Why NLP Still Matters in the Age of AI Agents
A language-first view of modern AI systems (2026) (source: Companies Bring AI Agents to Healthcare (WSJ)) A system that sounds simple—until it isn’t Imagine a health system designing a conversational AI service for telemedicine. Patients describe symptoms, concerns, and fragments of medical history in free text. The system responds conversationally, drawing on prior encounters, internal documentation, and clinical guidelines. It answers routine questions, summarizes relevant context, and—when appropriate—routes cases to clinicians. ...
From Vague to Vivid
Opening Happy new year! In my previous post, My Winter NLP Journey, I wrote about motivation: why I wanted to build a old model like GPT-2 and why implementation feels like the fastest path to understanding. I am glad to report I completed the journey as planned. This post is a reflection on what actually changed in my head after working through the course on my own. The biggest gain was simple to name but hard to achieve: several ideas that were vague for years finally became clear. ...
My Winter NLP Journey
The Gradient That Changed Everything It’s almost coincidental. On the Christmas Eve this year, I came across a math problem asking me to compute the partial derivatives of Word2Vec’s naive softmax loss — standard fare for any NLP course. But something compelled me to keep going, to really understand what these update rules were doing. The result was deceptively simple: $$ \frac{\partial J}{\partial v_c} = -u_o + \sum_{w\in V} \Pr[w|c] \, u_w = U(\hat{y} - y) $$What struck me wasn’t the math itself albeit it’s elegant but straightforward. What caught my attention was the structure of the learning process. What this math formulation suggests is that updating the center word vector $v_c$ requires knowing the current state of all context vectors $U$. But updating $U$ requires knowing $v_c$ (illustrated by partial derivative regarding $U$–the other piece of the puzzle). This chicken-and-egg dependency — where each parameter set treats the other as temporarily fixed — reminds us of Expectation-Maximization algorithms. It isn’t EM in the formal sense, but the alternating dependence—treating one parameter block as fixed while updating the other—shares the same structural intuition. ...
Post-mortem on Cloudflare Outage on Nov 2025: When a Single Assumption Went Global
An FP Practitioner’s Perspective on the Cloudflare Feature-File Outage Disclaimer: I am not affiliated with Cloudflare. This analysis is based entirely on publicly available incident reports and technical discussions. The architectural reconstructions and code examples represent my educated interpretation of what likely occurred, informed by functional programming principles and my own experience with similar failure modes. Where I speculate beyond published details, I have tried to make those inferences explicit. ...
AoC 2025 Final Reflection: Serious FP Journey
The Final Commit For the first time since Advent of Code began, the event ends on 12 Dec 2025 instead of on Christmas Day. Twelve days instead of twenty-five. My final solution—Day 12’s NP-hard tiling problem—took over 15 seconds to run, a humbling reminder that not every problem yields to elegance. But it works, and that’s what matters. The shortened 12-day format changed the rhythm entirely: less time to “warm up,” less room for recovery, and far more emphasis on momentum and clarity of thought. In that sustained sprint, F# didn’t just “work”—it got out of the way. Looking back at my commit history, I wrote less code this year than in 2024, but I enjoyed it more. ...
AoC 2025 Midpoint Review: How F# Clicks For Me
We are at the halfway mark of this year’s shortened 12-day Advent of Code. As I wrote three weeks ago, I decided to run an experiment: I abandoned my usual comfortable tool (Python) and my previous “challenge” tool (Rust) to solve everything personal in F#. In 2023 and 2024, I solved AoC in Rust. I treated it as a software engineering exercise: structured projects, cargo run --example dayXX, and strict memory discipline. ...
Working Toward Robustness (F# c-MLE Project Part 2)
How a lucky random number sequence hid a huge error until I tested on different platform — and what it taught me about production numerical code Days ago I wrote about implementing c-MLE in F# as my first substantial project in the language. The code worked. Tests passed. The optimizer converged to reasonable parameter estimates with violations driven to machine precision. Then I whimsically tested it on my M1 Max. Same code. Same seed. Same data. The estimated parameter was off by 38%. ...
My First F# Project: Implementing Constrained Optimization from Scratch
One week ago, I wrote about why I’m learning F# — a language I may never use professionally, but one I believe will change how I think in the languages I do use. The thesis was simple: F# might be the sweet spot for researchers who need readable, concise, and safe code without paying Rust’s memory-management tax or Haskell’s conceptual overhead. That was nearly weakly-educated speculation. This is what happened when I tried to prove it. ...
Why I'm Learning F# (And Why It May Matter For You Data Scientists)
There’s something strange about learning a programming language you may never use professionally. When I tell people I’m learning F#, the responses are almost predictable: Are you switching careers? Is your team moving to .NET? No. I still work in population and behavioral health research as a data scientist and numerical programmer, where Python dominates and Rust handles the performance-critical parts of our pipeline. F# is unlikely to appear in production systems at my job. ...
Functional Programming in Python
I confess gave up on Haskell about ten years ago. It wasn’t for lack of trying. I had spent months wrestling with type classes, drowning in monad transformers, and debugging cryptic compiler errors that felt more like philosophical riddles than helpful feedback. The promise of pure functional programming (FP) was intoxicating – bulletproof correctness, elegant abstractions, programs that composed like mathematical proofs. But the reality was different. The learning curve was steep, the tooling was sparse, and most importantly, I couldn’t use it professionally. Try convincing a team to rewrite a production system in Haskell. Try hiring engineers who know it. Try getting management approval for a language most people have never heard of. ...