One week ago, I wrote about why I’m learning F# — a language I may never use professionally, but one I believe will change how I think in the languages I do use. The thesis was simple: F# might be the sweet spot for researchers who need readable, concise, and safe code without paying Rust’s memory-management tax or Haskell’s conceptual overhead.
That was nearly weakly-educated speculation. This is what happened when I tried to prove it.

Finding the Right First Project
I needed something that would test F# in my actual domain — numerical and statistical programming — without being so ambitious that the language learning would drown in project complexity.
The criteria:
- Meaningful: Not a toy example. Something I’d genuinely want in my toolkit.
- Scoped: Completable in a focused week, not a months-long odyssey.
- Familiar territory: Constrained optimization, likelihood-based inference, matrix operations — problems I’ve solved before in Python and Rust.
My background includes (but not restricted within) several projects involving optimizations and penalized regressions. That pointed toward a specific class of problems: maximum likelihood estimation with side constraints.
Then I came across a last-year publication that fit perfectly.
The Paper: Constrained Maximum Likelihood Estimation
The method comes from Cao et al. (2024):
A constrained maximum likelihood approach to developing well-calibrated models for predicting binary outcomes.
Lifetime Data Analysis, 30, 624–648.
The paper addresses a problem I’ve encountered repeatedly in population health research:
You have a well-calibrated base model $\phi(X)$ for predicting some outcome. You want to improve it by adding a new predictor $Z$. But your training data comes from a different population than your target population — and if you just fit a standard logistic regression on $(X, Z)$, the resulting model will be miscalibrated where it matters most.
This is covariate shift, and it’s endemic to healthcare analytics. Your training data is whoever showed up at a particular clinic; your target population is the broader population you actually want to serve.
What Goes Wrong Without Constraints
Standard maximum likelihood assumes your training distribution matches your target distribution:
$$ P_S(X, Z, Y) \approx P(X, Z, Y) $$When this assumption fails — different covariate distributions, different outcome prevalence, or both — the fitted model can dramatically misestimate risk, especially in the tails. A model that looks good on average can be dangerously wrong for high-risk patients.
The c-MLE Solution
The key insight is to constrain the optimization. Instead of just maximizing likelihood, you maximize likelihood subject to calibration constraints derived from a trusted base model.
Concretely, you partition the risk spectrum into intervals based on the base model $\phi(X)$, then enforce that your new model’s predictions agree with the base model’s calibration on average within each interval:
$$ (1 - \delta)\,\pi_r \le \mathbb{E}\Big[g_\theta(X, Z) \mid \phi(X) \in I_r\Big] \le (1 + \delta)\,\pi_r $$where $\pi_r$ is the true average risk in interval $I_r$ and $\delta$ is a tolerance parameter.
This forces the new model to inherit the base model’s calibration in the target population while still benefiting from the additional predictor $Z$.
How c-MLE Works
The implementation requires three interlocking pieces.
1. Joint Likelihood
We model both the outcome $Y$ and the new predictor $Z$ given $X$:
$$ \mathcal{L}(\theta, \tau, \sigma) = \sum_i \Big[\log P_\theta(Y_i \mid X_i, Z_i) + \log f_{\tau,\sigma}(Z_i \mid X_i)\Big] $$The first term is standard logistic regression. The second term models the conditional distribution of $Z \mid X$ — necessary because we need to compute expectations over $Z$ when evaluating the calibration constraints.
2. Conditional Model for $Z \mid X$
Following the paper, I model:
$$ \log(Z) \sim \mathcal{N}(\tau^\top [1, X], \sigma^2), \quad \text{truncated to } (-\infty, 0] $$This places $Z \in (0, 1]$ — appropriate for the paper’s application (breast density) and many other bounded predictors.
3. Penalty-Based Constraint Enforcement
Rather than solving a constrained optimization problem directly, we convert it to an unconstrained problem via penalty:
$$ J(\theta) = -\mathcal{L}(\theta) + \lambda \sum_r \max(0, \text{violation}_r)^2 $$This is less elegant than Lagrangian methods but far simpler to implement — and simplicity matters when the goal is learning a new language.
Implementation Architecture
The F# implementation follows a “functional core, imperative shell” pattern:
| Module | Responsibility | Purity |
|---|---|---|
MathHelpers |
Sigmoid, truncated log-normal density, Monte Carlo sampling | Pure |
Fitting |
Joint log-likelihood, risk prediction | Pure |
Constraints |
Calibration constraint evaluation | Pure |
ParameterOps |
Vector ↔ record conversion for optimization | Pure |
Optimization |
BFGS wrapper, numerical gradients | Impure |
The pure core contains all the statistical logic. The impure shell handles optimization iteration, logging, and I/O. This separation isn’t just aesthetic — it makes the statistical code testable and the optimizer swappable.
Domain Types as Documentation
The type definitions directly mirror the statistical structure:
|
|
Anyone familiar with the paper can read these types and understand the implementation’s structure. The types aren’t incidental — they are the specification.
Key Implementation Patterns
A few excerpts that illustrate what F# brings to this problem.
Railway-Oriented Error Handling
The main API uses a result computation expression:
|
|
Each step can fail. The let! syntax short-circuits on error, threading failures through without explicit checking. Compare to Python:
|
|
The Python version is readable, but the error handling is manual and easy to forget. The F# version makes failure paths explicit in the type system — if a function returns Result<'a, 'e>, you must handle the error case.
Pipelines That Read Like Math
Constraint evaluation flows naturally:
|
|
The data flows top to bottom. Each transformation is explicit. The structure mirrors the mathematical operation: filter to the relevant interval, compute weighted average risk via Monte Carlo, aggregate.
Tail Recursion for Rejection Sampling
Sampling from a truncated distribution requires rejection sampling — repeatedly draw until you get a valid sample. In Python, this would be a while loop with mutable state. In F#, tail recursion is idiomatic:
|
|
The recursive structure expresses the algorithm directly: sample, check, retry if needed. F# optimizes tail recursion to a loop, so there’s no stack overhead — you get the clarity of recursion with the performance of iteration.
Making It Actually Work: Iterative Improvements
The first version compiled and ran. It also produced nonsense.
This is the reality of numerical programming: a correct-looking implementation can fail silently when the optimizer fights against Monte Carlo noise, penalty weights are miscalibrated, or numerical edge cases create cliffs in the objective landscape.
Here’s what I fixed — and why it mattered.
Problem 1: Monte Carlo Variance Was Destabilizing the Optimizer
The original implementation sampled fresh random numbers every time the constraint function was evaluated. This meant the objective function itself was stochastic — BFGS would compute a gradient, take a step, and find the objective had changed not because of the step but because of sampling noise.
Solution: Common Random Numbers (CRN)
I pre-generate and freeze the noise vectors in an IntegrationContext:
|
|
Now the same uniform noise is reused across all optimization iterations. The objective function becomes deterministic for a given parameter vector, and the optimizer can make consistent progress.
Problem 2: Rejection Sampling Was Inefficient and Non-Deterministic
The original sampler used rejection sampling with a while loop. Combined with the variance problem above, this created unpredictable behavior.
Solution: Inverse Transform Sampling
For a truncated normal, you can sample directly via the inverse CDF:
|
|
This maps pre-generated uniform noise to the truncated distribution deterministically. No rejection, no variance, no surprises.
Problem 3: Fixed Penalty Weight Couldn’t Balance Likelihood and Constraints
This problem requires some context. True c-MLE is a constrained optimization problem:
$$ \max_\theta \mathcal{L}(\theta) \quad \text{subject to} \quad g_r(\theta) \le 0, \quad r = 1, \dots, R $$where $g_r(\theta)$ encodes the calibration constraints (upper and lower bounds on expected risk per interval).
The Theoretically Correct Approach: KKT Conditions
The Karush-Kuhn-Tucker (KKT) conditions characterize the solution. At the optimum $\theta^*$, there exist multipliers $\lambda_r \ge 0$ such that:
$$ \nabla \mathcal{L}(\theta^*) = \sum_r \lambda_r \nabla g_r(\theta^*) $$with complementary slackness: $\lambda_r g_r(\theta^*) = 0$ for each $r$.
Intuitively, the gradient of the objective is a linear combination of constraint gradients — you can’t improve the objective without violating some binding constraint.
Why This Requires SQP
Solving the KKT system directly is intractable for nonlinear problems. The standard approach is Sequential Quadratic Programming (SQP), which iteratively:
- Approximates the Lagrangian Hessian
- Linearizes constraints around the current point
- Solves a Quadratic Programming (QP) subproblem to get a search direction
- Updates both parameters $\theta$ and multipliers $\lambda$
SQP is the workhorse of nonlinear constrained optimization — it’s what you’d use in MATLAB’s fmincon, Python’s scipy.optimize.minimize(..., method='SLSQP'), or dedicated solvers like IPOPT and SNOPT.
MathNet Doesn’t Have SQP
MathNet.Numerics provides excellent unconstrained optimizers — BFGS, conjugate gradient, Newton-Raphson — but no constrained solvers. This is a common gap in numerical libraries outside the optimization-specialist ecosystem.
So I needed an alternative.
Penalty Methods: The Classic Workaround
The textbook solution is to convert the constrained problem to an unconstrained one:
$$ \min_\theta \Big[ -\mathcal{L}(\theta) + \rho \sum_r \max(0, g_r(\theta))^2 \Big] $$The penalty term $\rho \sum_r \max(0, g_r)^2$ adds a cost for constraint violations. As $\rho \to \infty$, the penalized solution approaches the true constrained solution.
The catch: a fixed $\rho$ is a blunt instrument. Too small and constraints are ignored. Too large and the optimizer fixates on constraints before finding a good likelihood region — or worse, the problem becomes ill-conditioned and BFGS fails to converge.
Solution: Penalty Scheduling
Instead of picking a single $\rho$, I start small and increase geometrically:
|
|
The optimizer starts with $\rho = 0.1$, finds a good likelihood region, then progressively tightens constraints by multiplying $\rho$ by 5 each iteration until violations fall below tolerance.
This is a simplified form of the augmented Lagrangian method (also called the method of multipliers). A full implementation would also update Lagrange multiplier estimates between iterations, but even this basic penalty schedule works far better than a fixed weight — and it’s implementable with any unconstrained optimizer.
Problem 4: Numerical Cliffs in the Objective
Returning -1e10 for bad density values creates a cliff in the objective landscape. The optimizer can’t compute meaningful gradients near these cliffs, so it gets stuck or diverges.
Solution: Soft Floors
Instead of hard cutoffs, I use soft floors that create steep slopes rather than cliffs:
|
|
The gradient is enormous in bad regions, but it exists — the optimizer can escape.
Watching It Converge
Running the example produces this output:
|
|
The story is visible in the numbers:
-
Initial fit: The density model estimates $\sigma \approx 0.17$ from the data via OLS on $\log Z$.
-
Early iterations ($\rho = 0.1$ to $2.5$): The optimizer focuses primarily on likelihood, with constraints as a gentle nudge. Violations drop from 16% to 4%.
-
Middle iterations ($\rho = 12.5$ to $312.5$): Constraints become dominant. The optimizer sacrifices some likelihood fit to satisfy calibration bounds. Violations drop to 0.08%.
-
Final iterations ($\rho = 1562.5$ to $7812.5$): Fine-tuning. Violations drop below the $10^{-4}$ tolerance and the loop terminates.
-
Result: $\hat{\beta}_z = 0.9778$ against a true value of $1.0$ — a 2.2% relative error. The model is both well-fitted and calibrated.
This is exactly what the penalty scheduling approach is designed to produce: find a good solution first, then enforce constraints progressively rather than all at once.
Lessons Learned
F# Let Me Focus on the Logic
This was the main thesis, and it held up. Implementing c-MLE, I spent my cognitive budget on:
- Likelihood functions and their gradients
- Constraint geometry and penalty formulation
- Domain modeling — how parameters, data, and targets relate
I did not spend time on:
- Defensive
Nonechecks scattered through the codebase - Debugging silent
NaNpropagation from type mismatches - Tracing runtime errors back to incorrect assumptions
In Python, I think about logic and defensive coding simultaneously. In Rust, I think about logic and ownership simultaneously. In F# — for this class of problem — I just thought about the logic.
The Ecosystem Gap Was Smaller Than I Feared
I expected to miss NumPy and SciPy constantly. I didn’t.
MathNet.Numerics covered everything c-MLE needed: dense matrices, linear solvers, probability distributions, and BFGS optimization. The API is clean, the documentation is adequate, and performance was never a bottleneck for this problem.
This won’t hold universally. If I needed GPU acceleration, deep learning primitives, or the full scipy.stats menagerie, I’d be reaching for Python. But for algorithm-focused work at moderate scale — implementing a paper, prototyping a method, building a simulation — the gap is narrower than I expected.
The Right Tool Depends on What You’re Building
Here’s the distinction that crystallized for me:
Python’s ecosystem is unbeatable when you’re calling established methods. Need logistic regression with L2 regularization? LogisticRegression(penalty='l2').fit(X, y) and you’re done. The ecosystem does the work.
But when you’re implementing novel methods — translating a recent paper into working code, extending an algorithm, building something that doesn’t exist in a library yet — you’re not composing APIs. You’re writing the logic those APIs would eventually wrap.
That’s where F# earns its keep.
Python’s ecosystem is the paved road. When you’re going where the roads don’t reach yet, you need different equipment.
Types Caught Real Bugs
Two specific examples:
-
I initially had
CalibrationTarget.expected_risksasfloat[]instead offloat list. The type mismatch when I tried to useList.indexedforced me to think about whether I wanted array semantics or list semantics — and I realized list was correct for the access patterns I needed. -
The
Parametersrecord made it impossible to accidentally pass density parameters where logistic parameters were expected. In Python, both would benp.ndarray, and the bug would surface as wrong results, not a type error.
These aren’t dramatic saves. They’re small friction points that, in Python, would have become debugging sessions later.
Ionide Is Genuinely Excellent
The tooling exceeded expectations. Type inference feedback is near-instant. Errors are precise. Refactoring works reliably. The REPL integration lets you test functions interactively without the ceremony of a full test harness.
Coming from rust-analyzer (which is excellent but must reason about lifetimes and borrow checking), Ionide feels faster — not because it’s better engineered, but because F#’s type system is simpler. The tooling benefits from the language’s design.
Imperative First, Functional Later
My actual workflow wasn’t “think functionally from the start.” It was:
- Write something that works, often imperatively
- Notice the shape of the computation
- Refactor toward pipelines or recursion when the pattern clarifies
For example, my first version of computing expected risks per calibration interval looked like this:
|
|
It worked. Once I saw the pattern — filter, then average — the functional version was obvious:
|
|
The same progression happened with rejection sampling: while not done_ became tail recursion once I understood the algorithm well enough to express it declaratively.
F# accommodates this gracefully. mutable exists. for loops exist. You can write Python-shaped code, get it working, then incrementally tighten it. The language doesn’t demand purity — it just makes purity available when you’re ready.
This matters for learning. I wasn’t fighting the language while also figuring out the algorithm. I could fall back on familiar patterns, then level up the code once the logic was solid.
What This Project Didn’t Test
Honesty requires noting the limitations of a single project:
- Scale: This is ~550 lines. I don’t know how F# feels at 10,000 lines.
- Interop: I didn’t need to call Python or C libraries. The MathNet.Numerics dependency was sufficient.
- Concurrency: The implementation is single-threaded. F#’s async and parallelism story remains untested for me.
- Discriminated unions: The problem didn’t call for complex state modeling. I used records and functions, not DUs.
A Note on Rigor
The penalty scheduling approach works well for exploration and practical use, but it’s an approximation. For work requiring strong theoretical guarantees — a submission to the Journal of Statistical Software, say, or a methods paper where you need to prove KKT conditions are satisfied to numerical tolerance — you’d want a proper SQP solver.
This opens two future directions:
-
Interop with external solvers: Call IPOPT or SNOPT via P/Invoke or a subprocess. F# on .NET makes native interop feasible, though not trivial.
-
Implement SQP in F#: A more ambitious project — but F#’s expressiveness for numerical code makes it less daunting than it would be in many languages. The QP subproblem could use an active-set method; the outer loop is just the schedule I already have, plus multiplier updates.
Neither was necessary for this project’s goal: learning F# through a meaningful implementation. But they’re natural next steps if c-MLE becomes a tool I use seriously.
Future projects will probe these edges. For now, I’ve validated that F# works for the core of what I do: implementing statistical methods.
Wrapping Up
c-MLE is now implemented, tested against synthetic data, and ready to apply to real problems. More importantly, I have evidence for the claim I made last week:
F# gives me executable mathematics — code that reflects the underlying reasoning, backed by a type system that eliminates entire categories of runtime bugs.
This wasn’t a toy example chosen to flatter the language. It was a 2024 paper, with real mathematical complexity, implemented from scratch. And F# made it easier than Python would have — not because Python couldn’t do it, but because F# caught mistakes earlier, composed more cleanly, and let me focus on the statistics instead of the scaffolding.
If you work in numerical or statistical programming and you’ve felt the friction I described — None checks everywhere, runtime type errors, code that doesn’t look like the math it implements — F# might be worth an afternoon of exploration.
The implementation is available on GitHub: https://github.com/SaehwanPark/c-mle-fp
References
Cao, Y., Ma, W., Zhao, G., McCarthy, A. M., & Chen, J. (2024). A constrained maximum likelihood approach to developing well-calibrated models for predicting binary outcomes. Lifetime Data Analysis, 30, 624–648. https://doi.org/10.1007/s10985-024-09628-9