alt

Source: https://mathworld.wolfram.com/HadamardMatrix.html

Going Deeper: The Mathematical Engine Behind BRR

In our previous posts on Balanced Repeated Replication (BRR) and Fay’s method, we explored how these techniques solve the variance estimation problem in complex surveys. We saw the elegant result: create a set of replicate weights, recompute your statistics using each replicate, and combine the results to get variance estimates that properly account for the survey design.

But we left something crucial as a black box: how exactly are these replicates constructed? We said “use a Hadamard matrix” and moved on, focusing instead on the weighting schemes and the variance formulas. For many practitioners, that’s sufficient – major survey data providers like the Medicare Current Beneficiary Survey (MCBS), NHANES, and many state-level behavioral health surveys provide pre-computed replicate weights in their public use files. You load the data, use the supplied weights, trust the mathematics, get your standard errors.

But if you’re reading this, you’re probably not satisfied with black boxes. You want to understand why it works. What makes one particular matrix construction “just right” for this problem? Why do these specific patterns of $+1$ and $-1$ entries give us valid variance estimates? And what’s so special about Hadamard matrices anyway?

This post unpacks that machinery. We’ll see how a mathematical structure discovered in 1893 – long before survey sampling was even formalized – provides an unexpectedly perfect solution to the variance estimation dilemma. The key insight is one of those beautiful moments in applied mathematics where an abstract object, studied for its own theoretical interest, turns out to be exactly what a practical problem needs.

Here’s what we’ll cover:

  • The precise requirements that replicate construction must satisfy
  • What Hadamard matrices are and why their properties matter
  • The explicit connection between matrix structure and variance decomposition
  • A bit of history: from pure mathematics to survey statistics
  • Why this particular construction is both theoretically sound and computationally elegant

By the end, you’ll see not just how BRR works, but why it had to work this way – and why Hadamard matrices are the natural (perhaps only?) solution to the problem.


The Hadamard Solution

What We Need

Let’s be precise about the requirements. Suppose we have a stratified survey with $H$ strata, each containing exactly 2 PSUs. We want to construct $R$ replicates such that:

  1. Balance: Each PSU appears in exactly half the replicates with weight doubled, and half with weight zeroed (or in Fay’s case, reduced)
  2. Orthogonality: The perturbations across different replicates should be “independent” in a specific sense – no replicate’s pattern should be predictable from others
  3. Efficiency: We want $R$ to be reasonably small (not exponentially large in $H$) for computational tractability
  4. Variance decomposition: The replicate variances should combine to give an unbiased estimate of the true sampling variance

These requirements seem demanding. Balance suggests symmetry. Orthogonality suggests independence. Efficiency demands compactness. How can we possibly satisfy all of these simultaneously?

Enter Hadamard

A Hadamard matrix $\mathbf{H}$ of order $n$ is an $n \times n$ matrix with entries in $\{-1, +1\}$ satisfying:

$$ \mathbf{H} \mathbf{H}^T = n \mathbf{I}_n $$

That’s it. Remarkably simple definition, yet packed with implications.

Let’s see a concrete example. For $n = 4$:

$$ \mathbf{H}_4 = \begin{pmatrix} +1 & +1 & +1 & +1 \\ +1 & -1 & +1 & -1 \\ +1 & +1 & -1 & -1 \\ +1 & -1 & -1 & +1 \end{pmatrix} $$

Check the orthogonality: take rows 2 and 3. Their inner product is:

$$ (+1)(-1) + (-1)(+1) + (+1)(-1) + (-1)(-1) = -1 - 1 - 1 + 1 = -2 + 2 = 0 $$

Every pair of distinct rows is orthogonal. And each row has norm $\sqrt{4} = 2 = \sqrt{n}$.

The BRR Interpretation

Now map this to our survey problem. Suppose $H = 4$ strata:

  • Each column represents a stratum (PSU pair)
  • Each row represents a replicate
  • Entry $(r, h)$ tells us: in replicate $r$, which PSU from stratum $h$ gets doubled weight?
    • $+1$: double the first PSU, zero the second
    • $-1$: zero the first PSU, double the second

Look at row 2: $(+1, -1, +1, -1)$. This means:

  • Stratum 1: keep PSU 1, drop PSU 2
  • Stratum 2: drop PSU 1, keep PSU 2
  • Stratum 3: keep PSU 1, drop PSU 2
  • Stratum 4: drop PSU 1, keep PSU 2

Each replicate “splits” the strata in a different way. And here’s the magic: the orthogonality of the matrix rows ensures the balance and independence properties we need.

Why This Works (Preview)

The orthogonality condition $\mathbf{H} \mathbf{H}^T = n \mathbf{I}_n$ has immediate consequences:

  1. Row balance: Each row has equal numbers of $+1$ and $-1$ entries (actually, this requires $n$ to be even, which connects to the Hadamard conjecture)
  2. Column balance: Across all rows, each column sees $+1$ exactly $n/2$ times
  3. Pairwise orthogonality: Different replicates perturb the data in uncorrelated ways

These properties translate directly into the unbiased variance estimation we need. But to see why this gives correct variance estimates, we need to understand the mathematical properties more deeply.

That’s next.


Mathematical Properties

Orthogonality and What It Buys Us

The defining property $\mathbf{H} \mathbf{H}^T = n \mathbf{I}_n$ is deceptively powerful. Let’s unpack what it means.

If we denote the rows of $\mathbf{H}$ as $\mathbf{h}_1, \mathbf{h}_2, \ldots, \mathbf{h}_n$, then:

$$ \langle \mathbf{h}_i, \mathbf{h}_j \rangle = \begin{cases} n & \text{if } i = j \\ 0 & \text{if } i \neq j \end{cases} $$

Since entries are $\pm 1$, this means:

  • Self inner product: $\sum_{k=1}^{n} h_{ik}^2 = n$ (automatically satisfied since each entry squares to 1)
  • Mutual orthogonality: For $i \neq j$, exactly half the positions agree in sign, half disagree

That second point is crucial. If row $i$ has $+1$ in position $k$ and row $j$ has $+1$ in position $k$, they “agree”. If they have opposite signs, they “disagree”. Orthogonality means: across all $n$ positions, agreements and disagreements perfectly cancel.

In the BRR context, this means: across all strata, two different replicates make opposite perturbation choices exactly as often as they make the same choice. This is what creates statistical independence in the replicate estimates.

The Determinant

Another beautiful property: Hadamard matrices achieve the maximum possible determinant for matrices with bounded entries.

Hadamard’s own inequality states that for any $n \times n$ real matrix $\mathbf{A}$ with rows $\mathbf{a}_1, \ldots, \mathbf{a}_n$:

$$ |\det(\mathbf{A})| \leq \prod_{i=1}^{n} \|\mathbf{a}_i\| $$

with equality if and only if the rows are orthogonal.

For entries in $\{-1, +1\}$, we have $\|\mathbf{a}_i\| = \sqrt{n}$, so:

$$ |\det(\mathbf{H})| = n^{n/2} $$

This maximal determinant property means the rows span the space as “independently” as possible – no row is even remotely close to being a linear combination of others. In statistical terms, our replicates capture distinct, non-redundant information about sampling variability.

Balance: The Even Order Requirement

Here’s an elegant constraint. If $\mathbf{H}$ is Hadamard of order $n$, consider the row sums. Let $\mathbf{h}_i$ be row $i$. Then:

$$ s_i = \sum_{j=1}^{n} h_{ij} $$

is the difference between the number of $+1$’s and $-1$’s in that row.

Now, $s_i^2 \leq n$ by Cauchy-Schwarz (since $|h_{ij}| = 1$). But we also know:

$$ \sum_{i=1}^{n} s_i^2 = \sum_{i=1}^{n} \sum_{j=1}^{n} \sum_{k=1}^{n} h_{ij} h_{ik} = \sum_{j=1}^{n} \sum_{k=1}^{n} \left(\sum_{i=1}^{n} h_{ij} h_{ik}\right) $$

The inner sum is $\langle \text{col}_j, \text{col}_k \rangle$, which equals $n$ if $j=k$ and 0 otherwise (since $\mathbf{H}$ is also column-orthogonal). Thus:

$$ \sum_{i=1}^{n} s_i^2 = n^2 $$

If all $s_i^2 \leq n$ and they sum to $n^2$, we need $s_i^2 = n$ for all $i$. This means $s_i = \pm\sqrt{n}$.

For this to hold with integer entries, $n$ must be a perfect square or divisible by 4. The refined Hadamard conjecture states: Hadamard matrices exist if and only if $n = 1, 2$, or $n \equiv 0 \pmod{4}$.

For BRR purposes, this means: we can construct balanced replicates when the number of strata is 1, 2, or a multiple of 4. This isn’t a limitation we chose – it’s forced by the mathematics.

The Hadamard Conjecture

The existence question is fascinating: do Hadamard matrices exist for all $n \equiv 0 \pmod{4}$?

As of 2025, this remains unresolved, though they’ve been explicitly constructed for all $n = 4k \leq 664$ and many larger values. The smallest open case fluctuates as new constructions are found.

For practical survey work, we have explicit constructions for all relevant sizes (more on this in the construction section). But the theoretical question remains one of the elegant open problems in combinatorics.

Why These Properties Matter for Variance

Let’s connect this back to statistics. Suppose in stratum $h$, the true variance contribution is $\sigma_h^2$. When we form replicate $r$, we perturb stratum $h$ according to entry $h_{rh}$ of our Hadamard matrix.

The replicate variance estimator has the form:

$$ \hat{V} = \frac{1}{R} \sum_{r=1}^{R} (\hat{\theta}_r - \hat{\theta})^2 $$

where $\hat{\theta}_r$ is the estimate using replicate $r$.

The key insight: because of orthogonality, when we expand this variance estimator, the cross terms between different strata vanish (on average), leaving only the true stratum variances $\sigma_h^2$. The balance property ensures we’re not systematically over- or under-weighting any stratum.

The mathematical elegance is this: the same properties that make Hadamard matrices interesting to pure mathematicians – orthogonality, balance, maximal determinant – are precisely what we need for unbiased variance estimation.


The BRR Connection

The Variance Decomposition

Now we make the connection explicit. Let’s formalize the survey setting and show exactly how Hadamard structure gives us correct variance estimates.

Setup: We have $H$ strata, each with 2 PSUs. Let $y_{hi}$ denote the value from PSU $i$ in stratum $h$ (these are themselves typically weighted sums of individual observations within the PSU). The full-sample estimator is:

$$ \hat{\theta} = \sum_{h=1}^{H} \frac{1}{2}(y_{h1} + y_{h2}) $$

In general, this could be a more complex statistic, but the linear case reveals the core mechanism.

Replicate Construction: For replicate $r$, we use row $r$ of a Hadamard matrix $\mathbf{H}$ of order $R \geq H$. Define:

$$ \hat{\theta}_r = \sum_{h=1}^{H} \frac{1}{2}\left[(1 + h_{rh})y_{h1} + (1 - h_{rh})y_{h2}\right] $$

When $h_{rh} = +1$: weight on PSU 1 is $1$, weight on PSU 2 is $0$ (we’ve doubled one, zeroed the other).

When $h_{rh} = -1$: weight on PSU 1 is $0$, weight on PSU 2 is $1$.

This is the “half-sample” approach. Fay’s method modifies this by using $(1 + \epsilon h_{rh})$ and $(1 - \epsilon h_{rh})$ for $0 < \epsilon < 1$, but the mathematics is similar.

The Deviation

The key quantity is the deviation of replicate $r$ from the full sample:

$$ \hat{\theta}_r - \hat{\theta} = \sum_{h=1}^{H} \frac{1}{2}\left[(1 + h_{rh})y_{h1} + (1 - h_{rh})y_{h2}\right] - \sum_{h=1}^{H} \frac{1}{2}(y_{h1} + y_{h2}) $$$$ = \sum_{h=1}^{H} \frac{1}{2}\left[h_{rh}y_{h1} - h_{rh}y_{h2}\right] = \frac{1}{2}\sum_{h=1}^{H} h_{rh}(y_{h1} - y_{h2}) $$

Let $d_h = y_{h1} - y_{h2}$ denote the difference within stratum $h$. Then:

$$ \hat{\theta}_r - \hat{\theta} = \frac{1}{2}\sum_{h=1}^{H} h_{rh} d_h $$

Beautiful! The deviation is a weighted sum of within-stratum differences, with weights coming from the Hadamard matrix.

The Variance Estimator

The BRR variance estimator is:

$$ \hat{V}_{\text{BRR}} = \frac{1}{R} \sum_{r=1}^{R} (\hat{\theta}_r - \hat{\theta})^2 = \frac{1}{4R} \sum_{r=1}^{R} \left(\sum_{h=1}^{H} h_{rh} d_h\right)^2 $$

Expanding the square:

$$ \hat{V}_{\text{BRR}} = \frac{1}{4R} \sum_{r=1}^{R} \sum_{h=1}^{H} \sum_{k=1}^{H} h_{rh} h_{rk} d_h d_k $$$$ = \frac{1}{4R} \sum_{h=1}^{H} \sum_{k=1}^{H} d_h d_k \left(\sum_{r=1}^{R} h_{rh} h_{rk}\right) $$

Now here’s where Hadamard orthogonality shines. The inner sum $\sum_{r=1}^{R} h_{rh} h_{rk}$ is the inner product of columns $h$ and $k$ of $\mathbf{H}$.

By the Hadamard property $\mathbf{H}^T \mathbf{H} = R \mathbf{I}_R$ (columns are also orthogonal):

$$ \sum_{r=1}^{R} h_{rh} h_{rk} = \begin{cases} R & \text{if } h = k \\ 0 & \text{if } h \neq k \end{cases} $$

The cross terms vanish! We’re left with:

$$ \hat{V}_{\text{BRR}} = \frac{1}{4R} \sum_{h=1}^{H} d_h^2 \cdot R = \frac{1}{4} \sum_{h=1}^{H} d_h^2 $$

Why This Is Unbiased

Under the survey design, each stratum is independently sampled. The variance of $\hat{\theta}$ is:

$$ \text{Var}(\hat{\theta}) = \sum_{h=1}^{H} \text{Var}\left(\frac{1}{2}(y_{h1} + y_{h2})\right) $$

For a two-PSU stratum with equal probability sampling:

$$ \text{Var}\left(\frac{1}{2}(y_{h1} + y_{h2})\right) = \frac{1}{4}\text{Var}(y_{h1} - y_{h2}) = \frac{1}{4}\mathbb{E}[d_h^2] $$

(assuming the PSUs are sampled without replacement from a finite population, or appropriately defined for the sampling scheme).

Therefore:

$$ \mathbb{E}[\hat{V}_{\text{BRR}}] = \frac{1}{4} \sum_{h=1}^{H} \mathbb{E}[d_h^2] = \sum_{h=1}^{H} \text{Var}\left(\frac{1}{2}(y_{h1} + y_{h2})\right) = \text{Var}(\hat{\theta}) $$

The estimator is unbiased. And this works precisely because the Hadamard matrix orthogonality killed the cross terms.

What If We Used a Non-Hadamard Matrix?

Suppose we used a random $\pm 1$ matrix $\mathbf{M}$ instead. Then $\sum_{r} m_{rh} m_{rk}$ would generally be nonzero for $h \neq k$, introducing bias terms:

$$ \hat{V}_{\text{bad}} = \frac{1}{4R} \sum_{h=1}^{H} d_h^2 \sum_{r=1}^{R} m_{rh}^2 + \frac{1}{4R} \sum_{h \neq k} d_h d_k \sum_{r=1}^{R} m_{rh} m_{rk} $$

Those cross terms $d_h d_k$ could be positive or negative, large or small – we’d have no control over the bias. The estimator would be unreliable.

Hadamard matrices are special because they’re the unique $\pm 1$ matrices that guarantee these cross terms vanish.

The Elegance

Step back and appreciate what just happened:

  1. We needed to perturb $H$ strata in $R$ different ways
  2. We needed these perturbations to be “balanced” and “independent”
  3. We formalized this as: find a matrix where columns are orthogonal
  4. Hadamard matrices are exactly the $\pm 1$ matrices with orthogonal columns
  5. This orthogonality directly implies unbiased variance estimation

The mathematics didn’t just “work out” – it worked out because we chose the mathematically correct structure. This is applied mathematics at its finest: a deep structural property (orthogonality) solving a practical problem (variance estimation) in the most elegant possible way.


Historical Thread

Hadamard (1893): Pure Mathematics

Jacques Hadamard introduced these matrices in his 1893 paper on determinants. His motivation was entirely theoretical – he was studying the maximum value that $|\det(\mathbf{A})|$ could achieve for matrices with bounded entries.

The result was his celebrated inequality: for an $n \times n$ matrix with rows $\mathbf{a}_1, \ldots, \mathbf{a}_n$,

$$ |\det(\mathbf{A})| \leq \prod_{i=1}^{n} \|\mathbf{a}_i\| $$

with equality if and only if the rows are orthogonal. For entries in $\{-1, +1\}$, this gives $|\det(\mathbf{H})| = n^{n/2}$.

Hadamard’s question: when can we achieve this bound? He showed constructions for certain orders and proved the necessary condition $n \equiv 0 \pmod{4}$ for $n > 2$. The existence question became known as the Hadamard conjecture.

This was pure combinatorics. No applications, no statistics, no surveys – just beautiful mathematics for its own sake.

Paley (1933): Quadratic Residues

Raymond Paley made a major breakthrough in 1933 using number theory. He showed that Hadamard matrices of order $n$ exist whenever $n - 1$ is a prime power and $n \equiv 0 \pmod{4}$.

His construction used quadratic residues modulo primes – a completely different approach from Hadamard’s Sylvester-like recursive methods. This opened up large classes of orders where Hadamard matrices were known to exist.

Still, this remained in the realm of pure mathematics and combinatorics. The statistical applications were decades away.

Plackett-Burman (1946): Experimental Design

The first hint of statistical utility came from experimental design. Robin Plackett and J. P. Burman, working on screening experiments for industrial applications, needed to study many factors efficiently.

In a 1946 paper, they showed how Hadamard matrices could design experiments where $n-1$ factors are tested in $n$ runs, with each factor balanced and orthogonal to others. This allowed experimenters to estimate main effects without confounding, using far fewer runs than a full factorial design would require.

The Plackett-Burman designs became a cornerstone of industrial statistics – but still, these were about designed experiments, not observational surveys.

McCarthy (1966, 1969): The Survey Breakthrough

The leap to survey sampling came from Phillip J. McCarthy at the U.S. Bureau of the Census. In a 1966 technical paper, McCarthy proposed using half-sample replication for variance estimation in the Current Population Survey (CPS).

His insight: if we have $H$ strata with 2 PSUs each, we can form $H$ “half-samples” by selecting one PSU from each stratum. But which combinations should we use? Random selection would give us uncertain properties.

McCarthy realized that Hadamard matrices provided exactly the structured selection needed. The orthogonality ensured balanced, independent perturbations. The mathematical properties Hadamard studied for determinants translated directly into statistical properties for variance estimation.

McCarthy called this “Balanced Half-Sample Replication” – later shortened to Balanced Repeated Replication (BRR). His 1969 paper in the Review of the International Statistical Institute formalized the approach and provided theoretical justification.

This was the bridge: a 73-year journey from pure mathematics to practical survey statistics.

Fay (1984, 1989): The Refinement

Robert Fay at the U.S. Census Bureau extended McCarthy’s work. He observed that the full half-sample approach (doubling one PSU, zeroing the other) could be unstable for small samples or extreme statistics.

Fay’s modification was elegant: instead of weights $(2, 0)$, use $(1+\epsilon, 1-\epsilon)$ for $0 < \epsilon < 1$. Typically $\epsilon = 0.3$ or $0.5$. This “shrinks” the perturbation while maintaining the Hadamard structure’s orthogonality.

The variance formula adjusts by a factor of $1/\epsilon^2$:

$$ \hat{V}_{\text{Fay}} = \frac{1}{\epsilon^2 R} \sum_{r=1}^{R} (\hat{\theta}_r - \hat{\theta})^2 $$

Fay’s method became standard in U.S. federal surveys. It’s what you see in NHANES, MCBS, and most large-scale health surveys today.

Modern Developments

Current work focuses on:

  • Partial Hadamard matrices: What to do when $H$ is not a multiple of 4? Use the first $H$ columns of a larger Hadamard matrix, or use “nearly orthogonal” constructions
  • Computational efficiency: Fast algorithms for generating large Hadamard matrices and computing replicate estimates
  • Generalized BRR: Extensions to more than 2 PSUs per stratum, using more general orthogonal arrays
  • Nonlinear statistics: Understanding the behavior of BRR variance estimates for complex statistics like quantiles, where linearity breaks down

The Unifying Thread

The history reveals something profound about mathematical research. Hadamard studied determinants. Paley studied number theory. Neither could have imagined their work would eventually help health researchers compute confidence intervals for obesity prevalence.

Yet the mathematics was “right” in a deep sense – the structure they discovered was exactly what an applied problem would need, seven decades later. This is the quiet power of pure mathematics: today’s abstraction becomes tomorrow’s essential tool.

The Hadamard matrix wasn’t adapted to survey statistics. It was discovered to be the natural solution, as if it had been waiting all along.


Construction Methods

Sylvester’s Recursive Construction

The most elegant and practical construction for BRR purposes comes from James Joseph Sylvester (1867), predating even Hadamard’s work. The idea is breathtakingly simple: build large Hadamard matrices by recursively doubling.

Start with the trivial case:

$$ \mathbf{H}_1 = \begin{pmatrix} 1 \end{pmatrix} $$

Now, given a Hadamard matrix $\mathbf{H}_n$ of order $n$, construct order $2n$ using the Kronecker product:

$$ \mathbf{H}_{2n} = \mathbf{H}_2 \otimes \mathbf{H}_n = \begin{pmatrix} \mathbf{H}_n & \mathbf{H}_n \\ \mathbf{H}_n & -\mathbf{H}_n \end{pmatrix} $$

where $\mathbf{H}_2 = \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}$ is the base case.

Let’s see this in action:

$$ \mathbf{H}_2 = \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} $$$$ \mathbf{H}_4 = \begin{pmatrix} \mathbf{H}_2 & \mathbf{H}_2 \\ \mathbf{H}_2 & -\mathbf{H}_2 \end{pmatrix} = \begin{pmatrix} 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ 1 & 1 & -1 & -1 \\ 1 & -1 & -1 & 1 \end{pmatrix} $$$$ \mathbf{H}_8 = \begin{pmatrix} \mathbf{H}_4 & \mathbf{H}_4 \\ \mathbf{H}_4 & -\mathbf{H}_4 \end{pmatrix} $$

and so on. This gives us Hadamard matrices of order $2^k$ for all $k \geq 0$.

Why It Works

We need to verify that $\mathbf{H}_{2n} \mathbf{H}_{2n}^T = 2n \mathbf{I}_{2n}$.

$$ \mathbf{H}_{2n} \mathbf{H}_{2n}^T = \begin{pmatrix} \mathbf{H}_n & \mathbf{H}_n \\ \mathbf{H}_n & -\mathbf{H}_n \end{pmatrix} \begin{pmatrix} \mathbf{H}_n^T & \mathbf{H}_n^T \\ \mathbf{H}_n^T & -\mathbf{H}_n^T \end{pmatrix} $$

Block multiply:

$$ = \begin{pmatrix} \mathbf{H}_n \mathbf{H}_n^T + \mathbf{H}_n \mathbf{H}_n^T & \mathbf{H}_n \mathbf{H}_n^T - \mathbf{H}_n \mathbf{H}_n^T \\ \mathbf{H}_n \mathbf{H}_n^T - \mathbf{H}_n \mathbf{H}_n^T & \mathbf{H}_n \mathbf{H}_n^T + \mathbf{H}_n \mathbf{H}_n^T \end{pmatrix} $$

Since $\mathbf{H}_n \mathbf{H}_n^T = n \mathbf{I}_n$ by induction:

$$ = \begin{pmatrix} 2n \mathbf{I}_n & \mathbf{0} \\ \mathbf{0} & 2n \mathbf{I}_n \end{pmatrix} = 2n \mathbf{I}_{2n} $$

Perfect! The inductive step preserves the Hadamard property.

The Beauty of Recursion

This construction reveals deep structure. Each doubling embeds four copies of the previous matrix – three positive, one negative. The pattern has a fractal quality: zoom in on any $2^k \times 2^k$ subblock of $\mathbf{H}_{2^m}$ (where $k < m$), and you’ll see a scaled version of $\mathbf{H}_{2^k}$.

From a computational perspective, this is gift. We can:

  1. Generate $\mathbf{H}_{2^k}$ in $O(4^k) = O(n^2)$ time
  2. Store only the recursion pattern, not the full matrix (though for BRR, we typically do store it)
  3. Parallelize the construction naturally

For BRR with $H$ strata, we choose the smallest $k$ such that $2^k \geq H$, construct $\mathbf{H}_{2^k}$, and use the first $H$ columns. The remaining columns are simply discarded.

Paley’s Construction (When $n \equiv 0 \pmod{4}$ but $n \neq 2^k$)

When $n$ is divisible by 4 but not a power of 2, Sylvester’s construction doesn’t directly give us order $n$. Paley’s construction fills some of these gaps.

For a prime $p \equiv 3 \pmod{4}$, Paley constructs a Hadamard matrix of order $p + 1$. Define the Jacobi symbol:

$$ \left(\frac{a}{p}\right) = \begin{cases} 1 & \text{if } a \text{ is a quadratic residue mod } p \\ -1 & \text{if } a \text{ is a quadratic non-residue mod } p \\ 0 & \text{if } a \equiv 0 \pmod{p} \end{cases} $$

The construction uses:

$$ h_{ij} = \left(\frac{j-i}{p}\right) \quad \text{for } 1 \leq i, j \leq p $$

plus an additional row and column of all 1’s to make the matrix order $p+1$.

For example, with $p = 7$, we get a Hadamard matrix of order 8 – but since $8 = 2^3$, Sylvester’s method is simpler here. Paley’s construction shines for primes like $p = 11$ (giving order 12) or $p = 19$ (giving order 20).

Combining Constructions

A powerful technique: if Hadamard matrices of orders $m$ and $n$ exist, then a Hadamard matrix of order $mn$ exists (via Kronecker product). This extends our reach significantly:

$$ \mathbf{H}_{mn} = \mathbf{H}_m \otimes \mathbf{H}_n $$

For instance, $\mathbf{H}_{12} = \mathbf{H}_4 \otimes \mathbf{H}_3$… except $\mathbf{H}_3$ doesn’t exist (3 is not divisible by 4). But we can get $\mathbf{H}_{12}$ from Paley’s construction or by combining smaller known matrices.

This is how mathematicians have systematically constructed Hadamard matrices for all $n = 4k \leq 664$.

Practical Algorithm for BRR

For survey statistics implementation, here’s the standard approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
function generate_brr_hadamard(H):
  // find smallest power of 2 >= H
  k = ceiling(log2(H))
  n = 2^k
  
  // generate H_n via Sylvester
  H_matrix = sylvester_hadamard(n)
  
  // use first H columns
  return H_matrix[:, 1:H]

function sylvester_hadamard(n):
  if n == 1:
    return [1]
  else:
    H_half = sylvester_hadamard(n/2)
    return block_matrix([[H_half, H_half],
                         [H_half, -H_half]])

Simple, elegant, and guaranteed to work for any $H$. The resulting matrix might have more rows than strictly necessary (e.g., using 64 replicates for 50 strata), but the redundancy is mild and the orthogonality is perfect.

Why Not Use Smaller Matrices?

One might ask: if we have $H = 50$ strata, why use $R = 64$ replicates? Could we use exactly $R = 50$?

We could, but we’d lose guaranteed orthogonality. Since $50 = 2 \times 25$ and $25$ is odd, there’s no Hadamard matrix of order 50. We could use a $52 \times 52$ Hadamard (since $52 = 4 \times 13$) and take 50 columns, but we’d still need 52 replicates.

The power-of-2 approach via Sylvester is so simple and fast that the mild oversampling in replicates is considered acceptable. For $H = 50$, the difference between 50 and 64 replicates is negligible computationally – each replicate just means recomputing the statistic once more.

The Construction Elegance

The beauty here is that the simplest construction (Sylvester) covers the most common cases. Most surveys don’t have unusual numbers of strata. Using powers of 2 is natural, implementation is trivial, and the mathematics is bulletproof.

When you see “replicate weights computed using BRR” in a survey codebook, this is almost certainly what happened behind the scenes: Sylvester’s recursive doubling, with some columns potentially trimmed to match the stratum count.


Why It Works – The Deep Insight

The Fundamental Problem Restated

We’ve seen the mechanics – how Hadamard matrices lead to unbiased variance estimates. But let’s go deeper. Why is this the natural solution? What fundamental statistical principle is at play?

The core challenge in survey variance estimation is this: we observe one realization from a complex sampling design, but variance requires understanding what would happen across repeated sampling. We can’t actually re-sample the population, so we need to approximate that variability using only the data we have.

The Half-Sample Principle

McCarthy’s insight was to recognize that each stratum with 2 PSUs contains, in miniature, the sampling variability we need. The difference $d_h = y_{h1} - y_{h2}$ captures how much variation exists in that stratum.

If we could resample that stratum, sometimes we’d get PSU 1, sometimes PSU 2. The variability in our estimates across those hypothetical resamples is exactly what $d_h$ measures.

But here’s the subtlety: we need to combine these stratum-level variations correctly. We can’t just average $d_h^2$ across strata – that ignores how the strata contribute jointly to the overall estimator.

The Independence Requirement

For variance to decompose cleanly as $\text{Var}(\sum_h X_h) = \sum_h \text{Var}(X_h)$, we need independence across strata. This is built into the survey design – strata are sampled independently.

But when we form replicates, we’re creating artificial “pseudo-samples” that perturb multiple strata simultaneously. If those perturbations are correlated across strata, we introduce spurious covariance terms.

Hadamard orthogonality ensures: perturbations in different strata are uncorrelated across the replicate ensemble. Specifically, for strata $h$ and $k$:

$$ \frac{1}{R}\sum_{r=1}^{R} h_{rh} h_{rk} = \begin{cases} 1 & \text{if } h = k \\ 0 & \text{if } h \neq k \end{cases} $$

This is the discrete analogue of statistical independence. Across all replicates, the perturbation patterns in different strata are perfectly uncorrelated.

Variance Decomposition via Orthogonality

Let’s think about the replicate deviations as vectors. Define:

$$ \mathbf{v} = \begin{pmatrix} \hat{\theta}_1 - \hat{\theta} \\ \hat{\theta}_2 - \hat{\theta} \\ \vdots \\ \hat{\theta}_R - \hat{\theta} \end{pmatrix} $$

We showed earlier that $\hat{\theta}_r - \hat{\theta} = \frac{1}{2}\sum_{h=1}^{H} h_{rh} d_h$. In matrix form:

$$ \mathbf{v} = \frac{1}{2} \mathbf{H}_{R \times H} \mathbf{d} $$

where $\mathbf{d} = (d_1, d_2, \ldots, d_H)^T$ and $\mathbf{H}_{R \times H}$ uses the first $H$ columns of a Hadamard matrix.

The variance estimator is:

$$ \hat{V}_{\text{BRR}} = \frac{1}{R} \|\mathbf{v}\|^2 = \frac{1}{4R} \|\mathbf{H} \mathbf{d}\|^2 = \frac{1}{4R} \mathbf{d}^T \mathbf{H}^T \mathbf{H} \mathbf{d} $$

If $\mathbf{H}$ were square ($R = H$), we’d have $\mathbf{H}^T \mathbf{H} = R \mathbf{I}$, giving:

$$ \hat{V}_{\text{BRR}} = \frac{1}{4R} \mathbf{d}^T (R \mathbf{I}) \mathbf{d} = \frac{1}{4} \|\mathbf{d}\|^2 = \frac{1}{4} \sum_{h=1}^{H} d_h^2 $$

When $R > H$ (using a larger Hadamard and taking $H$ columns), we still have $\mathbf{H}^T \mathbf{H} = R \mathbf{I}_H$ for the column submatrix, so the result holds.

The Geometric Interpretation

Think of the space of all possible perturbations as $\mathbb{R}^H$ – one dimension per stratum. The vector $\mathbf{d} = (d_1, \ldots, d_H)$ represents the “true” variability signature of our sample.

The Hadamard matrix $\mathbf{H}$ defines $R$ probe directions in this space (the rows). Each replicate probes the variability along one direction. Because the columns of $\mathbf{H}$ are orthogonal, these probes are perfectly aligned with the coordinate axes of stratum-space.

When we compute $\frac{1}{R} \|\mathbf{H} \mathbf{d}\|^2$, we’re measuring the squared length of $\mathbf{d}$ as projected through these probe directions. The orthogonality ensures this measurement is unbiased – we’re not double-counting or missing any variance components.

In contrast, a non-orthogonal matrix would probe directions that aren’t aligned with the coordinate axes. Some variance components would be over-measured (if multiple probes point near the same axis), others under-measured. The bias would be unpredictable.

Why $\pm 1$ Entries?

Another deep question: why must the Hadamard matrix have entries in $\{-1, +1\}$ specifically?

This comes from the half-sample constraint. For each stratum in each replicate, we must choose one of the two PSUs. We can’t use both partially (that’s the full sample), and we can’t use neither (we’d have no data from that stratum).

The $\pm 1$ encoding naturally captures this binary choice:

  • $+1$: select PSU 1
  • $-1$: select PSU 2

Fay’s method relaxes this slightly, using $(1 + \epsilon h_{rh})/2$ and $(1 - \epsilon h_{rh})/2$ as weights, which corresponds to a convex combination of the two PSUs. But the underlying Hadamard structure is unchanged – we’re still using $h_{rh} \in \{-1, +1\}$ to determine the perturbation direction, just scaling it by $\epsilon$.

The Uniqueness Claim

Here’s a bold statement: Hadamard matrices are essentially the unique solution to the BRR problem.

More precisely: if we require:

  1. Exactly 2 PSUs per stratum
  2. Each replicate selects exactly one PSU from each stratum (half-sample property)
  3. Unbiased variance estimation for linear statistics
  4. Fixed number of replicates (not growing with $H$)

Then we need a matrix with:

  • Entries in $\{-1, +1\}$ (from requirement 2)
  • Orthogonal columns (from requirement 3)
  • $R \geq H$ rows (from requirement 4)

These conditions define Hadamard matrices (up to row/column permutations and sign flips).

So the solution isn’t just “a good choice” – it’s the only choice that satisfies all requirements simultaneously. This is why the mathematics feels inevitable once you understand the problem.

The Replication Philosophy

BRR embodies a general principle in statistics: when you can’t repeat the experiment, simulate repetition using structure.

We can’t actually re-run the survey. But by systematically perturbing the sample using Hadamard structure, we create $R$ “pseudo-surveys” that mimic what independent replications would produce. The orthogonality ensures these pseudo-surveys are as “independent” as possible given that they’re all derived from the same data.

This is philosophically similar to:

  • Bootstrap: resample the data to simulate sampling variability
  • Jackknife: leave out observations systematically to measure influence
  • Cross-validation: partition data to simulate out-of-sample performance

BRR’s special property is that the Hadamard structure gives us exact (unbiased) results for linear statistics, not just asymptotic approximations.

The Inevitability of the Solution

When McCarthy connected Hadamard matrices to survey variance, he wasn’t imposing an arbitrary mathematical gadget onto a statistical problem. He was recognizing that the problem’s inherent structure – binary choices per stratum, independence across strata, unbiased estimation – is the structure of Hadamard matrices.

The mathematics didn’t bend to fit the application. The application revealed its natural mathematical form, and that form had already been studied for 70 years.

This is the deepest “why” of all: BRR works because it respects the true structure of the sampling process, and Hadamard matrices are the mathematical objects that encode that structure exactly.


Computational Elegance

From Theory to Implementation

We’ve explored the mathematical beauty of Hadamard matrices and their theoretical perfection for BRR. But elegant mathematics doesn’t always translate to practical computation. Fortunately, Hadamard matrices are as computationally graceful as they are theoretically sound.

Storage Requirements

For a survey with $H$ strata, we need a Hadamard matrix with $H$ columns and $R$ rows (where $R \geq H$). Using Sylvester’s construction, $R = 2^{\lceil \log_2 H \rceil}$.

Each entry is $\pm 1$, requiring just 1 bit of storage (plus sign). For $H = 100$ strata, we need $R = 128$, giving a $128 \times 100$ matrix – that’s 12,800 bits, or 1.6 KB.

Even for massive surveys with $H = 1000$ strata, we need $R = 1024$, giving roughly 128 KB. This fits comfortably in L2 cache on modern processors.

Generation Speed

Sylvester’s recursive construction is embarrassingly efficient. The recurrence:

$$ \mathbf{H}_{2n} = \begin{pmatrix} \mathbf{H}_n & \mathbf{H}_n \\ \mathbf{H}_n & -\mathbf{H}_n \end{pmatrix} $$

requires no arithmetic operations – just copying and sign flipping. Starting from $\mathbf{H}_1 = [1]$:

  • $\mathbf{H}_2$: 4 elements, 2 sign flips
  • $\mathbf{H}_4$: 16 elements, 8 sign flips
  • $\mathbf{H}_8$: 64 elements, 32 sign flips
  • $\mathbf{H}_{2^k}$: $4^k$ elements, $2 \cdot 4^{k-1}$ sign flips

Total operations: $O(n^2)$ for generating an $n \times n$ Hadamard matrix. In practice, for $n = 1024$, this takes microseconds.

Replicate Weight Computation

Given the Hadamard matrix and base weights $w_{hi}$ for PSU $i$ in stratum $h$, computing replicate weights is straightforward:

$$ w_{hi}^{(r)} = \begin{cases} 2 w_{hi} & \text{if } i = 1 \text{ and } h_{rh} = +1 \\ 0 & \text{if } i = 1 \text{ and } h_{rh} = -1 \\ 0 & \text{if } i = 2 \text{ and } h_{rh} = +1 \\ 2 w_{hi} & \text{if } i = 2 \text{ and } h_{rh} = -1 \end{cases} $$

Or more concisely: $w_{hi}^{(r)} = w_{hi} \cdot (1 + h_{rh} \cdot (-1)^{i-1})$.

For Fay’s method: $w_{hi}^{(r)} = w_{hi} \cdot (1 + \epsilon \cdot h_{rh} \cdot (-1)^{i-1})$.

This is a single multiply-add per weight, vectorizable on modern CPUs. For $N$ observations, $H$ strata, and $R$ replicates, computing all replicate weights takes $O(NR)$ operations – linear in both sample size and replicate count.

Variance Estimation

The variance formula is:

$$ \hat{V}_{\text{Fay}} = \frac{1}{\epsilon^2 R} \sum_{r=1}^{R} (\hat{\theta}_r - \hat{\theta})^2 $$

The computational bottleneck is computing $\hat{\theta}_r$ for each replicate – the statistic itself, not the BRR machinery. For a mean, this is $O(N)$ per replicate, giving $O(NR)$ total. For more complex statistics (quantiles, regression coefficients, nonlinear functions), the per-replicate cost scales accordingly.

But crucially, the BRR overhead is negligible compared to computing the statistic itself. The Hadamard structure adds essentially no computational burden beyond the inherent cost of computing $R$ replicate estimates.

Parallelization

Replicate estimates are independent – each can be computed in parallel. This is trivial to implement:

1
2
3
4
parallel for r = 1 to R:
  theta_r = compute_statistic(data, replicate_weights[r])
  
variance = (1 / (epsilon^2 * R)) * sum((theta_r - theta)^2)

On modern multi-core systems, this gives near-linear speedup with the number of cores. For $R = 128$ replicates on a 16-core machine, we get roughly 16× acceleration.

The Hadamard structure imposes no dependencies between replicates – they’re mathematically independent, so they’re computationally independent. Elegant theory enables elegant implementation.

Memory Access Patterns

Hadamard matrices have excellent cache behavior. The Sylvester construction creates a recursive block structure where nearby rows share large subblocks of identical or sign-flipped entries.

When computing replicate weights sequentially ($r = 1, 2, 3, \ldots$), we access rows of the Hadamard matrix in order. Each row access brings the next row into cache (via prefetching). The block structure means consecutive rows often differ in only one block, minimizing cache misses.

For $\mathbf{H}_{1024}$, after processing row $r$, row $r+1$ shares at least 512 entries with $r$ (possibly sign-flipped). This locality is a direct consequence of the recursive construction – another gift from the mathematical structure.

Comparison to Alternative Methods

Let’s compare BRR to other variance estimation approaches:

Taylor Series Linearization (TSL):

  • Requires derivatives of the statistic
  • Complex for nonlinear statistics
  • Potentially biased for highly nonlinear estimators
  • But: only needs one computation of the statistic

Bootstrap:

  • Requires $B \gg R$ resamples (typically $B \geq 1000$)
  • Each resample is a random draw, no structure to exploit
  • Asymptotically consistent but computationally expensive
  • Memory-intensive if storing all bootstrap samples

Jackknife:

  • Requires $N$ replicates (one per observation)
  • For large samples, computationally prohibitive
  • Good for bias estimation, but inefficient for variance

BRR:

  • Fixed $R = O(\log N)$ replicates (via $R = 2^{\lceil \log_2 H \rceil}$)
  • Structured perturbations, efficient to generate and apply
  • Exact unbiasedness for linear statistics
  • Parallelizable with no dependencies

For a survey with $N = 100{,}000$ observations and $H = 200$ strata:

  • BRR: $R = 256$ replicates
  • Jackknife: 100,000 replicates
  • Bootstrap: 1,000+ replicates

BRR is $\sim$400× more efficient than jackknife, $\sim$4× more efficient than bootstrap, while maintaining theoretical rigor.

Software Implementation

Major statistical packages have highly optimized BRR implementations:

  • R: survey package (Thomas Lumley) – industry standard
  • SAS: PROC SURVEYMEANS with VARMETHOD=BRR
  • Stata: svy commands with vce(brr)
  • Python: samplics package

These implementations pre-generate Hadamard matrices up to common sizes (typically $n \leq 4096$), store them compressed, and use vectorized operations for weight computation. The result: variance estimation that feels instantaneous even for large surveys.

The Elegance Payoff

This is where mathematical elegance delivers practical dividends:

  1. Simple structure (recursive doubling) → fast generation
  2. Orthogonality → small number of replicates needed
  3. Independence → perfect parallelization
  4. $\pm 1$ entries → minimal storage, fast arithmetic
  5. Block structure → cache-friendly access patterns

Every theoretical property we admired has a computational benefit. The mathematics isn’t just “correct” – it’s efficient.

Scalability

How does BRR scale as surveys grow larger?

For $H$ strata:

  • Storage: $O(H \log H)$ (since $R = O(H)$ but typically $R = 2^{\lceil \log_2 H \rceil} < 2H$)
  • Generation time: $O(H \log H)$
  • Replicate weight computation: $O(NH)$ where $N$ is sample size
  • Variance computation: $O(NR) = O(NH)$

This is nearly optimal – we can’t do better than $O(N)$ since we must process every observation at least once.

Even for hypothetical massive surveys with $H = 10{,}000$ strata:

  • $R = 16{,}384$ replicates
  • Matrix storage: $\sim$20 MB
  • Generation: milliseconds
  • Practical? Absolutely.

The scalability isn’t just good – it’s effectively unlimited for any realistic survey design.

The Full Circle

We started with Hadamard studying determinants in 1893. We’ve arrived at health researchers in 2025 computing confidence intervals for obesity prevalence in milliseconds.

The path from abstract mathematics to practical impact is rarely this direct. But when mathematical structure aligns perfectly with a practical problem, the solution is both beautiful and effortless.

Hadamard matrices don’t just solve the BRR problem – they solve it with such elegance that the computation becomes invisible. That’s the hallmark of truly great mathematics applied well.


Closing Thoughts

The story of Hadamard matrices and BRR is a reminder that the boundaries between “pure” and “applied” mathematics are more permeable than they first appear. Hadamard sought to understand determinants. McCarthy sought to estimate variances. The connection between these goals wasn’t obvious, but it was profound.

When you next see “replicate weights” in a survey dataset, remember the 132-year journey from French academia to American statistical agencies to your analysis. Remember the orthogonality that makes it work, the recursion that makes it fast, and the history that makes it remarkable.

And perhaps most importantly: remember that today’s abstract mathematics – whatever seems purely theoretical, whatever lacks obvious application – may be tomorrow’s indispensable tool. The mathematics is patient. It waits to be discovered, not invented.