3 What are replicate weights

PASEC does not use a simple random sample of students. Instead, students are selected through a complex multi-stage sampling design involving schools, classrooms and students. As a result, observations are not statistically independent and standard formulas for standard errors are inappropriate.

If we ignored the sampling design and analysed the data using ordinary weighted statistics, point estimates such as means would often remain similar, but standard errors would usually be underestimated.

The idea behind replicate weights

Replicate weights provide a practical way of estimating sampling variance.

Rather than calculating standard errors directly from information about strata and primary sampling units, PASEC supplies a series of alternative weights known as replicate weights. Each replicate weight reflects a slightly modified version of the original sample.

The statistic of interest is calculated repeatedly:

Once using the final weight.
Once for each replicate weight.

The variation across these replicate estimates is then used to calculate the sampling variance.

Replicate weights in PASEC

PASEC 2019 uses the paired jackknife replication method.

In statistical resampling, the delete-one jackknife (leave-one-out) omits a single observation per iteration, generating n subsamples. The paired jackknife (delete-groups) omits predefined clusters or pairs of observations together. Delete-one is standard for simple, small datasets; the paired approach is essential for complex, stratified survey data where dropping single random observations would destroy the correlation structures and clustering inherent in the data. For practical purposes, this means that the replicate weights supplied in the PASEC datasets should be used as-is. Researchers do not need to create or modify them.

The student-level datasets contain:

Dataset	Final weight	Replicate weights
Grade 2	rwgt0	rwgt1-rwgt45
Grade 6	rwgt0	rwgt1-rwgt90

The final weight (rwgt0) is used to produce population estimates. The replicate weights are used to estimate standard errors.

Fortunately, researchers do not need to perform these calculations manually. R packages such as Rrepest and Stata’s survey commands perform these calculations automatically.

The paired jackknife variance estimator can be written as:

\[ \mathrm{Var}(\hat{\theta}) = \sum_{r=1}^{R} (\hat{\theta}_r - \hat{\theta})^2 \]

where

\(\hat{\theta}\) is the estimate using the final weight rwgt0
\(\hat{\theta}_r\) is the estimate using replicate weight \(r\) (rwgt1-rwgt45)
\(R\) is the number of replicate weights (\(R = 45\) for grade 2; \(R = 90\) for grade 6).

The standard error is \(\sqrt{\mathrm{Var}(\hat{\theta})}\)

Because age is an observed variable, only this sampling variance is needed — there is no imputation variance to combine.

3.1 Worked example: mean age in Grade 2, Benin

The following example estimates the weighted mean age (qe22) of Grade 2 students in Benin, excluding missing values coded as 99. Each mean is calculated using the final weight or one of the 45 paired jackknife replicate weights.

Step 1: Calculate the weighted mean using rwgt0: \[ \hat{\theta} = 7.5865 \] Step 2: Calculate the weighted mean using each replicate weight \(\hat{\theta}_r\).

	Mean
rwgt1	7.561751
rwgt2	7.562885
rwgt3	7.578231
rwgt4	7.587026
\(\vdots\)	\(\vdots\)
rwgt45	7.566877

The first and last few replicates are shown below; the same calculation is repeated for all 45 replicate weights.

Step 3: For each replicate, calculate the squared deviation from the final estimate:

	\(\hat{\theta}_r\)	\((\hat{\theta}_r - \hat{\theta})^2\)
rwgt1	7.561751	0.0000
rwgt2	7.562885	0.0000
rwgt3	7.578231	0.0001
rwgt4	7.587026	0.0000
\(\vdots\)	\(\vdots\)	\(\vdots\)
rwgt45	7.566877	0.0000

The replicate means are all very close to \(\hat{\theta}\), so each squared deviation is small.

Step 4: The sampling variance is the sum the squared deviations across all 45 replicates: \[ \mathrm{Var}(\hat{\theta}) = \sum_{r=1}^{45} (\hat{\theta}_r - \hat{\theta})^2 = 0.003582 \] Step 5: Standard error \[ SE = \sqrt{0.003582} = 0.0599 \] Summary

Quantity	Symbol	Value
Point estimate	\(\hat{\theta}\)	7.59
Sampling variance	\(\mathrm{Var}(\hat{\theta})\)	0.0036
Standard error	\(SE\)	0.06

The estimated mean age is 7.59 years with a standard error of 0.06 years. This matches the result obtained using repest or Rrepest in the Stata and R chapters, where the jackknife calculation is carried out automatically.