7 Analyzing PASEC 2019 data in Stata

This section provides a practical introduction to analysing PASEC 2019 data in Stata using the repest package. It is intended for users with a basic working knowledge of Stata and includes step-by-step examples covering common PASEC analyses.

7.1 Loading data in Stata

For hands-on examples we’ll start with the Grade 2 study.

* Set the path to your working folder if needed.
* Paths below are relative to the project root.

use "data/PASEC2019_GRADE2_INT.dta", clear
(Data file created by EpiData based on CIV_06_LIVRET_2A.rec)

It is good practice to run your analysis from a do-file rather than the command window, as this makes your work reproducible. All examples in this guide are written as do-file code available here.

7.2 Applying English labels

To encourage and assist non-French speaking users to analyse the rich PASEC data, we have supplied do files that apply English variable and value labels. These files are in the data/ folder of this project.

* Apply English labels

do "data/PASEC2019_Grade2_EN_labels.do"

* Show both numeric values and value labels 

numlabel, add 
qd26y: all characters numeric; replaced as int
(835 missing values generated)

qd27y: all characters numeric; replaced as int
(835 missing values generated)

qe21y: all characters numeric; replaced as int

qe21m: all characters numeric; replaced as byte

qe21d: all characters numeric; replaced as byte

The numlabel, add command displays the numeric code alongside the value label in tabulations (for example, 1. Benin rather than just Benin). This makes it easier to write if conditions using the correct numeric codes in subsequent commands.

After loading the file, let’s check if we have the most important variables that repest uses. The five plausible values for language and mathematics are LECT_PV1 to LECT_PV5 and MATHS_PV1 to MATHS_PV5. The final weight is rwgt0. The replicate weights are rwgt1 to rwgt45 and countries are indicated by ID_PAYS.

describe LECT_PV1-MATHS_PV5 ID_PAYS rwgt0 rwgt1-rwgt45
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
LECT_PV1        double  %10.0g                First plausible value in reading
LECT_PV2        double  %10.0g                Second plausible value in reading
LECT_PV3        double  %10.0g                Third plausible value in reading
LECT_PV4        double  %10.0g                Fourth plausible value in reading
LECT_PV5        double  %10.0g                Fifth plausible value in reading
MATHS_PV1       double  %10.0g                First plausible value in
                                                mathematics
MATHS_PV2       double  %10.0g                Second plausible value in
                                                mathematics
MATHS_PV3       double  %10.0g                Third plausible value in
                                                mathematics
MATHS_PV4       double  %10.0g                Fourth plausible value in
                                                mathematics
MATHS_PV5       double  %10.0g                Fifth plausible value in
                                                mathematics
ID_PAYS         float   %32.0g     country    Country identifier
rwgt0           double  %10.0g                Student replicate weight 0
rwgt1           double  %10.0g                Student replicate weight 1
rwgt2           double  %10.0g                Student replicate weight 2
rwgt3           double  %10.0g                Student replicate weight 3
rwgt4           double  %10.0g                Student replicate weight 4
rwgt5           double  %10.0g                Student replicate weight 5
rwgt6           double  %10.0g                Student replicate weight 6
rwgt7           double  %10.0g                Student replicate weight 7
rwgt8           double  %10.0g                Student replicate weight 8
rwgt9           double  %10.0g                Student replicate weight 9
rwgt10          double  %10.0g                Student replicate weight 10
rwgt11          double  %10.0g                Student replicate weight 11
rwgt12          double  %10.0g                Student replicate weight 12
rwgt13          double  %10.0g                Student replicate weight 13
rwgt14          double  %10.0g                Student replicate weight 14
rwgt15          double  %10.0g                Student replicate weight 15
rwgt16          double  %10.0g                Student replicate weight 16
rwgt17          double  %10.0g                Student replicate weight 17
rwgt18          double  %10.0g                Student replicate weight 18
rwgt19          double  %10.0g                Student replicate weight 19
rwgt20          double  %10.0g                Student replicate weight 20
rwgt21          double  %10.0g                Student replicate weight 21
rwgt22          double  %10.0g                Student replicate weight 22
rwgt23          double  %10.0g                Student replicate weight 23
rwgt24          double  %10.0g                Student replicate weight 24
rwgt25          double  %10.0g                Student replicate weight 25
rwgt26          double  %10.0g                Student replicate weight 26
rwgt27          double  %10.0g                Student replicate weight 27
rwgt28          double  %10.0g                Student replicate weight 28
rwgt29          double  %10.0g                Student replicate weight 29
rwgt30          double  %10.0g                Student replicate weight 30
rwgt31          double  %10.0g                Student replicate weight 31
rwgt32          double  %10.0g                Student replicate weight 32
rwgt33          double  %10.0g                Student replicate weight 33
rwgt34          double  %10.0g                Student replicate weight 34
rwgt35          double  %10.0g                Student replicate weight 35
rwgt36          double  %10.0g                Student replicate weight 36
rwgt37          double  %10.0g                Student replicate weight 37
rwgt38          double  %10.0g                Student replicate weight 38
rwgt39          double  %10.0g                Student replicate weight 39
rwgt40          double  %10.0g                Student replicate weight 40
rwgt41          double  %10.0g                Student replicate weight 41
rwgt42          double  %10.0g                Student replicate weight 42
rwgt43          double  %10.0g                Student replicate weight 43
rwgt44          double  %10.0g                Student replicate weight 44
rwgt45          double  %10.0g                Student replicate weight 45

7.3 The repest package

To get started, you need to install the repest package once. This package automates the handling of plausible values and replicate weights. This makes it easier to analyse PASEC data correctly in Stata. In the Stata command window, type:

ssc install repest, replace 

You only need to run this command once; there is no need to re-install repest each time you open Stata. The replace option updates the package if a newer version is available.

7.4 repest command syntax

The basic syntax of the repest command is as follows:

repest svyname [if] [in] , estimate(cmd [,cmd_options]) [options]
  • svyname: Either one of the study names supported by the package (e,g., PISA, TIMSS, PIRLS) or SVY which allows you to specify the survey design.

  • estimate(cmd [,cmd_options): Specifies the statistical command to run. cmd can be any Stata command that accepts weights — for example, mean, reg, qreg, or the built-in repest commands means, freq, summarize, corr and quantiletable. Command-specific options are passed after a comma within the parentheses.

7.5 Before You begin: Set up repest for PASEC

PASEC is not one of the studies supported by the repest package so you will need to use SVY for svyname and specify the survey parameters directly within the svyparm option.

Throughout this guide we use the SVY option in repest because PASEC is not one of the assessments with built-in survey specifications. All survey parameters must therefore be supplied explicitly using the svyparm() option.

PASEC uses the paired Jackknife method for creating the replicate weights. There are 90 replicate weights in the Grade 6 data and 45 replicate weights in the Grade 2 data. The final weight is given by rwgt0, the replicate weights are rwgt1 to rwgt90 in the Grade 6 data and rwgt1 to rwgt45 in the Grade 2 data. There are five sets of plausible values for each of mathematics, MATHS_PV1 to MATHS_PV5 and language LECT_PV1 to LECT_PV5.

The required parameters are as follows:

Survey setting svyparms() suboption PASEC 2019 Grade 6 PASEC 2019 Grade 2
Final weight final_weight_name() rwgt0 rwgt0
Replicate weights rep_weight_name() rwgt rwgt
Variance factor variancefactor() 1 1
Number of replications NREP() 90 45
Number of plausible values NBpv() 5 5

Commands for analyzing the grade 2 data will have the following syntax

repest SVY [if] [in] , estimate(cmd [,cmd_options]) [options] svyparm(NBpv(5) final_weight_name(rwgt0) rep_weight_name(rwgt) NREP(45) variancefactor(1)) 

For Grade 6, we need to specify that there are 90 replicate weights. Commands for grade 6 as follows

repest SVY [if] [in] , estimate(cmd [,cmd_options]) [options] svyparm(NBpv(5) final_weight_name(rwgt0) rep_weight_name(rwgt) NREP(90) variancefactor(1)) 

7.6 PASEC Analyses Examples

7.6.1 Calculating Mean Age

Let’s start by calculating the average age of students in Grade 2 in Benin. The age variable is qe22. Let’s inspect the ID_PAYS and qe22 variables.

tab ID_PAYS
              Country identifier |      Freq.     Percent        Cum.
---------------------------------+-----------------------------------
                        1. Benin |      1,654        7.54        7.54
                 2. Burkina Faso |      1,884        8.59       16.13
                      3. Burundi |      1,664        7.59       23.72
                     4. Cameroun |      1,780        8.12       31.83
                        5. Congo |      1,553        7.08       38.91
                6. Cote D'Ivoire |      1,332        6.07       44.99
                        7. Gabon |      1,157        5.28       50.26
                       8. Guinee |      1,086        4.95       55.21
                   9. Madagascar |      1,883        8.59       63.80
                       10. Niger |      1,730        7.89       71.69
11. Democratic Republic of Congo |      1,050        4.79       76.47
                     12. Senegal |      1,341        6.11       82.59
                        13. Chad |      1,727        7.87       90.46
                        14. Togo |      2,092        9.54      100.00
---------------------------------+-----------------------------------
                           Total |     21,933      100.00
tab qe22
       Student age |
            [4–16] |      Freq.     Percent        Cum.
-------------------+-----------------------------------
                 4 |          2        0.01        0.01
                 5 |         83        0.38        0.39
                 6 |      1,295        5.90        6.29
                 7 |      5,467       24.93       31.22
                 8 |      6,923       31.56       62.78
                 9 |      4,261       19.43       82.21
                10 |      2,099        9.57       91.78
                11 |        800        3.65       95.43
                12 |        458        2.09       97.52
                13 |        189        0.86       98.38
                14 |         73        0.33       98.71
                15 |         41        0.19       98.90
                16 |         19        0.09       98.98
                17 |          8        0.04       99.02
                18 |          1        0.00       99.02
                19 |          1        0.00       99.03
                20 |          1        0.00       99.03
                22 |          1        0.00       99.04
       99. Missing |        211        0.96      100.00
-------------------+-----------------------------------
             Total |     21,933      100.00

NOTE: Benin is coded as \(1\). Missing age values are coded as \(99\). Because \(99\) is a placeholder used to indicate missing data rather than a learner’s actual age, these observations should be excluded from analyses involving age. Failure to do so will produce misleading results. To calculate the average age of students in Grade \(2\) in Benin, the syntax is as follows:

*Calculating average age in Benin 

repest SVY if ID_PAYS==1&qe22<99, estimate(mean qe22) svyparm(NBpv(5) final_weight_name(rwgt0) rep_weight_name(rwgt) NREP(45) variancefactor(1)) 
(file C:\Users\cash\AppData\Local\Temp\ST_912c_000005.tmp not found)
file C:\Users\cash\AppData\Local\Temp\ST_912c_000005.tmp saved

_pooled.
 : _pooled
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        qe22 |    7.56647   .0598516   126.42   0.000     7.449163    7.683777
------------------------------------------------------------------------------

Because age is an observed variable rather than a plausible-value variable, repest uses only the sampling variance derived from the replicate weights. The reported standard error therefore reflects uncertainty arising from the sample design but not measurement uncertainty.

The estimated mean age of Grade 2 students in Benin is \(7.57\) years (\(95\%\) CI: \(7.45–7.68\)). This reflects the substantial grade repetition and late entry common in the region, with many students aged 9–11 also enrolled in Grade 2.

Alternative to generate the same result - svy

PASEC provides replicate weights specifically so that users can reproduce the official variance estimation procedure.

If the public data contain:

  • the final sampling weight,

  • the sampling strata,

  • the primary sampling units (PSUs),

and these accurately reflect the actual design, then

svyset psu [pw=weight], strata(strata) 
svy: mean score 

will generally produce design-consistent standard errors.

However, this is not always possible as

  1. The public design variables may be incomplete or masked.

  2. The weighting process may include calibration steps not reflected in the released design variables.

Furthermore, using ordinary survey declarations with svyset will generally not reproduce the official PASEC standard error.

Stata’s svyset can be configured to use the paired jackknife replicate weights, producing the standard errors as official PASEC results.

The syntax to generate the same result is

*set up for svyset to generate same result: 

svyset [pweight=rwgt0], vce(jackknife) jkrweight(rwgt1-rwgt45, multiplier(1) fpc(0)) mse 
svy: mean qe22 if ID_PAYS==1&qe22<99 

7.6.2 Calculating Mean Mathematics Proficiency

Now we will use the \(5\) plausible values to estimate mean mathematics scores in Benin.

Plausible Values are a set of multiple imputations. The repest package automatically recognises plausible values when the variable name contains the \(@\) symbol.

For example:

estimate(mean MATHS_PV@)

tells repest to:

-analyse all five mathematics plausible values;

-combine results appropriately; and

-calculate standard errors that reflect both sampling and measurement uncertainty.

This allows researchers to obtain valid estimates without having to implement the multiple-imputation calculations manually.

*Calculating mean mathematics scores in Benin 

repest SVY if ID_PAYS==1, estimate(mean MATHS_PV@) svyparm(NBpv(5) final_weight_name(rwgt0) rep_weight_name(rwgt) NREP(45) variancefactor(1))
(file C:\Users\cash\AppData\Local\Temp\ST_a0a0_000005.tmp not found)
file C:\Users\cash\AppData\Local\Temp\ST_a0a0_000005.tmp saved

_pooled.....
 : _pooled
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   MATHS_PV_ |   525.0697   7.159119    73.34   0.000      511.038    539.1013
------------------------------------------------------------------------------

The estimated mean mathematics proficiency in Benin is \(525\) points. The standard error of \(7.16\) reflects both sampling uncertainty and uncertainty arising from the plausible values. The confidence interval indicates that the population mean is likely to lie between \(511\) and \(539\) points.

PASEC proficiency scales are scaled using IRT procedures and should primarily be interpreted comparatively. The absolute value of the scale has no substantive meaning independent of the proficiency framework.

Unlike age, mathematics proficiency is represented by five plausible values. repest estimates the mean separately for each plausible value, combines the five estimates using multiple-imputation formulas, and then incorporates the replicate-weight variance to produce the final standard error.

Alternative to generate the same result - pv

An alternative Stata command for analysing plausible values is pv. The syntax to generate the same result is

pv, pv(LECT_PV*) jrr jk(2) weight(rwgt0) rw(rwgt1-rwgt45): mean  @pv [aw = @w] if ID_PAYS==1 

Now let’s get means of mathematics and literacy for each country by including the by option.

*Means of mathematics and literacy for each country 

repest SVY, estimate(mean MATHS_PV@ LECT_PV@) svyparm(NBpv(5) final_weight_name(rwgt0) rep_weight_name(rwgt) NREP(45) variancefactor(1)) by(ID_PAYS) 
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(file C:\Users\cash\AppData\Local\Temp\ST_9148_000005.tmp not found)
file C:\Users\cash\AppData\Local\Temp\ST_9148_000005.tmp saved

1.....
ID_PAYS : 1
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   MATHS_PV_ |   525.0697   7.159119    73.34   0.000      511.038    539.1013
    LECT_PV_ |   524.8164   7.711942    68.05   0.000     509.7013    539.9315
------------------------------------------------------------------------------

2.....
ID_PAYS : 2
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   MATHS_PV_ |   498.7166   8.233047    60.57   0.000     482.5801    514.8531
    LECT_PV_ |   493.4861   9.747319    50.63   0.000     474.3817    512.5905
------------------------------------------------------------------------------

3.....
ID_PAYS : 3
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   MATHS_PV_ |   614.3862   2.370557   259.17   0.000       609.74    619.0324
    LECT_PV_ |   624.9706   4.534335   137.83   0.000     616.0835    633.8578
------------------------------------------------------------------------------

4.....
ID_PAYS : 4
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   MATHS_PV_ |   516.7354   7.994531    64.64   0.000     501.0664    532.4044
    LECT_PV_ |   522.1636   8.392591    62.22   0.000     505.7144    538.6127
------------------------------------------------------------------------------

5.....
ID_PAYS : 5
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   MATHS_PV_ |   591.9041   6.298265    93.98   0.000     579.5598    604.2485
    LECT_PV_ |   582.4071   7.480158    77.86   0.000     567.7462    597.0679
------------------------------------------------------------------------------

6.....
ID_PAYS : 6
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   MATHS_PV_ |   522.5248   4.062804   128.61   0.000     514.5618    530.4877
    LECT_PV_ |   516.5937   5.404587    95.58   0.000     506.0009    527.1865
------------------------------------------------------------------------------

7.....
ID_PAYS : 7
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   MATHS_PV_ |   595.9217   9.399738    63.40   0.000     577.4986    614.3449
    LECT_PV_ |   610.2501    14.4539    42.22   0.000     581.9209    638.5792
------------------------------------------------------------------------------

8.....
ID_PAYS : 8
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   MATHS_PV_ |   519.3488   9.395843    55.27   0.000     500.9333    537.7644
    LECT_PV_ |    469.039   10.25872    45.72   0.000     448.9322    489.1457
------------------------------------------------------------------------------

9.....
ID_PAYS : 9
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   MATHS_PV_ |   549.7152   3.798524   144.72   0.000     542.2702    557.1602
    LECT_PV_ |   568.8426   6.895106    82.50   0.000     555.3285    582.3568
------------------------------------------------------------------------------

10.....
ID_PAYS : 10
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   MATHS_PV_ |   544.9191   6.358256    85.70   0.000     532.4571     557.381
    LECT_PV_ |   534.6824   7.193762    74.33   0.000     520.5829     548.782
------------------------------------------------------------------------------

11.....
ID_PAYS : 11
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   MATHS_PV_ |   567.7779   8.240901    68.90   0.000      551.626    583.9298
    LECT_PV_ |   530.9825    10.5383    50.39   0.000     510.3278    551.6372
------------------------------------------------------------------------------

12.....
ID_PAYS : 12
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   MATHS_PV_ |   563.4415   6.083019    92.63   0.000      551.519     575.364
    LECT_PV_ |   557.1325   9.344315    59.62   0.000      538.818     575.447
------------------------------------------------------------------------------

13.....
ID_PAYS : 13
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   MATHS_PV_ |   522.4416   6.840793    76.37   0.000     509.0338    535.8493
    LECT_PV_ |   508.4999     7.8032    65.17   0.000     493.2059    523.7939
------------------------------------------------------------------------------

14.....
ID_PAYS : 14
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   MATHS_PV_ |   489.3993   5.288786    92.54   0.000     479.0335    499.7651
    LECT_PV_ |   474.8978   7.203616    65.92   0.000      460.779    489.0167
------------------------------------------------------------------------------

There is substantial variation in achievement across participating countries. Burundi records the highest Grade 2 mathematics score (\(614\) points), while Togo records the lowest (\(489\) points). Reading and mathematics performance are strongly correlated across countries, although some countries perform relatively better in one domain than the other. Formal statistical comparisons between countries should be conducted using regression models rather than visual inspection of the means alone.

Notice that standard errors vary considerably across countries - Burundi (country \(3\)) has notably smaller standard errors than countries such as Gabon (\(7\)) or Guinea (\(8\)). This reflects differences in the homogeneity of performance within schools and the precision of the sampling design in each country.

7.6.3 Moving beyond means: summary statistics with repest

repest has some built in commands that are very useful for analysing large scale assessments, including summary statistics that focus on the full distribution.

The summarize command generates point estimates and standard errors for a range of statistics beyond the mean, such as percentiles, standard deviations etc.

*First create an indicator for girl 

gen girl=qe23==2 if qe23<9 

*Beyond means - looking at the distribution of mathematics scores in Benin 

repest SVY if ID_PAYS==1, estimate(summarize MATHS_PV@, stats(mean sd p5 p25 p50 p75 p95)) by(girl) svyparm(NBpv(5) final_weight_name(rwgt0) rep_weight_name(rwgt) NREP(45) variancefactor(1)) 
(28 missing values generated)

0 1
(file C:\Users\cash\AppData\Local\Temp\ST_8294_000005.tmp not found)
file C:\Users\cash\AppData\Local\Temp\ST_8294_000005.tmp saved

0.....
girl : 0
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
MATHS_PV__~n |   535.7972   7.346251    72.93   0.000     521.3988    550.1956
MATHS_PV__sd |   106.7786   7.377892    14.47   0.000     92.31815     121.239
MATHS_PV__p5 |   371.9733   9.424126    39.47   0.000     353.5023    390.4442
MATHS_PV_~25 |   460.3875   7.622528    60.40   0.000     445.4476    475.3274
MATHS_PV_~50 |   536.3236   5.948664    90.16   0.000     524.6644    547.9828
MATHS_PV_~75 |   598.2395   9.216155    64.91   0.000     580.1762    616.3028
MATHS_PV_~95 |   713.3866   33.84142    21.08   0.000     647.0587    779.7146
------------------------------------------------------------------------------

1.....
girl : 1
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
MATHS_PV__~n |   513.0814   7.796753    65.81   0.000     497.8001    528.3628
MATHS_PV__sd |   101.0514   6.809575    14.84   0.000     87.70485    114.3979
MATHS_PV__p5 |   364.1102   10.49694    34.69   0.000     343.5366    384.6839
MATHS_PV_~25 |   441.1321   7.258763    60.77   0.000     426.9052     455.359
MATHS_PV_~50 |   507.5715    8.92481    56.87   0.000     490.0792    525.0638
MATHS_PV_~75 |   575.9452   12.09901    47.60   0.000     552.2316    599.6588
MATHS_PV_~95 |   682.0938    23.3877    29.16   0.000     636.2547    727.9328
------------------------------------------------------------------------------

Looking beyond average performance reveals important differences across the distribution. Boys in Benin have higher mathematics scores at the mean, median and upper tail of the distribution. For example, the median score among boys is approximately \(536\) points compared with \(508\) points among girls. The percentile estimates show that the gender gap is not confined to a small group of high-performing students but is evident across much of the achievement distribution.

The quantiletable command creates quantile tables.

*Quintile table for mathematics and language in Benin 

repest SVY if ID_PAYS==1, estimate(quantiletable MATHS_PV@ LECT_PV@, nquantiles(5)) svyparm(NBpv(5)  final_weight_name(rwgt0) rep_weight_name(rwgt) NREP(45) variancefactor(1)) 
(file C:\Users\cash\AppData\Local\Temp\ST_29e4_000005.tmp not found)
file C:\Users\cash\AppData\Local\Temp\ST_29e4_000005.tmp saved

_pooled.....
 : _pooled
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
MATHS_PV__q1 |    387.316   5.641749    68.65   0.000     376.2584    398.3737
MATHS_PV__q2 |   465.0336   5.777279    80.49   0.000     453.7104    476.3569
MATHS_PV__q3 |   522.5471   6.296739    82.99   0.000     510.2057    534.8885
MATHS_PV__q4 |   575.5634   7.330665    78.51   0.000     561.1956    589.9313
MATHS_PV__q5 |   675.3612   18.45676    36.59   0.000     639.1866    711.5358
 LECT_PV__q1 |   417.5467   7.716463    54.11   0.000     402.4227    432.6707
 LECT_PV__q2 |   474.0085    6.38251    74.27   0.000      461.499    486.5179
 LECT_PV__q3 |   510.6889   7.230566    70.63   0.000     496.5172    524.8605
 LECT_PV__q4 |   561.0914   11.73183    47.83   0.000     538.0974    584.0853
 LECT_PV__q5 |   661.1535   20.08941    32.91   0.000      621.779     700.528
------------------------------------------------------------------------------

The quantile table divides the mathematics and reading distributions into five equal-sized groups. The reported values correspond to the score thresholds separating adjacent quintiles. For example, students scoring above approximately \(675\) points in mathematics belong to the highest-performing quintile in Benin.

Other built-in repest commands are means, freq and corr.

7.6.4 Estimating PASEC proficiency levels

PASEC defines a set of proficiency levels that describe what students know and can do at different points along the mathematics scale. In this example, we estimate the proportion of Grade 2 students in Benin who fall into each of the four PASEC mathematics proficiency levels.

The four proficiency levels for Grade 2 mathematics are defined by PASEC as follows:

PASEC Proficiency Level Score
1 \(< 400.34\)
2 \(400.34 – 489.03\)
3 \(489.03–577.73\)
4 \(>577.73\)

These boundaries are set on the PASEC international scale and are the same for all countries, enabling cross-country comparisons. If we want to provide estimates for the PASEC defined levels of proficiency, we first need to create dummy variables for each level.

*Use PASEC defined levels 

foreach var of varlist MATHS_PV1-MATHS_PV5{ 

gen prop1_`var'=`var'<=400.34 if `var'<. 

gen prop2_`var'=`var'>400.34&`var'<=489.03 if `var'<.    

gen prop3_`var'=`var'>489.03&`var'<577.73 if `var'<.     

gen prop4_`var'=`var'>577.73 if `var'<.  

} 

repest SVY  if ID_PAYS==1, estimate(mean prop1_MATHS_PV@ prop2_MATHS_PV@ prop3_MATHS_PV@ prop4_MATHS_PV@ ) svyparm(NBpv(5) final_weight_name(rwgt0) rep_weight_name(rwgt) NREP(45) variancefactor(1)) 
(file C:\Users\cash\AppData\Local\Temp\ST_a784_000005.tmp not found)
file C:\Users\cash\AppData\Local\Temp\ST_a784_000005.tmp saved

_pooled.....
 : _pooled
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
prop1_MATH~_ |    .109777   .0135943     8.08   0.000     .0831327    .1364213
prop2_MATH~_ |   .2709238   .0186251    14.55   0.000     .2344192    .3074284
prop3_MATH~_ |   .3307259   .0207808    15.92   0.000     .2899964    .3714554
prop4_MATH~_ |   .2885734   .0256526    11.25   0.000     .2382951    .3388516
------------------------------------------------------------------------------

The results indicate that approximately \(11\%\) of Grade 2 students in Benin fall below Level 1 (the lowest proficiency threshold), while around \(29\%\) reach the highest level. The largest share of students falls in the middle proficiency levels. Because these estimates are derived from plausible values, the reported standard errors incorporate both measurement and sampling uncertainty.

7.6.5 Testing for differences between groups

The table below highlights the distinction between two common types of comparisons in PASEC data: within-country comparisons (e.g., boys versus girls in the same country) and between-country comparisons (e.g., Benin versus Burkina Faso). Because these comparisons involve different survey structures, they require slightly different analytical approaches.

The examples that follow demonstrate how to test for differences in achievement between groups using regression models in repest.

Comparison Type What it measures Survey design impact
Within-Country (e.g., Boys vs. Girls in Senegal) The gap between two demographic subgroups who share the same sampling strata, schools, and teachers. High covariance. Because groups are clustered together in the same schools, their errors are correlated.
Between-Country (e.g., Senegal vs Benin) The gap between two entirely independent populations with completely separate sampling frames Zero covariance. Sampling units in Country A have no mathematical relationship to sampling units in Country B

In repest you should use over for within-country comparisons and by for between-country comparisons. You must not use over for countries. When using svy = "SVY", THE over() option is designed for within-country subgroup comparisons only. Using regression achieves the same goal and works correctly with the SVY option.

7.6.5.1 Testing for differences within countries - differences between boys and girls

Are there differences in mathematics scores between boys and girls? We can use linear regression to test for this.

*Test for differences between boys and girls mathematics scores in Benin

repest SVY if ID_PAYS==1, estimate(reg MATHS_PV@ girl) svyparm(NBpv(5) final_weight_name(rwgt0) rep_weight_name(rwgt) NREP(45) variancefactor(1)) 
(file C:\Users\cash\AppData\Local\Temp\ST_7208_000005.tmp not found)
file C:\Users\cash\AppData\Local\Temp\ST_7208_000005.tmp saved

_pooled.....
 : _pooled
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        girl |  -22.71577   4.891148    -4.64   0.000    -32.30224   -13.12929
       _cons |   535.7972   7.346251    72.93   0.000     521.3988    550.1956
------------------------------------------------------------------------------

The coefficient on girl provides the estimated difference in mathematics proficiency between girls and boys, together with the appropriate standard error and significance test. On average, girls’ scores are \(22.7\) points lower than boys in Benin.

What about differences in the proportion reaching PASEC level \(4\)?

*Test differences between boys and girls in percentage reaching PASEC level 4 in Benin

repest SVY if ID_PAYS==1, estimate(reg prop4_MATHS_PV@  girl) svyparm(NBpv(5) final_weight_name(rwgt0) rep_weight_name(rwgt) NREP(45) variancefactor(1)) 
(file C:\Users\cash\AppData\Local\Temp\ST_48bc_000005.tmp not found)
file C:\Users\cash\AppData\Local\Temp\ST_48bc_000005.tmp saved

_pooled.....
 : _pooled
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        girl |  -.0850427   .0286047    -2.97   0.003    -.1411068   -.0289785
       _cons |   .3287349   .0268158    12.26   0.000     .2761768    .3812929
------------------------------------------------------------------------------

Girls are \(8.5\) percentage points less likely to reach the highest proficiency level in mathematics than boys in Benin.

7.6.5.2 Testing for differences between countries

Are there differences in mean mathematics scores between Benin and Burkina Faso?

*Test differences in mathematics scores between Benin and Burkina Faso 

repest SVY if ID_PAYS<=2, estimate(reg MATHS_PV@ i.ID_PAYS) svyparm(NBpv(5) final_weight_name(rwgt0) rep_weight_name(rwgt) NREP(45) variancefactor(1))  
(file C:\Users\cash\AppData\Local\Temp\ST_6328_000005.tmp not found)
file C:\Users\cash\AppData\Local\Temp\ST_6328_000005.tmp saved

_pooled.....
 : _pooled
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
 _1b_ID_PAYS |          0  (omitted)
  _2_ID_PAYS |  -26.35308   11.12174    -2.37   0.018     -48.1513   -4.554867
       _cons |   525.0697   7.159119    73.34   0.000      511.038    539.1013
------------------------------------------------------------------------------

Mathematics proficiency levels are \(26.4\) points lower in Burkina Faso compared to Benin, a statistically significant difference.

7.6.5.3 Testing for differences in percentiles

You cannot use standard linear regression (regress) because it only models the conditional arithmetic mean. Instead, to test differences in percentiles, you must use quantile regression (qreg) .

Quantile regression works exactly like linear regression, but instead of drawing a line through the average of the data, it draws a line through a specific percentile (quantile) of the data.

  • qreg, quantile(0.50) models the Median (\(50th\) Percentile).

  • qreg, quantile(0.25) models the \(25th\) Percentile (Lower Bound).

  • qreg, quantile(0.75) models the \(75th\) Percentile (Upper Bound).

Just like regress, qreg returns standard coefficients (\(\beta\)) and standard error matrices (\(e(b)\)).

If we want to test for differences in percentiles, we can use quantile regression

*Test for differences at the 25th and 75th percentile between boys and girls in Benin 

repest SVY if ID_PAYS==1, estimate(qreg MATHS_PV@ girl, quantile(0.25)) svyparm(NBpv(5) final_weight_name(rwgt0) rep_weight_name(rwgt) NREP(45) variancefactor(1))
(file C:\Users\cash\AppData\Local\Temp\ST_4d9c_000005.tmp not found)
file C:\Users\cash\AppData\Local\Temp\ST_4d9c_000005.tmp saved

_pooled.....
 : _pooled
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        girl |  -19.25537   8.848027    -2.18   0.030    -36.59719    -1.91356
       _cons |   460.3875   7.622528    60.40   0.000     445.4476    475.3274
------------------------------------------------------------------------------
repest SVY if ID_PAYS==1, estimate(qreg MATHS_PV@ girl, quantile(0.75)) svyparm(NBpv(5) final_weight_name(rwgt0) rep_weight_name(rwgt) NREP(45) variancefactor(1))  
(file C:\Users\cash\AppData\Local\Temp\ST_743c_000005.tmp not found)
file C:\Users\cash\AppData\Local\Temp\ST_743c_000005.tmp saved

_pooled.....
 : _pooled
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        girl |  -22.29431   9.654296    -2.31   0.021    -41.21638   -3.372237
       _cons |   598.2395   9.216155    64.91   0.000     580.1762    616.3028
------------------------------------------------------------------------------

The gap between boys and girls increases from \(19.3\) points at the \(25th\) percentile to \(22.3\) points at the \(75th\) percentile.

7.6.6 Multivariate Regression

Now let’s use the Grade \(6\) data. There are \(90\) replicate weights so we will need to replace NREP(45) with NREP(90) in the svyparm() suboptions.

Grade 6 setup

repest SVY [if] [in] , estimate(cmd [,cmd_options]) [options] svyparm(NBpv(5) final_weight_name(rwgt0) rep_weight_name(rwgt) NREP(90) variancefactor(1)) 

In this example, we’ll analyze the association between students’ math scores (MATHS_PV@) and their gender (qe63) and socio-economic status (ses) in Cote D’Ivoire. The ses variable is a composite socioeconomic status index derived from the PASEC household questionnaire.

* Load Grade 6 data
use "data/PASEC2019_GRADE6_INT.dta", clear
(PASEC2019_LIVRET_D_FIN_PRIMAIRE)
* Apply English labels
do "data/PASEC2019_Grade6_EN_labels.do"

* Show both numeric values and value labels
numlabel, add
qd26y: all characters numeric; replaced as int
(1339 missing values generated)

qd27y: all characters numeric; replaced as int
(1339 missing values generated)

qe61y: all characters numeric; replaced as int

qe61m: all characters numeric; replaced as byte

qe61d: all characters numeric; replaced as byte
* Fist generate an indicator variable for girl 

gen girl=qe63==2 if qe63<9 
(16 missing values generated)
*Association between mathematics scores and socio-economic status and gender 

repest SVY if ID_PAYS==6, estimate(reg MATHS_PV@ ses girl) svyparm(NBpv(5) final_weight_name(rwgt0) rep_weight_name(rwgt) NREP(90) variancefactor(1)) results(add(r2 N)) 
(file C:\Users\cash\AppData\Local\Temp\ST_52dc_000005.tmp not found)
file C:\Users\cash\AppData\Local\Temp\ST_52dc_000005.tmp saved

_pooled.....
 : _pooled
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         ses |    2.17009   .3883977     5.59   0.000     1.408845    2.931335
        girl |  -11.62055   2.855666    -4.07   0.000    -17.21755   -6.023549
       _cons |   349.0048    18.6696    18.69   0.000      312.413    385.5965
        e_r2 |   .0786266   .0248018     3.17   0.002      .030016    .1272371
         e_N |       3800   207.3596    18.33   0.000     3393.583    4206.417
------------------------------------------------------------------------------

The results(add(r2 N)) option: This option allows you to add extra statistics to the results table, which Stata stores by default. Here, we add R-squared (e_r2) - a measure of model fit, and number of observations (e_N). Note that repest stores these additional statistics internally as e_r2 and e_N, and these are the names shown in the output table.

The regression estimates the association between mathematics proficiency, socioeconomic status (SES), and gender among Grade \(6\) students in Côte d’Ivoire while correctly accounting for both plausible values and the PASEC replicate-weight design.

The results indicate a strong positive association between socioeconomic status and mathematics achievement. A one-unit increase in the SES index is associated with an increase of approximately \(2.17\) points in mathematics proficiency (coefficient \(= 2.17\), \(p < 0.001\)). This relationship is statistically significant, suggesting that students from more advantaged socioeconomic backgrounds tend to perform better in mathematics.

The coefficient on the female indicator (girl) is \(-11.62\) (\(p < 0.001\)), indicating that girls score, on average, about 12 points lower than boys with similar socioeconomic status. Because SES is included in the model, this estimated gender gap reflects differences between boys and girls after accounting for socioeconomic background.

The constant term (\(349\)) represents the predicted mathematics score for a boy with an SES value of zero. While the intercept is necessary for estimation, it is typically of limited substantive interest because an SES value of zero may not correspond to a meaningful student profile.

Overall, the results suggest that both socioeconomic status and gender are important correlates of mathematics achievement in Côte d’Ivoire. Higher socioeconomic status is associated with better performance, while girls perform less well than boys on average, even after controlling for socioeconomic differences.

It is important to remember that these coefficients describe associations rather than causal effects. The regression does not imply that increasing a student’s socioeconomic status by one unit would necessarily increase their mathematics score by \(2.17\) points. Other factors associated with socioeconomic status, such as school quality, parental education, household resources, and learning opportunities, may also contribute to the observed relationship.