Chapter 5 Intro to data wrangling with tidyverse

The tidyverse is a modern collection of R packages designed to make working with data easier and more consistent. Now, before we begin data wrangling with dplyr, let’s take a closer look at how the tidyverse works and why it has become so central to modern R practice.

5.1 What makes the tidyverse special?

The tidyverse is not just a group of packages - it represents a philosophy of data analysis. Its design emphasizes clarity, consistency, and reproducibility. When you learn one tidyverse package, many of the same ideas and syntax patterns carry over to others.

At its heart are three key ideas:

  1. Tidy data principles
  2. Every tidyverse package expects data to be in tidy format, where:
    • each variable forms a column;
    • each observation forms a row, and
    • each value has its own cell.
  3. A consistent syntax and grammar .

Most tidyverse functions are verb-based and take a data frame as their first argument. In this context, a verb simply means a function that performs one clear action on your data - such as filtering, sorting, or summarising. Because each function both takes in and returns a data frame, you can easily chain them together in a logical sequence. This leads us to the next key feature.

5.2 Why learn dplyr first?

Among all tidyverse packages, dplyr is often the best starting point because it provides a simple, intuitive grammar for data manipulation. It introduces five core verbs - filter(), arrange(), mutate(), summarise(), and group_by() - that cover most day-to-day data wrangling tasks.

Once you understand how these verbs interact with each other (and with the pipe), you will find it much easier to explore, clean, and prepare real data for analysis or visualization.

In the next section, we will use dplyr verbs to clean and transform data, and combine operations into pipelines for more readable code. By the end, you will not only know how to use dplyr, but also start thinking the tidyverse way-using clear, expressive code that mirrors the logic of your analysis.

5.3 Data wrangling with dplyr

So far, we have been using base Rto learn the fundamentals - how to navigate RStudio, create and work with different data types, and produce simple descriptive summaries. When we learned to import data, we also took a brief, first look at the tidyverse, a collection of modern R packages that make working with data easier and more consistent.

Now, we are going to explore one of the most widely used tidyverse packages, dplyr , which provides a simple and intuitive way to manipulate data. With dplyr, you can filter, arrange, transform, and summarise data using functions that read almost like plain English. These functions (called verbs) work seamlessly together, and when combined with the pipe operator (%>%), they allow you to build clear, step-by-step workflows that are both efficient and easy to follow.

By the end of this lesson you should be able to:

  • Save data efficiently by converting a .dta file into an .RData object.

  • Apply data transformation techniques using the five key dplyr verbs:

    • filter()

    • arrange()

    • mutate()

    • summarise()

    • group_by()

  • Use the pipe operator (%>%) to write cleaner and more readable R code.

  • Combine multiple dplyr verbs to perform complex data manipulations (e.g., grouped filtering or mutating).

  • Perform various join operations to merge and relate data across multiple tables.

5.4 Import Data

ICAN-ICAR 2025 is a nationally representative, household-based survey of \(89,185\) children in \(56,913\) households. The survey assesses children aged 5-16 on foundational numeracy and reading skills.

Download the file “ican-icar-2025-v1.dta” from DataFirst and save it to the data directory in your main project folder (make a folder called data if you haven’t already). Note that the metadata for this data is found on the site.

# load the data

dat = read_csv("data/ican-icar-2025-v1.csv")

Load the tidyverse collection of packages, which loads the following packages: ggplot2, tibble, tidyr, readr, purrr, and dplyr.

  • ggplot2: For data visualization (this will be revisited in detail later)
  • tibble: For creating and working with tidy data frames
  • tidyr: For tidying messy data
  • readr:For importing rectangular data (like CSV files) quickly and efficiently
  • purrr:For functional programming
  • dplyr: For data manipulation using verbs like filter(), mutate(), summarise(), and arrange().

Load the tidyverse by running the command below:

library(tidyverse)

5.5 Working with Tibbles

Tibbles are data frames (under the hood) that work very well with tidyverse packages. They are easier to read and manipulate than traditional data frames, especially when you have many rows or columns. In addition, each column reports its data type, a nice feature borrowed from str(). If your data is currently a standard data frame, convert it to a tibble using as_tibble().

Tibbles are designed so that you do not accidently overwhelm your console when you print large data frames.

Here’s an example.

# convert ratings to a "tibble"
dat = as_tibble(dat)

A nice feature of tibbles is that if you display them in the console (by typing dat, for example) only the first 10 rows and all columns are shown.

dat
## # A tibble: 96,452 × 147
##     ...1 CountryName TierOneUnit VillageID Location HHID       SubmissionDate duration hh06a hh06b1 hh06b2 hh06c hh06d hh07a hh07b hh07c
##    <dbl> <chr>       <chr>       <chr>     <chr>    <chr>      <date>            <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     1 Mozambique  c101        c101      Urban    uuid:f580… 2025-09-23          979     4      1     NA     1     1     2     2     1
##  2     2 Mozambique  c101        c101      Urban    uuid:b98e… 2025-09-23          986     7      1     NA     1     1     2     2     1
##  3     3 Mozambique  c101        c101      Urban    uuid:2611… 2025-09-23         2167     6      1     NA     1     1     2     2     1
##  4     4 Mozambique  c101        c101      Urban    uuid:d46b… 2025-09-23         2078     6      1     NA     1     1     1     1     1
##  5     5 Mozambique  c101        c101      Urban    uuid:df92… 2025-09-23         1251     5      1     NA     1     1     2     2     1
##  6     6 Mozambique  c101        c101      Urban    uuid:ade5… 2025-09-23         2199     5      1     NA     1     1     2     2     1
##  7     7 Mozambique  c101        c101      Urban    uuid:3f84… 2025-09-23         2889     5      1     NA     1     1     2     2     1
##  8     8 Mozambique  c101        c101      Urban    uuid:ff7b… 2025-09-23         1588     4      1     NA     1     0     2     2     1
##  9     9 Mozambique  c101        c101      Urban    uuid:7730… 2025-09-23         2203     4      1     NA     1     1     2     2     1
## 10    10 Mozambique  c101        c101      Urban    uuid:e210… 2025-09-24         4647     5      1     NA     2     1     2     2     1
## # ℹ 96,442 more rows
## # ℹ 131 more variables: hh07d <dbl>, hh07e <dbl>, hh07f <dbl>, hh07g <dbl>, hh07h <dbl>, hh07i <dbl>, hh07j <dbl>, hh07k <dbl>,
## #   hh07l <dbl>, hh07m <dbl>, hh07n <dbl>, hh07o <dbl>, hh07p <dbl>, hh07p_3 <dbl>, hh07q <dbl>, ChildID <chr>, ch02 <dbl>, ch03 <dbl>,
## #   ch04a <dbl>, ch04b <dbl>, ch04c <dbl>, ch04d <dbl>, ch04e <dbl>, ch04f <dbl>, ch05 <dbl>, ch06a <dbl>, ch06b <dbl>, ch06c <dbl>,
## #   ch06d <dbl>, ch06e <dbl>, ch07a <dbl>, ch07b <dbl>, ch07c <dbl>, ch07d <dbl>, ch08 <dbl>, ch09 <dbl>, ch10a <dbl>, ch10b <dbl>,
## #   ch10c <dbl>, pt00 <dbl>, pt01b <dbl>, pt01c <dbl>, pt01d <dbl>, pt01e <dbl>, pt01f <dbl>, pt02b <dbl>, pt02c <dbl>, pt02d <dbl>,
## #   pt02e <dbl>, pt02f <dbl>, assessment <dbl>, sample <dbl>, l1_1 <dbl>, l1_2 <dbl>, l1_3 <dbl>, l1_4 <dbl>, l2_1 <dbl>, l2_2 <dbl>, …

To understand your data at a high-level, you can use skim() from skimr package :

skim(dat)
Table 5.1: Data summary
Name dat
Number of rows 96452
Number of columns 147
_______________________
Column type frequency:
character 8
Date 1
numeric 138
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
CountryName 0 1.0000000000000000000000 4 10 0 11 0
TierOneUnit 0 1.0000000000000000000000 4 4 0 121 0
VillageID 0 1.0000000000000000000000 4 4 0 627 0
Location 0 1.0000000000000000000000 5 5 0 2 0
HHID 0 1.0000000000000000000000 41 41 0 56880 0
ChildID 0 1.0000000000000000000000 48 66 0 96452 0
AssessmentLanguage 7153 0.9300000000000000488498 4 10 0 16 0
EnrolmentStatus 0 1.0000000000000000000000 13 18 0 3 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
SubmissionDate 0 1 2025-07-27 2025-11-11 2025-09-02 102

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
…1 0 1.00000000000000000000000 4.822650000000000000168e+04 2.784343999999999868990e+04 1.000000000000000000000e+00 24113.7500000000000000000000 4.822650000000000000168e+04 7.233925000000000000121e+04 9.645200000000000000074e+04 ▇▇▇▇▇
duration 0 1.00000000000000000000000 2.142300000000000181769e+03 1.711170000000000072709e+03 -3.813000000000000000010e+04 986.0000000000000000000000 1.658999999999999999927e+03 2.785999999999999999882e+03 2.014199999999999999949e+04 ▁▁▁▇▁
hh06a 0 1.00000000000000000000000 7.629999999999999893419e+00 6.219999999999999751310e+00 1.000000000000000000000e+00 4.0000000000000000000000 6.000000000000000000000e+00 8.000000000000000000000e+00 8.000000000000000000000e+01 ▇▁▁▁▁
hh06b1 19208 0.80000000000000004440892 6.700000000000000399680e-01 4.699999999999999733546e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▃▁▁▁▇
hh06b2 77244 0.20000000000000001110223 2.879999999999999893419e+00 1.320000000000000062172e+00 1.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 3.000000000000000000000e+00 8.000000000000000000000e+00 ▇▅▂▁▁
hh06c 44238 0.54000000000000003552714 1.270000000000000017764e+00 4.400000000000000022204e-01 1.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▇▁▁▁▃
hh06d 0 1.00000000000000000000000 5.200000000000000177636e-01 5.000000000000000000000e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▇
hh07a 0 1.00000000000000000000000 2.180000000000000159872e+00 4.600000000000000199840e-01 1.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 3.000000000000000000000e+00 ▁▁▇▁▂
hh07b 0 1.00000000000000000000000 2.149999999999999911182e+00 5.300000000000000266454e-01 1.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 3.000000000000000000000e+00 ▁▁▇▁▂
hh07c 0 1.00000000000000000000000 1.500000000000000000000e+00 8.000000000000000444089e-01 1.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 4.000000000000000000000e+00 ▇▃▁▁▁
hh07d 0 1.00000000000000000000000 2.410000000000000142109e+00 1.580000000000000071054e+00 1.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 4.000000000000000000000e+00 7.000000000000000000000e+00 ▇▂▂▁▁
hh07e 34735 0.64000000000000001332268 9.899999999999999911182e-01 1.199999999999999955591e-01 0.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▁▁▁▁▇
hh07f 0 1.00000000000000000000000 9.300000000000000488498e-01 2.500000000000000000000e-01 0.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▁▁▁▁▇
hh07g 0 1.00000000000000000000000 4.299999999999999933387e-01 5.000000000000000000000e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▆
hh07h 0 1.00000000000000000000000 3.499999999999999777955e-01 4.799999999999999822364e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▅
hh07i 0 1.00000000000000000000000 5.100000000000000088818e-01 5.000000000000000000000e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▇
hh07j 0 1.00000000000000000000000 2.200000000000000011102e-01 4.099999999999999755751e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 0.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▂
hh07k 0 1.00000000000000000000000 5.699999999999999511502e-01 4.899999999999999911182e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▆▁▁▁▇
hh07l 0 1.00000000000000000000000 4.899999999999999911182e-01 5.000000000000000000000e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▇
hh07m 0 1.00000000000000000000000 8.900000000000000133227e-01 3.099999999999999977796e-01 0.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▁▁▁▁▇
hh07n 10304 0.89000000000000001332268 7.099999999999999644729e-01 4.500000000000000111022e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▃▁▁▁▇
hh07o 0 1.00000000000000000000000 1.400000000000000133227e-01 3.499999999999999777955e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 0.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▁
hh07p 0 1.00000000000000000000000 3.499999999999999777955e-01 4.799999999999999822364e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▅
hh07p_3 82754 0.14000000000000001332268 5.000000000000000277556e-02 2.099999999999999922284e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 0.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▁
hh07q 0 1.00000000000000000000000 4.099999999999999755751e-01 4.899999999999999911182e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▆
ch02 0 1.00000000000000000000000 1.000000000000000000000e+01 3.250000000000000000000e+00 5.000000000000000000000e+00 7.0000000000000000000000 1.000000000000000000000e+01 1.300000000000000000065e+01 1.600000000000000000000e+01 ▇▆▆▅▅
ch03 0 1.00000000000000000000000 1.500000000000000000000e+00 5.000000000000000000000e-01 1.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 3.000000000000000000000e+00 ▇▁▇▁▁
ch04a 0 1.00000000000000000000000 1.040000000000000035527e+00 2.300000000000000099920e-01 1.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 4.000000000000000000000e+00 ▇▁▁▁▁
ch04b 0 1.00000000000000000000000 1.020000000000000017764e+00 2.000000000000000111022e-01 1.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 4.000000000000000000000e+00 ▇▁▁▁▁
ch04c 0 1.00000000000000000000000 1.020000000000000017764e+00 1.900000000000000022204e-01 1.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 4.000000000000000000000e+00 ▇▁▁▁▁
ch04d 0 1.00000000000000000000000 1.030000000000000026645e+00 2.300000000000000099920e-01 1.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 4.000000000000000000000e+00 ▇▁▁▁▁
ch04e 0 1.00000000000000000000000 1.030000000000000026645e+00 2.300000000000000099920e-01 1.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 4.000000000000000000000e+00 ▇▁▁▁▁
ch04f 0 1.00000000000000000000000 1.070000000000000062172e+00 3.099999999999999977796e-01 1.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 4.000000000000000000000e+00 ▇▁▁▁▁
ch05 0 1.00000000000000000000000 8.699999999999999955591e-01 3.300000000000000155431e-01 0.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▁▁▁▁▇
ch06a 12227 0.86999999999999999555911 4.070000000000000284217e+00 2.870000000000000106581e+00 0.000000000000000000000e+00 2.0000000000000000000000 4.000000000000000000000e+00 6.000000000000000000000e+00 1.200000000000000000043e+01 ▇▅▆▂▁
ch06b 12227 0.86999999999999999555911 1.290000000000000035527e+00 5.200000000000000177636e-01 1.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 3.000000000000000000000e+00 ▇▁▂▁▁
ch06c 12227 0.86999999999999999555911 8.299999999999999600320e-01 3.800000000000000044409e-01 0.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▂▁▁▁▇
ch06d 12227 0.86999999999999999555911 6.199999999999999955591e-01 4.899999999999999911182e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▅▁▁▁▇
ch06e 20279 0.79000000000000003552714 2.000000000000000111022e-01 4.000000000000000222045e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 0.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▂
ch07a 84225 0.13000000000000000444089 2.300000000000000099920e-01 4.199999999999999844569e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 0.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▂
ch07b 93741 0.02999999999999999888978 2.022710000000000036364e+03 2.430000000000000159872e+00 2.014999999999999999879e+03 2022.0000000000000000000000 2.023999999999999999849e+03 2.023999999999999999849e+03 2.024999999999999999870e+03 ▁▁▂▅▇
ch07c 93741 0.02999999999999999888978 3.740000000000000213163e+00 2.540000000000000035527e+00 0.000000000000000000000e+00 2.0000000000000000000000 3.000000000000000000000e+00 6.000000000000000000000e+00 1.200000000000000000043e+01 ▇▆▆▂▁
ch07d 93741 0.02999999999999999888978 4.629999999999999893419e+00 1.840000000000000079936e+00 1.000000000000000000000e+00 3.0000000000000000000000 5.000000000000000000000e+00 6.000000000000000000000e+00 7.000000000000000000000e+00 ▃▂▁▆▇
ch08 0 1.00000000000000000000000 1.900000000000000022204e-01 4.000000000000000222045e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 0.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▂
ch09 0 1.00000000000000000000000 1.600000000000000033307e-01 3.699999999999999955591e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 0.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▂
ch10a 12227 0.86999999999999999555911 4.899999999999999911182e-01 5.000000000000000000000e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▇
ch10b 54923 0.42999999999999999333866 2.720000000000000195399e+00 1.590000000000000079936e+00 1.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 3.000000000000000000000e+00 7.000000000000000000000e+00 ▇▃▁▁▁
ch10c 0 1.00000000000000000000000 5.300000000000000266454e-01 9.100000000000000310862e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 1.000000000000000000000e+00 4.000000000000000000000e+00 ▇▂▁▁▁
pt00 0 1.00000000000000000000000 2.189999999999999946709e+00 1.110000000000000097700e+00 0.000000000000000000000e+00 1.0000000000000000000000 3.000000000000000000000e+00 3.000000000000000000000e+00 3.000000000000000000000e+00 ▂▃▁▁▇
pt01b 14342 0.84999999999999997779554 3.604999999999999715740e+01 7.940000000000000390799e+00 1.800000000000000000087e+01 30.0000000000000000000000 3.500000000000000000000e+01 4.000000000000000000000e+01 8.000000000000000000000e+01 ▅▇▂▁▁
pt01c 14342 0.84999999999999997779554 6.899999999999999467093e-01 4.600000000000000199840e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▃▁▁▁▇
pt01d 40204 0.57999999999999996003197 1.530000000000000026645e+00 8.900000000000000133227e-01 0.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 4.000000000000000000000e+00 ▂▇▆▃▁
pt01e 14342 0.84999999999999997779554 5.000000000000000000000e-01 5.000000000000000000000e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▇
pt01f 55814 0.41999999999999998445688 2.100000000000000088818e+00 1.199999999999999955591e+00 1.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 3.000000000000000000000e+00 4.000000000000000000000e+00 ▇▁▁▅▃
pt02b 31684 0.67000000000000003996803 4.299000000000000199212e+01 9.539999999999999147349e+00 1.800000000000000000087e+01 36.0000000000000000000000 4.200000000000000000260e+01 4.900000000000000000087e+01 8.000000000000000000000e+01 ▂▇▆▂▁
pt02c 31684 0.67000000000000003996803 7.299999999999999822364e-01 4.400000000000000022204e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▃▁▁▁▇
pt02d 49106 0.48999999999999999111822 1.600000000000000088818e+00 9.200000000000000399680e-01 0.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 4.000000000000000000000e+00 ▂▇▆▃▁
pt02e 31684 0.67000000000000003996803 8.699999999999999955591e-01 3.400000000000000244249e-01 0.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▁▁▁▁▇
pt02f 40199 0.57999999999999996003197 2.339999999999999857891e+00 1.250000000000000000000e+00 1.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 4.000000000000000000000e+00 4.000000000000000000000e+00 ▇▂▁▅▅
assessment 0 1.00000000000000000000000 9.300000000000000488498e-01 2.600000000000000088818e-01 0.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▁▁▁▁▇
sample 7153 0.93000000000000004884981 1.449999999999999955591e+00 5.000000000000000000000e-01 1.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▇▁▁▁▆
l1_1 7153 0.93000000000000004884981 1.169999999999999928946e+00 9.100000000000000310862e-01 0.000000000000000000000e+00 0.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▆▁▂▁▇
l1_2 7153 0.93000000000000004884981 1.189999999999999946709e+00 8.699999999999999955591e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▅▁▃▁▇
l1_3 7153 0.93000000000000004884981 1.120000000000000106581e+00 8.800000000000000044409e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▆▁▃▁▇
l1_4 7153 0.93000000000000004884981 1.030000000000000026645e+00 8.800000000000000044409e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▇▁▅▁▇
l2_1 7153 0.93000000000000004884981 1.560000000000000053291e+00 7.399999999999999911182e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
l2_2 7153 0.93000000000000004884981 1.580000000000000071054e+00 7.299999999999999822364e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
l2_3 7153 0.93000000000000004884981 1.540000000000000035527e+00 7.500000000000000000000e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
l2_4 7153 0.93000000000000004884981 1.520000000000000017764e+00 7.600000000000000088818e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
l2_5 7153 0.93000000000000004884981 1.469999999999999973355e+00 7.900000000000000355271e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
l3_1 33971 0.65000000000000002220446 1.689999999999999946709e+00 6.500000000000000222045e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
l3_2 33971 0.65000000000000002220446 1.739999999999999991118e+00 5.999999999999999777955e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
l3_3 33971 0.65000000000000002220446 1.709999999999999964473e+00 6.300000000000000044409e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
l3_4 33971 0.65000000000000002220446 1.689999999999999946709e+00 6.400000000000000133227e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
l3_5 33971 0.65000000000000002220446 1.659999999999999920064e+00 6.700000000000000399680e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
l4_1 33971 0.65000000000000002220446 1.750000000000000000000e+00 5.999999999999999777955e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
l4_2 33971 0.65000000000000002220446 1.699999999999999955591e+00 6.300000000000000044409e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
l4_3 33971 0.65000000000000002220446 1.739999999999999991118e+00 5.999999999999999777955e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
l4_4 33971 0.65000000000000002220446 1.659999999999999920064e+00 6.600000000000000310862e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
l4_5 33971 0.65000000000000002220446 1.639999999999999902300e+00 6.800000000000000488498e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▂▁▇
l4_6 33971 0.65000000000000002220446 1.629999999999999893419e+00 6.800000000000000488498e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▂▁▇
l5_1 45843 0.52000000000000001776357 1.679999999999999937828e+00 6.700000000000000399680e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
l5_2 45843 0.52000000000000001776357 1.639999999999999902300e+00 6.899999999999999467093e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
l5_3 45843 0.52000000000000001776357 1.620000000000000106581e+00 6.999999999999999555911e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
l5_4 45843 0.52000000000000001776357 1.510000000000000008882e+00 7.600000000000000088818e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
l5_5 45843 0.52000000000000001776357 1.250000000000000000000e+00 8.599999999999999866773e-01 0.000000000000000000000e+00 0.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▅▁▃▁▇
l6_1 52471 0.46000000000000001998401 1.780000000000000026645e+00 5.500000000000000444089e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
l6_2 52471 0.46000000000000001998401 1.739999999999999991118e+00 5.699999999999999511502e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
l6_3 52471 0.46000000000000001998401 1.699999999999999955591e+00 5.999999999999999777955e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▂▁▇
l6_4 52471 0.46000000000000001998401 1.610000000000000097700e+00 6.700000000000000399680e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▂▁▇
l6_5 52471 0.46000000000000001998401 1.449999999999999955591e+00 7.700000000000000177636e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▃▁▇
n1 7153 0.93000000000000004884981 1.780000000000000026645e+00 5.500000000000000444089e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
n2 7153 0.93000000000000004884981 1.709999999999999964473e+00 5.799999999999999600320e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▂▁▇
n3 7153 0.93000000000000004884981 1.750000000000000000000e+00 5.799999999999999600320e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
n4 7153 0.93000000000000004884981 1.300000000000000044409e+00 6.800000000000000488498e-01 0.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▇▁▇
n5 7153 0.93000000000000004884981 1.560000000000000053291e+00 6.899999999999999467093e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
n6 7153 0.93000000000000004884981 1.280000000000000026645e+00 7.299999999999999822364e-01 0.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▃▁▇▁▇
n7 7153 0.93000000000000004884981 1.070000000000000062172e+00 7.500000000000000000000e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▅▁▇▁▆
n8 7153 0.93000000000000004884981 1.000000000000000000000e+00 7.399999999999999911182e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▅▁▇▁▅
n9 7153 0.93000000000000004884981 1.159999999999999920064e+00 8.399999999999999689138e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▅▁▅▁▇
n10 7153 0.93000000000000004884981 1.050000000000000044409e+00 8.199999999999999511502e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▇▁▇▁▇
n11 7153 0.93000000000000004884981 1.669999999999999928946e+00 6.400000000000000133227e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▂▁▇
n12 7153 0.93000000000000004884981 1.590000000000000079936e+00 6.800000000000000488498e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▂▁▇
n13 7153 0.93000000000000004884981 1.610000000000000097700e+00 6.700000000000000399680e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▂▁▇
n14 7153 0.93000000000000004884981 1.530000000000000026645e+00 7.099999999999999644729e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
n15 7153 0.93000000000000004884981 1.250000000000000000000e+00 6.999999999999999555911e-01 0.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▃▁▇▁▇
n16 7153 0.93000000000000004884981 1.639999999999999902300e+00 6.800000000000000488498e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
n17 7153 0.93000000000000004884981 1.659999999999999920064e+00 6.800000000000000488498e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
n18 7153 0.93000000000000004884981 1.639999999999999902300e+00 6.899999999999999467093e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
n19 7153 0.93000000000000004884981 1.629999999999999893419e+00 6.899999999999999467093e-01 0.000000000000000000000e+00 2.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▁▁▇
n20 7153 0.93000000000000004884981 1.620000000000000106581e+00 6.999999999999999555911e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
n21 7153 0.93000000000000004884981 1.479999999999999982236e+00 7.800000000000000266454e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
n22 7153 0.93000000000000004884981 1.469999999999999973355e+00 7.800000000000000266454e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
n23 7153 0.93000000000000004884981 1.439999999999999946709e+00 7.900000000000000355271e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
n24 7153 0.93000000000000004884981 1.419999999999999928946e+00 8.000000000000000444089e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
n25 7153 0.93000000000000004884981 1.439999999999999946709e+00 7.900000000000000355271e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
n26 7153 0.93000000000000004884981 1.449999999999999955591e+00 7.800000000000000266454e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▂▁▇
n27 7153 0.93000000000000004884981 1.310000000000000053291e+00 8.299999999999999600320e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▃▁▃▁▇
n28 7153 0.93000000000000004884981 1.209999999999999964473e+00 8.299999999999999600320e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▅▁▅▁▇
n29 7153 0.93000000000000004884981 1.070000000000000062172e+00 8.199999999999999511502e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▆▁▇▁▇
n30 7153 0.93000000000000004884981 1.080000000000000071054e+00 8.499999999999999777955e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▆▁▅▁▇
n31 47996 0.50000000000000000000000 1.590000000000000079936e+00 6.300000000000000044409e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▃▁▇
n32 54717 0.42999999999999999333866 1.520000000000000017764e+00 6.400000000000000133227e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▅▁▇
n33 63307 0.34000000000000002442491 1.560000000000000053291e+00 6.500000000000000222045e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▁▁▃▁▇
n34 60377 0.36999999999999999555911 1.270000000000000017764e+00 7.800000000000000266454e-01 0.000000000000000000000e+00 1.0000000000000000000000 1.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▃▁▅▁▇
n35 54717 0.42999999999999999333866 1.370000000000000106581e+00 7.299999999999999822364e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▅▁▇
n36 60377 0.36999999999999999555911 1.379999999999999893419e+00 7.299999999999999822364e-01 0.000000000000000000000e+00 1.0000000000000000000000 2.000000000000000000000e+00 2.000000000000000000000e+00 2.000000000000000000000e+00 ▂▁▅▁▇
IcarAssessTime 23441 0.76000000000000000888178 6.509999999999999786837e+00 1.623000000000000042676e+01 1.020000000000000017764e+00 2.7000000000000001776357 4.900000000000000355271e+00 7.620000000000000106581e+00 7.097200000000000272727e+02 ▇▁▁▁▁
IcanAssessTime 18215 0.81000000000000005329071 8.759999999999999786837e+00 2.378999999999999914865e+01 1.020000000000000017764e+00 3.6299999999999998934186 6.370000000000000106581e+00 9.949999999999999289457e+00 7.129800000000000181725e+02 ▇▁▁▁▁
AssHomeLang 7153 0.93000000000000004884981 6.500000000000000222045e-01 4.799999999999999822364e-01 0.000000000000000000000e+00 0.0000000000000000000000 1.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▅▁▁▁▇
HHWeightProvided 0 1.00000000000000000000000 3.105789999999999963492e+03 3.794869999999999890658e+03 4.199999999999999844569e-01 584.4099999999999681676854 2.112119999999999890757e+03 3.926999999999999999781e+03 5.715222000000000116340e+04 ▇▁▁▁▁
MPLMath 7314 0.92000000000000003996803 4.500000000000000111022e-01 5.000000000000000000000e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▆
MPLReading 7314 0.92000000000000003996803 3.099999999999999977796e-01 4.600000000000000199840e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▃
MPLBoth 7314 0.92000000000000003996803 2.899999999999999800160e-01 4.500000000000000111022e-01 0.000000000000000000000e+00 0.0000000000000000000000 0.000000000000000000000e+00 1.000000000000000000000e+00 1.000000000000000000000e+00 ▇▁▁▁▃
MathIRTScore 7314 0.92000000000000003996803 -4.199999999999999844569e-01 1.050000000000000044409e+00 -3.069999999999999840128e+00 -1.0700000000000000621725 -3.499999999999999777955e-01 2.000000000000000111022e-01 2.140000000000000124345e+00 ▂▃▇▅▂
StandardErrorMath 7314 0.92000000000000003996803 2.300000000000000099920e-01 1.700000000000000122125e-01 1.199999999999999955591e-01 0.1499999999999999944489 1.600000000000000033307e-01 2.500000000000000000000e-01 9.499999999999999555911e-01 ▇▁▁▁▁
ReadingIRTScore 7314 0.92000000000000003996803 -3.800000000000000044409e-01 1.310000000000000053291e+00 -3.270000000000000017764e+00 -1.3200000000000000621725 -7.000000000000000666134e-02 5.100000000000000088818e-01 1.899999999999999911182e+00 ▂▃▅▇▃
StandardErrorReading 7314 0.92000000000000003996803 3.499999999999999777955e-01 2.800000000000000266454e-01 8.000000000000000166533e-02 0.1100000000000000005551 2.399999999999999911182e-01 5.300000000000000266454e-01 1.080000000000000071054e+00 ▇▂▂▁▂

5.6 The Five Key dplyr Verbs

5.6.1 Filtering rows with filter()

The filter() function from dplyr is used to extract rows from the data that meet certain conditions.

Here we illustrate the use of filter() by extracting all children from Mozambique.

dat_moz = filter(dat, CountryName == "Mozambique")
dat_moz
## # A tibble: 8,255 × 147
##     ...1 CountryName TierOneUnit VillageID Location HHID       SubmissionDate duration hh06a hh06b1 hh06b2 hh06c hh06d hh07a hh07b hh07c
##    <dbl> <chr>       <chr>       <chr>     <chr>    <chr>      <date>            <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     1 Mozambique  c101        c101      Urban    uuid:f580… 2025-09-23          979     4      1     NA     1     1     2     2     1
##  2     2 Mozambique  c101        c101      Urban    uuid:b98e… 2025-09-23          986     7      1     NA     1     1     2     2     1
##  3     3 Mozambique  c101        c101      Urban    uuid:2611… 2025-09-23         2167     6      1     NA     1     1     2     2     1
##  4     4 Mozambique  c101        c101      Urban    uuid:d46b… 2025-09-23         2078     6      1     NA     1     1     1     1     1
##  5     5 Mozambique  c101        c101      Urban    uuid:df92… 2025-09-23         1251     5      1     NA     1     1     2     2     1
##  6     6 Mozambique  c101        c101      Urban    uuid:ade5… 2025-09-23         2199     5      1     NA     1     1     2     2     1
##  7     7 Mozambique  c101        c101      Urban    uuid:3f84… 2025-09-23         2889     5      1     NA     1     1     2     2     1
##  8     8 Mozambique  c101        c101      Urban    uuid:ff7b… 2025-09-23         1588     4      1     NA     1     0     2     2     1
##  9     9 Mozambique  c101        c101      Urban    uuid:7730… 2025-09-23         2203     4      1     NA     1     1     2     2     1
## 10    10 Mozambique  c101        c101      Urban    uuid:e210… 2025-09-24         4647     5      1     NA     2     1     2     2     1
## # ℹ 8,245 more rows
## # ℹ 131 more variables: hh07d <dbl>, hh07e <dbl>, hh07f <dbl>, hh07g <dbl>, hh07h <dbl>, hh07i <dbl>, hh07j <dbl>, hh07k <dbl>,
## #   hh07l <dbl>, hh07m <dbl>, hh07n <dbl>, hh07o <dbl>, hh07p <dbl>, hh07p_3 <dbl>, hh07q <dbl>, ChildID <chr>, ch02 <dbl>, ch03 <dbl>,
## #   ch04a <dbl>, ch04b <dbl>, ch04c <dbl>, ch04d <dbl>, ch04e <dbl>, ch04f <dbl>, ch05 <dbl>, ch06a <dbl>, ch06b <dbl>, ch06c <dbl>,
## #   ch06d <dbl>, ch06e <dbl>, ch07a <dbl>, ch07b <dbl>, ch07c <dbl>, ch07d <dbl>, ch08 <dbl>, ch09 <dbl>, ch10a <dbl>, ch10b <dbl>,
## #   ch10c <dbl>, pt00 <dbl>, pt01b <dbl>, pt01c <dbl>, pt01d <dbl>, pt01e <dbl>, pt01f <dbl>, pt02b <dbl>, pt02c <dbl>, pt02d <dbl>,
## #   pt02e <dbl>, pt02f <dbl>, assessment <dbl>, sample <dbl>, l1_1 <dbl>, l1_2 <dbl>, l1_3 <dbl>, l1_4 <dbl>, l2_1 <dbl>, l2_2 <dbl>, …

Next we extract the children from Mozambique that received an Reading IRT Score greater than \(1.5\). Multiple filter conditions are created with & (and) and | (or).

filter(dat, CountryName == "Mozambique" & ReadingIRTScore > 1.5)
## # A tibble: 390 × 147
##     ...1 CountryName TierOneUnit VillageID Location HHID       SubmissionDate duration hh06a hh06b1 hh06b2 hh06c hh06d hh07a hh07b hh07c
##    <dbl> <chr>       <chr>       <chr>     <chr>    <chr>      <date>            <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1    18 Mozambique  c101        c101      Urban    uuid:789a… 2025-09-25         2511     7      1     NA     1     1     2     2     1
##  2    35 Mozambique  c101        c101      Urban    uuid:9a7b… 2025-09-25         2868     8      1     NA     1     1     2     2     1
##  3   204 Mozambique  c101        c107      Urban    uuid:00db… 2025-09-22          898     6      1     NA     2     1     2     2     1
##  4   221 Mozambique  c101        c108      Rural    uuid:a3f7… 2025-09-27         1643    12      1     NA     2     1     2     2     1
##  5   252 Mozambique  c101        c108      Rural    uuid:4b57… 2025-10-24          965     6      1     NA     2     1     2     2     1
##  6   421 Mozambique  c101        c114      Rural    uuid:b166… 2025-09-22         3217     4      1     NA     1     0     3     3     1
##  7   603 Mozambique  c101        c120      Rural    uuid:fff9… 2025-09-21         2585     6      1     NA     1     1     2     2     1
##  8   677 Mozambique  c101        c122      Rural    uuid:6354… 2025-09-27         1443    10      0     NA    NA     1     2     2     1
##  9   730 Mozambique  c102        c124      Rural    uuid:06e4… 2025-10-24          630     4      0     NA    NA     1     3     3     2
## 10   731 Mozambique  c102        c124      Rural    uuid:d48d… 2025-10-24         1187     3      0     NA    NA     1     3     3     2
## # ℹ 380 more rows
## # ℹ 131 more variables: hh07d <dbl>, hh07e <dbl>, hh07f <dbl>, hh07g <dbl>, hh07h <dbl>, hh07i <dbl>, hh07j <dbl>, hh07k <dbl>,
## #   hh07l <dbl>, hh07m <dbl>, hh07n <dbl>, hh07o <dbl>, hh07p <dbl>, hh07p_3 <dbl>, hh07q <dbl>, ChildID <chr>, ch02 <dbl>, ch03 <dbl>,
## #   ch04a <dbl>, ch04b <dbl>, ch04c <dbl>, ch04d <dbl>, ch04e <dbl>, ch04f <dbl>, ch05 <dbl>, ch06a <dbl>, ch06b <dbl>, ch06c <dbl>,
## #   ch06d <dbl>, ch06e <dbl>, ch07a <dbl>, ch07b <dbl>, ch07c <dbl>, ch07d <dbl>, ch08 <dbl>, ch09 <dbl>, ch10a <dbl>, ch10b <dbl>,
## #   ch10c <dbl>, pt00 <dbl>, pt01b <dbl>, pt01c <dbl>, pt01d <dbl>, pt01e <dbl>, pt01f <dbl>, pt02b <dbl>, pt02c <dbl>, pt02d <dbl>,
## #   pt02e <dbl>, pt02f <dbl>, assessment <dbl>, sample <dbl>, l1_1 <dbl>, l1_2 <dbl>, l1_3 <dbl>, l1_4 <dbl>, l2_1 <dbl>, l2_2 <dbl>, …

Here’s another way of writing the same condition as above:

filter(dat, CountryName == "Mozambique", ReadingIRTScore > 1.5)
## # A tibble: 390 × 147
##     ...1 CountryName TierOneUnit VillageID Location HHID       SubmissionDate duration hh06a hh06b1 hh06b2 hh06c hh06d hh07a hh07b hh07c
##    <dbl> <chr>       <chr>       <chr>     <chr>    <chr>      <date>            <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1    18 Mozambique  c101        c101      Urban    uuid:789a… 2025-09-25         2511     7      1     NA     1     1     2     2     1
##  2    35 Mozambique  c101        c101      Urban    uuid:9a7b… 2025-09-25         2868     8      1     NA     1     1     2     2     1
##  3   204 Mozambique  c101        c107      Urban    uuid:00db… 2025-09-22          898     6      1     NA     2     1     2     2     1
##  4   221 Mozambique  c101        c108      Rural    uuid:a3f7… 2025-09-27         1643    12      1     NA     2     1     2     2     1
##  5   252 Mozambique  c101        c108      Rural    uuid:4b57… 2025-10-24          965     6      1     NA     2     1     2     2     1
##  6   421 Mozambique  c101        c114      Rural    uuid:b166… 2025-09-22         3217     4      1     NA     1     0     3     3     1
##  7   603 Mozambique  c101        c120      Rural    uuid:fff9… 2025-09-21         2585     6      1     NA     1     1     2     2     1
##  8   677 Mozambique  c101        c122      Rural    uuid:6354… 2025-09-27         1443    10      0     NA    NA     1     2     2     1
##  9   730 Mozambique  c102        c124      Rural    uuid:06e4… 2025-10-24          630     4      0     NA    NA     1     3     3     2
## 10   731 Mozambique  c102        c124      Rural    uuid:d48d… 2025-10-24         1187     3      0     NA    NA     1     3     3     2
## # ℹ 380 more rows
## # ℹ 131 more variables: hh07d <dbl>, hh07e <dbl>, hh07f <dbl>, hh07g <dbl>, hh07h <dbl>, hh07i <dbl>, hh07j <dbl>, hh07k <dbl>,
## #   hh07l <dbl>, hh07m <dbl>, hh07n <dbl>, hh07o <dbl>, hh07p <dbl>, hh07p_3 <dbl>, hh07q <dbl>, ChildID <chr>, ch02 <dbl>, ch03 <dbl>,
## #   ch04a <dbl>, ch04b <dbl>, ch04c <dbl>, ch04d <dbl>, ch04e <dbl>, ch04f <dbl>, ch05 <dbl>, ch06a <dbl>, ch06b <dbl>, ch06c <dbl>,
## #   ch06d <dbl>, ch06e <dbl>, ch07a <dbl>, ch07b <dbl>, ch07c <dbl>, ch07d <dbl>, ch08 <dbl>, ch09 <dbl>, ch10a <dbl>, ch10b <dbl>,
## #   ch10c <dbl>, pt00 <dbl>, pt01b <dbl>, pt01c <dbl>, pt01d <dbl>, pt01e <dbl>, pt01f <dbl>, pt02b <dbl>, pt02c <dbl>, pt02d <dbl>,
## #   pt02e <dbl>, pt02f <dbl>, assessment <dbl>, sample <dbl>, l1_1 <dbl>, l1_2 <dbl>, l1_3 <dbl>, l1_4 <dbl>, l2_1 <dbl>, l2_2 <dbl>, …

The %in% command is often useful when using dplyr verbs:

# Extract children from Mozambique who are ages 5, 9, and 13 years old and have an Reading IRT Score greater than 1.5
filter(dat, CountryName == "Mozambique", ch02 %in% c(5, 9, 13), ReadingIRTScore > 1.5)
## # A tibble: 74 × 147
##     ...1 CountryName TierOneUnit VillageID Location HHID       SubmissionDate duration hh06a hh06b1 hh06b2 hh06c hh06d hh07a hh07b hh07c
##    <dbl> <chr>       <chr>       <chr>     <chr>    <chr>      <date>            <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1    18 Mozambique  c101        c101      Urban    uuid:789a… 2025-09-25         2511     7      1     NA     1     1     2     2     1
##  2   252 Mozambique  c101        c108      Rural    uuid:4b57… 2025-10-24          965     6      1     NA     2     1     2     2     1
##  3  1083 Mozambique  c102        c138      Urban    uuid:d763… 2025-10-02          720     3      1     NA     1     1     3     3     1
##  4  1199 Mozambique  c102        c141      Rural    uuid:5603… 2025-10-09          581     4      1     NA     2     1     3     3     1
##  5  2165 Mozambique  c103        c173      Urban    uuid:a7af… 2025-08-11         2111     6      0     NA    NA     0     3     3     4
##  6  2265 Mozambique  c104        c176      Urban    uuid:7fcf… 2025-09-22         2155     7      1     NA     1     1     2     2     1
##  7  2291 Mozambique  c104        c177      Urban    uuid:7363… 2025-09-18          991     3      1     NA     1     0     3     3     1
##  8  2343 Mozambique  c104        c178      Urban    uuid:63b9… 2025-09-22         2007     6      1     NA     1     1     2     2     1
##  9  2737 Mozambique  c104        c187      Rural    uuid:c37b… 2025-10-24          504     5      1     NA     2     1     3     3     4
## 10  3305 Mozambique  c104        c205      Rural    uuid:70cf… 2025-09-19         8859     2      1     NA     2     1     3     3     2
## # ℹ 64 more rows
## # ℹ 131 more variables: hh07d <dbl>, hh07e <dbl>, hh07f <dbl>, hh07g <dbl>, hh07h <dbl>, hh07i <dbl>, hh07j <dbl>, hh07k <dbl>,
## #   hh07l <dbl>, hh07m <dbl>, hh07n <dbl>, hh07o <dbl>, hh07p <dbl>, hh07p_3 <dbl>, hh07q <dbl>, ChildID <chr>, ch02 <dbl>, ch03 <dbl>,
## #   ch04a <dbl>, ch04b <dbl>, ch04c <dbl>, ch04d <dbl>, ch04e <dbl>, ch04f <dbl>, ch05 <dbl>, ch06a <dbl>, ch06b <dbl>, ch06c <dbl>,
## #   ch06d <dbl>, ch06e <dbl>, ch07a <dbl>, ch07b <dbl>, ch07c <dbl>, ch07d <dbl>, ch08 <dbl>, ch09 <dbl>, ch10a <dbl>, ch10b <dbl>,
## #   ch10c <dbl>, pt00 <dbl>, pt01b <dbl>, pt01c <dbl>, pt01d <dbl>, pt01e <dbl>, pt01f <dbl>, pt02b <dbl>, pt02c <dbl>, pt02d <dbl>,
## #   pt02e <dbl>, pt02f <dbl>, assessment <dbl>, sample <dbl>, l1_1 <dbl>, l1_2 <dbl>, l1_3 <dbl>, l1_4 <dbl>, l2_1 <dbl>, l2_2 <dbl>, …

Let’s practice filtering data in R:

# Extract children from Kenya and Mali living in urban areas and are males

filter(dat, CountryName %in% c("Kenya", "Mali"), Location == "Urban", ch03 == 2)

# Extract children that live in a household with 5 to 10 individuals inclusive and were assessed in French

filter(dat, hh06a >= 5 & hh06a <= 10, AssessmentLanguage == "French")

Try the following exercises on your own:

  1. Extract all children who were assessed in English and met the minimum proficiency level for reading.

  2. How many children from the previous question are from Tanzania.

  3. Use %in% to filter children from question 1 that live in 2, 9, or 20 indivdual household and receive help with homework.

  4. How many children from the previous question are assisted by their mother with homework?

5.6.2 Introducing the pipe

The pipe operator %>% is a very useful way of chaining together multiple operations. A typical format is something like:

data %>% operation 1 %>% operation 2

You read the code from left to right: Start with data, apply some operation (operation 1) to it, get a result, and then apply another operation (operation 2) to that result, to generate another result (the final result, in this example). A useful way to think of the pipe is as similar to “then”.

The main goal of the pipe is to make code easier, by focusing on the transformations rather than on what is being transformed. Usually this is the case, but it’s also possible to get carried away and end up with a huge whack of piped statements. Deciding when to break a block up is an art best learned by experience.

Example

# filtering with the pipe
dat %>% 
  filter(CountryName == "Mozambique")
## # A tibble: 8,255 × 147
##     ...1 CountryName TierOneUnit VillageID Location HHID       SubmissionDate duration hh06a hh06b1 hh06b2 hh06c hh06d hh07a hh07b hh07c
##    <dbl> <chr>       <chr>       <chr>     <chr>    <chr>      <date>            <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     1 Mozambique  c101        c101      Urban    uuid:f580… 2025-09-23          979     4      1     NA     1     1     2     2     1
##  2     2 Mozambique  c101        c101      Urban    uuid:b98e… 2025-09-23          986     7      1     NA     1     1     2     2     1
##  3     3 Mozambique  c101        c101      Urban    uuid:2611… 2025-09-23         2167     6      1     NA     1     1     2     2     1
##  4     4 Mozambique  c101        c101      Urban    uuid:d46b… 2025-09-23         2078     6      1     NA     1     1     1     1     1
##  5     5 Mozambique  c101        c101      Urban    uuid:df92… 2025-09-23         1251     5      1     NA     1     1     2     2     1
##  6     6 Mozambique  c101        c101      Urban    uuid:ade5… 2025-09-23         2199     5      1     NA     1     1     2     2     1
##  7     7 Mozambique  c101        c101      Urban    uuid:3f84… 2025-09-23         2889     5      1     NA     1     1     2     2     1
##  8     8 Mozambique  c101        c101      Urban    uuid:ff7b… 2025-09-23         1588     4      1     NA     1     0     2     2     1
##  9     9 Mozambique  c101        c101      Urban    uuid:7730… 2025-09-23         2203     4      1     NA     1     1     2     2     1
## 10    10 Mozambique  c101        c101      Urban    uuid:e210… 2025-09-24         4647     5      1     NA     2     1     2     2     1
## # ℹ 8,245 more rows
## # ℹ 131 more variables: hh07d <dbl>, hh07e <dbl>, hh07f <dbl>, hh07g <dbl>, hh07h <dbl>, hh07i <dbl>, hh07j <dbl>, hh07k <dbl>,
## #   hh07l <dbl>, hh07m <dbl>, hh07n <dbl>, hh07o <dbl>, hh07p <dbl>, hh07p_3 <dbl>, hh07q <dbl>, ChildID <chr>, ch02 <dbl>, ch03 <dbl>,
## #   ch04a <dbl>, ch04b <dbl>, ch04c <dbl>, ch04d <dbl>, ch04e <dbl>, ch04f <dbl>, ch05 <dbl>, ch06a <dbl>, ch06b <dbl>, ch06c <dbl>,
## #   ch06d <dbl>, ch06e <dbl>, ch07a <dbl>, ch07b <dbl>, ch07c <dbl>, ch07d <dbl>, ch08 <dbl>, ch09 <dbl>, ch10a <dbl>, ch10b <dbl>,
## #   ch10c <dbl>, pt00 <dbl>, pt01b <dbl>, pt01c <dbl>, pt01d <dbl>, pt01e <dbl>, pt01f <dbl>, pt02b <dbl>, pt02c <dbl>, pt02d <dbl>,
## #   pt02e <dbl>, pt02f <dbl>, assessment <dbl>, sample <dbl>, l1_1 <dbl>, l1_2 <dbl>, l1_3 <dbl>, l1_4 <dbl>, l2_1 <dbl>, l2_2 <dbl>, …

The main usefulness of the pipe is when combining multiple operations:

Guided practice

# first filter on country then on IRT Score
dat_moz = dat %>% filter(CountryName == "Mozambique") %>% filter(ReadingIRTScore > 1.5)
dat_moz
# another way of doing the same thing
dat %>% filter(CountryName == "Mozambique" & ReadingIRTScore > 1.5) 

5.6.3 Arranging rows with arrange()

The arrange() function from dplyr lets you order rows in your data based on one or more columns.

Example

Ordering children from Mozambique in a descending order of age (note the use of desc):

arrange(dat_moz, desc(ch02))
## # A tibble: 390 × 147
##     ...1 CountryName TierOneUnit VillageID Location HHID       SubmissionDate duration hh06a hh06b1 hh06b2 hh06c hh06d hh07a hh07b hh07c
##    <dbl> <chr>       <chr>       <chr>     <chr>    <chr>      <date>            <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1    35 Mozambique  c101        c101      Urban    uuid:9a7b… 2025-09-25         2868     8      1     NA     1     1     2     2     1
##  2   677 Mozambique  c101        c122      Rural    uuid:6354… 2025-09-27         1443    10      0     NA    NA     1     2     2     1
##  3   794 Mozambique  c102        c127      Urban    uuid:4c66… 2025-09-25          404     4      0     NA    NA     1     3     3     2
##  4  1898 Mozambique  c103        c165      Rural    uuid:45c8… 2025-08-09          437     3      1     NA     2     1     2     2     1
##  5  2281 Mozambique  c104        c176      Urban    uuid:04e1… 2025-09-22         3718     9      1     NA     1     1     2     2     1
##  6  2310 Mozambique  c104        c177      Urban    uuid:b01c… 2025-09-18         2063     7      1     NA     1     1     2     2     1
##  7  2321 Mozambique  c104        c177      Urban    uuid:0f3a… 2025-09-18         4206     9      1     NA     1     1     3     3     1
##  8  3251 Mozambique  c104        c203      Rural    uuid:3877… 2025-09-21          643     3      0     NA    NA     0     3     3     4
##  9  3355 Mozambique  c105        c206      Urban    uuid:0f4a… 2025-09-22         2927     7      1     NA     1     0     2     2     1
## 10  3378 Mozambique  c105        c207      Urban    uuid:7ecd… 2025-09-26         1052     5      1     NA     1     1     2     2     1
## # ℹ 380 more rows
## # ℹ 131 more variables: hh07d <dbl>, hh07e <dbl>, hh07f <dbl>, hh07g <dbl>, hh07h <dbl>, hh07i <dbl>, hh07j <dbl>, hh07k <dbl>,
## #   hh07l <dbl>, hh07m <dbl>, hh07n <dbl>, hh07o <dbl>, hh07p <dbl>, hh07p_3 <dbl>, hh07q <dbl>, ChildID <chr>, ch02 <dbl>, ch03 <dbl>,
## #   ch04a <dbl>, ch04b <dbl>, ch04c <dbl>, ch04d <dbl>, ch04e <dbl>, ch04f <dbl>, ch05 <dbl>, ch06a <dbl>, ch06b <dbl>, ch06c <dbl>,
## #   ch06d <dbl>, ch06e <dbl>, ch07a <dbl>, ch07b <dbl>, ch07c <dbl>, ch07d <dbl>, ch08 <dbl>, ch09 <dbl>, ch10a <dbl>, ch10b <dbl>,
## #   ch10c <dbl>, pt00 <dbl>, pt01b <dbl>, pt01c <dbl>, pt01d <dbl>, pt01e <dbl>, pt01f <dbl>, pt02b <dbl>, pt02c <dbl>, pt02d <dbl>,
## #   pt02e <dbl>, pt02f <dbl>, assessment <dbl>, sample <dbl>, l1_1 <dbl>, l1_2 <dbl>, l1_3 <dbl>, l1_4 <dbl>, l2_1 <dbl>, l2_2 <dbl>, …

Subsequent arguments to arrange() can be used to arrange by multiple columns. Here we first children from Mozambique by age (in descending order) and then by household size (in ascending order)

Guided practice

arrange(dat_moz, desc(ch02), hh06a)

We can also use the pipe to do the same thing:

Example

dat_moz %>% arrange(desc(ch02))
## # A tibble: 390 × 147
##     ...1 CountryName TierOneUnit VillageID Location HHID       SubmissionDate duration hh06a hh06b1 hh06b2 hh06c hh06d hh07a hh07b hh07c
##    <dbl> <chr>       <chr>       <chr>     <chr>    <chr>      <date>            <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1    35 Mozambique  c101        c101      Urban    uuid:9a7b… 2025-09-25         2868     8      1     NA     1     1     2     2     1
##  2   677 Mozambique  c101        c122      Rural    uuid:6354… 2025-09-27         1443    10      0     NA    NA     1     2     2     1
##  3   794 Mozambique  c102        c127      Urban    uuid:4c66… 2025-09-25          404     4      0     NA    NA     1     3     3     2
##  4  1898 Mozambique  c103        c165      Rural    uuid:45c8… 2025-08-09          437     3      1     NA     2     1     2     2     1
##  5  2281 Mozambique  c104        c176      Urban    uuid:04e1… 2025-09-22         3718     9      1     NA     1     1     2     2     1
##  6  2310 Mozambique  c104        c177      Urban    uuid:b01c… 2025-09-18         2063     7      1     NA     1     1     2     2     1
##  7  2321 Mozambique  c104        c177      Urban    uuid:0f3a… 2025-09-18         4206     9      1     NA     1     1     3     3     1
##  8  3251 Mozambique  c104        c203      Rural    uuid:3877… 2025-09-21          643     3      0     NA    NA     0     3     3     4
##  9  3355 Mozambique  c105        c206      Urban    uuid:0f4a… 2025-09-22         2927     7      1     NA     1     0     2     2     1
## 10  3378 Mozambique  c105        c207      Urban    uuid:7ecd… 2025-09-26         1052     5      1     NA     1     1     2     2     1
## # ℹ 380 more rows
## # ℹ 131 more variables: hh07d <dbl>, hh07e <dbl>, hh07f <dbl>, hh07g <dbl>, hh07h <dbl>, hh07i <dbl>, hh07j <dbl>, hh07k <dbl>,
## #   hh07l <dbl>, hh07m <dbl>, hh07n <dbl>, hh07o <dbl>, hh07p <dbl>, hh07p_3 <dbl>, hh07q <dbl>, ChildID <chr>, ch02 <dbl>, ch03 <dbl>,
## #   ch04a <dbl>, ch04b <dbl>, ch04c <dbl>, ch04d <dbl>, ch04e <dbl>, ch04f <dbl>, ch05 <dbl>, ch06a <dbl>, ch06b <dbl>, ch06c <dbl>,
## #   ch06d <dbl>, ch06e <dbl>, ch07a <dbl>, ch07b <dbl>, ch07c <dbl>, ch07d <dbl>, ch08 <dbl>, ch09 <dbl>, ch10a <dbl>, ch10b <dbl>,
## #   ch10c <dbl>, pt00 <dbl>, pt01b <dbl>, pt01c <dbl>, pt01d <dbl>, pt01e <dbl>, pt01f <dbl>, pt02b <dbl>, pt02c <dbl>, pt02d <dbl>,
## #   pt02e <dbl>, pt02f <dbl>, assessment <dbl>, sample <dbl>, l1_1 <dbl>, l1_2 <dbl>, l1_3 <dbl>, l1_4 <dbl>, l2_1 <dbl>, l2_2 <dbl>, …

Finally, here’s an example of combining filter and arrange operations with the pipe:

dat %>% 
  filter(CountryName == "Mozambique" & ReadingIRTScore > 1.5) %>% 
  arrange(desc(ch02))
## # A tibble: 390 × 147
##     ...1 CountryName TierOneUnit VillageID Location HHID       SubmissionDate duration hh06a hh06b1 hh06b2 hh06c hh06d hh07a hh07b hh07c
##    <dbl> <chr>       <chr>       <chr>     <chr>    <chr>      <date>            <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1    35 Mozambique  c101        c101      Urban    uuid:9a7b… 2025-09-25         2868     8      1     NA     1     1     2     2     1
##  2   677 Mozambique  c101        c122      Rural    uuid:6354… 2025-09-27         1443    10      0     NA    NA     1     2     2     1
##  3   794 Mozambique  c102        c127      Urban    uuid:4c66… 2025-09-25          404     4      0     NA    NA     1     3     3     2
##  4  1898 Mozambique  c103        c165      Rural    uuid:45c8… 2025-08-09          437     3      1     NA     2     1     2     2     1
##  5  2281 Mozambique  c104        c176      Urban    uuid:04e1… 2025-09-22         3718     9      1     NA     1     1     2     2     1
##  6  2310 Mozambique  c104        c177      Urban    uuid:b01c… 2025-09-18         2063     7      1     NA     1     1     2     2     1
##  7  2321 Mozambique  c104        c177      Urban    uuid:0f3a… 2025-09-18         4206     9      1     NA     1     1     3     3     1
##  8  3251 Mozambique  c104        c203      Rural    uuid:3877… 2025-09-21          643     3      0     NA    NA     0     3     3     4
##  9  3355 Mozambique  c105        c206      Urban    uuid:0f4a… 2025-09-22         2927     7      1     NA     1     0     2     2     1
## 10  3378 Mozambique  c105        c207      Urban    uuid:7ecd… 2025-09-26         1052     5      1     NA     1     1     2     2     1
## # ℹ 380 more rows
## # ℹ 131 more variables: hh07d <dbl>, hh07e <dbl>, hh07f <dbl>, hh07g <dbl>, hh07h <dbl>, hh07i <dbl>, hh07j <dbl>, hh07k <dbl>,
## #   hh07l <dbl>, hh07m <dbl>, hh07n <dbl>, hh07o <dbl>, hh07p <dbl>, hh07p_3 <dbl>, hh07q <dbl>, ChildID <chr>, ch02 <dbl>, ch03 <dbl>,
## #   ch04a <dbl>, ch04b <dbl>, ch04c <dbl>, ch04d <dbl>, ch04e <dbl>, ch04f <dbl>, ch05 <dbl>, ch06a <dbl>, ch06b <dbl>, ch06c <dbl>,
## #   ch06d <dbl>, ch06e <dbl>, ch07a <dbl>, ch07b <dbl>, ch07c <dbl>, ch07d <dbl>, ch08 <dbl>, ch09 <dbl>, ch10a <dbl>, ch10b <dbl>,
## #   ch10c <dbl>, pt00 <dbl>, pt01b <dbl>, pt01c <dbl>, pt01d <dbl>, pt01e <dbl>, pt01f <dbl>, pt02b <dbl>, pt02c <dbl>, pt02d <dbl>,
## #   pt02e <dbl>, pt02f <dbl>, assessment <dbl>, sample <dbl>, l1_1 <dbl>, l1_2 <dbl>, l1_3 <dbl>, l1_4 <dbl>, l2_1 <dbl>, l2_2 <dbl>, …

Guided practice

Let’s attempt these exercises:

  1. Arrange children from urban Mali who receive help with homework in descending order using their total assessment time for math
dat %>%
  filter(CountryName == "Mali" & ch10a == 1) %>%
  arrange(desc(IcanAssessTime))
#Tip: Use desc() to sort a column in descending order. Without desc(), arrange() sorts in ascending order by default.

Independent Practice

  1. Filter children who are 5 years old who have a lot of difficulty in seeing and attained an Math IRT score greater than 2 arrange them by descending ICAN assessment time. Combine filter(), arrange(), and %>% into a single pipeline.

  2. Sort dat to find the most slowest child in reading assessment time. Did they meet minimum proficiency level for reading?

  3. How could you use arrange() to sort all missing values to the start? (Hint: use is.na()).

5.6.4 Selecting columns with select()

The ICAN-ICAR 2025 survey data has \(146\) columns. This is not unusual if one works with big data. It is highly likely that there is few columns (or variables) that you would be interested in. In that case, select() allows you to rapidly zoom in on a useful subset using operations based on the names of the variables.

select() is a quite similar to filter() for columns. The syntax is straightforward, the first argument gives the data, and then you list the variables you want to select!

Example

# Select columns by name: 

dat_new = select(dat, ChildID, Location, ch02, MPLBoth)
dat_new
## # A tibble: 96,452 × 4
##    ChildID                                          Location  ch02 MPLBoth
##    <chr>                                            <chr>    <dbl>   <dbl>
##  1 uuid:f5808f62-6a22-4247-ae68-9eb891657449_child1 Urban        8       0
##  2 uuid:b98e2157-5941-457c-8249-cea77f3c4b47_child1 Urban       12       0
##  3 uuid:26117653-2d45-474f-a7ea-6fa591e1bfe7_child1 Urban        8       0
##  4 uuid:d46bcfd9-f3e5-44ff-825e-6371ee6a1b9c_child1 Urban        9       0
##  5 uuid:df927f9a-f03b-480b-ada3-ddfde15b8ae8_child1 Urban       10       0
##  6 uuid:ade5f0d8-768a-479c-956b-976e569ff49d_child1 Urban       13       0
##  7 uuid:3f84d69b-e298-43d8-8c77-0ae0e5ad4ad8_child1 Urban        8       0
##  8 uuid:ff7b1d96-0e69-400c-955e-22097e185a4b_child1 Urban        8       1
##  9 uuid:7730b493-addc-4c60-8a8e-9d31376770c2_child1 Urban       16       0
## 10 uuid:e2108c51-8a38-452a-94f5-dd128cf38990_child1 Urban        8       0
## # ℹ 96,442 more rows

To exclude variables just put a minus sign in front of them:

Example

select(dat_new, -Location)
## # A tibble: 96,452 × 3
##    ChildID                                           ch02 MPLBoth
##    <chr>                                            <dbl>   <dbl>
##  1 uuid:f5808f62-6a22-4247-ae68-9eb891657449_child1     8       0
##  2 uuid:b98e2157-5941-457c-8249-cea77f3c4b47_child1    12       0
##  3 uuid:26117653-2d45-474f-a7ea-6fa591e1bfe7_child1     8       0
##  4 uuid:d46bcfd9-f3e5-44ff-825e-6371ee6a1b9c_child1     9       0
##  5 uuid:df927f9a-f03b-480b-ada3-ddfde15b8ae8_child1    10       0
##  6 uuid:ade5f0d8-768a-479c-956b-976e569ff49d_child1    13       0
##  7 uuid:3f84d69b-e298-43d8-8c77-0ae0e5ad4ad8_child1     8       0
##  8 uuid:ff7b1d96-0e69-400c-955e-22097e185a4b_child1     8       1
##  9 uuid:7730b493-addc-4c60-8a8e-9d31376770c2_child1    16       0
## 10 uuid:e2108c51-8a38-452a-94f5-dd128cf38990_child1     8       0
## # ℹ 96,442 more rows

You can also use select() to reorder variables. A useful function here is everything(). This is useful if you have a handful of variables you’d like to move to the start of the data frame.

# reorder so MPLBoth is first 
select(dat_new, MPLBoth, everything())

There are other helper functions that you can use within select(): - starts_with("abc") matches column names that begin with “abc”. - ends_with("xyz") matches column names that end with "xyz". -contains(“ijk”) matches column names that contain “ijk”. - num_range("x", 1:3) matchesx1,x2, andx3`.

# match columns that begin with "l1" but exclude location variable
dat %>%
  select(ChildID, starts_with("l1"))
## # A tibble: 96,452 × 5
##    ChildID                                           l1_1  l1_2  l1_3  l1_4
##    <chr>                                            <dbl> <dbl> <dbl> <dbl>
##  1 uuid:f5808f62-6a22-4247-ae68-9eb891657449_child1     1     1     1     1
##  2 uuid:b98e2157-5941-457c-8249-cea77f3c4b47_child1     2     2     1     2
##  3 uuid:26117653-2d45-474f-a7ea-6fa591e1bfe7_child1     0     0     0     0
##  4 uuid:d46bcfd9-f3e5-44ff-825e-6371ee6a1b9c_child1     2     1     2     2
##  5 uuid:df927f9a-f03b-480b-ada3-ddfde15b8ae8_child1     1     1     1     1
##  6 uuid:ade5f0d8-768a-479c-956b-976e569ff49d_child1     1     1     1     1
##  7 uuid:3f84d69b-e298-43d8-8c77-0ae0e5ad4ad8_child1     1     2     2     2
##  8 uuid:ff7b1d96-0e69-400c-955e-22097e185a4b_child1     2     2     2     2
##  9 uuid:7730b493-addc-4c60-8a8e-9d31376770c2_child1     2     1     2     1
## 10 uuid:e2108c51-8a38-452a-94f5-dd128cf38990_child1     2     1     0     1
## # ℹ 96,442 more rows
# match column names that contain "hh" (these are household-related variables)
dat %>%
  select(ChildID, contains("hh"), -HHWeightProvided)
## # A tibble: 96,452 × 25
##    ChildID     HHID  hh06a hh06b1 hh06b2 hh06c hh06d hh07a hh07b hh07c hh07d hh07e hh07f hh07g hh07h hh07i hh07j hh07k hh07l hh07m hh07n
##    <chr>       <chr> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 uuid:f5808… uuid…     4      1     NA     1     1     2     2     1     1     1     1     1     1     1     0     1     0     1     0
##  2 uuid:b98e2… uuid…     7      1     NA     1     1     2     2     1     1     1     1     1     1     1     1     1     1     1     0
##  3 uuid:26117… uuid…     6      1     NA     1     1     2     2     1     1     1     1     1     1     1     0     1     1     1     1
##  4 uuid:d46bc… uuid…     6      1     NA     1     1     1     1     1     4     1     1     1     1     1     0     1     1     1     0
##  5 uuid:df927… uuid…     5      1     NA     1     1     2     2     1     4     1     1     1     0     0     0     1     0     0    NA
##  6 uuid:ade5f… uuid…     5      1     NA     1     1     2     2     1     4     1     1     1     1     1     0     1     0     1     1
##  7 uuid:3f84d… uuid…     5      1     NA     1     1     2     2     1     1     1     1     1     1     1     1     1     1     1     1
##  8 uuid:ff7b1… uuid…     4      1     NA     1     0     2     2     1     1     1     1     1     1     0     1     1     0     1     1
##  9 uuid:7730b… uuid…     4      1     NA     1     1     2     2     1     4     1     1     1     1     0     0     1     0     1     1
## 10 uuid:e2108… uuid…     5      1     NA     2     1     2     2     1     4     1     1     0     0     0     0     0     1     1     1
## # ℹ 96,442 more rows
## # ℹ 4 more variables: hh07o <dbl>, hh07p <dbl>, hh07p_3 <dbl>, hh07q <dbl>
# match column names with l2.1, l2.2, l2.3
dat %>%
  select(ChildID, num_range("l2_",1:3))
## # A tibble: 96,452 × 4
##    ChildID                                           l2_1  l2_2  l2_3
##    <chr>                                            <dbl> <dbl> <dbl>
##  1 uuid:f5808f62-6a22-4247-ae68-9eb891657449_child1     0     0     0
##  2 uuid:b98e2157-5941-457c-8249-cea77f3c4b47_child1     2     1     2
##  3 uuid:26117653-2d45-474f-a7ea-6fa591e1bfe7_child1     0     0     0
##  4 uuid:d46bcfd9-f3e5-44ff-825e-6371ee6a1b9c_child1     2     2     2
##  5 uuid:df927f9a-f03b-480b-ada3-ddfde15b8ae8_child1     1     1     1
##  6 uuid:ade5f0d8-768a-479c-956b-976e569ff49d_child1     2     2     2
##  7 uuid:3f84d69b-e298-43d8-8c77-0ae0e5ad4ad8_child1     0     2     2
##  8 uuid:ff7b1d96-0e69-400c-955e-22097e185a4b_child1     2     2     2
##  9 uuid:7730b493-addc-4c60-8a8e-9d31376770c2_child1     2     2     2
## 10 uuid:e2108c51-8a38-452a-94f5-dd128cf38990_child1     1     1     1
## # ℹ 96,442 more rows

See ?select() for more details

Independent Practice

  1. Select only the child-relatd columns.

  2. Exclude the numeracy-related items.

5.6.5 Adding new variables with mutate()

Mutating operations add a new column to a dataframe using existing columns. Here’s a trivial example to get started:

mutate(dat, this_is = "stupid")  
## # A tibble: 96,452 × 148
##     ...1 CountryName TierOneUnit VillageID Location HHID       SubmissionDate duration hh06a hh06b1 hh06b2 hh06c hh06d hh07a hh07b hh07c
##    <dbl> <chr>       <chr>       <chr>     <chr>    <chr>      <date>            <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     1 Mozambique  c101        c101      Urban    uuid:f580… 2025-09-23          979     4      1     NA     1     1     2     2     1
##  2     2 Mozambique  c101        c101      Urban    uuid:b98e… 2025-09-23          986     7      1     NA     1     1     2     2     1
##  3     3 Mozambique  c101        c101      Urban    uuid:2611… 2025-09-23         2167     6      1     NA     1     1     2     2     1
##  4     4 Mozambique  c101        c101      Urban    uuid:d46b… 2025-09-23         2078     6      1     NA     1     1     1     1     1
##  5     5 Mozambique  c101        c101      Urban    uuid:df92… 2025-09-23         1251     5      1     NA     1     1     2     2     1
##  6     6 Mozambique  c101        c101      Urban    uuid:ade5… 2025-09-23         2199     5      1     NA     1     1     2     2     1
##  7     7 Mozambique  c101        c101      Urban    uuid:3f84… 2025-09-23         2889     5      1     NA     1     1     2     2     1
##  8     8 Mozambique  c101        c101      Urban    uuid:ff7b… 2025-09-23         1588     4      1     NA     1     0     2     2     1
##  9     9 Mozambique  c101        c101      Urban    uuid:7730… 2025-09-23         2203     4      1     NA     1     1     2     2     1
## 10    10 Mozambique  c101        c101      Urban    uuid:e210… 2025-09-24         4647     5      1     NA     2     1     2     2     1
## # ℹ 96,442 more rows
## # ℹ 132 more variables: hh07d <dbl>, hh07e <dbl>, hh07f <dbl>, hh07g <dbl>, hh07h <dbl>, hh07i <dbl>, hh07j <dbl>, hh07k <dbl>,
## #   hh07l <dbl>, hh07m <dbl>, hh07n <dbl>, hh07o <dbl>, hh07p <dbl>, hh07p_3 <dbl>, hh07q <dbl>, ChildID <chr>, ch02 <dbl>, ch03 <dbl>,
## #   ch04a <dbl>, ch04b <dbl>, ch04c <dbl>, ch04d <dbl>, ch04e <dbl>, ch04f <dbl>, ch05 <dbl>, ch06a <dbl>, ch06b <dbl>, ch06c <dbl>,
## #   ch06d <dbl>, ch06e <dbl>, ch07a <dbl>, ch07b <dbl>, ch07c <dbl>, ch07d <dbl>, ch08 <dbl>, ch09 <dbl>, ch10a <dbl>, ch10b <dbl>,
## #   ch10c <dbl>, pt00 <dbl>, pt01b <dbl>, pt01c <dbl>, pt01d <dbl>, pt01e <dbl>, pt01f <dbl>, pt02b <dbl>, pt02c <dbl>, pt02d <dbl>,
## #   pt02e <dbl>, pt02f <dbl>, assessment <dbl>, sample <dbl>, l1_1 <dbl>, l1_2 <dbl>, l1_3 <dbl>, l1_4 <dbl>, l2_1 <dbl>, l2_2 <dbl>, …

Now we have \(147\) columns, including this_is.

A more useful use of mutate is to construct a new variable based on existing variables. This is the way that mutate is almost always used.

Example

# create "math_savvy" column using MathIRTScore if the child has a score above 2
dat_new = dat %>%
mutate(math_savvy = if_else(MathIRTScore > 2, "Yes", "No"))

Hopefully, you’re getting used to the pipe by now, so let’s embed a mutating operation within a larger pipe than we’ve used before.

# count using count() how many children are math-savvy  across countries
# first exclude all NAs from math_savvy column

dat %>%
  mutate(math_savvy = if_else(MathIRTScore > 2, "Yes", "No")) %>%
  filter(!is.na(math_savvy)) %>%
  count(CountryName, math_savvy)
## # A tibble: 12 × 3
##    CountryName math_savvy     n
##    <chr>       <chr>      <int>
##  1 Bangladesh  No          6479
##  2 Kenya       No          6669
##  3 Mali        No          9588
##  4 Mexico      No          8109
##  5 Mozambique  No          8020
##  6 Nepal       No          4694
##  7 Nicaragua   No          6230
##  8 Pakistan    No          7236
##  9 Pakistan    Yes         1966
## 10 Senegal     No          8098
## 11 Tanzania    No         13166
## 12 Uganda      No          8883

They are all from Pakistan!

Independent Practice

  1. Convert both the ICAN and ICAR assessment time for each assessed child from seconds to hours. Extract all assessed children that took more than hour for both ICAN and ICAR assessments.

5.6.6 Aggregating over rows with summarise()

The summarise() verb (or summarize() will also work) summarises the rows in a data frame in some way. When applied to the whole data frame, it will collapse it to a single row. For example, here we calculate the average age and the median Reading IRT Score for children in Uganda living in rural areas:

dat %>% 
  filter(CountryName == "Uganda") %>%
  summarise(mean_age = mean(ch02),
            median_irt_score = median(ReadingIRTScore, na.rm = TRUE))
## # A tibble: 1 × 2
##   mean_age median_irt_score
##      <dbl>            <dbl>
## 1     9.90           -0.538

You need to watch out for NAs when using summarise(). If one exists, operations like mean() will return NA. You can exclude NAs from calculations using na.rm = TRUE.

summarise() is most useful when combined with group_by(), which imposes a grouping structure on a data frame. After applying group_by(), subsequent dplyr verbs will be applied to individual groups, basically repeating the code for each group. That means that summarise() will calculate a summary for each group:

# tell dplyr to group the data by Country
dat_new = dat %>% 
            group_by(CountryName)

# apply summarize() to see how many children in each country surveyed
dat_new  %>% 
  summarize(count = n())
## # A tibble: 11 × 2
##    CountryName count
##    <chr>       <int>
##  1 Bangladesh   6664
##  2 Kenya        7076
##  3 Mali        10091
##  4 Mexico       8306
##  5 Mozambique   8255
##  6 Nepal        4801
##  7 Nicaragua    7310
##  8 Pakistan    10510
##  9 Senegal      9117
## 10 Tanzania    14795
## 11 Uganda       9527
# get sorted counts (and present differently)
dat %>% 
group_by(CountryName) %>% 
summarize(count = n()) %>% 
arrange(desc(count)) %>% 
head(10)    # take first ten rows
## # A tibble: 10 × 2
##    CountryName count
##    <chr>       <int>
##  1 Tanzania    14795
##  2 Pakistan    10510
##  3 Mali        10091
##  4 Uganda       9527
##  5 Senegal      9117
##  6 Mexico       8306
##  7 Mozambique   8255
##  8 Nicaragua    7310
##  9 Kenya        7076
## 10 Bangladesh   6664

You can also pass your own summary functions to the pipe:

# my own function, computes the 60% quantile of the Reading IRT Score for each country

compute_60q = function(x){quantile(x, probs = 0.60, na.rm = TRUE)}

# use it in a grouped summary
dat %>%
  group_by(CountryName) %>%
  summarize(count = n(), q60 = compute_60q(ReadingIRTScore)) %>% 
  head(10)
## # A tibble: 10 × 3
##    CountryName count    q60
##    <chr>       <int>  <dbl>
##  1 Bangladesh   6664  0.356
##  2 Kenya        7076  0.451
##  3 Mali        10091 -1.74 
##  4 Mexico       8306  0.629
##  5 Mozambique   8255 -0.991
##  6 Nepal        4801  0.392
##  7 Nicaragua    7310  0.647
##  8 Pakistan    10510  0.439
##  9 Senegal      9117 -0.164
## 10 Tanzania    14795  0.315