We will use different data to illustrate Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random. First, we will present the three missingness situations, and afterwards, we show some tools to analyse the missingness patterns. There will be some redundancy of the plots used, but so the reader can chose the preferred tool.
Here you see a list of the packages we use for these analyses (click on the code button if you donโt see the black boxes with the code).
This page is still work in progress, this is the version from 13-01-2022. If you want to report an error, you can do this here Click to send an e-mail to Roger Hilfiker.
::opts_chunk$set(echo = TRUE)
knitr<- getOption("repos")
r "CRAN"] <- "https://stat.ethz.ch/CRAN/"
r[options(repos = r)
<- c("bookdown","rmarkdown" ,"knitr","rio", "psych","janitor",
list.of.packages "tidyverse","jtools","summarytools", "qgraph", "gtsummary" , "viridis", "wesanderson", "missMethods", "ggpubr", "ggrepel", "naniar", "finalfit", "missMethods", "rpart", "rpart.plot")
<- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
new.packages if(length(new.packages)) install.packages(new.packages)
library(summarytools)
library(psych)
library(janitor)
library(sjlabelled)
library(tidyverse)
library(gtsummary)
library(viridis)
library(wesanderson)
library(missMethods)
library(ggpubr)
library(ggrepel)
library(naniar)
library(finalfit)
library(rpart)
library(rpart.plot)
library(mice)
library(knitr)
We use some data from a published study and then we delete randomly some data.
This downloaded dataset has no missing values. So we will present the analysis without the missing values. Then, we will delete some values: We can simulate Missing Completely at Random values with the package missMethods, see here for a tutorial:.
Load the data directly from the web See the article here:
<-rio::import("https://doi.org/10.1371/journal.pone.0262238.s008", format="xlsx")
df1<-df1 %>%
df1rename(WalkingDistance_m_6min=Distance30) %>%
select(Number, ActiveSmoking, Age, Sex, COPDduration,FEV1, WalkingDistance_m_6min )
Below you see a summary of the data, here still without missing data:
options(width = 300)
summary(df1)
## Number ActiveSmoking Age Sex COPDduration FEV1 WalkingDistance_m_6min
## Min. : 1.00 Min. : 0.0 Min. :53.00 Min. :1.00 Min. : 1.00 Min. :0.440 Min. :214.7
## 1st Qu.:14.25 1st Qu.: 0.0 1st Qu.:64.00 1st Qu.:1.00 1st Qu.: 7.25 1st Qu.:1.015 1st Qu.:304.8
## Median :27.50 Median : 0.0 Median :68.50 Median :1.00 Median :16.00 Median :1.540 Median :371.4
## Mean :28.10 Mean :160.1 Mean :69.08 Mean :1.08 Mean :25.40 Mean :1.496 Mean :359.8
## 3rd Qu.:41.75 3rd Qu.: 1.0 3rd Qu.:76.00 3rd Qu.:1.00 3rd Qu.:44.25 3rd Qu.:1.805 3rd Qu.:408.0
## Max. :56.00 Max. :999.0 Max. :80.00 Max. :2.00 Max. :72.00 Max. :2.530 Max. :555.3
We plot the association between FEV1 and six minute walking distance: