Open Rstudio to do the practicals. Note that tasks with * are optional.
In this practical, a number of R packages are used. The packages used (with versions that were used to generate the solutions) are:
survival
(version: 3.2.11)R version 4.1.0 (2021-05-18)
Explore the help page of the R function mcnemar.test()
.
Performance
.Use the R function mcnemar.test() to investigate whether the percentages of the two groups are different in a matched setting.
<-
Performance matrix(c(794, 86, 150, 570),
nrow = 2,
dimnames = list("1st Survey" = c("Approve", "Disapprove"),
"2nd Survey" = c("Approve", "Disapprove")))
Performance
## 2nd Survey
## 1st Survey Approve Disapprove
## Approve 794 150
## Disapprove 86 570
Null hypothesis: the percentage of approve in the 1st survey is the same with the percentage of approve in the 2nd survey.
Alternative hypothesis: the percentage of approve in the 1st survey is different compared to the percentage of approve in the 2nd survey.
<- mcnemar.test(Performance)
test_res_cat_paired test_res_cat_paired
##
## McNemar's Chi-squared test with continuity correction
##
## data: Performance
## McNemar's chi-squared = 16.818, df = 1, p-value = 4.115e-05
If we assume that the significant level is equal to 0.05, the output indicates that we can reject the null hypothesis that the percentages are the same.
The test statistic (with continuity correction) is: \(X^2=\frac{(\mid b-c \mid -1)^2}{b+c}\),
where
Approve (2st Survey) | Disapprove (2st Survey) | Total | |
---|---|---|---|
Approve (1st Survey) | a | b | a + b |
Disapprove (1st Survey) | c | d | c + d |
Total | a + c | b + d | n |
<- (abs(150-86)-1)^2/(150+86)
test_statistic test_statistic
## [1] 16.8178
id
: a vector that consist of the numbers 1 until 100 (integers) where each number is repeated 2 timescase
: a vector that repeats c(0, 1)
100 timesresponse
: a vector of length 200 with randomly selected 0 and 1 valuestable(response[case == 0], response[case == 1])
to generate the paired table.Use the R function table() to create the paired table.
Use the R function mcnemar.test() to investigate whether the percentages of the two groups are different in a matched setting.
Use the R function clogit() from the survival package to perform a conditional logistic regression.
# Generate the data
set.seed(123)
= 100
n <- rep(x = 1:n, each = 2)
id <- rep(x = 0:1, times = n)
case <- rbinom(n = n*2, size = 1, prob = 0.5)
response
# Create the paired table
table(response[case == 0], response[case == 1])
##
## 0 1
## 0 25 23
## 1 30 22
# Perform the McNemar test
mcnemar.test(table(response[case == 0], response[case == 1]))
##
## McNemar's Chi-squared test with continuity correction
##
## data: table(response[case == 0], response[case == 1])
## McNemar's chi-squared = 0.67925, df = 1, p-value = 0.4098
If we assume that the significant level is equal to 0.05, the output indicates that we cannot reject the null hypothesis that the percentages are the same.
# Fit the conditional logistic model
library(survival)
summary(clogit(response ~ case + strata(id)))
## Call:
## coxph(formula = Surv(rep(1, 200L), response) ~ case + strata(id),
## method = "exact")
##
## n= 200, number of events= 97
##
## coef exp(coef) se(coef) z Pr(>|z|)
## case -0.2657 0.7667 0.2771 -0.959 0.338
##
## exp(coef) exp(-coef) lower .95 upper .95
## case 0.7667 1.304 0.4453 1.32
##
## Concordance= 0.566 (se = 0.096 )
## Likelihood ratio test= 0.93 on 1 df, p=0.3
## Wald test = 0.92 on 1 df, p=0.3
## Score (logrank) test = 0.92 on 1 df, p=0.3
The p-values are similar. You can repeat the analysis assuming a different set.seed().
Use the R function table() to create the paired table.
Use the R function mcnemar.test() to investigate whether the percentages of the two groups are different in a matched setting.
Use the R function p.adjust() to return p-values adjusted for multiple testing.
# Create the data:
set.seed(123)
<- sample(x = c(0, 1), size = 500, replace = TRUE)
before <- sample(x = c(0, 1), size = 500, replace = TRUE)
after <- sample(x = c("males", "females"), size = 500, replace = TRUE)
sex
<- data.frame(before, after, sex)
dat
# Create the paired table for males:
<- table(dat$before[dat$sex == "males"], dat$after[dat$sex == "males"])
tab_males
# Perform the test for males:
<- mcnemar.test(tab_males)
test_males
# Create the paired table for females:
<- table(dat$before[dat$sex == "females"], dat$after[dat$sex == "females"])
tab_females
# Perform the test for females :
<- mcnemar.test(tab_females)
test_females
# Extract the p-value from the tests:
<- test_males$p.value
p_males <- test_females$p.value
p_females
# Apply the correction for multiple testing:
p.adjust(p = c(p_males, p_females), method = "bonferroni")
## [1] 1.0000000 0.9624163
# Alternatively
<- 2
m pmin(c(p_males, p_females) * m, 1)
## [1] 1.0000000 0.9624163
Explore the pre-loaded data set iris
.
setosa
and versicolor
setosa
and virginica
versicolor
and virginica
Since the data looks normally distributed and we have 50 subjects, we can use the t-test. Use the R function t.test() to investigate whether the mean values of two sample groups are different.
Use the R function p.adjust() to return p-values adjusted for multiple testing.
# Explore the data:
head(iris)
boxplot(iris$Sepal.Length ~ iris$Species)
par(mfrow = c(1, 3))
hist(iris$Sepal.Length[iris$Species == "setosa"],
main = "setosa", xlab = "Length")
hist(iris$Sepal.Length[iris$Species == "versicolor"],
main = "versicolor", xlab = "Length")
hist(iris$Sepal.Length[iris$Species == "virginica"],
main = "virginica", xlab = "Length")
# Perform the tests:
<- t.test(x = iris$Sepal.Length[iris$Species == "setosa"],
test1 y = iris$Sepal.Length[iris$Species == "versicolor"])
<- t.test(x = iris$Sepal.Length[iris$Species == "setosa"],
test2 y = iris$Sepal.Length[iris$Species == "virginica"])
<- t.test(x = iris$Sepal.Length[iris$Species == "versicolor"],
test3 y = iris$Sepal.Length[iris$Species == "virginica"])
# Apply the correction for multiple testing:
p.adjust(p = c(test1$p.value, test2$p.value, test3$p.value), method = "holm")
## [1] 7.493485e-17 1.190060e-24 1.866144e-07
# Alternatively
<- c(test1$p.value, test2$p.value, test3$p.value)
p
# order the p-values from the smallest to the largest:
<- p[order(p)]
p_order
# set the number of tests:
<- 3
m
# obtain the 1st adjusted p-value:
<- 1
i <- (m-i+1) * p_order[i]
p_adjust_1
# obtain the 2st adjusted p-value:
<- 2
i <- (m-i+1) * p_order[i]
p_adjust_2
# obtain the 3st adjusted p-value:
<- 3
i <- (m-i+1) * p_order[i]
p_adjust_3
# bring the adjusted p-values in the correct order:
c(p_adjust_1, p_adjust_2, p_adjust_3)[order(p)]
## [1] 7.493485e-17 1.190060e-24 1.866144e-07
© Eleni-Rosalina Andrinopoulou