Statistical tests

Test Variable 1 Variable 2 Research question example
Pearson correlation test Quantitative Quantitative Is the height of the students in my university linearly associated with their weight?
Spearman correlation test Quantitative Quantitative Is the height of the students in my university monotonically associated with their weight?
One sample t-test Quantitative - Is the mean BMI of the students in my university different from a fixed value?
One-sample Wilcoxon signed rank test Quantitative - Is the median score value of the students in my university different from a fixed value?
Two sample t-test (independent samples) Quantitative Categorical (2 categories) Is the mean BMI of the students from group 1 different from the mean BMI of the students of group 2?
Two-sample Wilcoxon rank sum test Quantitative Categorical (2 categories) Is the distribution of the score values of the students in my university different from the distribution of the score values of the students from a different university?
Two sample t-test (dependent samples) Quantitative Categorical (2 categories) Is the mean BMI of the students before the exams different from the mean BMI of the students after the exams?
Two-sample Wilcoxon signed-rank test Quantitative Categorical (2 categories) Is the median score value of the students in my university different this year compared to next year?
M sample test (Analysis of variance, F-test) Quantitative Categorical (>2 categories) Is the mean BMI of the students different in groups 1, 2 and 3?
M-sample Kruskal-Wallis test Quantitative Categorical (>2 categories) Is the distribution of the score values of the students different in groups 1, 2 and 3?
Chi-square test Categorical Categorical Is there a relationship between gender and whether or not someone followed online course? (test whether two variables are related or independent)
Fisher exact test Categorical Categorical Is there a relationship between gender and whether or not someone followed online course? (test whether two variables are related or independent)
z-test for proportions (one sample) Categorical - Is the probability of being diagnosed with asthma different than a fixed percentage?
Binomial test Categorical - Is the probability of being diagnosed with asthma different than a fixed percentage?
z-test for proportions (two sample) Categorical Categorical Is the probability of being diagnosed with asthma in the Netherlands different than in Belgium?
Mc Nemar test Categorical Categorical Is there a difference in the percentage of patients with asthma between the placebo and the drug group (matched data)?

*Quantitative variables: Continuous, discrete
Categorical variables: Binary, nominal

Continuous data

Correlation test (parametric)

For two continuous variables: investigate the linear association between two continuous variables

Pearson correlation test

Assumptions

  • The variables must be continuous.
  • There is a linear relationship between the two variables.
  • The data has homoscedasticity. Homoscedasticity is equal variance of one variable for every value of the other variable.
  • The variables should be approximately normally distributed.
    • statistical tests such as Kolmogorov-Smirnov and Shapiro-Wilk should be used with caution (e.g., always visualize the data).
  • The two variables represent paired observations.
  • The variables should not contain any outliers.
Theory

Scenario I

Is the height of the students in my university linearly associated with their weight?

Connection with linear regression

The slope becomes the correlation if the two variables have identical standard deviations.

\(scaled(y_i) = \beta_0 + \beta_1scaled(x_i) + \epsilon_i\)

\(H_0: \beta_1 = 0\)
\(H_1: \beta_1 \neq 0\)

Alternatively

\(H_0: \rho = 0\)
\(H_1: \rho \neq 0\)

If one-tailed
Is the height of the students in my university linearly increasing with the increase of weight?
\(H_0:\rho = 0\)
\(H_1:\rho > 0\)

or

Is the height of the students in my university linearly decreasing with the increase of weight?
\(H_0: \rho = 0\)
\(H_1: \rho < 0\)

Test statistic

\(t = \frac{r \sqrt{n-2}}{\sqrt{1-r^2}}\)

  • Sample correlation: \(r\)
  • Number of subjects: \(n\)

Sampling distribution

The sampling distribution of the correlation coefficient follows Student’s \(t\)-distribution.

Degrees of freedom

df = number of subjects - 2

Type I error

Choose the probability of the type I error (\(\alpha\)). Common choice 5%. Less than 5% chance of getting such an extreme value by chance.

Critical values

Get critical values\(_{\alpha/2}\) and p-value from the t-distribution.

If one-tailed
Get critical value\(_{\alpha}\) and p-value from the t-distribution.

Draw conclusions

Compare test statistic (\(t\)) with the critical values\(_{\alpha/2}\) or the \(p-value\) with \(\alpha\).

If the \(t\) is in the purple area or the \(p-value\) < \(\alpha\), we reject \(H_0\).

If one-tailed
Compare test statistic with the critical value\(_{\alpha}\) or the p−value with \(\alpha\).

Application

Scenario I

Is the height of the students in my university linearly associated with their weight?

Hypothesis

\(H_0: \rho = 0\)
\(H_1: \rho \neq 0\)

If one-tailed
Is the height of the students in my university linearly increasing with the increase of weight?
\(H_0:\rho = 0\)
\(H_1:\rho > 0\)

or

Is the height of the students in my university linearly decreasing with the increase of weight?
\(H_0: \rho = 0\)
\(H_1: \rho < 0\)

Collect and visualize data

Test statistic

Let’s assume that:

  • Sample correlation \(r = 0.83\)
  • Number of subjects \(n = 50\)

\(t = \frac{r \sqrt{n-2}}{\sqrt{1-r^2}} = \frac{0.83 \sqrt{50-2}}{\sqrt{1-0.83^2}} = 10.31\)

Degrees of freedom

df = number of subjects \(− 2 = 50 − 2 = 48\)

Type I error

We assume to have \(\alpha\) = 0.05.

Critical values

With the help of R we obtain the critical values:

critical value\(_{\alpha/2}:\)

qt(p = 0.05/2, 48, lower.tail = FALSE)
## [1] 2.010635

-critical value\(_{\alpha/2}:\)

qt(p = 0.05/2, 48, lower.tail = TRUE)
## [1] -2.010635

If one-sided
critical value\(_{\alpha}\)
qt(0.05, df, lower.tail = FALSE)

or

-critical value\(_{\alpha}\)
qt(0.05, df, lower.tail = TRUE)

Draw conclusions

We reject the \(H_0\) if: t > critical value\(_{\alpha/2}\) or t < - critical value\(_{\alpha/2}\)
In our example we have 10.31 > 2.01
Therefore, we reject the \(H_0\).

With the help of R we obtain the p-value from the t-distribution:

2 * pt(10.31, df = 48, lower.tail = FALSE)
[1] 9.23435e-14
# The following code will provide the same result
# 2 * pt(-10.31, df = 48, lower.tail = TRUE)

If one-tailed
We reject the \(H_0\) if: t > critical value\(_{alpha}\)
With the help of R we obtain the p-value from the t-distribution:
pt(t, df, lower.tail = FALSE)

or

We reject the \(H_0\) if: t < -critical value\(_{alpha}\)
With the help of R we obtain the p-value from the t-distribution: pt(t, df, lower.tail = TRUE)

Correlation test (non-parametric)

For two continuous/ordered variables: investigate the monotonic association between two continuous variables

Spearman correlation test

Assumptions

  • The variables must be continuous/ordinal.
  • There is a monotonic relationship between the two variables. In a monotonic relationship, the variables tend to change together, but not always at a constant rate (as in the linear case).
  • The two variables represent paired observations.
Theory

Scenario I

Is the height of the students in my university monotonically associated with their weight?

Connection with linear regression

The slope becomes the correlation if we use the rank of the two variables of interest.

What is rank?
Ranks are integers indicating the rank of some values. E.g. the rank of 3, 10, 16, 6, 2 is 2, 4, 5, 3, 1:

rank(c(3, 10, 16, 6, 2))
[1] 2 4 5 3 1

\(rank(y_i) = \beta_0 + \beta_1rank(x_i) + \epsilon_i\)

\(H_0: \beta_1 = 0\)
\(H_1: \beta_1 \neq 0\)

Alternatively

\(H_0: \rho = 0\)
\(H_1: \rho \neq 0\)

If one-tailed
Is the height of the students in my university monotonically increasing with the increase of weight?
\(H_0:\rho = 0\)
\(H_1:\rho > 0\)

or

Is the height of the students in my university monotonically decreasing with the increase of weight?
\(H0: \rho = 0\)
\(H1: \rho < 0\)

Test statistic

\(t = \frac{r_R \sqrt{n-2}}{\sqrt{1-r_R^2}}\)

  • Sample correlation: \(r_R\) (remember that the Spearman correlation coefficient is based on the ranked values for each variable rather than the raw data)
  • Number of subjects: \(n\)

Degrees of freedom

df = number of subjects - 2

Type I error

Choose the probability of the type I error (\(\alpha\)). Common choice 5%. Less than 5% chance of getting such an extreme value by chance.

Critical values

Get critical values\(_{\alpha/2}\) and p-value from the t-distribution.

If one-tailed
Get critical value\(_{\alpha}\) and p-value from the t-distribution.

Draw conclusions

Compare test statistic (\(t\)) with the critical values\(_{\alpha/2}\) or the \(p-value\) with \(\alpha\).

If the \(t\) is in the purple area or the \(p-value\) < \(\alpha\), we reject \(H_0\).

If one-tailed
Compare test statistic with the critical value\(_{\alpha}\) or the p−value with \(\alpha\).

Application

Scenario I

Is the height of the students in my university monotonically associated with their weight?

Hypothesis

\(H_0: \rho = 0\)
\(H_1: \rho \neq 0\)

If one-tailed
Is the height of the students in my university monotonically increasing with the increase of weight?
\(H_0:\rho = 0\)
\(H_1:\rho > 0\)

or

Is the height of the students in my university monotonically decreasing with the increase of weight?
\(H0: \rho = 0\)
\(H1: \rho < 0\)

Collect and visualize data

x y rank_x rank_y
1.72 49 2.0 2.0
1.85 57 7.0 3.0
1.81 62 5.5 5.0
1.81 75 5.5 7.0
1.92 76 8.5 9.5
1.36 42 1.0 1.0
1.79 76 4.0 9.5
1.92 61 8.5 4.0
1.74 75 3.0 7.0
2.09 75 10.0 7.0

Test statistic

Let’s assume that:

  • Sample correlation \(r_R = 0.41\)
  • Number of subjects \(n = 10\)

\(t = \frac{r_R \sqrt{n-2}}{\sqrt{1-r_R^2}} = \frac{0.41 \sqrt{10-2}}{\sqrt{1-0.41^2}} = 1.27\)

Degrees of freedom

df = number of subjects \(− 2 = 10 − 2 = 8\)

Type I error

We assume to have \(\alpha\) = 0.05.

Critical values

With the help of R we obtain the critical values:

critical value\(_{\alpha/2}:\)

qt(p = 0.05/2, df = 8, lower.tail = FALSE)
## [1] 2.306004

-critical value\(_{\alpha/2}:\)

qt(p = 0.05/2, df = 8, lower.tail = TRUE)
## [1] -2.306004

If one-sided
critical value\(_{\alpha}\)
qt(0.05, df, lower.tail = FALSE)

or

-critical value\(_{\alpha}\)
qt(0.05, df, lower.tail = TRUE)

Draw conclusions

We reject the \(H_0\) if: t > critical value\(_{\alpha/2}\) or t < - critical value\(_{\alpha/2}\)
In our example we have 1.27 < 2.31
Therefore, we do not reject the \(H_0\).

With the help of R we obtain the p-value from the t-distribution:

2 * pt(q = 1.27, df = 8, lower.tail = FALSE)
[1] 0.2397765

If one-tailed
We reject the \(H_0\) if: t > critical value\(_{\alpha}\)
With the help of R we obtain the p-value from the t-distribution:
pt(t, df, lower.tail = FALSE)

or

We reject the \(H_0\) if: t < -critical value\(_{\alpha}\)
With the help of R we obtain the p-value from the t-distribution:
pt(t, df, lower.tail = TRUE)

Comparison of means (parametric)

The t-test is a popular statistical tool used to investigate the difference between one group’s mean (average) and a standard value, or the differences between the means (averages) of two groups.

One sample t-test

For one continuous variable: investigate the difference between one group’s mean (average) and a standard value

Assumptions

  • The dependent variable(s) must be continuous.
  • The observations are independent of one another.
  • The dependent variable(s) should be approximately normally distributed.
    • statistical tests such as Kolmogorov-Smirnov and Shapiro-Wilk should be used with caution (e.g., always visualize the data).
  • The dependent variable(s) should not contain any outliers.
Theory

Scenario I

Is the mean BMI of the students in my university different from the BMI of all students?

Connection with linear regression

\(y_i = \beta_0 + \beta_1 x_i + \epsilon_i\) where \(x_i = 0\)
\(H_0: \beta_0 = 0\)
\(H_1: \beta_0 \neq 0\)

Alternatively

\(H_0: \mu = 0\)
\(H_1: \mu \neq 0\)
where
\(\mu\) is the mean of students in my university

More general

\(H_0: \mu = \mu_0\)
\(H_1: \mu \neq \mu_0\)
where
\(\mu_0\) is the mean of all students

If one-tailed
Is the mean score value of the students in my university larger than the score value of all students?
\(H_0:\mu = \mu_0\)
\(H_1:\mu > \mu_0\)

or

Is the mean score value of the students in my university smaller than the score value of all students?
\(H_0: \mu = \mu_0\)
\(H_1: \mu < \mu_0\)

Test statistic

\(t = \frac{\bar{x} - \mu_0}{sd(x)/\sqrt{n}}\)

  • Sample mean: \(\bar{x}\) (sample of students in my university)
  • Population mean: \(\mu_0\) (all students)
  • Standard deviation of the sample: \(sd(x)\)
  • Number of subjects: \(n\)

Sampling distribution

The sampling distribution of the mean follows Student’s t-distribution.

Degrees of freedom

df = number of subjects - 1

Type I error

Choose the probability of the type I error (\(\alpha\)). Common choice 5%. Less than 5% chance of getting such an extreme value by chance.

Critical values

Get critical values\(_{\alpha/2}\) and p-value from the t-distribution.

If one-tailed
Get critical value\(_{\alpha}\) and p-value from the t-distribution.

Draw conclusions

Compare test statistic (\(t\)) with the critical values\(_{\alpha/2}\) or the \(p-value\) with \(\alpha\).

If the \(t\) is in the purple area or the \(p-value\) < \(\alpha\), we reject \(H_0\).

If one-tailed
Compare test statistic with the critical value\(_{\alpha}\) or the p−value with \(\alpha\).

Application

Scenario I

Is the mean BMI of the students in my university different from the BMI of all students?

Hypothesis

\(H_0: \mu = \mu_0\)
\(H_1: \mu \neq \mu_0\)

If one-tailed
Is the median score value of the students in my university larger than the score value of all students?
\(H_0:\mu = \mu_0\)
\(H_1:\mu > \mu_0\)

or

Is the median score value of the students in my university smaller than the score value of all students?
\(H_0: \mu = \mu_0\)
\(H_1: \mu < \mu_0\)

Collect and visualize data

From a quick look at the histogram, we see that the data look roughly bell-shaped, so our assumption of a normal distribution seems reasonable.

Test statistic

Let’s assume that:
- Sample mean \(\bar{x} = 24\)
- Mean of all students \(\mu_0 = 20\)
- Standard deviation of the sample \(sd(x) = 6\)
- Number of subjects \(n = 50\)
Then the test statistic will be: \(t = \frac{\bar{x} - \mu_0}{sd(x)/\sqrt{n}} = \frac{24-20}{6/\sqrt{50}} = 4.7\)

Degrees of freedom

df = number of subjects \(- 1 = 50 - 1 = 49\)

Type I error

We assume to have \(\alpha\) = 0.05.

Critical values

With the help of R we obtain the critical values:

critical value\(_{\alpha/2}\):

qt(p = 0.05/2, df = 49, lower.tail = FALSE)
[1] 2.009575

-critical value\(_{\alpha/2}\):

qt(p = 0.05/2, df = 49, lower.tail = TRUE)
[1] -2.009575

If one-sided
critical value\(_{\alpha}\)
qt(p = 0.05, df, lower.tail = FALSE)

or

-critical value\(_{\alpha}\)
qt(p = 0.05, df, lower.tail = TRUE)

Draw conclusions

We reject the \(H_0\) if: t > critical value\(_{\alpha/2}\) or t < - critical value\(_{\alpha/2}\)
In our example we have 4.7 > 2.01.
Therefore, we reject the \(H_0\).

With the help of R we obtain the p-value from the t-distribution:

2 * pt(q = 4.7, df = 49, lower.tail = FALSE)
[1] 2.146314e-05

If one-tailed
We reject the \(H_0\) if: t > critical value\(_{\alpha}\)
With the help of R we obtain the p-value from the t-distribution:
pt(q = t, df, lower.tail = FALSE)

or

We reject the \(H_0\) if: t < -critical value\(_{\alpha}\)
With the help of R we obtain the p-value from the t-distribution:
pt(q = t, df, lower.tail = TRUE)

Two sample t-test (independent samples)

For one continuous and one categorical variable: investigate the difference between the means (averages) of two indepedent groups.

Assumptions

  • The dependent variable(s) must be continuous.
  • The observations are independent of one another.
  • The dependent variable(s) should be approximately normally distributed.
    • statistical tests such as Kolmogorov-Smirnov and Shapiro-Wilk should be used with caution (e.g., always visualize the data).
  • The dependent variable(s) should not contain any outliers.
Theory

Scenario II

Is the mean BMI of the students from group 1 different from the mean BMI of the students of group 2?

Connection with linear regression

\(y_i = \beta_0 + \beta_1 x_i + \epsilon_i\) where \(x_i\) indicates whether a patient was in group 1 or in group 2.
\(H_0: \beta_1 = 0\)
\(H_1: \beta_1 \neq 0\)

Alternatively

\(H_0: \mu_1 = \mu_2\) or \((\mu_1 - \mu_2 = 0)\)
\(H_1: \mu_1 \neq \mu_2\) or \((\mu_1 - \mu_2 \neq 0)\)
where
\(\mu_1\) is the mean BMI of all students in group 1
\(\mu_2\) is the mean BMI of all students in group 2

If one-tailed
Is the mean BMI of the students from group 1 larger than the mean BMI of the students of group 2?
\(H_0:\mu_1 = \mu_2\)
\(H_1:\mu_1 > \mu_2\)

or

Is the mean BMI of the students from group 1 smaller than the mean BMI of the students of group 2?
\(H0: \mu_1 = \mu_2\)
\(H1: \mu_1 < \mu_2\)

Test statistic T

If the variance of the two groups are equivalent we use the t-statistic:

\(t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{sd(x)^2}{n_1} + \frac{sd(x)^2}{n_2}}}\), where

\(sd^2(x) = \frac{\sum (x_1 - \bar{x}_1)^2 + \sum (x_2 - \bar{x}_2)^2}{n_1 + n_2 - 2}\)

If the variances of the two groups being compared are not equal we use the Welch t-statistic:

\(t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{sd_1^2(x)}{n_1} + \frac{sd_2^2(x)}{n_2}}}\)

  • Sample mean of group 1: \(\bar{x}_1\)
  • Sample mean of group 2: \(\bar{x}_2\)
  • Standard deviation of group 1: \(sd(x_1)\)
  • Standard deviation of group 2: \(sd(x_2)\)
  • Number of subjects in group 1: \(n_1\)
  • Number of subjects in group 2: \(n_2\)

Sampling distribution

The sampling distribution of the mean difference follows Student’s t-distribution.

Degrees of freedom

If the variance of the two groups are equivalent:

\(df = n_1 + n_2 - 2\)

If the variance of the two groups are not equivalent:

\(df = \frac{\bigg(\frac{sd_1^2}{n_1} + \frac{sd_2^2}{n_2} \bigg)^2}{ \frac{(sd_1^2/n_1)^2}{n_1 - 1} \frac{(sd_2^2/n_2)^2}{n_2 - 1} }\)

Type I error

Choose the probability of the type I error (\(\alpha\)). Common choice 5%. Less than 5% chance of getting such an extreme value by chance.

Critical values

Get critical values\(_{\alpha/2}\) and p-value from the t-distribution.

If one-tailed
Get critical value\(_{\alpha}\) and p-value from the t-distribution.

Draw conclusions

Compare test statistic (\(t\)) with the critical values\(_{\alpha/2}\) or the \(p-value\) with \(\alpha\).

If the \(t\) is in the purple area or the \(p-value\) < \(\alpha\), we reject \(H_0\).

If one-tailed
Compare test statistic with the critical value\(_{\alpha}\) or the p−value with \(\alpha\).

Application

Scenario II

Is the mean BMI of the students from group 1 different from the mean BMI of the students of group 2?

Hypothesis

\(H_0: \mu_1 = \mu_2\) or \((\mu_1 - \mu_2 = 0)\)
\(H_1: \mu_1 \neq \mu_2\) or \((\mu_1 - \mu_2 \neq 0)\)

If one-tailed
Is the mean BMI of the students from group 1 larger than the mean BMI of the students of group 2?
\(H_0:\mu_1 = \mu_2\)
\(H_1:\mu_1 > \mu_2\)

or

Is the mean BMI of the students from group 1 smaller than the mean BMI of the students of group 2?
\(H0: \mu_1 = \mu_2\)
\(H1: \mu_1 < \mu_2\)

Collect and visualize data

From a quick look at the histogram, we see that the data look roughly bell-shaped, so our assumption of a normal distribution seems reasonable.

Test statistic

Let’s assume equal variance and:

  • Sample mean of group 1: \(\bar{x}_1 = 24\)
  • Sample mean of group 2: \(\bar{x}_2 = 23\)
  • Standard deviation of group 1: \(sd(x_1) = 6\)
  • Standard deviation of group 1: \(sd(x_2) = 2\)
  • Number of subjects in group 1: \(n_1 = 50\)
  • Number of subjects in group 2: \(n_2 = 50\)

Let’s also assume that the standard deviations are statistically equal.

Where we test:
\(H_0: \frac{variance_1}{variance_2} = 1\)
\(H_1: \frac{variance_1}{variance_2} \neq 1\)

We can use the F-test to test for homogeneity in variances. \(F\) critical value: \(\frac{highest \ variance}{lowest \ variance} = 9\)
DF: \(n_1 - 1 = 50 - 1 = 49\), \(n_2 - 1 = 50 - 1 = 49\)
We assume \(\alpha = 0.05\)
We have a two-tailed test. With the help of R we obtain the p-value from the \(F\)-distribution:

2 * pf(9, df1 = 49, df2 = 49, lower.tail = FALSE)
[1] 1.8892e-12

This can be performed in R with the function var.test() as follow:

set.seed(2021)

BMI1 <- rnorm(50, 24, 6)
BMI2 <- rnorm(50, 23, 2)

var.test(BMI1, BMI2)

    F test to compare two variances

data:  BMI1 and BMI2
F = 14.796, num df = 49, denom df = 49, p-value < 2.2e-16
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
  8.396643 26.074166
sample estimates:
ratio of variances 
          14.79647 

Since the variances cannot be assumed equal, we can calculate: \(t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{sd^2(x_1)}{n_1} + \frac{sd^2(x_2)}{n_2}}} = \frac{24-23}{\sqrt{\frac{36}{50} + \frac{4}{50}}} = 1.12\)

Degrees of freedom

\(df = \frac{\bigg[\frac{sd^2(x_1)}{n_1} + \frac{sd^2(x_2)}{n_2} \bigg]^2}{ \frac{[sd^2(x_1)/n_1]^2}{n_1 - 1} + \frac{[sd^2(x_2)/n_2]^2}{n_2 - 1} } = \frac{ \bigg( \frac{36}{50} + \frac{4}{50} \bigg)^2 }{ \frac{(36/50)^2}{49} + \frac{(4/50)^2}{49} } = 59.76\)

Type I error

We assume to have \(\alpha\) = 0.05.

Critical values

With the help of R we obtain the critical values:

critical value\(_{\alpha/2}\):

qt(p = 0.05/2, df = 59.76, lower.tail = FALSE)
[1] 2.000463

-critical value\(_{\alpha/2}\):

qt(p = 0.05/2, df = 59.76, lower.tail = TRUE)
[1] -2.000463

If one-sided
critical value\(_{\alpha}\)
qt(p = 0.05, df, lower.tail = FALSE)

or

-critical value\(_{\alpha}\)
qt(p = 0.05, df, lower.tail = TRUE)

Draw conclusions

We reject the \(H_0\) if: t > critical value\(_{\alpha/2}\) or t < - critical value\(_{\alpha/2}\)
In our example we have 1.11 < 2.
Therefore, we do not reject the \(H_0\).

With the help of R we obtain the p-value from the t-distribution:

2 * pt(q = 1.12, df = 98, lower.tail = FALSE)
[1] 0.2654511

If one-tailed
We reject the \(H_0\) if: t > critical value\(_{\alpha}\)
With the help of R we obtain the p-value from the t-distribution:
pt(q = t, df, lower.tail = FALSE)

or

We reject the \(H_0\) if: t < -critical value\(_{\alpha}\)
With the help of R we obtain the p-value from the t-distribution:
pt(q = t, df, lower.tail = TRUE)

Two sample t-test (dependent samples)

For one continuous and one categorical variable: investigate the difference between the means (averages) of two dependent groups.

Assumptions

  • The dependent variable(s) must be continuous.
  • The observations are dependent.
  • The dependent variable(s) should be approximately normally distributed.
    • statistical tests such as Kolmogorov-Smirnov and Shapiro-Wilk should be used with caution (e.g., always visualize the data).
  • The dependent variable(s) should not contain any outliers.
Theory

Scenario III

Is the mean BMI of the students before the exams different from the mean BMI of the students after the exams?

Connection with linear regression

\(y_{2i} - y_{1i} = \beta_0 + \beta_1 x_i + \epsilon_i\), where \(x_i=0\)
\(H_0: \beta_0 = 0\)
\(H_1: \beta_0 \neq 0\)

Alternatively

\(H_0: \mu_1 - \mu_2 = 0\)
\(H_1: \mu_1 - \mu_2 \neq 0\)
where
\(\mu_1\) is the mean BMI of all students in group 1
\(\mu_2\) is the mean BMI of all students in group 2

It becomes a one-sample t-test on the pairwise differences.

If one-tailed
Is the mean BMI of the students before the exams larger than the mean BMI of the students after the exams?
\(H_0:\mu_1 - \mu_2 = 0\)
\(H_1:\mu_1 - \mu_2 > 0\)

or

Is the mean BMI of the students before the exams smaller than the mean BMI of the students after the exams?
\(H_0: \mu_1 - \mu_2 = 0\)
\(H_1: \mu_1 - \mu_2 < 0\)

Test statistic T

\(t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{sd(x)\sqrt{n}}\), where

  • Sample mean of group 1: \(\bar{x}_1\)
  • Sample mean of group 2: \(\bar{x}_2\)
  • Standard deviation of the difference: \(sd(x)\)
  • Number of subjects: \(n\)

Degrees of freedom

\(df = n - 1\)

Type I error

Choose the probability of the type I error (\(\alpha\)). Common choice 5%. Less than 5% chance of getting such an extreme value by chance.

Critical values

Get critical values\(_{\alpha/2}\) and p-value from the t-distribution.

If one-tailed
Get critical value\(_{\alpha}\) and p-value from the t-distribution.

Draw conclusions

Compare test statistic (\(t\)) with the critical values\(_{\alpha/2}\) or the \(p-value\) with \(\alpha\).

If the \(t\) is in the purple area or the \(p-value\) < \(\alpha\), we reject \(H_0\).

If one-tailed
Compare test statistic with the critical value\(_{alpha}\) or the p−value with \(\alpha\).

Application

Scenario III

Is the mean BMI of the students before the exams different from the mean BMI of the students after the exams?

Hypothesis

\(H_0: \mu_1 - \mu_2 = 0\)
\(H_1: \mu_1 - \mu_2 \neq 0\)

It becomes a one-sample t-test on the pairwise differences.

If one-tailed
Is the mean BMI of the students before the exams larger than the mean BMI of the students after the exams?
\(H_0:\mu_1 - \mu_2 = 0\)
\(H_1:\mu_1 - \mu_2 > 0\)

or

Is the mean BMI of the students before the exams smaller than the mean BMI of the students after the exams?
\(H_0: \mu_1 - \mu_2 = 0\)
\(H_1: \mu_1 - \mu_2 < 0\)

Collect and visualize data

From a quick look at the histogram, we see that the data look roughly bell-shaped, so our assumption of a normal distribution seems reasonable.

Test statistic

Let’s assume equal variance and:

  • Sample mean of group 1: \(\bar{x}_1 = 24\)
  • Sample mean of group 2: \(\bar{x}_2 = 25\)
  • Standard deviation of group 1: \(sd(x) = 4\)
  • Number of subjects: \(n = 50\)

\(t = \frac{\bar{x}_1 - \bar{x}_2}{sd(x)\sqrt{n}} = \frac{24-25}{4 \sqrt(50)} = -1.77\)

Degrees of freedom

\(df = n - 1 = 49\)

Type I error

We assume to have \(\alpha\) = 0.05.

Critical values

With the help of R we obtain the critical values:

critical value\(_{\alpha/2}\):

qt(p = 0.05/2, df = 49, lower.tail = FALSE)
[1] 2.009575

-critical value\(_{\alpha/2}\):

qt(p = 0.05/2, df = 49, lower.tail = TRUE)
[1] -2.009575

Draw conclusions

We reject the \(H_0\) if: t > critical value\(_{alpha/2}\) or t < - critical value\(_{alpha/2}\)
In our example we have -1.77 > -2.01.
Therefore, we do not reject the \(H_0\).

With the help of R we obtain the p-value from the t-distribution:

2 * pt(q = -1.77, df = 49, lower.tail = TRUE)
[1] 0.08294898

If one-tailed
We reject the \(H_0\) if: t > critical value\(_{\alpha}\)
With the help of R we obtain the p-value from the t-distribution:
pt(q = t, df, lower.tail = FALSE)

or

We reject the \(H_0\) if: t < -critical value\(_{\alpha}\)
With the help of R we obtain the p-value from the t-distribution:
pt(q = t, df, lower.tail = TRUE)

M sample test (Analysis of variance, F-test)

For one continuous and one categorical variable: investigate the difference between the means (averages) of more than two groups.

Assumptions

  • The variables must be continuous.
  • The data are normally distributed.
  • The dependent variable(s) should be approximately normally distributed.
    • statistical tests such as Kolmogorov-Smirnov and Shapiro-Wilk should be used with caution (e.g., always visualize the data).
  • Samples must be independent.
  • Population variances must be equal.
Theory

Scenario IV

Is the mean BMI of the students different in groups 1, 2 and 3?

Connection with linear regression

\(y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \epsilon_i\)
where \(x_{1}\) and \(x_{2}\) indicate whether the subject is in group 2 or 3. \(H_0: y_i = \beta_0\)

Alternatively

\(H_0: \mu_1 = \mu_2 = \mu_3\)
\(H_1: \mu_1 \neq \mu_2\) or \(\mu_2 \neq \mu_3\) or \(\mu_1 \neq \mu_3\)

It generalizes the t-test to more than two groups. Doing multiple two-sample t-tests would result in an increased chance of committing a Type I error.

Limitation: A one way ANOVA will tell you that at least two groups were different from each other. But it won’t tell you which groups were different. If your test returns a significant F-statistic, you may need to run an ad hoc test to tell you exactly which groups had a difference in means.

Test statistic

  • Calculate the within group variation.
  • Calculate the between group variation.
  • Calculate the F statistic: the ratio of between group variation to within group variation.

Under the null hypothesis that the group means are the same, the between-group variability will be similar to the within-group variability. If, however, there are differences between the groups, then the between-group variability will be larger than the within-group variability.

Sum of squares due to differences between groups:
\(SS_{between} = \sum_j n_j (\bar{x}_j - \bar{x})^2\),
where \(n_j\) is the number of observations in group \(j\) and \(\bar{x}_j\) is the mean value of the \(j\)th group.

Sum of squares due to variability within groups:
\(SS_{within} = \sum_j\sum_i(x_{ij} - \bar{x}_j)^2\),
where \(x_{ij}\) is the observation of the \(i\)-th subject in group \(j\).

Total sum of squares:
\(SS_{total} = SS_{between} + SS_{within} = \sum_j\sum_i(x_{ij} - \bar{x})^2 = \sum_j n_j (\bar{x}_j - \bar{x})^2 + \sum_j\sum_i(x_{ij} - \bar{x}_j)^2\)

Mean squares:
\(MS_{between} = \frac{SS_{between}}{m-1}\)
\(MS_{within} = \frac{SS_{within}}{n-m}\),
where \(m\) is the total number of groups \(n\) the total number of subjects.

\(F = \frac{MS_{between}}{MS_{within}}\)

Degrees of freedom

\(df_1 = m-1\) and \(df_2 = n-m\), where \(n\) is the total number of subjects.

Type I error

Choose the probability of the type I error (α). Common choice 5%. Less than 5% chance of getting such an extreme value by chance.

Critical value

Get the critical value for the \(\cal{F}\)-distribution.

Draw conclusions

Compare test statistic (F) with the critical value\(_{\alpha}\) or the \(p-value\) with \(\alpha\)

If F > critical value\(_{alpha}\) is in the purple area or the \(p−value < \alpha\), we reject \(H_0\).

Application

Scenario IV

Is the mean BMI of the students different in groups 1, 2 and 3?

Hypothesis

\(H_0: \mu_1 = \mu_2 = \mu_3\)
\(H_1: \mu_1 \neq \mu_2\) or \(\mu_2 \neq \mu_3\) or \(\mu_1 \neq \mu_3\)

Collect and visualize the data

From a quick look at the histograms, we see that the data look roughly bell-shaped, so our assumption of a normal distribution seems reasonable.

Test statistic

Let’s assume:

  • Sample mean of group 1: \(\bar{x}_1 = 25.1633\)
  • Sample mean of group 2: \(\bar{x}_2 = 22.8823\)
  • Sample mean of group 3: \(\bar{x}_3 = 18.1985\)
  • Number of subjects: \(n_1 = n_2 = n_3 = 50\)

\(SS_{between} = \sum_j n_j (\bar{x}_j - \bar{x})^2 = 1260.8\),

\(SS_{within} = \sum_j\sum_i(x_{ij} - \bar{x}_j)^2 = 1721.1\)

Total sum of squares:
\(SS_{total} = SS_{between} + SS_{within} = 2981.9\)

Mean squares:
\(MS_{between} = \frac{SS_{between}}{m-1} = \frac{1260.8}{3-1} = 630.4\)
\(MS_{within} = \frac{SS_{within}}{n-m}= \frac{1721.1}{150-3} = 11.7\),

\(F = \frac{MS_{between}}{MS_{within}} = \frac{630.4}{11.7} = 53.8\)

Degrees of freedom

\(df_1 = m - 1 = 3 - 1 = 2\)
\(df_2 = n - m = 150 - 3 = 147\)

Type I error

We assume to have \(\alpha\) = 0.05.

Critical values

With the help of R we obtain the critical values:

qf(p = 0.05, df1 = 2, df2 = 147, lower.tail = FALSE)
[1] 3.057621

Draw conclusions

We reject the \(H_0\) if: F > critical value\(_\alpha\)
In our example we have 53.8 > 3.06.
Therefore, we reject the \(H_0\).

With the help of R we obtain the p-value from the \(\cal{F}\)-distribution:

pf(q = 53.8, df1 = 2, df2 = 147, lower.tail = FALSE)
[1] 2.932458e-18

Comparison of mean ranks (non-parametric)

This test is the non parametric equivalent to the t-test. Compared to the t-test the wilcoxon test is robust to outliers.

One-sample Wilcoxon signed rank test

For one continuous variable: investigates the location of a population based on a sample of data.

Assumptions

  • Population distribution is symmetric.
  • The observations are independent of one another.
Theory

Scenario I

Is the median score value of the students in my university different from the median score value of all students?

What is signed rank?
Ranks are integers indicating the rank of some values. E.g. the rank of 3, -10, 16, 6, 2 is 3, 1, 5, 4, 2:

rank(c(3, -10, 16, 6, 2))
[1] 3 1 5 4 2

Signed rank is the same, but we obtain the rank according to the absolute value and we add the sign. E.g. the signed rank of 3, -10, 16, 6, 2 is 2, -4, 5, 3, 1:

signed_rank <- function(x) sign(x) * rank(abs(x))
signed_rank(c(3, -10, 16, 6, 2))
[1]  2 -4  5  3  1

Connection with linear regression

\(signed\_rank(y_i) = \beta_0 + \beta_1 x_i + \epsilon_i\) where \(x_i = 0\)
\(H_0: \beta_0 = 0\)
\(H_1: \beta_0 \neq 0\)

Alternatively

\(H_0: m = 0\)
\(H_1: m \neq 0\)
where
\(m\) is the mean of students in my university

More general

\(H_0: m = m_0\)
\(H_1: m \neq m_0\)
where
\(m_0\) is the mean of all students

If one-tailed
Is the median score value of the students in my university larger than the median score value of all students?
\(H_0:m=m_0\)
\(H_1:m>m_0\)

or

Is the median score value of the students in my university smaller than the median score value of all students?
\(H0:m=m_0\)
\(H1:m<m_0\)

Test statistic

  • Calculate the ranks of the absolute difference.
    • If there are ties you assign the average of the tied ranks.
    • If a pair of scores are equal (the same value) then they are considered tied and dropped from the analysis and the sample size is reduced.
  • Obtain the sum of those ranks where the difference was positive \(W_+ = \sum R_d^+\) or \(W_- = \sum R_d^-\). The test statistic (\(W\)) is the minimum of \(W_+\) and \(W_-\).

If one-tailed: use eitherW+ orW− for the test statistic (\(W\)) depending on the direction of the alternative hypothesis.

Sampling distribution

For large sample size, we use the normal approximation, that is, \(W\) is normally distributed.

\(\mu_W = \frac{n(n+1)}{4}\)
\(\sigma_W = \sqrt{\frac{n(n+1)(2n+1)}{24}}\)

\(z = \frac{ \mid max(W_+, W_-) - \mu_W \mid - 1/2}{\sigma_W}\)

For small sample size, we can use the exact distribution (more details in the application).

Type I error

Choose the probability of the type I error (\(\alpha\)). Common choice 5%. Less than 5% chance of getting such an extreme value by chance.

Critical value

Get critical values\(_{\alpha/2}\) and p-value either from the z-distribution or the exact distribution of W.

If one-tailed
Get critical value\(_{\alpha}\) and p-value either from the z-distribution or the exact distribution of W.

Draw conclusions

Compare test statistic with the critical values\(_{\alpha/2}\) or the p−value with \(\alpha\).

If one-tailed
Compare test statistic with the critical value\(_{\alpha}\) or the p−value with \(\alpha\).

Application

Scenario I

Is the median score value of the students in my university different from the median score value of all students?

Hypothesis

\(H_0: m = m_0\)
\(H_1: m \neq m_0\)

If one-tailed
Is the median score value of the students in my university larger than the median score value of all students?
\(H_0:m=m_0\)
\(H_1:m>m_0\)

or

Is the median score value of the students in my university smaller than the median score value of all students?
\(H0:m=m_0\)
\(H1:m<m_0\)

Collect the data

x m_0 Difference |Difference| rank
9.75508 10 -0.244920 0.244920 1
11.10491 10 1.104913 1.104913 3
10.69730 10 0.697299 0.697299 2

Test statistic

\(W_- = 1\) and \(W_+ = 5\)

Type I error

We assume to have \(\alpha\) = 0.05

Exact distribution

Suppose we have 3 observations with no ties. We have \(0, \dots, 6\) possible values for \(W\). Now, each of the three data points would be assigned a rank of either \(1, 2,\) or \(3\). Depending on whether the data point fell above or below the hypothesized median, each of the three possible ranks \(1, 2,\) or \(3\) would remain either a positive signed rank or become a negative.

W Probability
1 2 3 6 0.125
-1 2 3 5 0.125
1 -2 3 4 0.125
1 2 -3 3 0.250
-1 -2 3 3 0.250
-1 2 -3 2 0.125
1 -2 -3 1 0.125
-1 -2 -3 0 0.125

Obtain the probabilities:

W Probability
0 0.125
1 0.125
2 0.125
3 0.250
4 0.125
5 0.125
6 0.125

Critical values

With the help of R we obtain the critical values:

low critical value\(_{\alpha/2}\):

qsignrank(p = 0.05/2, n = 3, lower.tail = TRUE)
[1] 0

high critical value\(_{\alpha/2}\):

qsignrank(p = 0.05/2, n = 3, lower.tail = FALSE)
[1] 6

If one-tailed
low critical value\(_{\alpha}\):
qsignrank(p = 0.05, n = n, lower.tail = TRUE)

or

high critical value\(_{\alpha}\):
qsignrank(p = 0.05, n = n, lower.tail = FALSE)

Draw conclusions

if: \(W >\) high critical value\(_{\alpha/2}\) or \(W <\) low critical value\(_{\alpha/2}\).
In our example we have \(6 > 5\) and \(0 < 1\).
Therefore, we do not reject the \(H_0\).

\(p-value = 2 * Pr(W <= 1) = 2 * (0.125 + 0.125) = 0.5\)
or
\(p-value = 2 * Pr(W >= 5) = 2 * (0.125 + 0.125) = 0.5\)

With the help of R we obtain the p-value from the exact distribution:

\(p-value = 2 * Pr(W <= 1):\)

2 * psignrank(q = 1, n = 3, lower.tail = TRUE)
[1] 0.5

or

\(p-value = 2 * Pr(W >= 5) = 2 * (1 - Pr(W < 5)):\)

2 * (1 - psignrank(q = 5 - 1, n = 3, lower.tail = TRUE))
[1] 0.5
2 * psignrank(q = 5 - 1, n = 3, lower.tail = FALSE)
[1] 0.5

If one-tailed
With the help of R we obtain the p-value from the exact distribution:
psignrank(q = W, n = n, lower.tail = TRUE)

or

With the help of R we obtain the p-value from the exact distribution:
1 - psignrank(q = W - 1, n = n, lower.tail = TRUE)
or
psignrank(q = W - 1, n = n, lower.tail = FALSE)

Two-sample Wilcoxon rank sum test

For one continuous and one categorical variable: investigate whether the two populations (of two groups) are equal.

Assumptions

  • Population distribution is symmetric.
  • The observations are independent of one another.
Theory

Scenario II

Is the distribution of the score values of the students in my university different from the distribution of the score values of the students from a different university?

Connection with linear regression

\(rank(y_i) = \beta_0 + \beta_1 x_i + \epsilon_i\)
\(H_0: \beta_1 = 0\)
\(H_1: \beta_1 \neq 0\)

Alternatively

\(H_0\): the distributions of both populations are equal
\(H_1\): the distributions are not equal

If one-tailed
Do the score values of the students in my university tend to be larger than the score values of the students from a different university?
\(H_0\): the distributions of both populations are equal
\(H_1\): the distributions of the population of my university is larger than the population of the other university

or

Do the score values of the students in my university tend to be smaller than the score values of the students from a different university?
\(H_0\): the distributions of both populations are equal
\(H_1\): the distributions of the population of my university is smaller than the population of the other university

Test statistic

  • Calculate the ranks for the two groups (\(r_1\) and \(r_2\)).
  • Obtain the sum of those ranks \(R_1=\sum r_1\) or negative \(R_2=\sum r_2\).
  • Calculate \(U_1=n_1n_2+ \frac{n_1(n_1+1)}{2}−R_1\) and \(U_2=n_1n_2+ \frac{n_2(n_2+1)}{2}−R_2\)
  • The test statistic (\(U\)) is the minimum of \(U_1\) and \(U_2\).

Sampling disctribution

For large sample size, we can use the normal approximation, that is, \(U\) is normally distributed.
\(\mu_U=\frac{n_1n_2}{2}\)
\(\sigma_U = \sqrt{ \frac{n_1n_2(n_2 + n_1 + 1)}{12}}\)
The formula for the standard deviation is more complicated in the presence of tied ranks. If there are ties in ranks, we should use:

\(\sigma_u = \sqrt{ \frac{n_1n_2}{12}\bigg[(n+1) - \sum_{i=1}^K \frac{t_i^2-t_i}{n(n-1)}\bigg]}\)
where \(n=n_1+n_2\) and \(t_i\) is the number of subjects sharing the rank \(i\). K is the number of ranks.

\(z = \frac{|max(U_1, U_2) - \mu_U| - 1/2}{\sigma_U}\)

For small sample size, we can use the exact distribution.

Type I error
Choose the probability of the type I error (\(\alpha\)). Common choice 5%. Less than 5% chance of getting such an extreme value by chance.

Critical value
Get critical values\(_{\alpha/2}\) and p-value either from the z-distribution or the exact distribution of \(U\).

If one-tailed
Get critical value\(_{\alpha}\) and p-value either from the z-distribution or the exact distribution of \(U\).

Draw conclusions
Compare test statistic with the critical values\(_{\alpha/2}\) or the p−value with \(\alpha\).

If one-tailed
Compare test statistic with the critical value\(_{\alpha}\) or the p−value with \(\alpha\).

Application

Scenario II

Is the distribution of the score values of the students in my university different from the distribution of the score values of the students from a different university?

Hypothesis

\(H_0\): the distributions of both populations are equal
\(H_1\): the distributions are not equal

If one-tailed
Do the score values of the students in my university tend to be larger than the score values of the students from a different university?
\(H_0\): the distributions of both populations are equal
\(H_1\): the distributions of the population of my university is larger than the population of the other university

or

Do the score values of the students in my university tend to be smaller than the score values of the students from a different university?
\(H_0\): the distributions of both populations are equal
\(H_1\): the distributions of the population of my university is smaller than the population of the other university

Collect the data

variable value rank
x 9.75508 1
x 11.10491 4
x 10.69730 3
y 14.71926 5
y 15.79611 6
y 10.15486 2

Test statistic

\(R_1=1+4+3=8\)
\(U_1= n_1n_2+ \frac{n_1(n_1+1)}{2}-R_1=7\)
\(R_2=5+6+2=13\)
\(U_2 = n_1n_2+ \frac{n_2(n_2+1)}{2}-R_2=2\)

Type I error

We assume to have \(\alpha\) = 0.05

Critical values

With the help of R we obtain the critical values:

low critical value\(_{\alpha/2}\):

qwilcox(p = 0.05/2, m = 3, n = 3, lower.tail = TRUE) 
[1] 0

high critical value\(_{\alpha/2}\):

qwilcox(p = 0.05/2, m = 3, n = 3, lower.tail = FALSE) 
[1] 9

If one-tailed
low critical value\(_{\alpha}\):
qwilcox(p = 0.05, m = m, n = n, lower.tail = TRUE)

or

high critical value\(_{\alpha}\):
qwilcox(p = 0.05 - 1, m = m, n = n, lower.tail = FALSE)

Draw conclusion

We reject the \(H_0\) if: \(U_1 >\) high critical value\(_{\alpha/2}\) and \(U_2 <\) low critical value\(_{\alpha/2}\).
in our example we have \(9 > 7\) and \(0 < 2\).
Therefore, we do not reject the \(H_0\).

With the help of R we obtain the p-value from the exact distribution:

\(P-value = Pr(U <= 2)\)

2 * pwilcox(q = 2, m = 3, n = 3, lower.tail = TRUE) 
[1] 0.4

or

\(P-value = Pr(U >= 7) = 1 - Pr(U < 7)\)

2 * (1 - pwilcox(q = 7 - 1, m = 3, n = 3, lower.tail = TRUE))
[1] 0.4
2 * pwilcox(q = 7 - 1, m = 3, n = 3, lower.tail = FALSE)
[1] 0.4

If one-tailed
With the help of R we obtain the p-value from the exact distribution:
\(P-value = Pr[U <= min(U_1,U_2)]\)
pwilcox(q = min(U_1,U_2), m = n_1, n = n_2)

or

With the help of R we obtain the p-value from the exact distribution:
\(P-value = 1-Pr[U >= max(U_1,U_2)] = Pr[U < max(U_1,U_2)]\)
1 - pwilcox(q = max(U_1,U_2) - 1, m = n_1, n = n_2)

Two-sample Wilcoxon signed-rank test

For one continuous and one categorical variable: investigates the differences in the locations of two populations.

Assumptions

  • Population distribution is symmetric.
  • The observations are dependent.
Theory

Scenario III

Is the median score value of the students in my university different this year compared to next year?

Connection with linear regression

\(signed\_rank(y_2−y_1) = \beta_0+\beta_1x_i + \epsilon_i\) where \(x_i=0\)
\(H_0: \beta_0 = 0\)
\(H_1: \beta_0 \neq 0\)

Alternatively

\(H_0: m_1 = m_2\)
\(H_1: m_1 \neq m_2\)
where
\(m_1\) is the median of students in my university this year and \(m_2\) is the median of students in my university last year

If one-tailed
Is the median score value of the students in my university larger this year compared to next year?
\(H_0: m_1 = m_2\)
\(H_1: m_1 > m_2\)

or

Is the median score value of the students in my university smaller this year compared to next year?
\(H_0: m_1 = m_2\)
\(H_1: m_1 < m_2\)

Test statistic

  • Calculate the ranks of the absolute difference.
    • If there are ties you assign the average of the tied ranks.
    • If a pair of scores are equal (the same value) then they are considered tied and dropped from the analysis and the sample size is reduced.
  • Obtain the sum of those ranks where the difference was positive \(W_+=\sum R^+_d\) or negative \(W_−=\sum R^−_d\). The test statistic (\(W\)) is the minimum of \(W_+\) and \(W_−\).

Sampling distribution

For large sample size, we can use the normal approximation, that is, \(W\) is normally distributed.

\(\mu_W = \frac{n(n+1)}{4}\)
\(\sigma_w = \sqrt{ \frac{n(n+1)(2n+1)}{24} }\)
\(z = \frac{ |max(W_+, W_-) - \mu_W | - 1/2}{\sigma_W}\)

For small sample size, we can use the exact distribution (more details in the application).

Critical value

Get critical values\(_{\alpha/2}\) and p-value either from the z-distribution or the exact distribution of \(W\).

If one-tailed
Get critical values\(_{\alpha}\) and p-value either from the z-distribution or the exact distribution of \(W\).

Draw conclusions
Compare test statistic with the critical values\(_{\alpha/2}\) or the p−value with \(\alpha\).

If one-tailed
Compare test statistic with the critical value\(_{\alpha}\) or the p−value with \(\alpha\).

Application

Scenario III

Is the median score value of the students in my university different this year compared to next year?

Hypothesis

\(H_0: m_1 = m_2\)
\(H_1: m_1 \neq m_2\)
where
\(m_1\) is the median of students in my university this year and \(m_2\) is the median of students in my university last year

If one-tailed
Is the median score value of the students in my university larger this year compared to next year? ?
\(H_0: m_1 = m_2\)
\(H_1: m_1 > m_2\)

or

Is the median score value of the students in my university smaller this year compared to next year? ?
\(H_0: m_1 = m_2\)
\(H_1: m_1 < m_2\)

Collect the data

x m_0 Difference |Difference| rank
5 10 -5 5 3
4 5 -1 1 1
4 8 -4 4 2

Test statistic

\(W_- = 6\) and \(W_+ = 0\)

Type I error

We assume to have \(\alpha\) = 0.05

Exact distribution

Suppose we have 3 observations with no ties. We have \(0, \dots, 6\) possible values for \(W\). Now, each of the three data points would be assigned a rank of either \(1, 2,\) or \(3\). Depending on whether the data point fell above or below the hypothesized median, each of the three possible ranks \(1, 2,\) or \(3\) would remain either a positive signed rank or become a negative.

W Probability
1 2 3 6 0.125
-1 2 3 5 0.125
1 -2 3 4 0.125
1 2 -3 3 0.250
-1 -2 3 3 0.250
-1 2 -3 2 0.125
1 -2 -3 1 0.125
-1 -2 -3 0 0.125

Obtain the probabilities:

W Probability
0 0.125
1 0.125
2 0.125
3 0.250
4 0.125
5 0.125
6 0.125

Critical values

With the help of R we obtain the critical values:

low critical value\(_{\alpha/2}\):

qsignrank(p = 0.05/2, n = 3, lower.tail = TRUE)
[1] 0

high critical value\(_{\alpha/2}\):

qsignrank(p = 0.05/2, n = 3, lower.tail = FALSE)
[1] 6

If one-tailed
low critical value\(_{\alpha}\):
qsignrank(p = 0.05, n = n, lower.tail = TRUE)

or

high critical value\(_{\alpha}\):
qsignrank(p = 0.05, n = n, lower.tail = FALSE)

Draw conclusions

We reject the \(H_0\) if: \(W >\) high critical value\(_{\alpha/2}\) or \(W <\) low critical value\(_{\alpha/2}\)
In our example we have \(6 = 6\) and \(0 = 0\).
Therefore, we do not reject the \(H_0\).

\(p-value = 2 * Pr(W <= 0) = 2 * 0.125 = 0.25\)
or
\(p-value = 2 * Pr(W >= 6) = 2 * 0.125 = 0.25\)

With the help of R we obtain the p-value from the exact distribution:

\(p-value = 2 * Pr(W <= 0):\)

2 * psignrank(q = 0, n = 3)
[1] 0.25

or

\(p-value = 2 * Pr(W >= 6) - 2 * (1-Pr(W<6)):\)

2 * (1 - psignrank(q = 6 - 1, n = 3, lower.tail = TRUE))
[1] 0.25
2 * psignrank(q = 6 - 1, n = 3, lower.tail = FALSE)
[1] 0.25

If one-tailed
With the help of R we obtain the p-value from the exact distribution:
psignrank(q = W, n = n, lower.tail = TRUE)

or

With the help of R we obtain the p-value from the exact distribution:
1 - psignrank(q = W - 1, n = n)
psignrank(q = W - 1, n = n, lower.tail = FALSE)

M-sample Kruskal-Wallis test

For one continuous and one categorical variable: investigate whether samples originate from the same distribution.

Assumptions

  • Within and between groups observations are independent of one another.
Theory

Scenario IV

Is the distribution of the score values of the students different in groups 1, 2 and 3?

Connection with linear regression

\(rank(y_i) = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \epsilon_i\)
where \(x_{1i}\) and \(x_{2i}\) indicates whether the subject is in group 2 or 3. \(H_0: y = \beta_0\)

Alternatively

\(H_0:\) the samples (groups) are from identical populations
\(H_1:\) at least one of the samples (groups) comes from a different population than the others

Test statistic

  • Rank all data from all groups together; i.e., rank the data from 1 to N ignoring groups. Assign any tied values the average of the ranks they would have received had they not been tied.

  • Calculate the test statistic

    • if we do not have ties:
      \(H = \frac{12}{n(n + 1)} \sum_{j} n_j \bar{r_j}^2 - 3 (n+1)\),
      where

      • \(n\) is the total number of observations in all groups
      • \(n_j\) is the number of observations in group \(j\)
      • \(\bar{r_j} = \frac{\sum_{i=1}^{n_j}r_{ij}}{n_j}\) is the average rank of all observations in group \(j\)
        • \(r_{ij}\) is the rank (among all observations) of observation \(i\) from group \(j\)
    • if we do have ties:
      \(H = \frac{H}{1- \frac{\sum_{j}(T_j^3 - T_j)}{(n^3 - n)}}\),
      where

      • \(T_j\) is the number of tied values in group \(j\)


For small sample sizes, the exact distribution should be used.

Type I error

Choose the probability of the type I error (\(\alpha\)). Common choice 5%. Less than 5% chance of getting such an extreme value by chance.

Degrees of freedom

\(df = m-1\),
where \(m\) is the total number of groups.

Critical value

Get critical values\(_{\alpha}\) and p-value either from the \(\chi^2\)-distribution or the exact distribution.

Draw conclusions
Compare test statistic with the critical values\(_{\alpha}\) or the p−value with \(\alpha\).

Application

Scenario IV

Is the distribution of the score values of the students different in groups 1, 2 and 3?

Hypothesis

\(H_0:\) the samples (groups) are from identical populations
\(H_1:\) at least one of the samples (groups) comes from a different population than the others

Collect the data

Groups values rank mean rank per group
1 8.88 5 9.17
1 9.54 6 9.17
1 13.12 13 9.17
1 10.14 8 9.17
1 10.26 9 9.17
1 13.43 14 9.17
2 9.92 7 3.25
2 6.47 1 3.25
2 7.63 2 3.25
2 8.11 3 3.25
3 15.90 15 10.40
3 12.44 11 10.40
3 12.60 12 10.40
3 11.44 10 10.40
3 8.78 4 10.40

Test statistic

No ties:

\(H = \frac{12}{n(n + 1)} \sum_{j} n_j \bar{r_j}^2 - 3 (n+1) = \frac{12}{15(15+1)} * (9.17^2 * 6 + 3.25^2 * 4 + 10.40^2 * 5) - 3* (15+1) = 6.38\)

Degrees of freedom

df = number of group \(− 1 = 3 - 1 = 2\)

Type I error

We assume to have α = 0.05.

Critical values

With the help of R we obtain the critical values:

critical value\({_\alpha}\):

qchisq(p = 0.05, df = 2, lower.tail = FALSE)
[1] 5.991465

Draw conclusions

We reject the \(H_0\) if: H > critical value\(_{\alpha}\)
in our example we have 6.38 > 5.99.
Therefore, we reject the \(H_0\).

With the help of R we obtain the p-value from the \(\chi^2\)-distribution:

pchisq(q = 6.38, df = 2, lower.tail = FALSE)
## [1] 0.04117187

Categorical data

Chi-square test

For two categorical variables: statistical significance test used in the analysis of contingency tables.
1) Two variables are related or independent
2) Goodness-of-fit between observed distribution and theoretical distribution of frequencies

Assumptions

  • The study groups must be independent.
  • There are 2 variables, and both are measured as categories, usually at the nominal level. However, data may be ordinal data.
  • The levels (or categories) of the variables are mutually exclusive.

Theory

Scenario I

Is there a relationship between gender and whether or not someone followed online course? (test whether two variables are related or independent)

\(H_0:\) there is not association between gender and whether someone followed the online course
\(H_1:\) there is an association between gender and whether someone followed the online course

If a chi-square goodness of fit test is performed then:
The null and alternative hypotheses for our goodness of fit test reflect the assumption that we are making about the population. E.g. Groups (of males/female or yes-online/no-online) occur in equal proportions.

Connection with linear regression

Let’s assume a 2x2 table with variable \(A\) consisting of the categories \(i\) and variable \(B\) consisting of the categories \(j\). A multiplicative model that reproduces the cell frequencies exactly is:

\(n_{ij} = N *\alpha_i * \beta_j * \alpha\beta_{ij}\)

Each cell count (\(n_{ij}\)) can be obtained as the product of the overall number of observations (\(N\)), the main effect of variable \(A\) at category \(i\) (\(\alpha_i = n_{A_i}/N\)), the main effect of variable \(B\) at category \(j\) (\(\beta_j = n_{B_j}/N\)) and the interaction between the two variables (\(\alpha\beta_{ij} = n_{ij}/n_{A_i}/n_{B_j} * N\)).

For example,

Yes: online course No: online course Sum
Male 33 14 47
Female 29 24 53
Sum 62 38 100

\(n_{11} = 100 * (47/100) * (62/100) * (33/47/62*100) = 33\)

Because of its multiplicative form, the above model is difficult to work with. We, therefore, take the logarithm of both sides, we can rewrite it as:

\(log(n_{ij}) = log(N) + log(\alpha_i) + log(\beta_j) + \log(\alpha\beta_{ij})\)

Which is a log-linear model.

Test statistic

  • In order to compute the chi-square test statistic we must know the observed and expected values.
  • The test statistic is:  \(X^2 = \sum_{i=1}^K \frac{(O_i-E_i)^2}{E_i}\),
    where \(K\) are the contingency table cells, \(O\) is the observed value and \(E\) the expected value.

When the values in the contingency table are fairly small a “correction for continuity” known as the “Yates’ correction” may be applied to the test statistic. \(X^2 = \sum_{i=1}^K \frac{(|O_i-E_i| - 1/2)^2}{E_i}\)

Degrees of freedom

df = (number of rows − 1) * (number of columns − 1)

If a chi-square goodness of fit test is performed then:
df = number of categories - 1

Type I error

Choose the probability of the type I error (\(\alpha\)). Common choice 5%. Less than 5% chance of getting such an extreme value by chance.

Critical value

Get critical values\(_{\alpha}\) and p-value from the \(\chi^2\)-distribution.

Draw conclusions

Compare test statistic with the critical values\(_{\alpha}\) or the p−value with \(\alpha\).

Application

Scenario I

Is there a relationship between gender and whether or not someone followed online course? (test whether two variables are related or independent)

Hypothesis

\(H_0:\) there is not association between gender and whether someone followed the online course
\(H_1:\) there is an association between gender and whether someone followed the online course

Collect the data

Observed:

Yes: online course No: online course Sum
Male 33 14 47
Female 29 24 53
Sum 62 38 100

Expected:

For each cell we calculate:

(total number of obs for the row) * (total number of obs for the column) / (total number of obs)

Yes: online course No: online course
Male 29.1 17.9
Female 32.9 20.1

Test statistic

\(X^2 = \sum_{i=1}^K \frac{(O_i-E_i)^2}{E_i} = \frac{(33-29.1)^2}{29.1} + \frac{(14-17.9)^2}{17.9} + \frac{(29-32.9)^2}{32.9} + \frac{(24-20.1)^2}{20.1} = 2.59\)

Degrees of freedom

df = (number of rows − 1) * (number of columns − 1) = \((2-1) * (2-1) = 1\)

Type I error

We assume to have \(\alpha\) = 0.05

Critical value

With the help of R we obtain the critical values: critical value\(_{\alpha}\):

qchisq(p = 0.05, df = 1, lower.tail = FALSE)
[1] 3.841459

Draw conclusions

We reject the H0 if: \(X^2\) > critical value\(_{\alpha}\)
In our example we have 2.59 < 3.84.
Therefore, we do not reject the \(H_0\).

With the help of R we obtain the p-value from the \(\chi^2\)-distribution:

pchisq(q = 2.59, df = 1, lower.tail = FALSE)
[1] 0.1075403

Fisher exact test

For two categorical variables: statistical significance test used in the analysis of contingency tables.

Assumptions

  • The study groups must be independent.
  • The variables should be dichotomous.
  • Fisher’s test requires the rare condition that both row and column marginal totals are fixed in advance.

Theory

Scenario I

Is there a relationship between gender and whether or not someone followed online course? (test whether two variables are related or independent)

Yes: online course No: online course Total
Male O11 O12 TotalR1
Female O21 O22 TotalR2
Total TotalC1 TotalC2 Total

The test Fisher treats assumes that both the row and column totals (TotalR1, TotalR2, TotalC1 and TotalC2) are known, fixed quantities. Then it calculates the probability that we would have obtained the observed frequencies that we did (O11, O12, O21 and O22) given those totals.

If we assume the marginal totals as given, the value of \(O11\) determines the other three cell counts. Assuming fixed marginals, the distribution of the four cell counts follows the hypergeometric distribution, e.g for \(O11\):

\(Pr(O11)=\frac{( {TotalR1 \atop O11} ) ( {TotalR2 \atop O21} )}{( {Total \atop TotalC1} )} = \frac{\frac{TotalR1!}{O11!O12!} \frac{TotalR2!}{O21!O22!} }{\frac{N!}{TotalC1!TotalC2!}}= \frac{TotalR1!TotalR2!TotalC1!TotalC2!}{Total!O11!O12!O21!O22!}\),
where \(({TotalR1 \atop O11}) = \frac{TotalR1!}{O11!(TotalR1-O11)!}\) and \(!\) denotes the factorial, e.g:
\(N! = N (N-1) (N-2) (N-3) ... 1\)

  • For all possible tables (given that TotalR1, TotalR2, TotalC1 and TotalC2 are fixed), calculate the relevant hypergeometric probability.
  • The p-value of independence in the 2x2 table is the sum of hypergeometric probabilities for outcomes at least as favourable to the alternative hypothesis as the observed outcome.

Since the number of possible tables can be very large, we often must resort to computer simulation.

Type I error

Choose the probability of the type I error (\(\alpha\)). Common choice 5%. Less than 5% chance of getting such an extreme value by chance.

Application

Scenario I

Is there a relationship between gender and whether or not someone followed online course? (test whether two variables are related or independent)

Yes: online course No: online course Sum
Male 1 3 4
Female 3 1 4
Sum 4 4 8

For this table:

\(p = \frac{TotalR1!TotalR2!TotalC1!TotalC2!}{Total!O11!O12!O21!O22!} = \frac{4!4!4!4!}{8!1!3!1!3!} = 0.2285714\)

Other alternatives:

Yes: online course No: online course Sum
Male 0 4 4
Female 4 0 4
Sum 4 4 8

\(p = \frac{TotalR1!TotalR2!TotalC1!TotalC2!}{Total!O11!O12!O21!O22!} = \frac{4!4!4!4!}{8!0!4!0!4!} = 0.01428571\)

Yes: online course No: online course Sum
Male 2 2 4
Female 2 2 4
Sum 4 4 8

\(p = \frac{TotalR1!TotalR2!TotalC1!TotalC2!}{Total!O11!O12!O21!O22!} = \frac{4!4!4!4!}{8!2!2!2!2!} = 0.5142857\)

Yes: online course No: online course Sum
Male 3 1 4
Female 1 3 4
Sum 4 4 8

\(p = \frac{TotalR1!TotalR2!TotalC1!TotalC2!}{Total!O11!O12!O21!O22!} = \frac{4!4!4!4!}{8!3!1!3!1!} = 0.2285714\)

Yes: online course No: online course Sum
Male 4 0 4
Female 0 4 4
Sum 4 4 8

\(p = \frac{TotalR1!TotalR2!TotalC1!TotalC2!}{Total!O11!O12!O21!O22!} = \frac{4!4!4!4!}{8!4!0!4!0!} = 0.01428571\)

For one-tailed
Find extreme cases from the same direction as our data: \(0.2285714 + 0.01428571 = 0.243\)

or

Find extreme cases from the other direction as our data: \(0.2285714 + 0.5142857 + 0.2285714 + 0.01428571 =0.986\)

For a two-tailed test we must also consider tables that are equally extreme, but in both direction. This is challenging, therefore we sum the probabilities that are equal or less than that from the observed data: \(0.2285714 + 0.01428571 + 0.2285714 + 0.01428571 = 0.486\)

Type I error

We assume to have \(\alpha\) = 0.05

Draw conclusions

The \(H_0\) is not rejected.

z-test for proportions

One sample test

For one categorical variable: assesses whether or not a sample from a population represents the true proportion of the entire population.

Assumptions

  • The observations are independent of one another.
  • The expected counts of successes and failures are both sufficiently large.
    \(np >10\) and \(n(1-p)>10\), where \(n\) is the number of observations and \(p\) the proportion.
Theory

Scenario I

Is the probability of being diagnosed with asthma different than it was 50 years ago?

\(H_0: \pi = \pi_0\)
\(H_1: \pi \neq \pi_0\)
where
\(pi_0\) is the probability of being diagnosed with asthma 50 years ago.

If one-tailed
Is the probability of being diagnosed with asthma higher than it was 50 years ago?
\(H_0: \pi=\pi_0\)
\(H_1: \pi>\pi_0\)

or

Is the probability of being diagnosed with asthma lower than it was 50 years ago?
\(H_0:\pi=\pi_0\)
\(H_1:\pi<\pi_0\)

Test statistic

For large sample sizes, the distribution of the test statistic is approximately normal.

\(z = \frac{p-\pi_0}{\sqrt{\frac{\pi_0(1-\pi_0)}{n}}}\)

  • Sample proportion: \(p\)
  • Population proportion: \(\pi_0\)
  • Number of subjects: \(n\)

If continuity correction is applied: \(z = \frac{p-\pi_0 + c}{\sqrt{\frac{\pi_0(1-\pi_0)}{n}}}\),

where

  • \(c = -\frac{1}{2n}\) if \(p > \pi_0\)
  • \(c = \frac{1}{2n}\) if \(p < \pi_0\)
  • \(c =0\) if \(|p-\pi_0| < \frac{1}{2n}\)

Sampling distribution

The sampling distribution follows the normal distribution.

Type I error

Choose the probability of the type I error (\(\alpha\)). Common choice 5%. Less than 5% chance of getting such an extreme value by chance.

Critical values

Get critical values\(_{\alpha/2}\) and p-value from the normal-distribution.

If one-tailed
Get critical value\(_{\alpha}\) and p-value from the normal-distribution.

Draw conclusions

Compare test statistic (\(z\)) with the critical values\(_{\alpha/2}\) or the \(p-value\) with \(\alpha\).

If the \(z\) > critical values\(_{\alpha/2}\) or \(z\) < -critical values\(_{\alpha/2}\) or the \(p-value\) < \(\alpha\), we reject \(H_0\).

If one-tailed
Compare test statistic with the critical value\(_{alpha}\) or the p−value with \(\alpha\).

Application

Scenario I

Is the probability of being diagnosed with asthma different than it was 50 years ago?

Hypothesis

\(H_0: \pi = \pi_0\)
\(H_1: \pi \neq \pi_0\)

Collect the data

x Freq
No 47
Yes 53

Last year we had \(p_0 = 0.6\)

Test statistic

(with no continuity correction):
\(z = \frac{p-\pi_0}{\sqrt{\frac{\pi_0(1-\pi_0)}{n}}} = \frac{0.53-0.6}{\sqrt{\frac{0.6(1-0.6)}{100}}}=-1.43\)

Type I error

We assume to have \(\alpha\) = 0.05

Critical values

With the help of R we obtain the critical values:

critical value\(_{\alpha/2}\):

qnorm(p = 0.05/2, lower.tail = FALSE)
[1] 1.959964

-critical value\(_{\alpha/2}\):

qnorm(p = 0.05/2, lower.tail = TRUE)
[1] -1.959964

If one-sided
critical value\(_{\alpha}\)
qnorm(p = 0.05, lower.tail = FALSE)

or

-critical value\(_{\alpha}\)
qnorm(p = 0.05, lower.tail = TRUE)

Draw conclusions

We reject the \(H_0\) if: z > critical value\(_{\alpha/2}\) or z < - critical value\(_{\alpha/2}\)
In our example we have -1.43 > -1.96.
Therefore, we do not reject the \(H_0\).

With the help of R we obtain the p-value from the normal-distribution:

2 * pnorm(q = -1.43, lower.tail = TRUE)
[1] 0.152717

If one-tailed
We reject the \(H_0\) if: z > critical value\(_{\alpha}\)
With the help of R we obtain the p-value from the t-distribution:
pnorm(q = z, lower.tail = FALSE)

or

We reject the \(H_0\) if: z < -critical value\(_{\alpha}\)
With the help of R we obtain the p-value from the t-distribution:
pnorm(q = z, lower.tail = TRUE)

Two sample test

For two categorical variables: compares the proportion of two different populations.

Assumptions

  • The observations are independent of one another.
  • The expected counts of successes and failures are both sufficiently large.
    \(np >10\) and \(n(1-p)>10\), where \(n\) is the number of observations and \(p\) the proportion.
Theory

Scenario II

Is the probability of being diagnosed with asthma in the Netherlands different than in Belgium?

\(H_0: \pi_1 = \pi_2\)
\(H_1: \pi_1 \neq \pi_2\)
where
\(\pi_1\) is the probability of being diagnosed with asthma in the Netherlands and \(p_2\) is the probability of being diagnosed with asthma in Belgium.

If one-tailed
Is the probability of being diagnosed with asthma higher than in Belgium?
\(H_0:\pi_1=\pi_2\)
\(H_1:\pi_1>\pi_2\)

or

Is the probability of being diagnosed with asthma lower than in Belgium?
\(H_0:\pi_1=\pi_2\)
\(H_1:\pi_1<\pi_2\)

Test statistic

For large sample sizes, the distribution of the test statistic is approximately normal.

Pooled version:

\(z = \frac{(p_1-p_2) - 0}{\sqrt{p(1-p)\big(\frac{1}{n_1} + \frac{1}{n_2}\big)}}\)

Unpooled version:

\(z = \frac{(p_1-p_2) - 0}{\sqrt{ \frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2} }}\)

  • Sample proportion of group 1: \(p_1\)
  • Sample proportion of group 2: \(p_2\)
  • Number of subjects: \(n\)
  • Total proportion: \(p = \frac{n_1p_1 + n_2p_2}{n_1 + n_2}\)

The pooling refers to the way in which the standard error is estimated. In the pooled version, the two proportions are averaged, and only one proportion is used to estimate the standard error. In the unpooled version, the two proportions are used separately.

If continuity correction is applied.

Pooled version:

\(z = \frac{(p_1-p_2) + \frac{F}{2}\big(\frac{1}{n_1} + \frac{1}{n_2}\big)}{ \sqrt{p(1-p)\big(\frac{1}{n_1} + \frac{1}{n_2}\big)}}\)

Unpooled version:

\(z = \frac{(p_1-p_2) + \frac{F}{2}\big(\frac{1}{n_1} + \frac{1}{n_2}\big)}{\sqrt{ \frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2} }}\)

where

  • \(F = -1\) if \(p_1 > p_2\)
  • \(F = 1\) if \(p_1 < p_2\)

Sampling distribution

The sampling distribution follows the normal distribution.

Type I error

Choose the probability of the type I error (\(\alpha\)). Common choice 5%. Less than 5% chance of getting such an extreme value by chance.

Critical values

Get critical values\(_{\alpha/2}\) and p-value from the normal-distribution.

If one-tailed
Get critical value\(_{\alpha}\) and p-value from the normal-distribution.

Draw conclusions

Compare test statistic (\(z\)) with the critical values\(_{\alpha/2}\) or the \(p-value\) with \(\alpha\).

If the \(z\) > critical values\(_{\alpha/2}\) or \(z\) < -critical values\(_{\alpha/2}\) or the \(p-value\) < \(\alpha\), we reject \(H_0\).

If one-tailed
Compare test statistic with the critical value\(_{\alpha}\) or the p−value with \(\alpha\).

Application

Scenario II

Is the probability of being diagnosed with asthma in the Netherlands different than in Belgium?

Hypothesis

\(H_0: \pi_1 = \pi_2\)
\(H_1: \pi_1 \neq \pi_2\)

Collect the data

the Netherlands
x1 Freq
No 47
Yes 53
Belgium
x2 Freq
No 62
Yes 38

Test statistic

(with no continuity correction and pooled version):

\(p = \frac{n_1p_1 + n_2p_2}{n_1 + n_2} = \frac{100 \ 0.53 + 100\ 0.38}{100 + 100} = 0.46\)

\(z = \frac{(p_1-p_2) - 0}{\sqrt{p(1-p)\big(\frac{1}{n_1} + \frac{1}{n_2}\big)}} = \frac{0.53-0.38}{\sqrt{0.46(1-0.46) \big( \frac{1}{100} + \frac{1}{100} \big)}} = 2.13\)

Type I error

We assume to have \(\alpha\) = 0.05

Critical values

With the help of R we obtain the critical values:

critical value\(_{\alpha/2}\):

qnorm(p = 0.05/2, lower.tail = FALSE)
[1] 1.959964

-critical value\(_{\alpha/2}\):

qnorm(p = 0.05/2, lower.tail = TRUE)
[1] -1.959964

If one-sided
critical value\(_{\alpha}\)
qnorm(p = 0.05, lower.tail = FALSE)

or

-critical value\(_{\alpha}\)
qnorm(p = 0.05, lower.tail = TRUE)

Draw conclusions

We reject the \(H_0\) if: z > critical value\(_{\alpha/2}\) or z < - critical value\(_{\alpha/2}\)
In our example we have 2.13 > 1.96.
Therefore, we do reject the \(H_0\).

With the help of R we obtain the p-value from the normal-distribution:

2 * pnorm(q = 2.13, lower.tail = FALSE)
[1] 0.03317161

Binomial test

For one categorical variable: investigates deviations from a theoretically expected distribution of observation.

Assumptions

  • Independent observations.

Theory

Scenario

Is the probability of being diagnosed with asthma different than it was 50 years ago?

\(H_0: \pi = \pi_0\)
\(H_1: \pi \neq \pi_0\)
where
\(\pi_0\) is the probability of being diagnosed with asthma 50 years ago.

If one-tailed
Is the probability of being diagnosed with asthma higher than it was 50 years ago?
\(H_0:\pi=\pi_0\)
\(H_1:\pi>\pi_0\)

or

Is the probability of being diagnosed with asthma lower than it was 50 years ago?
\(H_0:\pi=\pi_0\)
\(H_1:\pi<\pi_0\)

If in a sample of size \(n\) there are \(k\) successes, the formula of the binomial distribution is: \(Pr(X = k) = ({n \atop k}) p^k (1-p)^{n-k}\), where \(({n \atop k}) = \frac{n!}{k!(n-k)!}\) and \(!\) indicates a factorial.

  • For any possible outcome of the binomial we obtain the corresponding probability.
  • We find the p-value by considering the probability of seeing an outcome as, or more, extreme. For a one-tailed test, this is straightforward to calculate. If \(H_1:\pi<\pi_0\), then: \(p-value = \sum_{i=0}^{k}Pr(X = i) = \sum_{i=0}^{k} ({n \atop k}) p^k (1-p)^{n-i}\). Calculating a p-value for a two-tailed test is more complicated, since a binomial distribution is not symmetric if \(\pi_{0}\neq 0.5\). This means that we cannot double the p-value from the one-tailed test.

Type I error

Choose the probability of the type I error (\(\alpha\)). Common choice 5%. Less than 5% chance of getting such an extreme value by chance.

Application

Scenario

Is the probability of being diagnosed with asthma higher than it was 50 years ago?

Hypothesis

\(H_0:\pi=0.4\)
\(H_1:\pi>0.4\)

Collect the data

  • \(n = 10\)
  • \(k = 3\)
  • \(p = 0.3\)
  • \(\pi_0 = 0.4\) the probability of being diagnosed with asthma 50 years ago

The \(p-value\) is \(Pr(X <= 3)\)

With the help of R we obtain the p-value:

pbinom(q = 3, size = 10, prob = 0.4)
[1] 0.3822806

Type I error

We assume to have \(\alpha\) = 0.05

Draw conclusions

The \(H_0\) is not rejected.

Mc Nemar test

For categorical data: tinvestigates the marginal homogeneity of two dichotomous variables.

Assumptions

  • The variables should be categorical.
  • Matched data (pairs).
  • Pairs are independent of one another.
  • The levels (or categories) of the variables are mutually exclusive.

The \(\chi^2\) test is checking for independence, while the McNemar test is looking for consistency in results.

Theory

Scenario I

Is there a difference in the percentage of patients with asthma between the placebo and the drug group (matched data)?

Lets consider that the subjects of the groups are matched. That means that they have the same characteristics.

The null hypothesis then says that the probability that one subject is in the placebo group and has no asthma and the other subject is in the drug group and has asthma is the same as the probability that one subject is in the placebo group and has asthma and the other subject is the drug group and has no asthma.

Placebo (asthma) Placebo (no asthma) Total
Drug (asthma) a b a + b
Drug (no asthma) c d c + d
Total a + c b + d n

\(H_0: p_a + p_b = p_a + p_c\) and \(p_c + p_d = p_d + p_b\)
\(H_0: p_a + p_b \neq p_a + p_c\) and \(p_c + p_d \neq p_d + p_b\)


\(H_0: p_b = p_c\)
\(H_0: p_b \neq p_c\)

Connection with regression

Most statistical tests are special cases of the regression models. The same holds for the mcnemar test which is a special case of the conditional logistic regression when assuming one binary variable.

Since we have one patient in the placebo group and one in the drug (this is fixed by design) we are conditioning on the number of asthma cases in each stratum

Then the probability of being an asthma case is

\(\pi = \frac{Pr(Y_{s, drug}= asthma\ \&\ Y_{s, placebo} = no\ asthma)}{Pr(Y_{s, drug}= asthma\ \&\ Y_{s, placebo} = no\ asthma) + Pr(Y_{s, drug}= no \ asthma\ \&\ Y_{s, placebo} = asthma)} = \frac{p_b}{p_b+p_c}\)

where \(s\) represents the strata that are the pairs

Hypothesis testing:

\(H_0: \beta = 0\)
\(\beta = \texttt{logit}(\pi) = 0\) (if we know that \(\texttt{logit}(\pi) = \texttt{log}\big( \frac{\pi}{1-\pi}\big)\) \(\rightarrow\)
\(\beta = \texttt{log}\bigg(\frac{ \frac{p_b}{p_b+p_c} }{1 - \frac{p_b}{p_b+p_c}}\bigg) = \texttt{log} \bigg( \frac{\frac{p_b}{p_b+p_c}}{\frac{p_b+p_c-p_b}{p_b+p_c}} \bigg) = \texttt{log}\big\{\frac{p_b(p_b+p_c)}{p_c(p_b+p_c)}\big\} = \texttt{log}\big(\frac{p_b}{p_c} \big)\)


\(\rightarrow\) \(\texttt{log}\big(\frac{p_b}{p_c}\big) = 0\) \(\rightarrow\) \(\frac{p_b}{p_c} = 1\) \(\rightarrow\) \(p_b = p_c\)

Test statistic

\(X^2 = \frac{(b-c)^2}{b+c}\)

When the values in the contingency table are fairly small a “correction for continuity” known as the “Yates’ correction” may be applied to the test statistic. \(X^2 = \frac{(|b-c| - 1)^2}{b+c}\)

Degrees of freedom

df = 1

Type I error

Choose the probability of the type I error (\(\alpha\)). Common choice 5%. Less than 5% chance of getting such an extreme value by chance.

Critical value

Get critical values\(_{\alpha}\) and p-value from the \(\chi^2\)-distribution.

Draw conclusions

Compare test statistic with the critical values\(_{\alpha}\) or the p−value with \(\alpha\).

Application

Scenario I

Is there a difference in the percentage of patients with asthma between the placebo and the drug group (matched data)?

Hypothesis

\(H_0: p_b = p_c\)
\(H_0: p_b \neq p_c\)

Collect the data

Drug (asthma) Drug (no asthma) Sum
Placebo (asthma) 8 13 21
Placebo (no asthma) 18 11 29
Sum 26 24 50

Test statistic

\(X^2 = \frac{(b-c)^2}{b+c} = \frac{(13-18)^2}{13+18} = 0.81\)

Degrees of freedom

df = 1

Type I error

We assume to have \(\alpha\) = 0.05

Critical value

With the help of R we obtain the critical values: critical value\(_{\alpha}\):

qchisq(p = 0.05, df = 1, lower.tail = FALSE)
[1] 3.841459

Draw conclusions

We reject the H0 if: \(X^2\) > critical value\(_{\alpha}\)
In our example we have 0.81 < 3.84.
Therefore, we do not reject the \(H_0\).

With the help of R we obtain the p-value from the \(\chi^2\)-distribution:

pchisq(q = 0.81, df = 1, lower.tail = FALSE)
[1] 0.3681203