Open Rstudio to do the practicals. Note that tasks with * are optional.
In this practical, a number of R packages are used. The packages used (with versions that were used to generate the solutions) are:
survival (version: 3.3.1)memisc (version: 0.99.30.7)ggplot2 (version: 3.3.6)R version 4.2.1 (2022-06-23 ucrt)
For this practical, we will use the heart and
retinopathy data sets from the survival
package. More details about the data sets can be found in:
https://stat.ethz.ch/R-manual/R-devel/library/survival/html/heart.html
https://stat.ethz.ch/R-manual/R-devel/library/survival/html/retinopathy.html
Before starting with any statistical analysis it is important to transform and explore your data set.
age is equal to age - 48. Let’s bring
age back to the normal scale. Do not overwrite the variable
age, but create a new variable with the name
age_orig.surgery into a factor with levels
0: no and 1: yes.Use the function factor(…) to convert a numeric variable to a factor.
heart$age_orig <- heart$age + 48
heart$surgery <- factor(heart$surgery, levels = c(0, 1), labels = c("no", "yes"))Categorize the variable age from the
retinopathy data set as young: [minimum
age until mean age) and old:
[mean age until maximum age). Give this
variable the name ageCat. Print the first 6 rows of the
data set retinopathy.
To dichotomize a numeric variable combine the function as.numeric(…) with a logical condition (e.g., as.numeric(X > 2)). This logical condition will split the numeric variable into two parts (young and old). Use the function factor(…) to convert a variable into a factor.
retinopathy$ageCat <- as.numeric(retinopathy$age >= mean(retinopathy$age))
retinopathy$ageCat <- factor(retinopathy$ageCat, levels = c(0, 1), labels = c("young", "old"))
head(retinopathy)Categorize futime from data set
retinopathy as follows:
short: [minimum futime until 25).medium: [25 until 45).long: [45 until maximum futime).futimeCat. Print the first 6
rows of the data.Create a variable that is identical to the futime variable (use the name futimeCut). Then use indexing (e.g., X[X < 25]) to select the correct subset of the new variable futimeCut and set it to the new category (e.g., “short”).
E.g. you can create the low category as:
retinopathy$futimeCut <- retinopathy$futime
retinopathy$futimeCut[retinopathy$futime < 25] <- "short"
Now continue with the other categories.
retinopathy$futimeCut <- retinopathy$futime
retinopathy$futimeCut[retinopathy$futime < 25] <- "short"
retinopathy$futimeCut[retinopathy$futime >= 25 & retinopathy$futime < 45] <- "medium"
retinopathy$futimeCut[retinopathy$futime >= 45] <- "long"
head(retinopathy)Create 2 vectors of size 50 as follows:
Sex: takes 2 values 0 and 1.Age: takes values from 20 till 80.Sex variable into a factor with levels 0:
female and 1: male.AgeCat as dichotomous with
Age <= 50 to be 0 and 1 otherwise.AgeCat variable into a factor with levels
0: young and 1: old.Age variable by \(\frac{Age-mean(Age)}{sd(Age)}\).To sample a numeric and categorical variable use the function sample(…). To convert a numeric variable to a categorical use the function factor(…). To dichotomize a numeric variable use the function as.numeric(…).
Sex <- sample(0:1, 50, replace = T)
Age <- sample(20:80, 50, replace = T)
Sex <- factor(Sex, levels = c(0:1), labels = c("female", "male"))
AgeCat <- as.numeric(Age > 50)
AgeCat <- factor(AgeCat, levels = c(0:1), labels = c("young", "old"))
Age <- (Age - mean(Age))/sd(Age)Create a data frame with the name DF as follows:
Sex, Age,
AgeCat form the previous Task.Gender, StandardizedAge,
DichotomousAge.DF <- data.frame(Sex, Age, AgeCat)
DF <- data.frame("Gender" = Sex, "StandardizedAge" = Age, "DichotomousAge" = AgeCat)Create 2 vectors of size 150 as follows:
Treatment: takes 2 values 1 and 2.Weight: takes values from 50 till 100.Treatment variable into a factor with
levels 1: no and 2: yes.Weight variable by Weight *
1000.Treatment and
Weight.To sample a numeric and categorical variable use the function sample(…). To convert a numeric variable to a categorical use the function factor(…).
Treatment <- sample(1:2, 150, replace = T)
Weight <- sample(50:100, 150, replace = T)
Treatment <- factor(Treatment, levels = c(1:2), labels = c("no", "yes"))
Weight <- Weight * 1000
data.frame(Treatment, Weight)Create a list called my_list with the following:
let: a to i.sex: factor taking the values males and
females and length 50.mat: matrix
| 1 | 2 |
| 3 | 4 |
To obtain letters use the function letters(…). To sample a numeric and categorical variable use the function sample(…). To convert a numeric variable to a categorical use the function factor(…).
let <- letters[1:9]
sex <- sample(1:2, 50, replace = TRUE)
sex <- factor(sex, levels = 1:2, labels = c("males", "females"))
mat <- matrix(1:4 ,2, 2, byrow = TRUE)
my_list <- list(let = let, sex = sex, mat = mat) Let’s obtain some descriptive statistics.
Obtain the mean and standard deviation for the variable
age using the heart data set.
Use the functions mean(…) and sd(…).
mean(heart$age)## [1] -2.484027
sd(heart$age)## [1] 9.419999
Using the retinopathy data set:
age.
type.age.Use the functions median(…) and IQR(…) to obtain the median and the interquartile range. Load the package memisc and use the function percent(…) in order to obtain the percentages. To check whether there are missing values use the functions sum(is.na(…)).
median(retinopathy$age)## [1] 16
IQR(retinopathy$age)## [1] 20
library(memisc)## Loading required package: lattice
## Loading required package: MASS
##
## Attaching package: 'memisc'
## The following objects are masked from 'package:stats':
##
## contr.sum, contr.treatment, contrasts
## The following object is masked from 'package:base':
##
## as.array
percent(retinopathy$type)## juvenile adult N
## 57.86802 42.13198 394.00000
sum(is.na(retinopathy$age)) # any(is.na(retinopathy$age))## [1] 0
Using the data frame DF from the exercise before (Task
5):
StandardizedAge.StandardizedAge.Gender.DichotomousAge.Gender and
DichotomousAge (crosstab table).To calculate the frequencies, use the functions length(…) or table(…). To obtain the dimensions use the function dim(…).
mean(DF$StandardizedAge)## [1] 1.891608e-16
sd(DF$StandardizedAge)## [1] 1
length(DF$Gender[DF$Gender == "female"])## [1] 22
length(DF$Gender[DF$Gender == "male"])## [1] 28
table(DF$Gender)##
## female male
## 22 28
table(DF$Gender, DF$DichotomousAge)##
## young old
## female 15 7
## male 16 12
dim(DF)## [1] 50 3
Obtain the pearson and spearman correlation of the variables
year and age of the heart data
set.
To calculate the correlations, use the function cor(…) and check the argument method.
cor(heart$year, heart$age, method = "pearson")## [1] -0.1623965
cor(heart$year, heart$age, method = "spearman")## [1] -0.1770664
Let’s visualize the data.
Using the heart data set:
age and
year.Age for the x-axis and Year of acceptance for
the y-axis.Use the function plot(…, xlab, ylab, col). Use the function legend(…) to add a legend to the plot.
plot(heart$age, heart$year)plot(heart$age, heart$year, xlab = "Age", ylab = "Year of acceptance")plot(heart$age, heart$year, xlab = "Age", ylab = "Year of acceptance", col = heart$transplant)
legend(-40, 6, c("no", "yes"), col = c("black", "red"), pch = 1)Using the retinopathy data set:
age per status.Use the function boxplot(…).
boxplot(retinopathy$age ~ retinopathy$status)boxplot(retinopathy$age ~ retinopathy$status, col = c("blue", "green"))Using the retinopathy data set:
age with
risk.age per type
group.Use the ggplot2 package and the functions: geom_smooth(…) and geom_density(…).
library(ggplot2)##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:memisc':
##
## syms
ggplot(retinopathy, aes(age, risk)) +
geom_smooth(colour = 'black', span = 0.4)## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(retinopathy, aes(age, fill = type)) +
geom_density(alpha = 0.25) © Eleni-Rosalina Andrinopoulou