Open Rstudio to do the practicals. Note that tasks with * are optional.
In this practical, a number of R packages are used. The packages used (with versions that were used to generate the solutions) are:
survival (version: 3.3.1)R version 4.2.1 (2022-06-23 ucrt)
For this practical, we will use the heart and
retinopathy data sets from the survival
package. More details about the data sets can be found in:
https://stat.ethz.ch/R-manual/R-devel/library/survival/html/heart.html
https://stat.ethz.ch/R-manual/R-devel/library/survival/html/retinopathy.html
start,
stop, event, age,
year, surgery of the heart
data set.age,
futime, risk of the
retinopathy data set.apply(heart[, c("start", "stop", "event", "age", "year", "surgery")], 2, mean)## start stop event age year surgery
## 15.5145349 201.2936047 0.4360465 -2.4840266 3.4532894 0.1686047
apply(retinopathy[, c("age", "futime", "risk")], 2, mean)## age futime risk
## 20.78173 35.57929 9.69797
Create the matrix
dataset1 <- cbind(A = 1:30, B = sample(1:100, 30)) and
find the row sum of dataset1.
dataset1 <- cbind(A = 1:30, B = sample(1:100, 30))
apply(dataset1, 1, sum)## [1] 5 25 75 45 12 56 66 21 96 26 65 13 92 104 110 48 20 95 116 104 90 40 93 113
## [25] 93 36 84 33 58 86
Create the following function
DerivativeFunction <- function(x) { log10(x) + 10 }.
Apply the DerivativeFunction to
dataset1 <- cbind(A = 1:30, B = sample(1:100, 30)). The
output should be a list.
DerivativeFunction <- function(x) { log10(x) + 10 }
dataset1 <- cbind(A = 1:30, B = sample(1:100, 30))
lapply(dataset1, DerivativeFunction)## [[1]]
## [1] 10
##
## [[2]]
## [1] 10.30103
##
## [[3]]
## [1] 10.47712
##
## [[4]]
## [1] 10.60206
##
## [[5]]
## [1] 10.69897
##
## [[6]]
## [1] 10.77815
##
## [[7]]
## [1] 10.8451
##
## [[8]]
## [1] 10.90309
##
## [[9]]
## [1] 10.95424
##
## [[10]]
## [1] 11
##
## [[11]]
## [1] 11.04139
##
## [[12]]
## [1] 11.07918
##
## [[13]]
## [1] 11.11394
##
## [[14]]
## [1] 11.14613
##
## [[15]]
## [1] 11.17609
##
## [[16]]
## [1] 11.20412
##
## [[17]]
## [1] 11.23045
##
## [[18]]
## [1] 11.25527
##
## [[19]]
## [1] 11.27875
##
## [[20]]
## [1] 11.30103
##
## [[21]]
## [1] 11.32222
##
## [[22]]
## [1] 11.34242
##
## [[23]]
## [1] 11.36173
##
## [[24]]
## [1] 11.38021
##
## [[25]]
## [1] 11.39794
##
## [[26]]
## [1] 11.41497
##
## [[27]]
## [1] 11.43136
##
## [[28]]
## [1] 11.44716
##
## [[29]]
## [1] 11.4624
##
## [[30]]
## [1] 11.47712
##
## [[31]]
## [1] 11.60206
##
## [[32]]
## [1] 11.86332
##
## [[33]]
## [1] 11.89209
##
## [[34]]
## [1] 11.91908
##
## [[35]]
## [1] 11.17609
##
## [[36]]
## [1] 11.39794
##
## [[37]]
## [1] 11.74036
##
## [[38]]
## [1] 11.57978
##
## [[39]]
## [1] 11.72428
##
## [[40]]
## [1] 11.5682
##
## [[41]]
## [1] 11.9345
##
## [[42]]
## [1] 11.98227
##
## [[43]]
## [1] 10.60206
##
## [[44]]
## [1] 11.91381
##
## [[45]]
## [1] 11.80618
##
## [[46]]
## [1] 10.77815
##
## [[47]]
## [1] 11.36173
##
## [[48]]
## [1] 11.716
##
## [[49]]
## [1] 11.62325
##
## [[50]]
## [1] 11.79934
##
## [[51]]
## [1] 11.70757
##
## [[52]]
## [1] 11.99123
##
## [[53]]
## [1] 11.51851
##
## [[54]]
## [1] 11.86923
##
## [[55]]
## [1] 12
##
## [[56]]
## [1] 11.61278
##
## [[57]]
## [1] 11.83885
##
## [[58]]
## [1] 10.47712
##
## [[59]]
## [1] 11.94939
##
## [[60]]
## [1] 11.77815
age and
year from the heart data set and the
variable risk from the retinopathy data
set. Give the name list1 to this list.list1 <- list(heart$age, heart$year, retinopathy$risk)
lapply(list1, median)## [[1]]
## [1] -0.1136208
##
## [[2]]
## [1] 3.750856
##
## [[3]]
## [1] 10
Create the following function
Function2 <- function(x) { exp(x) + 0.1 }. Apply the
Function2 to
dataset2 <- cbind(A = c(1:10), B = rnorm(10, 0, 1)). The
output should be simplified.
Function2 <- function(x) { exp(x) + 0.1 }
dataset2 <- cbind(A = c(1:10), B = rnorm(10, 0, 1))
sapply(dataset2, Function2)## [1] 2.818282e+00 7.489056e+00 2.018554e+01 5.469815e+01 1.485132e+02 4.035288e+02 1.096733e+03
## [8] 2.981058e+03 8.103184e+03 2.202657e+04 5.829599e-01 1.706285e+00 1.994492e+00 3.778803e+00
## [15] 1.213039e+00 1.416347e+00 8.806216e-01 7.133495e-01 4.448679e+00 2.392794e+00
transplant
from the heart data set and the variable
status from the retinopathy data set. Give
the name list2 to this list.list2 <- list(heart$transplant, retinopathy$status)
library(memisc)
sapply(list2, function(x) { percent(x) } )## [,1] [,2]
## 0 59.88372 60.6599
## 1 40.11628 39.3401
## N 172.00000 394.0000
sapply(list2, function(x) { percent(x) } )## [,1] [,2]
## 0 59.88372 60.6599
## 1 40.11628 39.3401
## N 172.00000 394.0000
Control_Flow_and_Functions:
Writing your own function (Task 1 and 2)? Now try to create
again the same function (called summary_df) but avoid the
use of a for loop. Apply the function to the
retinopathy dat set.Use the functions summary_continuous() and
summary_categorical().
summary_continuous <- function(x) {
paste0(round(mean(x), 1), " ( ", round(sd(x), 1), ") ")
}
summary_categorical <- function(x) {
tab <- prop.table(table(x))
paste0(round(tab * 100, 1), "% ", names(tab), collapse = ", ")
}
summary_continuous <- function(x) {
paste0(round(mean(x), 1), " (", round(sd(x), 1), ")")
}
summary_categorical <- function(x) {
tab <- prop.table(table(x))
paste0(round(tab * 100, 1), "% ", names(tab), collapse = ", ")
}
summary_df <- function(dat) {
vec_categorical <- sapply(dat, is.factor)
print(sapply(dat[,vec_categorical], summary_categorical))
vec_continuous <- sapply(dat, is.numeric)
print(sapply(dat[,vec_continuous], summary_continuous))
}
summary_df(dat = retinopathy)## laser eye type
## "50.8% xenon, 49.2% argon" "45.2% right, 54.8% left" "57.9% juvenile, 42.1% adult"
## id age trt futime status risk
## "873.2 (495.5)" "20.8 (14.8)" "0.5 (0.5)" "35.6 (21.4)" "0.4 (0.5)" "9.7 (1.5)"
year per transplant
group using the heart data set.futime per status group
using the retinopathy data set.tapply(heart$year, heart$transplant, median)## 0 1
## 3.47707 3.92334
tapply(retinopathy$futime, retinopathy$status, median)## 0 1
## 48.53 13.83
Fun1 <- function(x) { mean(x)/(length(x) - 2) } to
year per transplant and surgery
group using the heart data set.futime per status,
type and trt group using the
retinopathy data set.Fun1 <- function(x) { mean(x)/(length(x) - 2) }
tapply(heart$year, list(heart$transplant, heart$surgery), Fun1)## 0 1
## 0 0.03820181 0.2818764
## 1 0.06362334 0.3910915
tapply(retinopathy$futime, list(retinopathy$status, retinopathy$type, retinopathy$trt), mean)## , , 0
##
## juvenile adult
## 0 45.22127 48.42273
## 1 18.65137 19.25160
##
## , , 1
##
## juvenile adult
## 0 45.62218 47.92323
## 1 16.66944 21.32833
© Eleni-Rosalina Andrinopoulou