The Apply Family

Preface

Open Rstudio to do the practicals. Note that tasks with * are optional.

R packages

In this practical, a number of R packages are used. The packages used (with versions that were used to generate the solutions) are:

survival (version: 3.3.1)

R version 4.2.1 (2022-06-23 ucrt)

Datasets

For this practical, we will use the heart and retinopathy data sets from the survival package. More details about the data sets can be found in:

https://stat.ethz.ch/R-manual/R-devel/library/survival/html/heart.html

https://stat.ethz.ch/R-manual/R-devel/library/survival/html/retinopathy.html

The Apply Family

apply

Task 1

Obtain the mean of the columns start, stop, event, age, year, surgery of the heart data set.
Obtain the mean of the columns age, futime, risk of the retinopathy data set.

Solution 1

apply(heart[, c("start", "stop", "event", "age", "year", "surgery")], 2, mean)

##       start        stop       event         age        year     surgery 
##  15.5145349 201.2936047   0.4360465  -2.4840266   3.4532894   0.1686047

apply(retinopathy[, c("age", "futime", "risk")], 2, mean)

##      age   futime     risk 
## 20.78173 35.57929  9.69797

Task 2*

Create the matrix dataset1 <- cbind(A = 1:30, B = sample(1:100, 30)) and find the row sum of dataset1.

Solution 2*

dataset1 <- cbind(A = 1:30, B = sample(1:100, 30))
apply(dataset1, 1, sum)

##  [1]   5  25  75  45  12  56  66  21  96  26  65  13  92 104 110  48  20  95 116 104  90  40  93 113
## [25]  93  36  84  33  58  86

lapply

Task 1

Create the following function DerivativeFunction <- function(x) { log10(x) + 10 }. Apply the DerivativeFunction to dataset1 <- cbind(A = 1:30, B = sample(1:100, 30)). The output should be a list.

Solution 1

DerivativeFunction <- function(x) { log10(x) + 10 }
dataset1 <- cbind(A = 1:30, B = sample(1:100, 30))
lapply(dataset1, DerivativeFunction)

## [[1]]
## [1] 10
## 
## [[2]]
## [1] 10.30103
## 
## [[3]]
## [1] 10.47712
## 
## [[4]]
## [1] 10.60206
## 
## [[5]]
## [1] 10.69897
## 
## [[6]]
## [1] 10.77815
## 
## [[7]]
## [1] 10.8451
## 
## [[8]]
## [1] 10.90309
## 
## [[9]]
## [1] 10.95424
## 
## [[10]]
## [1] 11
## 
## [[11]]
## [1] 11.04139
## 
## [[12]]
## [1] 11.07918
## 
## [[13]]
## [1] 11.11394
## 
## [[14]]
## [1] 11.14613
## 
## [[15]]
## [1] 11.17609
## 
## [[16]]
## [1] 11.20412
## 
## [[17]]
## [1] 11.23045
## 
## [[18]]
## [1] 11.25527
## 
## [[19]]
## [1] 11.27875
## 
## [[20]]
## [1] 11.30103
## 
## [[21]]
## [1] 11.32222
## 
## [[22]]
## [1] 11.34242
## 
## [[23]]
## [1] 11.36173
## 
## [[24]]
## [1] 11.38021
## 
## [[25]]
## [1] 11.39794
## 
## [[26]]
## [1] 11.41497
## 
## [[27]]
## [1] 11.43136
## 
## [[28]]
## [1] 11.44716
## 
## [[29]]
## [1] 11.4624
## 
## [[30]]
## [1] 11.47712
## 
## [[31]]
## [1] 11.60206
## 
## [[32]]
## [1] 11.86332
## 
## [[33]]
## [1] 11.89209
## 
## [[34]]
## [1] 11.91908
## 
## [[35]]
## [1] 11.17609
## 
## [[36]]
## [1] 11.39794
## 
## [[37]]
## [1] 11.74036
## 
## [[38]]
## [1] 11.57978
## 
## [[39]]
## [1] 11.72428
## 
## [[40]]
## [1] 11.5682
## 
## [[41]]
## [1] 11.9345
## 
## [[42]]
## [1] 11.98227
## 
## [[43]]
## [1] 10.60206
## 
## [[44]]
## [1] 11.91381
## 
## [[45]]
## [1] 11.80618
## 
## [[46]]
## [1] 10.77815
## 
## [[47]]
## [1] 11.36173
## 
## [[48]]
## [1] 11.716
## 
## [[49]]
## [1] 11.62325
## 
## [[50]]
## [1] 11.79934
## 
## [[51]]
## [1] 11.70757
## 
## [[52]]
## [1] 11.99123
## 
## [[53]]
## [1] 11.51851
## 
## [[54]]
## [1] 11.86923
## 
## [[55]]
## [1] 12
## 
## [[56]]
## [1] 11.61278
## 
## [[57]]
## [1] 11.83885
## 
## [[58]]
## [1] 10.47712
## 
## [[59]]
## [1] 11.94939
## 
## [[60]]
## [1] 11.77815

Task 2*

Create a list that consist of the variables age and year from the heart data set and the variable risk from the retinopathy data set. Give the name list1 to this list.
Obtain the median of each element of the list. The output should be a list.

Solution 2*

list1 <- list(heart$age, heart$year, retinopathy$risk)
lapply(list1, median)

## [[1]]
## [1] -0.1136208
## 
## [[2]]
## [1] 3.750856
## 
## [[3]]
## [1] 10

sapply

Task 1

Create the following function Function2 <- function(x) { exp(x) + 0.1 }. Apply the Function2 to dataset2 <- cbind(A = c(1:10), B = rnorm(10, 0, 1)). The output should be simplified.

Solution 1

Function2 <- function(x) { exp(x) + 0.1 }
dataset2 <- cbind(A = c(1:10), B = rnorm(10, 0, 1))
sapply(dataset2, Function2)

##  [1] 2.818282e+00 7.489056e+00 2.018554e+01 5.469815e+01 1.485132e+02 4.035288e+02 1.096733e+03
##  [8] 2.981058e+03 8.103184e+03 2.202657e+04 5.829599e-01 1.706285e+00 1.994492e+00 3.778803e+00
## [15] 1.213039e+00 1.416347e+00 8.806216e-01 7.133495e-01 4.448679e+00 2.392794e+00

Task 2

Create a list that consist of the variable transplant from the heart data set and the variable status from the retinopathy data set. Give the name list2 to this list.
Obtain the percentages of 0 cases of each element of the list in a simplified output (as a vector).
Obtain the percentages of 1 cases of each element of the list in a simplified output (as a vector).

Solution 2

list2 <- list(heart$transplant, retinopathy$status)

library(memisc)

sapply(list2, function(x) { percent(x) } )

##        [,1]     [,2]
## 0  59.88372  60.6599
## 1  40.11628  39.3401
## N 172.00000 394.0000

sapply(list2, function(x) { percent(x) } )

##        [,1]     [,2]
## 0  59.88372  60.6599
## 1  40.11628  39.3401
## N 172.00000 394.0000

Task 3*

Do you remember the practical Control_Flow_and_Functions: Writing your own function (Task 1 and 2)? Now try to create again the same function (called summary_df) but avoid the use of a for loop. Apply the function to the retinopathy dat set.

Use the functions summary_continuous() and summary_categorical().
summary_continuous <- function(x) {
paste0(round(mean(x), 1), " ( ", round(sd(x), 1), ") ")
}

summary_categorical <- function(x) {
tab <- prop.table(table(x))
paste0(round(tab * 100, 1), "% ", names(tab), collapse = ", ")
}

Solution 3*

summary_continuous <- function(x) {
  paste0(round(mean(x), 1), " (", round(sd(x), 1), ")")
}

summary_categorical <- function(x) {
  tab <- prop.table(table(x))
  paste0(round(tab * 100, 1), "% ", names(tab), collapse = ", ")
}


summary_df <- function(dat) {
  vec_categorical <- sapply(dat, is.factor)
  print(sapply(dat[,vec_categorical], summary_categorical))
  vec_continuous <- sapply(dat, is.numeric)
  print(sapply(dat[,vec_continuous], summary_continuous))
}

summary_df(dat = retinopathy)

##                         laser                           eye                          type 
##    "50.8% xenon, 49.2% argon"     "45.2% right, 54.8% left" "57.9% juvenile, 42.1% adult" 
##              id             age             trt          futime          status            risk 
## "873.2 (495.5)"   "20.8 (14.8)"     "0.5 (0.5)"   "35.6 (21.4)"     "0.4 (0.5)"     "9.7 (1.5)"

tapply

Task 1

Obtain the median year per transplant group using the heart data set.
Obtain the median futime per status group using the retinopathy data set.

Solution 1

tapply(heart$year, heart$transplant, median)

##       0       1 
## 3.47707 3.92334

tapply(retinopathy$futime, retinopathy$status, median)

##     0     1 
## 48.53 13.83

Task 2

Apply the function Fun1 <- function(x) { mean(x)/(length(x) - 2) } to year per transplant and surgery group using the heart data set.
Obtain the mean futime per status, type and trt group using the retinopathy data set.

Solution 2

Fun1 <- function(x) { mean(x)/(length(x) - 2) }
tapply(heart$year, list(heart$transplant, heart$surgery), Fun1)

##            0         1
## 0 0.03820181 0.2818764
## 1 0.06362334 0.3910915

tapply(retinopathy$futime, list(retinopathy$status, retinopathy$type, retinopathy$trt), mean)

## , , 0
## 
##   juvenile    adult
## 0 45.22127 48.42273
## 1 18.65137 19.25160
## 
## , , 1
## 
##   juvenile    adult
## 0 45.62218 47.92323
## 1 16.66944 21.32833