Thursday, September 7, 2017

Learn R : Grouping Functions sapply vs lapply vs vapply


    Applying a function to an array/list/matrix is very simple in R. I am going to cover some of the basic grouping functions in this post.  R has many grouping functions and it might be confusing for programmers to decide which one to use. Like any other functions you can use the help() function to get more information about them anytime.

lapply function

lapply(X, FUNCTION, ...)
 
    Let's say we have an array, and we need to pass each item in this array to a function. Rather than creating a loop and pass each item to a function you can use one of the apply functions in R. To demo the grouping functions, Let's say we have temperature records of a city, each temperature is in Fahrenheit degree and I need each of them available in Celsius degree too. First I am going to try to do this in classic way with a loop.

tempsInf <- c(68, 73, 70, 61, 59, 66, 79)
tempsInc <- c()

Tocelsius <- function(temp,asInt=F){
 if (asInt){
        round(((temp-32) * (5/9)),0)
   } else { 
        (temp-32) * (5/9) 
   }
}

for (temp in tempsInf){
tempsInc <- append(tempsInc, Tocelsius(temp, TRUE))
}

print(tempsInc)

[1] 20 23 21 16 15 19 26
> 

  •    tempsInf contains the temperatures in Fahrenheit.
  •    tempsInc contains the temperatures in Celsius.
  •    We have a function named Tocelsius which takes two arguments. Temp is the Fahrenheit value we can to convert, and asInt is an optional parameter and its default value is False. If you like the returned value in integer then you can pass True.
  • We have a simple for loop which takes all the values in array and calls the Tocelsius function to retrieve the celcius value.
  • append() is a function which adds the celcius values to the tempsInc array.

  Now, let's try the grouping function lapply to make this code little bit better.


tempsInf <- c(68, 73, 70, 61, 59, 66, 79)
tempsInc <- c()

Tocelsius <- function(temp,asInt=F){
 if (asInt){
        round(((temp-32) * (5/9)),0)
   } else { 
        (temp-32) * (5/9) 
   }
}

tempsinC <- lapply(tempsInf, toCelsius, asinteger = T) 
unlist(tempsinC)

print(tempsInc)

[1] 20 23 21 16 15 19 26
> 

  
      Now what happened here? I passed my object which is the array of temperatures to lapply function and I passed the function I want to run which is the Tocelsius. Then lapply magically run each item and give me back the degrees in celsius. Returned numbers are right but lapply returns list rather than array. That's why I used the unlist() function to convert the list to array.

sapply function

sapply(X, FUNCTION, ..., simplify = TRUE, USE.NAMES = TRUE)
 
     lapply function is pretty useful but wait it gets better. Now let's look at sapply function. sapply is wrapper for lapply and it can do more that lapply, I am going to use the same example but I am going to convert temperatures in Fahrenheit to string to demo sapply functionality here.

 
tempsInf <- c('68', '73', '70', '61', '59', '66', '79')
tempsInc <- c()
Tocelsius <- function(temp,asInt=F){
temp = as.numeric(temp)
 if (asInt){
        round((temp-32) * (5/9),0)
   } else { 
        (temp-32) * (5/9) 
   }
}
sapply(tempsInf, Tocelsius, asInt=T)

68 73 70 61 59 66 79 
20 23 21 16 15 19 26 
> 

     When you run this script, first thing you would notice is sapply does not return a list, it tries to return array if appropriate that's why I don't have to use unlist() function. Second since our sources are string, sapply uses them as their names. You can turn off that by passing USE.NAMES = F

vapply function

vapply(X, FUNCTION, FUNCTION.VALUE, ..., USE.NAMES = TRUE)
 
  Advantages of vapply is, you have some kind of control what will be returned by it. FUNCTION.VALUE checks the data type returned by the function.
 
tempsInf <- c('68', '73', '70', '61', '59', '66', '79')
tempsInc <- c()
Tocelsius <- function(temp,asInt=F){
temp = as.numeric(temp)
 if (asInt){
        round((temp-32) * (5/9),0)
   } else { 
        (temp-32) * (5/9) 
   }
}
 vapply(tempsInf, FUN = Tocelsius, asInt=T,  
   FUN.VALUE = as.numeric(2))
 

68 73 70 61 59 66 79 
20 23 21 16 15 19 26 
> 

     As you can see everything is same with sapply, only difference is in vapply we are passing as.numeric(2) parameter. This checks if the returned value from the function is numeric or not. if we change this to as.character(2) then you will receive the following error.

Error : values must be type 'character',
 but FUN(X[[1]]) result is type 'double'
> 

No comments:

Post a Comment