if ... else and ifelse
Let’s make this a quick and quite basic one. There is this incredibly useful function in R called ifelse()
. It’s basically a vectorized version of an if … else control structure every programming language has in one way or the other. ifelse()
has, in my view, two major advantages over if … else:
- It’s super fast.
- It’s more convenient to use.
The basic idea is that you have a vector of values and whenever you want to test these values against some kind of condition, you want to have a specific value in another vector. An example follows below. First, let’s load the {rbenchmark}
package to see the speed benefits.
library(rbenchmark)
Now, the toy example: I am creating a vector of half a million random normally distributed values. For each of these values, I want to know whether the value is below or above zero.
x <- rnorm(500000)
ifelse()
is used as ifelse(<TEST>, <OUTCOME IF TRUE>, <OUTCOME IF FALSE>)
, so we need three arguments. My test is x < 0
and I want to have the string "negative"
in y
whenever the corresponding value in x
is smaller than zero. If this is not the case, then y
should have a "positive"
in this position. ifelse()
only needs one line of code for this.
benchmark(replications = 50, {
y <- ifelse(x < 0, "negative", "positive")
})$user.self
## [1] 4.215
We could also solve this with a for
loop. But, as you can see, this takes approx. 3 times as long.
benchmark(replications = 50, {
y <- c()
for (i in x) {
if (i < 0) {
y[length(y)+1] <- "negative"
} else {
y[length(y)+1] <- "negative"
}
}
})$user.self
## [1] 13.021
The same is true for an sapply()
version. sapply()
even consistently takes a little longer than a for
loop in this case - to my surprise.
benchmark(replications = 50, {
y <- sapply(x, USE.NAMES = F, FUN = function (i) {
if (i < 0) {
"negative"
} else {
"positive"
}
}
)
})$user.self
## [1] 15.023
It’s highly unlikely that rnorm()
produces a value of exactly zero. But we could also check for this by simply nesting calls to ifelse()
. If you want to do this, you simply add another ifelse()
in the “FALSE” part of the previous ifelse()
as I did below. In this little toy example, this nested test is still considerably faster than the for
or sapply()
versions of the single test.
benchmark(replications = 50, {
y <- ifelse(x < 0, "negative",
ifelse(x > 0, "positive", "exactly zero"))
})$user.self
## [1] 8.381