Should we call Cpp from R? Implementing and benchmarking a simple gradient descent procedure with Rcpp

In this post, we’ll implement a simple gradient descent function in c++ directly inside R, employing the package Rcpp. This post intends to show how sometimes c++ can be used to make code faster in some circumstances, so we encourage people to code more in this language.

So let’s get down to brass tacks. Gradient descent is an algorithm for optimization of functions with low peaks. This procedures tries to get to the local minimum of a function by iterating through the the proportional gradient of this function (if you don’t get it, try Calculus 101, or read it on Wikipedia).

Suppose a common logistic regression application. Let’s simulate some binary data:

n <- 1250000
x <- rnorm(n)
y <- 1/(1 + exp(-1.3 - 1.6 * x - rnorm(n)))
y <- rbinom(n,1,y)

For the purpose of generating estimates for this logit regression, we need a cross-entropy function: \[D(S, L) = - \sum_{i=1}^n L_i * log(S_i)\], such that \[S_i = logit(ax_i +b)\]

In the context of logistic regression: \[\begin{equation} D(S,L) = \\ -y log(S_i) - (1 - y) log(1 - S_i) \end{equation}\]
Now we need to minimize the average cross-entropy function: \[L = \frac{1}{N} D(S,L)\]

We need the gradient for this function. We take for granted that \[\frac{\partial L}{\partial b} = \frac{1}{2N} \sum_{i=1}^{n} (S_i - y_i)\] and

\[\frac{\partial L}{\partial a} = \frac{1}{2N} \sum_{i=1}^{n} (S_i - y_i)x_i\]

Implementing a Gradient Descent Function in R

In order to compare, we’re implementing a Gradient Descent Function in R firstly. This function will receive data, a threshold, and values for a and b. Next, those functions will be updated until their difference were lesser than the threshold. Another important information: no for loops here, we’re using recursion.

gd_r <- function(y,x, threshold, a, b, lr) {
  n <- length(y)
  
  # Updating values for a and b
  a_n <- a - lr*(1/(2*n))*sum( (1/(1+exp( -a*x - b)) - y)*x)
  b_n <- b - lr*(1/(2*n))*sum( 1/(1+exp( -a*x - b)) - y)

  # Test if the difference between old and new values 
  #   for a and b reached the threshold
  if( abs(a_n - a) > threshold || abs(b_n - b) > threshold ) {
    # Recursion
    return(gd_r(y,x,threshold,a_n,b_n, lr))
  } 
  return(c(a_n, b_n))
}

Let’s check if it’s working for a threshold of .0001, a learning rate of .9, and initial values for a and b, .5.

  gd_r(y,x,.0001, .5,.5,.9)

## [1] 1.365905 1.107018

Implementing Gradient Descent in C++

Now I’ll implement the same function, but in c++ directly in R. This can be done using the package Rcpp. For more references, check out Hadley Wickham book.

We just need to define our c++ function inside the Rcpp function cppFunction. (If you don’t know c++, don’t bother, Hadley book on R is a good introduction.)

library(Rcpp)
cppFunction('
NumericVector gd_cpp(IntegerVector y, NumericVector x, double threshold, double a, double b, double lr) {
  NumericVector ab(2);

  // n will represent the length of y and x
  int n = y.size();

  // In this case, we re using for loops to calculate sums. That s right, no Rcpp:sum here.
  double sum_a = 0;
  double sum_b = 0;
  for(int i = 0; i < n; i++) {
    sum_a += (1/(1+std::exp( -a*x[i] - b)) - y[i])*x[i];
    sum_b += (1/(1+std::exp( -a*x[i] - b)) - y[i]);
  }

  // Updating values for a and b
  ab[0] = a - lr * sum_a / (2*n);
  ab[1] = b - lr * sum_b / (2*n);

  // Recursion
  if ( std::abs(ab[0] - a) > threshold || std::abs(ab[1] - b) > threshold ) {
    return gd_cpp(y,x,threshold, ab[0],ab[1],lr);
  }
  return ab;
  
}
')

Let’s check if it’s working.

gd_cpp(y,x,.0001, .5,.5,.9)

## [1] 1.365905 1.107018

It looks great!

Comparing perfomance of both functions

Now we have to check if the function we create in c++ outperform R function. For this operation, we just have to use the function microbenchmark from “microbenchmark” package. We will run the same operations 50 times. So, we’ll change the learning rate to 2 and the threshold to .001.

library(microbenchmark)
res <- microbenchmark(gd_cpp(y,x,.001, .5,.5,2) , gd_r(y,x,.001, .5,.5,2), times=50 )

And the winner is:

print(res)

## Unit: seconds
##                              expr      min       lq     mean   median
##  gd_cpp(y, x, 0.001, 0.5, 0.5, 2) 2.886451 2.904519 3.056539 2.994356
##    gd_r(y, x, 0.001, 0.5, 0.5, 2) 4.083224 4.149808 4.657743 4.444425
##        uq     max neval
##  3.105450 3.93215    50
##  5.083942 6.01895    50

As we can see, the function written in cpp was around 1.58 times faster than the same function wwritten in R (3.05 vs 4.83). In other words, if you have an analysis that requires velocity and you don’t bother of writing in a more complex language, you should implement in with Rcpp.