aliquote.org

Lasso Regression with Stata

January 17, 2019

Here comes the time of lasso and elastic net regression with Stata.

While ridge estimators have been available for quite a long time now (ridgereg), the class of estimators developped by Friedman, Hastie and Tibshirani has long been missing in Stata. It looks like it is now available in the elasticregress package (also available on GitHub), at least for linear models.

Here is a toy example, inspired from a previous talk (PDF) I gave on the topic. Basically, the ridge or L2 penalty consists in shrinking parameter estimates toward zero in order to stabilize their variance. The amount of shrinkage is controlled by the penalty parameter, $\lambda$, such that it is possible to make the estimates tend towards 0 by using a very high lambda value as demonstrated below using a polynomial model of order $k$: The yellow points are the observed values and the blue ones are predictions from this model with varying amount of shrinkage.

The Lasso penalty on the contrary will simply set the parameter estimates to zero past a certain threshold, which makes it convenient when one thinks in terms of variable selection although this technique does not lend itself well to collinearity, in which case the elasticnet criterion is certainly a better option.

Let us assume we have a sample of $n$ observations generated from the following model:

$$ y = \beta_0 + \sum_{j=1}^{10}\beta_jx_j + u, $$

where $u$ are random gaussian perturbations and $n=50$.

clear
set seed 101
set obs 50
forvalues j = 1/10 {
  generate x`j' = rnormal(0,1)
}
generate y = 1.1*x1 + 0.8*x2 - 0.7*x5 + 1.4*x6 + rnormal(0,1)
graph matrix y x*

Note that in the above model, we do not control the variance-covariance matrix of the predictors so that we can’t ensure that the partial correlations are exactly zero. More appropriate techniques are available to create multivariate normal observations. It is easy to check visually that the correlation matrix between the outcome $y$ and the predictors $x_j$ behave as expected:

And here is what we get when using a combination of L1 and L2 penalties:

. elasticregress y x*
Warning: the smallest λ tested was the MSE-minimising λ.
Consider re-running estimation with a smaller epsilon.

Elastic-net regression                 Number of observations     =         50
                                       R-squared                  =     0.8138
                                       alpha                      =     1.0000
                                       lambda                     =     0.0918
                                       Cross-validation MSE       =     1.6373
                                       Number of folds            =         10
                                       Number of alpha tested     =          6
                                       Number of lambda tested    =        100
------------------------------------------------------------------------------
           y |      Coef.
-------------|----------------------------------------------------------------
          x1 |   1.131193
          x2 |   .6925592
          x3 |  -.0984186
          x4 |          0
          x5 |  -.5039949
          x6 |   1.490665
          x7 |          0
          x8 |   .1432863
          x9 |          0
         x10 |  -.1665392
       _cons |  -.1164585
------------------------------------------------------------------------------
. display e(lambda)
.09181307

The estimates are stored in the e(b) vector, and more options are available. Here is one way to improve our original estimates, by increasing the grid search size from cross-validation and considering the $\pm 1$ SE rule:

. elasticregress y x*, numalpha(10) epsilon(0.00001) lambda1se

Elastic-net regression                 Number of observations     =         50
                                       R-squared                  =     0.7853
                                       alpha                      =     0.6667
                                       lambda                     =     0.2351
                                       Cross-validation MSE       =     1.9336
                                       Number of folds            =         10
                                       Number of alpha tested     =         10
                                       Number of lambda tested    =        100
------------------------------------------------------------------------------
           y |      Coef.
-------------|----------------------------------------------------------------
          x1 |   1.012981
          x2 |   .5975951
          x3 |          0
          x4 |          0
          x5 |  -.4317686
          x6 |    1.34201
          x7 |          0
          x8 |   .0574305
          x9 |          0
         x10 |  -.1385899
       _cons |  -.0239716
------------------------------------------------------------------------------

See Also

» Stata plot of the week #1 » Computer Age Statistical Inference » Stata and Julia on Atom » 50 years of Data Science and beyond » New tutorials