Ridge regression

Table of contents

  • Intorduction
  • Theory: Ridge Objective
  • Visualization

Introduction

We are dealing with a regression type of problem, so we can apply a framework such as:

\[\mathbf{y}=\beta_0+\beta_1\mathbf{x_1}+ \dots +\beta_p\mathbf{x_p}+\epsilon\]

The goal of this methods is to find the coefficients by minimizing the Residual Sum of Squares (RSS).

\[RSS = \sum_{i=1}^{n}(y_i-\beta_0-\beta_1x_{i1}-\dots-\beta_nx_{np})^2\]

In a simple RSS regression we can say that the \(\beta\)’s are unbiased, meaning that no systematical error is included in their estimation.

We can consider the MSE and decompose it into two parts, one due to the bias and the second one due to the variance of the model.

\[MSE[\hat{\beta}_j] := E[(\hat{\beta_j}-\beta_j)^2]=(E[\hat{\beta_j}]-\beta_j)^2+Var[\hat{\beta}_j]\]

Shrinkage methods aim to add bias to the coefficient estimation in order to reduce the variace of the model and obtain a simpler interpretation of the data set.

We can do that by imposing sparsity. If we suppose a matrix to be a linear combination of the variables \(f(x)=\sum_{i=1}^{n}<\beta,\mathbf{x}>\) , where \(\beta_i\)’s are the coefficients, we can impose sparsity by imposing most of the coefficients to be zero.

The result of adding some systematical quantity to the RSS will cause a decrease of the model variance, simplyfying the model.

PLOT bias-variance tradeoff plot: The MSE plot explains that is the variance is high the bias decreases

Ridge Objective

Ridge regression is a regression type of problem, and is used to enforce sparsity by imposing a quadratic penalty:

\[\lambda \sum_{j=1}^{p}\beta_j=\lambda||\mathbf{\beta}||_2^2\]

We the wan to explore how the \(\beta\)’s change as we tune \(\lambda\). To do that we make use of a cross validation algorithm.

In practical applications is a good practice to standardize the features before running a ridgre regression over a data set, in order to avoid an unwanted scale effect.

The formulation of the Ridge regression can be expressed in two different forms:

  • first form:
\[min_{\beta} [RSS + \lambda\sum_{j=1}^{p}\beta_j^2]\]
  • second form:
\[min_{\beta}[RSS]\] \[\textrm{s.t.} \sum_{j=1}^{p}\beta_j^2 \leq C^2\]

References:

  1. Coursera:
  2. Xav