Imagine you are preparing for a test and have a big list of topics to study. But not all topics are equally important. Some topics are essential, while others are just there to make the list look long. Wouldn't it be nice if someone told you which topics really matter so you can focus on them? This is exactly what Lasso Regression does when we work with data. It helps us figure out which pieces of information (features) are important and which ones we can ignore.
Let's break it down step by step:
Regression is a way to predict something, to estimate specific variable. For example:
Imagine you have dataset with 100 feature (input variables), however, only 10 of them actually matter. If we use 100 features, our model might get confused and could give bad predictions. This is called overfitting, and it's like trying to study every topic when only a fuew are on the test.
We need to find a way to focus on the important features and ignore the unimportant ones. This is where Lasso Regression comes in.
The lasso regression is a special type of regression that does two things i.e.:
When we train a model, we try to minimize something called the loss function. For regular regression, the loss function is:
\begin{equation}
Loss = Error = \sum (Actual - Prediction)^2
\end{equation}
But in Lasso Regression, we add penalty to the loss function:
\begin{equation}
Loss = Error + \lambda \sum|w_i|
\end{equation}
where:
Let's say you're trying to predict test scores based on these features:
Imagine you’re walking on a mountain. Regular regression tries to find the lowest point (minimum error). Lasso Regression also tries to find the lowest point, but it puts up walls around some features (the penalty). These walls prevent the model from giving too much importance to unimportant features.
Simpler Models: Lasso removes unimportant features, making the model easier to understand.
Prevents Overfitting: By ignoring noisy features, Lasso helps the model generalize better to new data.
Focus on What Matters: It tells you which features are actually important.
In this example we will create the sample dataset from sklearn.dataset module (make_regression). The generated dataset will contain 100 samples and have 10 features (input variables) with 0.1 noise. The generated dataset will be divided on train test in 80:20 ratio. The 80% will be used for training the Lasso Regression.
Ok so the first step is to improt the required libraries/modules. For this example we will need Lasso from sklearn.linear_model, train_test_split function from sklearn.model_selection, and the make_regression from sklearn.dataset module.
Finally we can test the Lass trained model on the test dataset and compute the score. The score in this case is the \(R^2\) score and every sklearn regression algorithm has this as the bulit-in function (or at least the majorty of them). The \(R^2\) is computed using the bulit-in function score(). Of course in the parentheses you need to define the X_test, y_test values.
Lasso Regression is like a helpful teacher who tells you which topics to focus on.
It adds a penalty to remove unimportant features.
It’s a great tool for creating simple, accurate models that avoid overfitting.
Now you know how Lasso Regression works and why it’s useful for feature selection.
Step 1: What is Regression ?
Regression is a way to predict something, to estimate specific variable. For example:
- If you know how much you studied, can you predict your test score?
- If you know the size of a house, can you predict its price ?
- Features: Hours studied, number of pracice tests.
- Ouput: Test Score
- \(y\) is the output (like your test score),
- \(x_1, x_2m,...,x_n\) are the features (like hours studied or practice tests),
- \(w_1, w_2,...,w_n\) are the weights (they tell us how important each feature is).
- \(b\) is the bias (a constant number).
Step 2: Too Many Features Can Be a Problem?
Imagine you have dataset with 100 feature (input variables), however, only 10 of them actually matter. If we use 100 features, our model might get confused and could give bad predictions. This is called overfitting, and it's like trying to study every topic when only a fuew are on the test. We need to find a way to focus on the important features and ignore the unimportant ones. This is where Lasso Regression comes in.
Step 3: What is Lasso Regression ?
The lasso regression is a special type of regression that does two things i.e.:
- It finds the best equation to predict
- It automatically removes the features that are not important.
Step 4: The Math Behind Lasso
When we train a model, we try to minimize something called the loss function. For regular regression, the loss function is:
\begin{equation}
Loss = Error = \sum (Actual - Prediction)^2
\end{equation}
But in Lasso Regression, we add penalty to the loss function:
\begin{equation}
Loss = Error + \lambda \sum|w_i|
\end{equation}
where:
- \(\sum|w_i|\) is the sum of the absolute values of all the weights
- \(\lambda\) is a number that controls how strong the penalty is
- if the penalty is small, Lasso will behave like regular regression.
- if the pnealty is large, Lasso will make more weights zero.
Step 5: A Simple Example
Let's say you're trying to predict test scores based on these features:
- Hours studied
- Hours spent watching TV
- Number of parctice test
- Favorite color (it is a silly feature)
Step 6: Visualizing Lasso
Imagine you’re walking on a mountain. Regular regression tries to find the lowest point (minimum error). Lasso Regression also tries to find the lowest point, but it puts up walls around some features (the penalty). These walls prevent the model from giving too much importance to unimportant features.
Step 7: Why Use Lasso for Feature Selection?
Simpler Models: Lasso removes unimportant features, making the model easier to understand.
Prevents Overfitting: By ignoring noisy features, Lasso helps the model generalize better to new data.
Focus on What Matters: It tells you which features are actually important.
Step 8: How to Use Lasso in Python?
In this example we will create the sample dataset from sklearn.dataset module (make_regression). The generated dataset will contain 100 samples and have 10 features (input variables) with 0.1 noise. The generated dataset will be divided on train test in 80:20 ratio. The 80% will be used for training the Lasso Regression. Ok so the first step is to improt the required libraries/modules. For this example we will need Lasso from sklearn.linear_model, train_test_split function from sklearn.model_selection, and the make_regression from sklearn.dataset module.
from sklearn.datasets import make_regressionThe next step will be to generate the data using the make_regression function and split the data using the train_test_split function. The make_regression function used to create the dataset will consist of 10 features and 100 samples and with noise paramter set to 0.1. The noise parameter is the standard deviation of the gaussian noise applied to the output. Then we are goind to split that data in 80:20 ratio by setting the test_size parameter to 0.2. To control the shuffling applied to the data before applying the split we will set the value of random_State parameter to 42.
from sklearn.model_selection import train_test_splot
from sklearn.linear_model import Lasso
X, y = make_regression(n_samples=100, n_features=10, noise=0.1)The next step is to define the Lasso model with alpha (penalty term - lambda in previosly described math) equal to 0.1. After defining the Lasso Regression and assigned it to lasso variable the train data must be provided using fit() function.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
lasso = Lasso(alpha = 0.1)The final step is to show the weights or in other words the feature importance that was obtained with Lasso Regression method. This is done using the built in function .coef_.
lasso.fit(X_train,y_train)
print("Feature Importance:", lasso.coef_)The output of the previous command line is given below.
Feature Importance: [56.76623236 20.40924724 9.14790695 23.73715764 30.96736664 39.12329335 48.59523207 13.39841335 13.46403202 26.00418041]Depending on the arbitrarily defined threhsold value you can remove features that have lower weights (Feature importances) and use only those with higher feature importance values. For example if we say that weight threshold must be higher than 15 this means that the the dataset will be reduced to 7 features since only 7 have higher feature importance values than 17.
Finally we can test the Lass trained model on the test dataset and compute the score. The score in this case is the \(R^2\) score and every sklearn regression algorithm has this as the bulit-in function (or at least the majorty of them). The \(R^2\) is computed using the bulit-in function score(). Of course in the parentheses you need to define the X_test, y_test values.
score = lasso.score(X_test, y_test)The output of the previous code is given below.
print("Model Score:", score)
Model Score: 0.9999862508164981The obtaine \(R^2\) value indicate that the trained model performs very well on the unseen data which proves that the overfitting did not occur. The entire code created in this example is given below.
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
# Create some sample data
X, y = make_regression(n_samples=100, n_features=10, noise=0.1)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Apply Lasso Regression
lasso = Lasso(alpha=0.1) # alpha is the penalty term (lambda in our math)
lasso.fit(X_train, y_train)
# Get the coefficients (weights)
print("Feature Importance:", lasso.coef_)
# Test the model
score = lasso.score(X_test, y_test)
print("Model Score:", score)