PYTHONHOLICS: Predictive Modeling

Showing posts with label Predictive Modeling. Show all posts

Wednesday, February 26, 2025

Adding Interaction Terms to Linear Regression

Interaction terms in linear regression models are used to capture the effect of two or more predictor variables working together. These terms are crucial when we believe that the relationship between a predictor and the response variable is not just linear, but also dependent on the values of other predictors. In this post, we will explore how to add interaction terms to a linear regression model using Python and scikit-learn.

Understanding Interaction Terms

In a basic linear regression model, the relationship between the predictor variables and the target variable is assumed to be linear. However, in many real-world situations, this assumption may not hold true. For example, the effect of one predictor variable on the target variable might depend on the level of another predictor variable. In such cases, adding interaction terms can help model these relationships more accurately.

Consider the following general linear regression equation:

\begin{equation} y=\beta_0 + \beta_1\cdot x_1 + \beta_2\cdot x_2 + \varepsilon \end{equation}

Where:

y is the target variable
x1 and x2 are predictor variables
β0, β1, and β2 are coefficients
ε is the error term

To include an interaction between x1 and x2, the model becomes:

\begin{equation} y=\beta_0 + \beta_1\cdot x_1 + \beta_2\cdot x_2 + \beta_3 \cdot (x_1\cdot x_2)+\varepsilon \end{equation}

Here, β3 represents the coefficient of the interaction term (x1 * x2). This term allows the model to account for the combined effect of x1 and x2 on the target variable y.

Adding Interaction Terms in Python

Now let's look at an example where we add interaction terms to a linear regression model using Python and the scikit-learn library. We will use the popular Boston housing dataset to predict the price of houses based on features like crime rate, average number of rooms, and distance to employment centers, among others.

# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
#from sklearn.datasets import load_boston #Works with scikit-learn &leq 1.2.0
from sklearn.datasets import fetch_openml #Use when you want the Boston housing dataset however you have scikit-learn version equal or higher than 1.2.0
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error

# Load the Boston dataset
#boston = load_boston()#Works with scikit-learn &leq 1.2.0
boston = fetch_openml(name="boston", version=1, as_frame=True)#Use when you want the Boston housing dataset however you have scikit-learn version equal or higher than 1.2.0
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = boston.target

# Add interaction terms using PolynomialFeatures
poly = PolynomialFeatures(interaction_only=True, include_bias=False)
X_poly = poly.fit_transform(X)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size=0.3, random_state=42)

# Create and train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate the mean squared error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

It should be noted that in this example we have used the Boston housing dataset. Unfortunately, the Boston housing dataset is no longer available in scikit-learn versions 1.2.0 or higher. However, you can still download the dataset from the openl.org repository using the fetch_openml function from sklearn.datasets module.
After importing all necessary libraries you have to download the Boston housing dataset using the fetch_openml function.

boston = fetch_openml(name="boston", version=1, as_frame=True)

Explanation of the Code

In this code, we:

Import necessary libraries, including scikit-learn's LinearRegression, PolynomialFeatures, and the Boston dataset.
Load the Boston housing dataset and separate the features (X) and the target variable (y).
Use PolynomialFeatures with interaction_only=True to generate only the interaction terms between the original features (excluding the squared terms).
Split the data into training and testing sets, train a linear regression model, and evaluate its performance using the mean squared error (MSE) metric.

When the previous code is executed the obtained result is:

Mean Squared Error: 16.320305223697712

Conclusion

Interaction terms can significantly improve the performance of a linear regression model when the relationship between predictors is not purely additive. By using techniques such as PolynomialFeatures in scikit-learn, you can easily add interaction terms and enhance your model’s predictive power. However, it's essential to avoid overfitting by carefully selecting interaction terms and evaluating the model on a separate test set.

When to Use Interaction Terms

Interaction terms should be used when you believe that the effect of one predictor variable depends on the value of another predictor. However, it's important to be cautious when adding too many interaction terms, as this could lead to overfitting. Always evaluate your model using cross-validation and test data to ensure its generalizability.

Tuesday, January 28, 2025

Comparing Ridge, Lasso, and ElasticNet Regressions

In this post, we will compare three popular regularization techniques used in linear regression models: Ridge, Lasso, and ElasticNet. These methods help prevent overfitting by adding penalties to the coefficients of the model.

What is Regularization?

Regularization is a technique used to improve the generalization ability of a model. It adds a penalty to the loss function based on the size of the coefficients, helping to reduce the complexity of the model and prevent overfitting.

Ridge Regression

Ridge regression, also known as L2 regularization, adds the squared magnitude of the coefficients as a penalty term to the loss function. The goal is to minimize the sum of the squared residuals along with the penalty term.

Python Example: Ridge Regression

In this example, we will generate synthetic data with two features, split it into training and test sets, and apply Ridge regression using an alpha value of 1.0. The model's coefficients and intercept will be shown as the output.

Let's implement Ridge regression in Python using the Ridge class from sklearn.linear_model.

from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
import numpy as np

# Generating synthetic data
X, y = make_regression(n_samples=100, n_features=2, noise=0.1)

# Splitting the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Applying Ridge regression
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

# Making predictions and evaluating the model
ridge_predictions = ridge.predict(X_test)

print("Ridge Regression Coefficients:", ridge.coef_)
print("Ridge Regression Intercept:", ridge.intercept_)

The previous code block consist of the following sections:

Importing Libraries:
- from sklearn.linear_model import Ridge - Imports the Ridge class from sklearn.linear_model to implement Ridge regression.
- from sklearn.model_selection import train_test_split - Imports the train_test_split function to split the dataset into training and testing subsets.
- from sklearn.datasets import make_regression - Imports the make_regression function to generate synthetic regression data.
- import numpy as np - Imports the numpy library, typically used for numerical operations, although not directly used in this snippet.
Generating Synthetic Data:
- X, y = make_regression(n_samples=100, n_features=2, noise=0.1) - Generates a synthetic regression dataset with 100 samples and 2 features. The noise parameter adds random noise to the output values, making the problem more realistic.
Splitting Data into Training and Test Sets:
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) - Splits the generated dataset into training and test sets. 20% of the data is used for testing, and 80% is used for training. The random_state ensures that the split is reproducible.
Applying Ridge Regression:
- ridge = Ridge(alpha=1.0) - Initializes the Ridge regression model with a regularization strength of 1.0 (default value). The alpha parameter controls the amount of regularization applied to the model.
- ridge.fit(X_train, y_train) - Fits the Ridge regression model to the training data (X_train and y_train). The model learns the relationships between the features and the target variable during this step.
Making Predictions:
- ridge_predictions = ridge.predict(X_test) - Uses the trained Ridge model to make predictions on the test data (X_test), which will be evaluated against the actual target values (y_test).
Printing Model Parameters:
- print("Ridge Regression Coefficients:", ridge.coef_) - Prints the coefficients learned by the Ridge regression model. These coefficients represent the contribution of each feature to the model's predictions.
- print("Ridge Regression Intercept:", ridge.intercept_) - Prints the intercept value (bias term) of the Ridge regression model. This is the predicted value when all input features are zero.

After executing the code in this example the following output is obtained.

Ridge Regression Coefficients: [42.32481736 37.27191182]
Ridge Regression Intercept: -0.1079988849518081

Ridge Regression Coefficients:
- [42.32481736 37.27191182] - These are the coefficients (weights) learned by the Ridge regression model for each of the two input features. The model assigns:
  - 42.32481736 to the first feature, meaning for each unit increase in this feature, the output variable is expected to increase by approximately 42.32 units, holding the other feature constant.
  - 37.27191182 to the second feature, meaning for each unit increase in this feature, the output variable is expected to increase by approximately 37.27 units, holding the other feature constant.
Ridge Regression Intercept:
- -0.1079988849518081 - This is the intercept (bias term) of the Ridge regression model. It represents the predicted value of the target variable when both input features are zero. In this case, the model predicts a value of approximately -0.11 when both features are zero.

Lasso Regression

Lasso regression, or L1 regularization, uses the absolute values of the coefficients as a penalty term. It tends to produce sparse models, where some coefficients are driven to zero, effectively performing feature selection.

Python Example: Lasso Regression

Here, we apply Lasso regression with an alpha value of 0.1. As with Ridge regression, the coefficients and intercept are printed. However, in Lasso, some coefficients may be zero, leading to a simpler model.

Let's implement Lasso regression using the Lasso class from sklearn.linear_model.

from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
import numpy as np

# Generating synthetic data
X, y = make_regression(n_samples=100, n_features=2, noise=0.1)
# Splitting the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Applying Lasso regression
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

# Making predictions and evaluating the model
lasso_predictions = lasso.predict(X_test)

print("Lasso Regression Coefficients:", lasso.coef_)
print("Lasso Regression Intercept:", lasso.intercept_)

The previous code block consist of the following steps:

Importing Libraries:
- from sklearn.linear_model import Lasso - Imports the Lasso class from sklearn.linear_model to implement Lasso regression.
- from sklearn.model_selection import train_test_split - Imports the train_test_split function to split the dataset into training and testing subsets.
- from sklearn.datasets import make_regression - Imports the make_regression function to generate synthetic regression data.
- import numpy as np - Imports the numpy library for numerical operations, although it's not directly used in this snippet.
Generating Synthetic Data and Splitting the Data:
- X, y = make_regression(n_samples=100, n_features=2, noise=0.1) - Generates a synthetic regression dataset with 100 samples and 2 features. The noise parameter adds random noise to the output values, simulating real-world data.
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) - Splits the generated dataset into training and test sets. 20% of the data is used for testing, and 80% is used for training. The random_state ensures that the split is reproducible.
Applying Lasso Regression:
- lasso = Lasso(alpha=0.1) - Initializes the Lasso regression model with a regularization strength of 0.1 (alpha). The alpha parameter controls the magnitude of the L1 penalty, influencing how much the model shrinks the coefficients.
- lasso.fit(X_train, y_train) - Fits the Lasso regression model to the training data (X_train and y_train). The model learns the relationships between the features and the target variable during this step. (Note: X_train and y_train should have been previously defined, but they aren't in the given snippet.)
Making Predictions:
- lasso_predictions = lasso.predict(X_test) - Uses the trained Lasso model to make predictions on the test data (X_test). The model applies the learned relationships to predict the target values for the unseen test set.
Printing Model Parameters:
- print("Lasso Regression Coefficients:", lasso.coef_) - Prints the coefficients learned by the Lasso regression model. These coefficients represent the impact of each feature on the target variable. In Lasso, some coefficients may be zero due to feature selection, making the model sparse.
- print("Lasso Regression Intercept:", lasso.intercept_) - Prints the intercept (bias term) of the Lasso regression model. This is the predicted value when all input features are zero. In Lasso, the intercept is typically non-zero unless the data is centered.

Executing the previous code block the following output is obtained.

    Lasso Regression Coefficients: [18.42199406 61.89838269]
    Lasso Regression Intercept: 0.02333834253124545

Lasso Regression Coefficients:
- [18.42199406 61.89838269] - These are the coefficients (weights) learned by the Lasso regression model for each of the two input features. The model assigns:
  - 18.42199406 to the first feature, meaning that for each unit increase in this feature, the output variable is expected to increase by approximately 18.42 units, holding the other feature constant.
  - 61.89838269 to the second feature, meaning that for each unit increase in this feature, the output variable is expected to increase by approximately 61.90 units, holding the other feature constant.
Lasso Regression Intercept:
- 0.02333834253124545 - This is the intercept (bias term) of the Lasso regression model. It represents the predicted value when both input features are zero. In this case, the model predicts a value of approximately 0.02 when both features are zero.

ElasticNet Regression

ElasticNet regression combines both L1 and L2 regularization, making it a compromise between Ridge and Lasso. It is useful when there are many correlated features in the dataset.

Python Example: ElasticNet Regression

In the ElasticNet regression example, we use both L1 and L2 regularization. The l1_ratio parameter controls the mix of Lasso (L1) and Ridge (L2) regularization, with a value of 0.5 indicating an equal mix.

Now, we will implement ElasticNet regression using the ElasticNet class from sklearn.linear_model.

from sklearn.linear_model import ElasticNet
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
import numpy as np

# Generating synthetic data
X, y = make_regression(n_samples=100, n_features=2, noise=0.1)
# Splitting the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Applying ElasticNet regression
elasticnet = ElasticNet(alpha=0.1, l1_ratio=0.5)
elasticnet.fit(X_train, y_train)

# Making predictions and evaluating the model
elasticnet_predictions = elasticnet.predict(X_test)

print("ElasticNet Regression Coefficients:", elasticnet.coef_)
print("ElasticNet Regression Intercept:", elasticnet.intercept_)

The previous code block consist of the following sections:

Importing Libraries:
- from sklearn.linear_model import ElasticNet - Imports the ElasticNet class from sklearn.linear_model to implement ElasticNet regression.
- from sklearn.model_selection import train_test_split - Imports the train_test_split function to split the dataset into training and testing subsets.
- from sklearn.datasets import make_regression - Imports the make_regression function to generate synthetic regression data.
- import numpy as np - Imports the numpy library for numerical operations, although it's not directly used in this snippet.
Generating Synthetic Data:
- X, y = make_regression(n_samples=100, n_features=2, noise=0.1) - Generates a synthetic regression dataset with 100 samples and 2 features. The noise parameter adds random noise to the output values, simulating real-world data.
Splitting the Data:
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) - Splits the data into training and test sets, with 20% of the data allocated for testing. The random_state=42 ensures reproducibility of the split.
Applying ElasticNet Regression:
- elasticnet = ElasticNet(alpha=0.1, l1_ratio=0.5) - Initializes the ElasticNet regression model. The alpha parameter controls the strength of the regularization, while the l1_ratio parameter determines the mix between Lasso (L1) and Ridge (L2) penalties:
  - When l1_ratio=1.0, it behaves like Lasso regression (pure L1 regularization).
  - When l1_ratio=0.0, it behaves like Ridge regression (pure L2 regularization).
  - Here, with l1_ratio=0.5, it combines both Lasso and Ridge penalties in equal measure.
- elasticnet.fit(X_train, y_train) - Fits the ElasticNet regression model to the training data (X_train and y_train) and learns the coefficients that minimize the residual sum of squares subject to the regularization.
Making Predictions:
- elasticnet_predictions = elasticnet.predict(X_test) - Uses the trained ElasticNet model to make predictions on the test data (X_test). This step applies the learned relationships to predict the target values for the test set.
Printing Model Parameters:
- print("ElasticNet Regression Coefficients:", elasticnet.coef_) - Prints the coefficients learned by the ElasticNet regression model. These coefficients represent how much each feature contributes to the target variable. Both Lasso and Ridge regularization influence these coefficients.
- print("ElasticNet Regression Intercept:", elasticnet.intercept_) - Prints the intercept (bias term) of the ElasticNet regression model. The intercept is the predicted value when all input features are zero.

The otuput of the ElasticNet example is shown below.

ElasticNet Regression Coefficients: [ 4.33447528 75.87734055]
ElasticNet Regression Intercept: 0.12560084787858017

ElasticNet Regression Coefficients:
- [ 4.33447528 75.87734055 ] - These are the coefficients (weights) for the two features used in the ElasticNet regression model:
  - 4.33447528 corresponds to the first feature, indicating that for each unit increase in the first feature, the target variable is expected to increase by approximately 4.33 units, while holding the second feature constant.
  - 75.87734055 corresponds to the second feature, indicating that for each unit increase in the second feature, the target variable is expected to increase by approximately 75.88 units, while holding the first feature constant.
ElasticNet Regression Intercept:
- 0.12560084787858017 - This is the intercept (bias term) of the model. It represents the predicted value when both features are zero. In this case, when both features are zero, the model predicts a value of approximately 0.13 for the target variable.

Comparison

Here is a brief comparison of the three regression methods:

Ridge Regression: Suitable when most features contribute to the prediction. It tends to shrink coefficients evenly.
Lasso Regression: Useful for feature selection. It can shrink some coefficients to zero, making the model sparse.
ElasticNet Regression: Combines Ridge and Lasso, performing well in cases with many correlated features.

Each method has its advantages and should be chosen based on the nature of your dataset and the problem at hand.

Thank you for reading the tutorial! Try running the Python code and let me know in the comments if you got the same results. If you have any questions or need further clarification, feel free to leave a comment. Thanks again!

Wednesday, January 1, 2025

Lasso regression for feature selection

Imagine you are preparing for a test and have a big list of topics to study. But not all topics are equally important. Some topics are essential, while others are just there to make the list look long. Wouldn't it be nice if someone told you which topics really matter so you can focus on them? This is exactly what Lasso Regression does when we work with data. It helps us figure out which pieces of information (features) are important and which ones we can ignore. Let's break it down step by step:

Step 1: What is Regression ?

Regression is a way to predict something, to estimate specific variable. For example:

If you know how much you studied, can you predict your test score?
If you know the size of a house, can you predict its price ?

In regression, we take input data (called features) and try to predict an output value. For example:

Features: Hours studied, number of pracice tests.
Ouput: Test Score

The job of regression is to find the best equation that relates the features to the output. This equation looks like this: \begin{equation} y = b + w_1x_1 + w_2x_2 +\cdots+w_nx_n \end{equation} where:

\(y\) is the output (like your test score),
\(x_1, x_2m,...,x_n\) are the features (like hours studied or practice tests),
\(w_1, w_2,...,w_n\) are the weights (they tell us how important each feature is).
\(b\) is the bias (a constant number).

Step 2: Too Many Features Can Be a Problem?

Imagine you have dataset with 100 feature (input variables), however, only 10 of them actually matter. If we use 100 features, our model might get confused and could give bad predictions. This is called overfitting, and it's like trying to study every topic when only a fuew are on the test.
We need to find a way to focus on the important features and ignore the unimportant ones. This is where Lasso Regression comes in.

Step 3: What is Lasso Regression ?

The lasso regression is a special type of regression that does two things i.e.:

It finds the best equation to predict
It automatically removes the features that are not important.

Lasso does this by adding a penalty to the equation. This penalty forces some of the weights to become zero. If a weight is zero, it means the correspoding feature is not important and can be ignored.

Step 4: The Math Behind Lasso

When we train a model, we try to minimize something called the loss function. For regular regression, the loss function is: \begin{equation} Loss = Error = \sum (Actual - Prediction)^2 \end{equation} But in Lasso Regression, we add penalty to the loss function: \begin{equation} Loss = Error + \lambda \sum|w_i| \end{equation} where:

\(\sum|w_i|\) is the sum of the absolute values of all the weights
\(\lambda\) is a number that controls how strong the penalty is
- if the penalty is small, Lasso will behave like regular regression.
- if the pnealty is large, Lasso will make more weights zero.

By adding this pnealty, Lasso autoamitcally removes unimportant features (their weights become zero) and focuses only on the important ones.

Step 5: A Simple Example

Let's say you're trying to predict test scores based on these features:

Hours studied
Hours spent watching TV
Number of parctice test
Favorite color (it is a silly feature)

If you use regular regression, the model might try to assign a weight to all these features, even though "Favorite color" clearly doesn’t matter. But with Lasso Regression, the penalty will force the weight of "Favorite color" to become zero. So the final equation might look like this: Now, we know that "Favorite color" is not important, and we can safely ignore it

Step 6: Visualizing Lasso

Imagine you’re walking on a mountain. Regular regression tries to find the lowest point (minimum error). Lasso Regression also tries to find the lowest point, but it puts up walls around some features (the penalty). These walls prevent the model from giving too much importance to unimportant features.

Step 7: Why Use Lasso for Feature Selection?

Simpler Models: Lasso removes unimportant features, making the model easier to understand. Prevents Overfitting: By ignoring noisy features, Lasso helps the model generalize better to new data. Focus on What Matters: It tells you which features are actually important.

Step 8: How to Use Lasso in Python?

In this example we will create the sample dataset from sklearn.dataset module (make_regression). The generated dataset will contain 100 samples and have 10 features (input variables) with 0.1 noise. The generated dataset will be divided on train test in 80:20 ratio. The 80% will be used for training the Lasso Regression.
Ok so the first step is to improt the required libraries/modules. For this example we will need Lasso from sklearn.linear_model, train_test_split function from sklearn.model_selection, and the make_regression from sklearn.dataset module.

from sklearn.datasets import make_regression
 from sklearn.model_selection import train_test_splot
from sklearn.linear_model import Lasso

The next step will be to generate the data using the make_regression function and split the data using the train_test_split function. The make_regression function used to create the dataset will consist of 10 features and 100 samples and with noise paramter set to 0.1. The noise parameter is the standard deviation of the gaussian noise applied to the output. Then we are goind to split that data in 80:20 ratio by setting the test_size parameter to 0.2. To control the shuffling applied to the data before applying the split we will set the value of random_State parameter to 42.

X, y = make_regression(n_samples=100, n_features=10, noise=0.1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

The next step is to define the Lasso model with alpha (penalty term - lambda in previosly described math) equal to 0.1. After defining the Lasso Regression and assigned it to lasso variable the train data must be provided using fit() function.

lasso = Lasso(alpha = 0.1)
lasso.fit(X_train,y_train)

The final step is to show the weights or in other words the feature importance that was obtained with Lasso Regression method. This is done using the built in function .coef_.

print("Feature Importance:", lasso.coef_)

The output of the previous command line is given below.

Feature Importance: [56.76623236 20.40924724  9.14790695 23.73715764 30.96736664 39.12329335
            48.59523207 13.39841335 13.46403202 26.00418041]

Depending on the arbitrarily defined threhsold value you can remove features that have lower weights (Feature importances) and use only those with higher feature importance values. For example if we say that weight threshold must be higher than 15 this means that the the dataset will be reduced to 7 features since only 7 have higher feature importance values than 17.
Finally we can test the Lass trained model on the test dataset and compute the score. The score in this case is the \(R^2\) score and every sklearn regression algorithm has this as the bulit-in function (or at least the majorty of them). The \(R^2\) is computed using the bulit-in function score(). Of course in the parentheses you need to define the X_test, y_test values.

score = lasso.score(X_test, y_test)
print("Model Score:", score)

The output of the previous code is given below.

Model Score: 0.9999862508164981

The obtaine \(R^2\) value indicate that the trained model performs very well on the unseen data which proves that the overfitting did not occur. The entire code created in this example is given below.

from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
# Create some sample data
X, y = make_regression(n_samples=100, n_features=10, noise=0.1)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Apply Lasso Regression
lasso = Lasso(alpha=0.1)  # alpha is the penalty term (lambda in our math)
lasso.fit(X_train, y_train)
# Get the coefficients (weights)
print("Feature Importance:", lasso.coef_)
# Test the model
score = lasso.score(X_test, y_test)
print("Model Score:", score)

Step 9: Key Takeaways

Lasso Regression is like a helpful teacher who tells you which topics to focus on. It adds a penalty to remove unimportant features. It’s a great tool for creating simple, accurate models that avoid overfitting. Now you know how Lasso Regression works and why it’s useful for feature selection.