Showing posts with label machine learning basics. Show all posts

Wednesday, January 1, 2025

Polynomial regression in Scikit-learn

Imagine you're trying to draw a line to connect a buch of dots on a piece of paper. If all the dots are kind of in a straight line, you just draw a straight line, right? If stright line goes through all the dots or majority of them and if these dots are kined of in a stright line then that's a linear regression.
But what to do when the dots form a curve, like a shape of a hill or a rollercoaster? A straight line won't fit very well. Instead, we will need a bendy line that can go up and down to match the cruve. That's where polynomial regression comes in!.

What is Polynomial Regression?

A Polynomial Regression is like upgrading from a stright ruler to a flexible ruler that can ben. Instead of just fitting a straight line ($y = mx + c$), you can use a formula that can be written as: \begin{equation} y = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + \cdots + a_nx^n \end{equation} where:

$x$ - this is the input (like dots on the paper)
$y$ - This is the output (the line you're drawing)
$a_0, a_1,a_2,...,a_n$ . These are numbers (coefficients) that the math figures out to make the line fit the dots.
$x^2,x^3,...,x^n$ - These make the line bend. The higher the power (n), the more bendy the line can be.

The proper question that should be proposed is when to use the polynomial regression? The polynomial regression is appropriate to use when:

The data doesn't fit a straight line but follows a curve
You notice atterns like ups and downs (e.g. growth trends, hills, valleys)
You want a model that's simple but flexible enough to capture curves.

How to use the polynomial regression?

The application of the polynomial regression will be shown on the following example:

Look at the data - Suppose you're measuring how fast a toy car rools down a hill over time. The speed might increase slowly at first, then zoom up fast. The graph of this data could like a curve.
Pick a polynomial degree ($n$) - The idea is to start from the lowest degree ($n=2$) ( a simple bendy line, a parabola). If that's not curvy enough, try $n=3$, $n=4$, etc. But don't make it too bendy, or it might wiggle too much and ift random noise instead of the real pattern.
Fit the equation - Use a computer to calculate the coefficients ($a_0$, $a_1$, $a_2$,...) that make the line match your data as closely as possible.
Check the fit - Does the line match the dots? IF not, adjust the degree of the polynomial.

Key Things to Remember

Don't overdo it: If you make the polynomial too bendy ($n$ too high), it will try to fit every single dot perfectly, even the random little bumps (noise). That's bad because it won't work very well on the new data due to overfitting.
Balanved simpicity and accuracy - find the lowest degree $n$ that fits the curve well.

It’s like building a toy car track. Sometimes a straight ramp is enough, but other times you need to add curves to make it exciting! That’s the magic of polynomial regression.

Example 1 - Estimation of the plants growth based on the exposure to sunlight.

You’re trying to figure out the relationship between the number of hours a plant gets sunlight (x) and how tall it grows (y). Your measurements are:

$x$ (hours of sunlight)	$y$ (height in cm)
1	2
2	6
3	10
4	18
5	26

The data from the table is graphically shown in Figure 1.

Figure 1 - Height in cm versus hours of sunlight From Figure 1 it can be noticed that the points cannot be fitted using straight line. So, we will try the polynomial regression of degree 2 ($y = a_0 + a_1 x + a_2 x^2 $).

Step 1: Set up the equation

For degree 2, the equation in general form can be written as: \begin{equation} y = a_0 + a_1 x + a_2 x^2 \end{equation} In the previous equaiton we have to find $a_0$, $a_1$, and $a_2$ which are called intercept, linear term, and quadratic term.

Step 2: Organize the data

$x$	$y$	$x^2$
1	2	1
2	6	4
3	10	9
4	18	16
5	26	25

Step 3: Write the system of equations

To solve for $a_0$, $a_1$, and $a_2$, we use normal equaitons derived from least squares:

Sum of $y$: \begin{equation} \sum y = na_0 + a_1\sum x + a_2\sum x^2 \end{equation}
Sum of $xy$: \begin{equation} \sum xy = a_0\sum x + a_1\sum x^2 + a_2\sum x^3 \end{equation}
Sum of $x^2y$: \begin{equation} \sum x^2y = a_0\sum x^2 + a_1\sum x^3 + a_2\sum x^4 \end{equation}

Step 4: Plug in the data

Now we have to calculate all the sums. \begin{equation} \sum x = 1+2+3+4+5 = 15 \end{equation} \begin{equation} \sum x^2 = 1+4+9+16+25 = 25 \end{equation} \begin{equation} \sum x^3 = 1+8+27+64+125 = 225 \end{equation} \begin{equation} \sum x^4 = 1 + 16+ 81+256+625 = 979 \end{equation} \begin{equation} \sum y = 2 + 6 + 10 + 18 + 26 = 62 \end{equation} \begin{equation} \sum xy = 1\cdot 2 + 2 \cdot 6 + 3 \cdot 10 + 4 \cdot 18 + 5 \cdot 26 = 230 \end{equation} \begin{equation} \sum x^2 y = 1 \cdot 2 + 4 \cdot 6 + 9 \cdot 10 + 16 \cdot 18 + 25 \cdot 26 = 978 \end{equation} With the substitution of the obtained sums into equations for $\sum y$, $\sum xy$, and $\sum x^2 y$ the following linear equaions are obtained: \begin{eqnarray} 62 &=& 5a_0 + 15a_1 + 55a_2\\ \nonumber 230 &=& 15a_0 + 55a_1 + 225 a_2 \\ \nonumber 978 &=& 55a_0 + 225a_1 + 979 a_2 \end{eqnarray} These three equations can be solved manually or using caclulator. However, first the equations have to be simplifed (if possible) to isolate $a_0$, $a_1$, and \(a_2). Then we have to use the substitution or elimination to find the coefficients. After solving these three equations with three unkowns the vlaues of the unknowns are equal to: \begin{equation} a_0 = 0.8, a_1 = 0.2, a_2 = 1.0 \end{equation}

Step 5: Write the final equation

The polynomial regression equation can be written as: \begin{equation} y = 0.8 + 0.2x + 1.0x^2 \end{equation} The output is grapically shown in Figure 2.

Figure 2 - approximation of data using polynomial regression. As seen from Fiugre 2 using polynomial regression we have obtained the function that can successfully approaximate the coordinates shown with blue points. The model (polynomial regression) is not overfitted since hte curve does not go through every single data sample.

Step 7: Use the equation

Now you can predict the plant height for any number of sunlight howurs. For example, if $x = 6$ - 6 hours of sunlight then the predicted plant height is equal to: \begin{equation} y = 0.8 + 0.2\cdot 6 + 1.0\cdot(6)^2 = 38 [\mathrm{cm}] \end{equation}

Logistic Regression for Binary Classification

Hey there! Let's talk about something called logistic regression. It's a fancy name, but I promise it's not too hard to understand. It's like a magical tool that helps us make decisions with just two options, like "yes" or "no," "cat" or "dog," or even "pass" or "fail."

What Is Logistic Regression?

Imagine you have a magical button. If you press it, it gives you a number between 0 and 1. That number tells you how confident the button is about something being true (like 1 is "yes" and 0 is "no"). Logistic regression is the math behind how this button works!

The Magic Formula

Logistic regression uses this formula:

\begin{equation} P = \frac{1}{1+e^{-z}} \end{equation}

Here:

$P$ is the probability (a number between 0 and 1).
$e$ is a special math number (around 2.718).
$z$ is a score calculated like this: z = b0 + b1 * x, where:

$b_0$ is the magic starting number (intercept).
$b_1$ is the weight or importance of $x$.
$x$ is your input value.

Example Without Python

Let's say we're trying to predict if a person will like ice cream on a hot day (1 = yes, 0 = no). Our formula is:

\begin{equation} z = -1 + 0.5\cdot Temperature \end{equation}

If the temperature is 30°C:

\begin{eqnarray} z &=& -1 + 0.5\cdot 30 = 14\\ \nonumber P &=& \frac{1}{1+e^{-14}} = 0.999 \end{eqnarray}

The probability is almost 1, so the person will most likely like ice cream!

Example With Python

Now, let’s calculate the same thing using Python:

import math

# Logistic regression function
def logistic_regression(temp):
    z = -1 + 0.5 * temp
    P = 1 / (1 + math.exp(-z))
    return P

# Predict for a temperature of 30°C
temperature = 30
probability = logistic_regression(temperature)
print(f"The probability of liking ice cream at {temperature}°C is {probability:.4f}")

When you run this, you'll see:

The probability of liking ice cream at 30°C is 0.9999

Conclusion

Here’s what we learned:

Logistic regression helps us predict yes/no or true/false decisions.
It uses a formula to calculate probabilities between 0 and 1.
It’s useful for problems like "Will it rain today?" or "Is this an email spam?"

Now you know the basics of logistic regression! Keep practicing, and soon you’ll be a pro!

Saturday, December 7, 2024

What are Features and Lables in Machine Learning?

The data is crucial element of ML, and undestanding how the structure and interpret the data is essential for building the effective ML models. The two components are critical of any dataset using in ML and these are features (input variables) and labels (output/target variable). These two terms are foundational elements for success of ML projects. In this post, we will dive into what features (input variables) and labels( output/target variable) are, how they work, and why they are important for building powerful ML models.

What are features (input variables) in ML?

As stated in the name of the previous title the Features are input variables or independent variables in your dataset that are used to make predicitons. Each feature represents a specific measurable property or characteristc of the data you are analyzing. Features can be numerical, categorical, or even derived from raw data such as text, images, or audio.
The exampels of features will be shown in four differen publically available dataset: housing dataset, wheater dataset, e-commerce dataset, and combined cycle power plant.
The housing dataset (Boston housing dataset) contains 14 columns in total. The target variable in this dataset is Median value owner-occuped homes in $1000's so the remaining 13 columns are features or input variables. These features are:

crim - per captia cirme rate by town
zn - Proportion of large residential lots (over 25000 sq,fit.)
indus - Proportion of non-retail business acess per town
Chas - Binary variable indicating if the property is near Charles River (1 for yes, 0 or not)
Nox - Concentration of nitrogen oxides in the air
Age - proportion of old owner-occupied units built before 1940
dis - weighted distances to Boston employment centers
rad - index of accessibility to radial highways
Tax - property tax rate per $10000

Previous 13 features are columns in the dataset and are used in ML model to predict labels (output/traget variable) which is in this case the Median value owner-occuped homes in $1000's.

The key characteristics of features

The key characteristics of features are type, importance and scaling. The features can be of numerical or categorical type. The numerical type are features such as age, hegiht and etc while categorical can be color, region, and etc.
Generally not all features contribute equally to the model's predicitons. Some features have hgiehr importance while others have lower importance. Those with lower importance are usually irrelevant or redundant. Many algorithms are sensitive to the scale of features, requiring normalization or standardization i.e. feautre scaling before using feautres in ML algorithm.

What are labels in ML?

Labels (output/target variables) are dependent variable,s represent the answers or target values your model is trying to predict. They are the outcomes you use to evaluate how well your model is performing.
Example of labels in the dataset are:

In a housing dataset - label is the price of the house
In a weather dataset - the label could be whether it will rain tomorrow (yes/no)
In a e-commerce dataset . the label is whether the customer will purchase an item (Yes/no)

Lables can be continuous values or discrete categories. The lables suvh as continuous values are used for regression tasks e.g. in housing dataset predicting house prices. In classification tasks the label is the discrete categories e.g. predicting whether an email is spam or not.

Features Vs. Labels: Quick Comparison

Aspect	Features	Lables
Definition	Inputs to the model	Outputs the model is trained to predict
Role	Independent variables that explain the data	Dependent variable being explained
Example in Housing	Square footage, number of bedrooms	House price
Example in Weather	Temperature, humidity	Rain (yes/no)
Examples in E-commerce	Product rating, the time spent on page	Purhces detecions (yes/no)

How Features and Labels Work Together?

The relationship between features and labels is at the ehart of supervised learning. The process of supervised learning consists of the following steps:

Data collection - Before training ML model the dataset must be gathered which must consist of features and labels. For example in housing dataset the features are house attributes while the labels would be house prices. In case of combined cycle power plant the features are ambinet pressure, ambient temperature and vacuum in condenser while the labels are generated power output of CCPP.
Training Model - when the data is collected nad prepared the dataset is provided to hte ML model. During training, the ML algorithm learns a function or mapping:
$$f(features) = label$$ In regression model, this might be learning to predict house prices based on the square footage and location.
Prediction - when ML algorithm is trained the model can predict labels for new, unseen features. For example, given the features of a house, the model can predict its price.

Why are features and labels important?

Feautres are important since they define the model inputs while lables are important since it defines the objective. The choice and quality of features directly impact the performance of them odel. The features that are irrelevant or redundant cna lead to poor results, while well-selected features imporve accuracy and efficiency.
Labels determine what the model is trying to predict. Whitouth well-defined and accurate labels, the model cannot learn effectively.
Feature engineering is very important for cracting and selecting the right features which can make or break ML project. DErived features often provide additional predictive power.

Handling Features and Lables in Machine Learning

Feature enginering consist of selection, transformation, and scaling. The selection is important to indetify the most relevant features. Technics like correlation or feature importance scores are used here. The transformation is used to convert the categorical features into numerical ones (label encoder, one-hot encoding). The scaling is used for scaling/normalizing features to make them compatible with certian algorithms.
Lable preprocessing consist of encoding, balancing, and cleaning. The econding is used to convert lables into numerical values for classification tasks (for example spam and not spam into 1 and 0). Balancing is used to handle imbalanced datasets. There are two approaches in balancing the dataset and these are applicaitons of undersampling and oversampling techniques. The cleaning is used to ensure labels are accurate and free from erros or bias.

\(x\)	\(y\)	\(x^2\)
1	2	1
2	6	4
3	10	9
4	18	16
5	26	25