Linear Regression
NYU K12 STEM Education: Machine Learning (Day 2)
Statistics ReviewPermalink
In machine learning, a solid understanding of basic statistical concepts is essential for analyzing data and interpreting model results.
MeanPermalink
The mean, or average, is the sum of all the values in a dataset divided by the number of values. It provides a measure of the central tendency of the data.
Formula:
Example:
For the dataset [2, 4, 6, 8, 10]:
VariancePermalink
Variance measures the spread of the data points around the mean. It indicates how much the data varies from the mean.
Formula:
Example:
For the dataset [2, 4, 6, 8, 10]:
Mean and Variance VisualizationPermalink



Standard DeviationPermalink
Standard deviation is the square root of the variance. It provides a measure of the spread of the data points in the same units as the data itself.
Formula:
Example:
Using the variance calculated above:
Standard Deviation VisualizationPermalink


CovariancePermalink
Covariance measures the degree to which two variables change together. If the covariance is positive, the variables tend to increase together; if negative, one variable tends to increase when the other decreases.
Formula:
Example:
For the datasets X = [2, 4, 6] and Y = [3, 6, 9]:

Linear RegressionPermalink
In a Nutshell…Permalink
- Consider a function
- Here we introduce a new notation
- What this means is that we have a function
which has as its variable. - If we have different
values we will have different values of . Example:- For
and setting we have - For
and setting we have - For
and setting we have
- For
- We believe that dataset are representation of underlying models which can be represented as functions of features.
- For example, we can build a model to forecast weather, we can use the features humidity, current temperature and wind speed to estimate what the temperature will be tomorrow.
- Here we have
representing the tomorrow’s temperature and being a vector containing humidity, current temperature and wind speed. - But many times we do have
available, our task here is to figure out what is using the data available to us. - Here
is called a model. - In other words, we want to find a model that fits the data.
- It would be easier to have a ”framework” of the model ready and find the model parameters using the data.
- The numbers
, and are called model parameters. - We often write the model as
, stacking all parameters to a vector .
Structure of a datasetPermalink
- In a dataset we have many data.
- We can represent each piece of data as
, is called the feature and is called the label.- The relationship between
and and the model is - For example, if the weather forecast says it will be 21
C (69.8 F) if it turns out to be 22 C (71.6 F) you won’t be yelling at the TV.
How would you fit a line?Permalink
Can you find a line that passes through (0, 0) and (1, 1)?
- The ”framework” of the model is
- The data is (
, ) and ( , ). - The process of finding a model to fit the data is to find the values of
and .
How would you fit a quadratic curve?Permalink
Can you find a quadratic curve that passes through (0, 0), (1, 1) and (−1, 1)?
- The ”framework” of the model is
- The data is
, and - The process of finding a model to fit the data is to find the values of
, and .
Is Your Model a Good Fit?Permalink
- How would you determine if your model is a good fit or not?
- How will you determine this?
- Is there a quantitative way?
- We now introduce a new notation
here the represents is a prediction of .
Error FunctionsPermalink
- An error function quantifies the discrepancy between your model and the data.
- They are non-negative, and go to zero as the model gets better.
- Common Error Functions:
- Mean Squared Error:
- Mean Absolute Error:
- Mean Squared Error:
- In later units, we will refer to these as cost functions or loss functions.
Linear RegressionPermalink
- Linear models: For scalar-valued feature
, this is - One of the simplest machine learning model, yet very powerful.
Least Square SolutionPermalink
- Model:
- Loss:
- Optimization: Find
, such that is the least possible value (hence the name “least square”).

Using Pseudo-InversePermalink
- For
data points ( ) we have, - In matrix form we have,
- We can write it as
. We call the design matrix. - We can put the desired labels in matrix form as well:
- Our goal is to minimize the error between
and which can be written as
Multilinear RegressionPermalink
- What if we have multivariate data with
being a vector? -
Example:
- The model can be written as:
- In matrix-vector form:
- Solution remains the same
DemosPermalink
- Vectorized Programming
- Plotting Functions
- Ice-breaker Dataset
- Linear Regression
- Multivariable Linear Regression