NYU K12 STEM Education: Machine Learning (Day 2)

nyu-header

Statistics ReviewPermalink

In machine learning, a solid understanding of basic statistical concepts is essential for analyzing data and interpreting model results.

MeanPermalink

The mean, or average, is the sum of all the values in a dataset divided by the number of values. It provides a measure of the central tendency of the data.

Formula: Mean(μ)=1Ni=1Nxi

Example:

For the dataset [2, 4, 6, 8, 10]: μ=2+4+6+8+105=6

VariancePermalink

Variance measures the spread of the data points around the mean. It indicates how much the data varies from the mean.

Formula: Variance(σ2)=1Ni=1N(xiμ)2

Example:

For the dataset [2, 4, 6, 8, 10]: σ2=(26)2+(46)2+(66)2+(86)2+(106)25=8

Mean and Variance VisualizationPermalink

Figure 1: Mean and Variance Visualized
Figure 1: Mean and Variance Visualized
Figure 2: Wide Spread Dataset
Figure 2: Wide Spread Data
Figure 3: Less Spread Data
Figure 3: Less Spread Data

Standard DeviationPermalink

Standard deviation is the square root of the variance. It provides a measure of the spread of the data points in the same units as the data itself.

Formula: Standard Deviation(σ)=Variance

Example:

Using the variance calculated above: σ=82.83

Standard Deviation VisualizationPermalink

Figure 4: Standard Deviation visualized on wide spread dataset
Figure 4: Standard Deviation visualized on wide spread dataset
Figure 5: Standard Deviation visualized on less spread dataset
Figure 5: Standard Deviation visualized on less spread dataset

CovariancePermalink

Covariance measures the degree to which two variables change together. If the covariance is positive, the variables tend to increase together; if negative, one variable tends to increase when the other decreases.

Formula: Covariance(Cov(X,Y))=1Ni=1N(xiμX)(yiμY)

Example: For the datasets X = [2, 4, 6] and Y = [3, 6, 9]: Cov(X,Y)=(24)(36)+(44)(66)+(64)(96)3=6

Figure 6: Covariance visualized
Figure 6: Covariance visualized

Linear RegressionPermalink

In a Nutshell…Permalink

  • Consider a function y=2x+1
  • Here we introduce a new notation f(x)=2x+1
  • What this means is that we have a function f(x) which has x as its variable.
  • If we have different x values we will have different values of f(x). Example:
    • For f(x)=2x+1 and setting x=1 we have f(x)=3
    • For f(x)=2x+1 and setting x=0 we have f(x)=1
    • For f(x)=2x+1 and setting x=1.5 we have f(x)=2
  • We believe that dataset are representation of underlying models which can be represented as functions of features.
  • For example, we can build a model to forecast weather, we can use the features humidity, current temperature and wind speed to estimate what the temperature will be tomorrow.
  • Here we have f(x) representing the tomorrow’s temperature and x being a vector containing humidity, current temperature and wind speed.
  • But many times we do have f(x) available, our task here is to figure out what f(x) is using the data available to us.
  • Here f(x) is called a model.
  • In other words, we want to find a model that fits the data.
  • It would be easier to have a ”framework” of the model ready and find the model parameters using the data.
    • f(x)=w1x+w0
    • f(x)=w2x2+w1x+w0
    • f(x)=1e(w1x+w0)+1
  • The numbers w0, w1 and w2 are called model parameters.
  • We often write the model as f(x;w), stacking all parameters to a vector w.

Structure of a datasetPermalink

  • In a dataset we have many data.
  • We can represent each piece of data as (xi,yi), i=1,2,3,
  • xi is called the feature and yi is called the label.
  • The relationship between xi and yi and the model f is f(xi)=yi
  • For example, if the weather forecast says it will be 21C (69.8F) if it turns out to be 22C (71.6F) you won’t be yelling at the TV.

How would you fit a line?Permalink

Can you find a line that passes through (0, 0) and (1, 1)?

  • The ”framework” of the model is f(x)=w1x+w0
  • The data is (x=0, f(x)=y=0) and (x=1, f(x)=y=1).
  • The process of finding a model to fit the data is to find the values of w1 and w0.

How would you fit a quadratic curve?Permalink

Can you find a quadratic curve that passes through (0, 0), (1, 1) and (−1, 1)?

  • The ”framework” of the model is f(x)=w2x2+w1x+w0
  • The data is (x=0,f(x)=y=0), (x=1,f(x)=y=1) and (x=1,f(x)=y=1)
  • The process of finding a model to fit the data is to find the values of w2, w1 and w0.

Is Your Model a Good Fit?Permalink

  • How would you determine if your model is a good fit or not?
    • How will you determine this?
    • Is there a quantitative way?
  • We now introduce a new notation f(xi)=yi^ here the ^ represents f(xi) is a prediction of yi.

Error FunctionsPermalink

  • An error function quantifies the discrepancy between your model and the data.
    • They are non-negative, and go to zero as the model gets better.
  • Common Error Functions:
    • Mean Squared Error: MSE=1Ni=1N||yiyi^||2
    • Mean Absolute Error: MAE=1Ni=1N|yiyi^|
  • In later units, we will refer to these as cost functions or loss functions.

Linear RegressionPermalink

  • Linear models: For scalar-valued feature x, this is f(x)=w1x+w0
  • One of the simplest machine learning model, yet very powerful.

Least Square SolutionPermalink

  • Model: f(x)=w1x+w0
  • Loss: J(w0,w1)=1Ni=1N||yif(xi)||2
  • Optimization: Find w0, w1 such that J(w0,w1) is the least possible value (hence the name “least square”).
Figure 7: Loss Landscape
Figure 7: Loss Landscape

Using Pseudo-InversePermalink

  • For N data points (xi,yi) we have, y1^=w0+w1x1y2^=w0+w1x2yN^=w0+w1xN
  • In matrix form we have, [y1^y2^yN^]=[1x11x21xN][w0w1]
  • We can write it as Y^=X×w. We call X the design matrix.
  • We can put the desired labels in matrix form as well: Y=[y1y2yN]
  • Our goal is to minimize the error between Y and Y^ which can be written as ||YY^||2

Multilinear RegressionPermalink

  • What if we have multivariate data with x being a vector?
  • Example: xi=[xi1xi2]

    y1^=w0+w1x11+w2x12y2^=w0+w1x21+w2x22yN^=w0+w1xN1+w2xN2

  • The model can be written as: yi^=[1xi1xi2][w0w1w2]
  • In matrix-vector form: [y1^y2^yN^]=[1x11x121x21x221xN1xN2][w0w1w2]
  • Solution remains the same w=(XTX)1XTY

DemosPermalink

  1. Vectorized Programming
  2. Plotting Functions
  3. Ice-breaker Dataset
  4. Linear Regression
  5. Multivariable Linear Regression

ReferencesPermalink