In words, we assume that the data is drawn from a "line" through the origin (one can always add a bias / offset
through an additional dimension, similar to the
Perceptron). For each data point with
features , the label is drawn from a Gaussian with
mean and variance . Our task
is to estimate the slope from the data.
Estimating with MLE
We are minimizing a loss function, . This
particular loss function is also known as the squared loss. Linear regression is also known as Ordinary
Least Squares (OLS). OLS can be optimized with gradient descent or Newton's method. The latter leads to a closed-form solution.
Closed Form: where
and
.
Estimating with MAP
Additional Model Assumption:
This objective is known as Ridge Regression. It has a closed form solution
of: where
and
.