Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js

Lecture 3: The Perceptron

previous
next

Assumptions

  1. Binary classification (i.e. yi{1,+1})
  2. Data is linearly separable

Classifier

h(xi)=sign(wxi+b)
b is the bias term (without the bias term, the hyperplane that w defines would always have to go through the origin). Dealing with b can be a pain, so we 'absorb' it into the feature vector w by adding one additional constant dimension. Under this convention, xibecomes[xi1]wbecomes[wb] We can verify that [xi1][wb]=wxi+b Using this, we can simplify the above formulation of h(xi) to h(xi)=sign(wx)
(Left:) The original data is 1-dimensional (top row) or 2-dimensional (bottom row). There is no hyper-plane that passes through the origin and separates the red and blue points. (Right:) After a constant dimension was added to all data points such a hyperplane exists.
Observation: Note that yi(wxi)>0xiis classified correctly where 'classified correctly' means that xi is on the correct side of the hyperplane defined by w. Also, note that the left side depends on yi{1,+1} (it wouldn't work if, for example yi{0,+1}).

Perceptron Algorithm

Now that we know what the w is supposed to do (defining a hyperplane the separates the data), let's look at how we can get such w.

Perceptron Algorithm


Geometric Intuition

Illustration of a Perceptron update. (Left:) The hyperplane defined by wt misclassifies one red (-1) and one blue (+1) point. (Middle:) The red point x is chosen and used for an update. Because its label is -1 we need to subtract x from wt. (Right:) The udpated hyperplane wt+1=wtx separates the two classes and the Perceptron algorithm has converged.

Quiz: How often can a Perceptron misclassify a point x repeatedly?

Perceptron Convergence

Suppose that w such that yi(wx)>0 (xi,yi)D.

Now, suppose that we rescale each data point and the w such that ||w||=1and||xi||1xiD The Margin of a hyperplane, γ, is defined as γ=min We can visualize this as follows


Theorem: If all of the above holds, then the perceptron algorithm makes at most 1 / \gamma^2 mistakes.

Proof:
Keeping what we defined above, consider the effect of an update (\vec{w} becomes \vec{w}+y\vec{x}) on the two terms \vec{w} \cdot \vec{w}^* and \vec{w} \cdot \vec{w}. We will use two facts: