We have updated the tests for this project. Please go to Piazza resources for the latest version. The autograder is using the latest version. (November 16, 2015)
"Just as we have two eyes and two feet,
duality is part of life."
--Carlos Santana
Files you'll edit: | |
l2distance.m | Computes the Euclidean distances between two sets of vectors. You can copy this function from previous assignments. |
computeK.m | Computes a kernel matrix given input vectors. |
generateQP.m |
Generates all the right matrices and vectors to call the qp command in Octave.
|
recoverBias.m | Solves for the hyperplane bias b after the SVM dual has been solved. |
trainsvm.m | Trains a kernel SVM on a data set and outputs a classifier. |
crossvalidate.m | A function that uses cross validation to find the best kernel parameter and regularization constant. |
autosvm.m | Take a look at this function. It takes as input a data set and its labels, cross validates over the hyper-parameters of your SVM and outputs a well-tuned classifier. You can "tune" this for the in-class competition. |
Files you might want to look at: | |
visdecision.m | This function visualizes the decision boundary of an SVM in 2d. |
extractpars.m | This function extracts parameters. It is a helper function for visdecision. |
We provide you with a "spiral" data set. You can load it and visualize it with the following commands:
>> load spiral >> whos Variables in the current scope: Attr Name Size Bytes Class ==== ==== ==== ===== ===== xTe 2x242 3872 double xTr 2x100 1600 double yTe 1x242 1936 double yTr 1x100 800 double Total is 1227 elements using 9808 bytes >> dummy=@(X) sign(mean(X,1)); >> visdecision(xTr,yTr,dummy);
visdecision
visualizes the data set and the decision boundary of a classifier. In this case the dummy
classifier is not particularly powerful and only serves as a placeholder.
First implement the kernel function
computeK(ktype,X,Z,kpar)It takes as input a kernel type (ktype) and two data sets $\mathbf{X}$ in $\mathcal{R}^{d\times n}$ and $\mathbf{Z}$ in $\mathcal{R}^{d\times m}$ and outputs a kernel matrix $\mathbf{K}\in{\mathcal{R}^{n\times m}}$. The last input,
kpar
specifies the kernel parameter (e.g. the inverse kernel width $\gamma$ in the RBF case or the degree $p$ in the polynomial case.)
ktype='linear'
) svm, we use $k(\mathbf{x},\mathbf{z})=x^Tz$ ktype='rbf'
) svm we use $k(\mathbf{x},\mathbf{z})=\exp(-\gamma ||x-z||^2)$ (gamma is a hyperparameter, passed a the value of kpar)ktype='poly'
) we use $k(\mathbf{x},\mathbf{z})=(x^Tz + 1)^p$ (p is the degree of the polymial, passed as the value of kpar)l2distance.m
as a helperfunction.
Read through the specifications of the Octave internal quadratic programming solver, qp.
>> help qp [...] -- Function File: [X, OBJ, INFO, LAMBDA] = qp (X0, H, q, A, b, lb, ub) [...] Solve the quadratic program min 0.5 x'*H*x + x'*q x subject to A*x = b lb <= x <= ub using a null-space active-set method.Here,
X0
is any initial guess for your vector $\alpha$.
The qp solver is not very fast (in fact it is pretty slow) but for our purposes it will do. Write the function
[H,q,A,b,lb,ub]=generateQP(K,yTr,C)which takes as input a kernel matrix
K
, a label vector yTr
and a regularization constant C
and outputs the matrices [H,q,A,b,lb,ub]
so that they can be used directly with the qp
solver to find a solution to the SVM dual problem. (Be careful, the b
here has nothing to do with the SVM hyper-plane bias $b$.)
Implement the function
[svmclassify,sv_i,alphas]=trainsvm(xTr,yTr,C,ktype,kpar);It should use your functions
computeK.m
and generateQP
to solve the SVM dual problem of an SVM specified by a training data set (xTr,yTr
), a regularization parameter (C
), a kernel type (ktype
) and kernel parameter (kpar
).
For now ignore the first output svmclassify
(you can keep the pre-defined solution that generates random labels) but make sure that sv_i
corresponds to all the support vectors (up to numerical accuracy) and alphas
returns an n-dimensional vector of alphas.
Now that you can solve the dual correctly, you should have the values for $\alpha_i$. But you are not done yet. You still need to be able to classify new test points. Remember from class that $h(\mathbf{x})=\sum_{i=1}^n \alpha_i y_i k(\mathbf{x}_i,\mathbf{x})+b$. We need to obtain $b$. It is easy to show (and omitted here) that if $C>\alpha_i>0$ (with strict $>$), then we must have that $y_i(\mathbf{w}^\top \phi(\mathbf{x}_i)+b)=1$. Rephrase this equality in terms of $\alpha_i$ and solve for $b$. Implement
bias=recoverBias(K,yTr,alphas,C);where
bias
is the hyperplane bias $b$
(Hint: This is most stable if you pick an $\alpha_i$ that is furthest from $C$ and $0$. )
With the recoverBias
function in place, you can now finish your function trainsvm.m
and implement the remaining output, which returns an actual classifier svmclassify
. This function should take a set of inputs of dimensions $d\times m$ as input and output predictions ($1\times m$).
You can evaluate it and see if you can match my testing and training error exactly:
>> load spiral >> svmclassify=trainsvm(xTr,yTr,6,'rbf',0.25); Generate Kernel ... Generate QP ... Solve QP ... Recovering bias ... Extracting support vectors ... >> testerr=sum(sign(svmclassify(xTe))~=yTe(:))/length(yTe) testerr = 0.29339 >> trainerr=sum(sign(svmclassify(xTr))~=yTr(:))/length(yTr) trainerr = 0.13000
If you play around with your new SVM solver, you will notice that it is rather sensitive to the kernel and regularization parameters. You therefore need to implement a function
[bestC,bestP,bestvalerror]=crossvalidate(xTr,yTr,ktype,Cs,kpars)to automatically sweep over different values of C and kpars and output the best setting on a validation set (
xTv,yTv
). For example you could set Cs=2.^[-5:5]
to try out different orders of magnitude. Cut off a section of your training data to generate a valication set.
If you did everything correctly, you should now be able to visualize your cross validation map:
>> load spiral >> [bestC,bestP,bestval,allerrs]=crossvalidate(xTr,yTr,'rbf',2.^[-1:8],2.^[-2:3]); >> [xx,yy]=meshgrid(-2:3,-1:8); >> surf(xx,yy,allerrs); >> xlabel('\gamma'); ylabel('C'); zlabel('Val Error');The result should look similar to this image:
>> svmclassify=trainsvm(xTr,yTr,bestC,'rbf',bestP); >> testerr=sum(sign(svmclassify(xTe))~=yTe(:))/length(yTe) testerr = 0.15289 >> visdecision(xTe,yTe,svmclassify,'vismargin',true);
Implement the function autosvm.m
. Currently it takes a data set as input and tunes the hyper-parameters for the RBF kernel to perform best on a hold-out set (with your crossvalidation.m
function). Can you improve it? (For example, you may look into adding new kernels to computeK.m
, use telescopic cross validation, or use k-fold cross validation instead of a single validation set.)
>> more offwhenever you start or Octave console, to see the output of your functions on the screen right away.