Theory

Show how a single neuron with an appropriate choice of activation, loss function and regularization can mimic (i) Ridge Regression, (ii) Lasso (iii) (linear) Support Vector Machines.
Show that the Perceptron Learning Algorithm in the Reading is equivalent to Stochastic Gradient Descent with a batch size of 1, where the \(i\) th example incurs loss \( \max ( 0, - y y_i) \) if the true label is \(y\) and the prediction of the Perceptron is \(y_i\).
Answer the problems in the following handout

Practice

Implement all the above. For OLS and linear Support vector machines, try to get the trained neuron weights as close to the optimal values calculated by vanilla OLS and linear SVM respectively.