Theory

  1. Show that the distance of any point \( \z_i\in \reals^p \) from a hyperplane \(\w^T\x -b=0 \) (where \(\w\) and \(\x\) are in \(\reals^p\), and \(b\in \reals\) is a number) is given by
\[\frac{ \w^T \z_i -b }{|| \w||}.\]
  1. Find an approach to assign probabilities to the classes using a support vector machine.

Mini-project

Please take a look at the credit risk dataset.

  1. Build a model to predict risk using the base SVM approach by choosing appropriate kernels
  2. An SVM generally treats the two classes the same: support vectors are equidistant (in the lifted space) from the separating hyperplane. In our problem, misclassification of one kind may be worse than the other. Find an approach to use different margins for positive and negative examples.
  3. The approach generally taken by credit rating companies is feature engineering followed by logistic regression. Try to find features/combinations of features derived as a result of feature engineering.