SVM is a popular statistical learning method for either classification or regression. For classification, a linear classifier or a hyperplane, such as with w as weight vector and b as the bias, would label data into various categories. The geometric margin is defined as . For SVM, the maximum margin approach is equivalent to minimize . However, with the introduction of regularization to inhibit complexity, the optimization has to upgrade to minimize , where C is the regularization parameter. Eventually the solution for SVM turns out to be a quadratic optimization problem over w and ξ with the constraints of .

Since the

`sashelp.class`

dataset is extremely simple, I attempt to use its variables `age`

and `weight`

to predict `sex`

, which is just for demonstration purpose. According to the plot, the data points are linearly non-separable. The kernel methods have to be applied to map the input data to a high-dimensional space so that they are linearly separable. To harness SVM in SAS, three procedures are commonly used under the license of SAS EMiner. For example, `PROC DMDB`

is used to recode the categorical data and set up the working catalog, `PROC SVM`

is used to build the model, and `PROC SVMSCORE`

is applied to implement the model.`proc sgplot data=sashelp.class;`

scatter x = weight y = age / group = sex;

run;

proc dmdb batch data=sashelp.class dmdbcat=_cat out=_class;

var weight age;

class sex;

run;

If we let C be infinitely large, then all constraints will be executed. Therefore, the margin is narrowed down.

`proc svm data=_class dmdbcat=_cat c=1e11 kernel=linear out= _1;`

title 'hard margin';

ods output restab = restab1;

var weight age ;

target sex;

run;

The accuracy is 63.16%. Overall, the result is below.

Name | Value |
---|---|

Regularization Parameter C | 100000000000 |

Classification Error (Training) | 7.000000 |

Geometric Margin | 1.624447E-10 |

Number of Support Vectors | 17 |

Estimated VC Dim of Classifier | 3.4494098E24 |

Number of Kernel Calls | 74 |

On the contrary, the small C allows constraints to be easily ignored, which leads to the desired large margin.

`proc svm data=_class dmdbcat=_cat kernel=linear out= _2;`

title 'soft margin';

ods output restab = restab2;

var weight age;

target sex;

run;

The accuracy or miscalculation rate keeps the same, since the data is so small. In

`PROC SVM`

, without the specification, the C value is solved to be almost near zero, and the margin are huge.Name | Value |
---|---|

Regularization Parameter C | 0.000098161 |

Classification Error (Training) | 7.000000 |

Geometric Margin | 158.553426 |

Number of Support Vectors | 18 |

Estimated VC Dim of Classifier | 3.850370 |

Number of Kernel Calls | 76 |

- For the SVM procedure, except the training data, adding a validation data for the
`testdata`

option at the`PROC`

statement could effectivley increase the C parameter and decrease the possibility of overfitting. - There are a few advantages for SVM over other data mining methods. First SVM is suitable for high dimension data, and more importantly the complexity can be easily controlled by the adjustment of the regularization parameter C.