Title: Research on Large Scale Data Sets Categorization Based on SVM

Year of Publication: Jul - 2015
Page Numbers: 51-61
Authors: Yongli Li, Liyan Dong, Minghui Sun, Hongjie Wang, Le Huang, Xinxin Wang, Meichen Dong
Conference Name: The Fourth International Conference on Informatics & Applications (ICIA2015)
- Japan


Support Vector Machines algorithms are not appropriate for the large data sets because of high training complexity. To address this issue, this paper presents a two stage SVM classification algorithm based on fuzzy clustering. The algorithm is divided into two phases. In the first phase, an approximate decision hyper-plane is obtained by weighted SVM which using the data after the fuzzy clustering as training data sets. In the second phase, the decision hyper-plane is obtained by SVM using the data near to the approximate hyper-plane obtained in the first phase. Experimental results demonstrate that our approach has good classification accuracy while the training is significantly faster than the standard SVM. The improved approach has a distinctive advantage on dealing with huge data sets.