Title: Comparison of Machine Learning Algorithms Based on Filipino-Vietnamese Speeches

Year of Publication: Nov - 2014
Page Numbers: 28-36
Authors: Hoa T. Le
Conference Name: The International Conference on Data Mining, Internet Computing, and Big Data (BigData2014)
- Malaysia


People of different races are characterized by the language they speak. They can identify voices of someone’s race just by listening and talking through conversation. This paper presents an efficient comparison of machine learning algorithm based on Filipino-Vietnamese speeches for tone classification using feature parameter. The system was trained using audio recorded speeches samples. Datasets were taken from multiple sessions involving 10 respondents; 5 (five) of which are Filipinos and 5 (five) Vietnamese. The respondents were asked to read the paragraphs and record their voices while reading the data. The empirical test shows that during the pre-processing of data records, Vietnamese have longer range of duration as compared to Filipinos because of their manners in reading and intensity on accent-bearing syllables. In constructing the speech recognition model, four classification algorithms were used, namely: KNN (K-Nearest Neighbour), Naïve-Bayes, SMO (Support Vector Machine) and MLP (Multilayer Perceptron). The evaluation of the training set in terms of accuracy, correctly classified instances and incorrectly classified instances are evaluated by the performance of the developed system. As the data established, the resultsshow that SMO and MLP performed better for all the given datasets, with accuracy rates ranging from 99.2694% for MLPand 98.7179%for SMO. However, KNN algorithm turned out tohave the lowest rate of 96.3882%.