Title: BUILDING ACCURATE TEXT CLASSIFIERS

Issue Number: Vol. 6, No. 1
Year of Publication: Jan - 2016
Page Numbers: 1-15
Authors: Arthi Venkataraman
Journal Name: International Journal of Digital Information and Wireless Communications (IJDIWC)
- Hong Kong
DOI:  http://dx.doi.org/10.17781/P001921

Abstract:


This paper brings out the various techniques we have followed to build a production ready scalable classifier system. This system is used to classify the tickets raised by employees of an organization. The end users raise the tickets in Natural language which is then automatically classified by the classifier. This is a practical industry applied research paper in the area of machine learning. We have applied different machine learning and natural language processing techniques like active learning and random under sampling. The application of these techniques has helped in improving the accuracy of the prediction. We have used clustering for handling the data issues found in the training data. The approach we used for the core classifier combined the results of multiple machine learning algorithms using suitable scoring techniques. The overall solution architecture used ensured the meeting of production grade software system attributes of reliability, availability and scalability.