Title: Practical Issues of Clustering Relatively Small Text Data Sets for Business Purposes

Year of Publication: Jun - 2014
Page Numbers: 15-22
Authors: Nikita Nikitinsky , Tamara Sokolova and Ekaterina Pshehotskaya
Conference Name: The International Conference on Digital Security and Forensics (DigitalSec2014)
- Czech Republic


Clustering of relatively small sets of documents has become a frequent task in small business. Current topic modeling and clustering algorithms can handle this task, but there are some ways to improve the quality of cluster analysis, for example, by introducing some combined algorithms. In this paper, we will conduct some experiments to define the best clustering algorithm among LSI, LDA and LDA+GS combined with GMM and find heuristics to improve the performance of the best algorithm.