Title: Composite Heuristic Algorithm for Clustering Text Data Sets

Issue Number: Vol. 3, No. 3
Year of Publication: 2014
Page Numbers: 153-162
Authors: Nikita Nikitinsky, Tamara Sokolova, Ekaterina Pshehotskaya
Journal Name: International Journal of Cyber-Security and Digital Forensics (IJCSDF)
- Hong Kong
DOI:  http://dx.doi.org/10.17781/P001335


Document clustering has become a frequent task in business. Current topic modeling and clustering algorithms can handle this task, but there are some ways to improve the quality of cluster analysis, for example, by introducing some combined algorithms. In this paper, we will conduct some experiments to define the best clustering algorithm among LSI, LDA and LDA+GS combined with GMM and find heuristics to improve the performance of the best algorithm.