Title: SIMPLE RULES MALAY STEMMER

Year of Publication: Jun - 2012
Page Numbers: 28-35
Authors: Syed Abdullah Fadzli, A Khairani Norsalehen, I. Ahmad Syarilla, Hassan Hasni, M Satar Siti Dhalila
Conference Name: The International Conference on Informatics and Applications (ICIA2012)
- Malaysia

Abstract:


Stemming is a morphological analysis that tries to associate variants of the same term with a common root form. It is important to improve recall and precision in IR systems. Malay word stemming is considered complicated compared with other languages because of its unique morphological structure. Many research in Malay stemming relies heavily on dictionary which needs higher processing cost and offers lower coverage. This paper presents a stemming approach called UniSZA stemmer which attempts to reduce dictionary dependencies and lower the processing cost by proposing 7 simple rules. Experimental results show that the approach produces higher compression ratio and processing speed compared to RAO and RFO methods.