- Anglický jazyk
Development of Stemming Algorith for Wolaytta Text
Autor: Lemma Lessa
This study describes the design of a stemming algorithm for Wolaytta language. To give a solid background for the thesis, literatures on conflation in general and stemming algorithms in particular were reviewed. The result of the study is a prototype context... Viac o knihe
Na objednávku, dodanie 2-4 týždne
53.42 €
bežná cena: 60.70 €
O knihe
This study describes the design of a stemming algorithm for Wolaytta language. To give a solid background for the thesis, literatures on conflation in general and stemming algorithms in particular were reviewed. The result of the study is a prototype context sensitive iterative stemmer for Wolaytta language. Error counting technique was employed to evaluate the performance of this stemmer. The stemmer was trained on 3537 words (80% of the sample text) and the improved version reveals an accuracy of 90.6% on the training set. The number of over stemmed and understemmed words on the training set were 8.6% (304 words) and 0.8% (28 words) respectively. When the stemmer runs on the unseen sample of 884 words (20% of the sample text), it performed with an accuracy of 86.9%. The percentage of errors recorded as understemmed and overstemmed on this unseen (test set) were 9% and 4.1%, respectively. Moreover, a dictionary reduction of 38.92% was attained on the test set. The major sources of errors are also reported with possible recommendations to further improve the performance of the stemmer and also for further research.
- Vydavateľstvo: LAP LAMBERT Academic Publishing
- Rok vydania: 2011
- Formát: Paperback
- Rozmer: 220 x 150 mm
- Jazyk: Anglický jazyk
- ISBN: 9783846548592