• Anglický jazyk

Development of Stemming Algorith for Wolaytta Text

Autor: Lemma Lessa

This study describes the design of a stemming algorithm for Wolaytta language. To give a solid background for the thesis, literatures on conflation in general and stemming algorithms in particular were reviewed. The result of the study is a prototype context... Viac o knihe

Na objednávku, dodanie 2-4 týždne

53.42 €

bežná cena: 60.70 €

O knihe

This study describes the design of a stemming algorithm for Wolaytta language. To give a solid background for the thesis, literatures on conflation in general and stemming algorithms in particular were reviewed. The result of the study is a prototype context sensitive iterative stemmer for Wolaytta language. Error counting technique was employed to evaluate the performance of this stemmer. The stemmer was trained on 3537 words (80% of the sample text) and the improved version reveals an accuracy of 90.6% on the training set. The number of over stemmed and understemmed words on the training set were 8.6% (304 words) and 0.8% (28 words) respectively. When the stemmer runs on the unseen sample of 884 words (20% of the sample text), it performed with an accuracy of 86.9%. The percentage of errors recorded as understemmed and overstemmed on this unseen (test set) were 9% and 4.1%, respectively. Moreover, a dictionary reduction of 38.92% was attained on the test set. The major sources of errors are also reported with possible recommendations to further improve the performance of the stemmer and also for further research.

  • Vydavateľstvo: LAP LAMBERT Academic Publishing
  • Rok vydania: 2011
  • Formát: Paperback
  • Rozmer: 220 x 150 mm
  • Jazyk: Anglický jazyk
  • ISBN: 9783846548592

Generuje redakčný systém BUXUS CMS spoločnosti ui42.