Anglický jazyk

Generation of Text and Speech Corpora

Vydavateľstvo: LAP LAMBERT Academic Publishing
Rok vydania: 2019
Formát: Paperback
Rozmer: 220 x 150 mm
Jazyk: Anglický jazyk
ISBN: 9783659777127

Na objednávku

65.12 €

bežná cena: 74.00 €

O knihe

Recent trends in the development of language related technology finds unavoidable requirement of language related resources and acquiring knowledge from these resources. In this prospect corpus-based methods are getting strong push from various laboratories throughout the world in Bangla language processing. As a continuation of these efforts, new Bangla text corpus BdNC01 and several speech corpora were generated in this work. The texts were collected from web editions of several leading Bangla news papers over a long period of time to avoid time dependency of word frequency. More than eleven million word tokens were collected during a period of six years. The corpus was manually checked and error-corrected each time before preserving in final repository as ASCII and Unicode texts. Popular words derived from text corpus, we recorded the largest speech corpora in Bangla language. It has been specifically designed for various research activities related to HMM-based speaker-independent speech recognition.