- Anglický jazyk
Generation of Text and Speech Corpora
Autor: Md. Farukuzzaman Khan
Recent trends in the development of language related technology finds unavoidable requirement of language related resources and acquiring knowledge from these resources. In this prospect corpus-based methods are getting strong push from various laboratories... Viac o knihe
Na objednávku, dodanie 2-4 týždne
66.60 €
bežná cena: 74.00 €
O knihe
Recent trends in the development of language related technology finds unavoidable requirement of language related resources and acquiring knowledge from these resources. In this prospect corpus-based methods are getting strong push from various laboratories throughout the world in Bangla language processing. As a continuation of these efforts, new Bangla text corpus BdNC01 and several speech corpora were generated in this work. The texts were collected from web editions of several leading Bangla news papers over a long period of time to avoid time dependency of word frequency. More than eleven million word tokens were collected during a period of six years. The corpus was manually checked and error-corrected each time before preserving in final repository as ASCII and Unicode texts. Popular words derived from text corpus, we recorded the largest speech corpora in Bangla language. It has been specifically designed for various research activities related to HMM-based speaker-independent speech recognition.
- Vydavateľstvo: LAP LAMBERT Academic Publishing
- Rok vydania: 2019
- Formát: Paperback
- Rozmer: 220 x 150 mm
- Jazyk: Anglický jazyk
- ISBN: 9783659777127