- Anglický jazyk
Vision-Based Deep Web Data Extraction For Web Document Clustering
Autor: M. Lavanya
The VDEC approach comprises of two phases: 1) Vision-based web data extraction, and 2) Web document clustering. In phase 1, the web page information is segmented into various chunks from which, surplus noise and duplicate chunks are removed using three parameters,... Viac o knihe
Na objednávku
73.98 €
bežná cena: 82.20 €
O knihe
The VDEC approach comprises of two phases: 1) Vision-based web data extraction, and 2) Web document clustering. In phase 1, the web page information is segmented into various chunks from which, surplus noise and duplicate chunks are removed using three parameters, such as hyperlink percentage, noise score and cosine similarity. To identify the relevant chunk, three parameters such as Title word Relevancy, Keyword frequency-based chunk selection, Position features are used and then, a set of keywords is extracted from those main chunks. Finally, the extracted keywords are subjected to web document clustering using Fuzzy C-Means clustering (FCM). The proposed vision based deep web data extraction is implemented and tested using synthetic dataset. The results are compared with existing two algorithms, the one is Vision-based Data Record Extraction (ViDE) and another is Mining Data Region (MDR) algorithm. From the experimental results that has been performed on two different synthetic datasets, the results showed that the proposed VDEC method can achieve stable and good results of about 99.2% and 99.1% precision value in both datasets with different threshold values provided.
- Vydavateľstvo: LAP LAMBERT Academic Publishing
- Rok vydania: 2022
- Formát: Paperback
- Rozmer: 220 x 150 mm
- Jazyk: Anglický jazyk
- ISBN: 9786204956060