Title: Knowledge-Based Analysis of Web Data Extraction

Year of Publication: Nov - 2016
Page Numbers: 26-32
Authors: Alaa Hassan , Marwah N. Abdullah and Nadia Naef
Conference Name: The Fifth International Conference on Informatics and Applications (ICIA2016)
- Japan

Abstract:


In this research, the field of mining has organized the content across the Web by providing the models and techniques of working to achieve the integration of knowledge in a mechanism so that these models are designed to represent human knowledge in the form of structured language through the concepts of modeling tools. Various webs used to obtain data from different sites may seem a little complicated at first, where we studied in this research the exploration of data on the Web. The data is analyzed and the following extract used the Web information extraction technology. They are extracting the information from pages through using a program designed in the Java language, this has been implemented by checking every page of your website, then added the extracted information to their database. Documentation Web has many different formulas formats, such as HTML pages and other formats. Data web is an extracted function to detect the state of the web pages contents, if they are hacker pages or not, where evidence is imported to CSV. Next test data uses web content software depending on the decision tree mining algorithms and is implemented in Weka.