Title: Search Engine Development: Adaptation from Supervised Learning Methodology

Year of Publication: March - 2014
Page Numbers: 35-42
Authors: Mohd Noah A. Rahman, Afzaal H. Seyal, Siti Aminah Maidin
Conference Name: The Fourth International Conference on Digital Information Processing and Communications (ICDIPC2014)
- Malaysia


This paper reports a generic search engine development using a supervised learning methodology. This learning method has become apparently important due to the growth rate of data which has increased tremendously and challenge our capacity to write software algorithm around it. This was advocated as a mean to understand better the flow of algorithm in a controlled environment setting. It uses the Breadth-First-Search (BFS) algorithm retrieval strategy to retrieve pages with topic specific or topical searching. Additionally, an inverted indexing technique is applied to store mapping from contents to its location in a DBMS. Subsequently, these techniques require proper approach to avoid flooding of irrelevant links which can constitute a poor design and constructed search engine to crash. The main idea of this research is to learn the concept of how to crawl, index, search and rank the output accordingly in a controlled environment. This is a contrast as compared to unsupervised learning conditions which could lead to information overloading.