Artikel,

Search engine case study: searching the web using genetic programming and MPI

.
Parallel Computing, 27 (1-2): 71--89 (Januar 2001)

Abstract

The generation of a Web page follows distinct sources for the incorporation of information. The earliest format of these sources was an organized display of known information determined by the page designers' interest and/or design parameters. The sources may have been published in books or other printed literature, or disseminated as general information about the page designer. Due to a growth in Web pages, several new search engines have been developed in addition to the refinement of the already existing ones. The use of the refined search engines, however, still produces an array of diverse information when the same set of keywords are used in a Web search. Some degree of consistency in the search results can be achieved over a period of time when the same search engine is used, yet, most initial Web searches on a given topic are treated as final after some form of refinement/adjustment of the keywords used in the search process. To determine the applicability of a genetic programming (GP) model for the diverse set of Web documents, search strategies behind the current search engines for the World Wide Web were studied. The development of a GP model resulted in a parallel implementation of a pseudo-search engine indexer simulator. The training sets used in this study provided a small snapshot of the computational effort required to index Web documents accurately and efficiently. Future results will be used to develop and implement Web crawler mechanisms that are capable of assessing the scope of this research effort. The GP model results were generated on a network of SUN workstations and an IBM SP2.

Tags

Users

  • @brazovayeye

Comments and Reviews