,

Analyzing the Web: Are Top Websites Lists a Good Choice for Research?

, и .
Proceedings of the International Conference on Theory and Practice of Digital Libraries, стр. 11--25. Cham, Springer, (2022)
DOI: 10.1007/978-3-031-16802-4_2

Аннотация

The web has been a subject of research since its beginning, but it is difficult if not impossible to analyze the whole web, even if a database of all URLs would be freely accessible. Hundreds of studies have used commercial top websites lists as a shortcut, in particular the Alexa One Million Top Sites list. However, apart from the fact that Amazon decided to terminate Alexa, we question the usefulness of such lists for research as they have several shortcomings. Our analysis shows that top sites lists miss frequently visited websites and offer only little value for language-specific research. We present a heuristic-driven alternative based on the Common Crawl host-level web graph while also taking language-specific requirements into account.

тэги

Пользователи данного ресурса

  • @jaeschke
  • @dblp

Комментарии и рецензии