Abstract. The envisioned Semantic Web aims to provide richly annotated and explicitly structured Web pages in XML, RDF, or description logics, based upon underlying ontologies and thesauri. Ideally, this should enable a wealth of query
processing and semantic reasoning capabilities using XQuery and logical inference engines. However, we believe that the diversity and uncertainty of terminologies
and schema-like annotations will make precise querying on a Web scale extremely elusive if not hopeless, and the same argument holds for large-scale dynamic federations of Deep Web sources. Therefore, ontology-based reasoning
and querying needs to be enhanced by statistical means, leading to relevanceranked lists as query results.
This paper presents steps towards such a “statistically semantic”Web and outlines technical challenges.We discuss how statistically quantified ontological relations
can be exploited in XML retrieval, how statistics can help in making Web-scale search efficient, and how statistical information extracted from users’ query logs
and click streams can be leveraged for better search result ranking. We believe these are decisive issues for improving the quality of next-generation search engines
for intranets, digital libraries, and the Web, and they are crucial also for peer-to-peer collaborative Web search.