An ontology is a computer-processable collection of knowledge about the world.
This thesis explains how an ontology can be constructed and expanded auto-
matically. The proposed approach consists of three contributions:
1. A core ontology, YAGO.
YAGO is an ontology that has been constructed automatically. It com-
bines high accuracy with large coverage and serves as a core that can be
expanded.
2. A tool for information extraction, LEILA.
LEILA is a system that can extract knowledge from natural language
texts. LEILA will be used to ¯nd new facts for YAGO.
3. An integration mechanism, SOFIE.
SOFIE is a system that can reason on the plausibility of new knowl-
edge. SOFIE will assess the facts found by LEILA and integrate them
into YAGO.
Each of these components comes with a fully implemented system. Together,
they form an integrative architecture, which does not only gather new facts,
but also reconcile them with the existing facts. The result is an ever-growing,
yet highly accurate ontological knowledge base. A survey of applications of the
ontology completes the thesis.
We present a taxonomy automatically generated from
the system of categories in Wikipedia. Categories in the resource
are identified as either classes or instances and included in a large
subsumption, i.e. isa, hierarchy. The taxonomy is made available in
RDFS format to the research community, e.g. for direct use within AI
applications or to bootstrap the process of manual ontology creation.
this paper presents the process of acquiring a large, domain independent, taxonomy from the German Wikipedia. We build upon a
previously implemented platform that extracts a semantic network and taxonomy from the English version of theWikipedia. We describe
two accomplishments of our work: the semantic network for the German language in which isa links are identied and annotated, and
an expansion of the platform for easy adaptation for a new language. We identify the platform's strengths and shortcomings, which stem
from the scarcity of free processing resources for languages other than English. We show that the taxonomy induction process is highly
reliable evaluated against the German version of WordNet, GermaNet, the resource obtained shows an accuracy of 83.34%.
This paper presents an automatic method for diferentiating
between instances and classes in a large scale taxonomy induced from
the Wikipedia category network. The method exploits characteristics
of the category names and the structure of the network. The approach
we present is the ¯rst attempt to make this distinction automatically
in a large scale resource. In contrast, this distinction has been made
in WordNet and Cyc based on manual annotations. The result of the
process is evaluated against ResearchCyc. On the subnetwork shared by
our taxonomy and ResearchCyc we report 84.52% accuracy.