@gabydler

Empowering the SDM-RDFizer Tool for Scaling Up to Complex Knowledge Graph Creation Pipelines

, , , and . Semantic Web Journal, (2023)

Abstract

Data has grown exponentially in the last years, and knowledge graphs have gained momentum as data structures to integrate heterogeneous data and metadata. This explosion of data has created many opportunities to develop innovative technologies. Still, it brings attention to the lack of standardization for making data available, raising questions about interoperability and data quality. Data complexities such as large volume, heterogeneity, and high duplicate rates affect knowledge graph creation. This work addresses these issues to scale up knowledge graph creation guided by the RDF Mapping Language (RML). For that purpose, we present the SDM-RDFizer, a two-fold solution to address these two sources of complexity. First, RML triples maps are reordered in a way that the most selective maps are evaluated first, while non-selective rules are considered at the end, reducing the number of triples that are kept in the main memory. In the second step, an RDF compression strategy and novel operators are implemented to avoid the generation of duplicated RDF triples and the reduction of the number of comparisons during the execution of RML operators between mapping rules. We test our tool on two well-known benchmarks, overcoming state-of-the-art RML engines, and hence, demonstrating the benefits of the proposed techniques.

Links and resources

Tags

community

  • @l3s
  • @gabydler
@gabydler's tags highlighted