Inproceedings,

Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering

L. Meyer, J. Frey, K. Junghanns, F. Brei, K. Bulert, S. Gründer-Fahrer, and M. Martin.
Proceedings of Poster Track of Semantics 2023, volume 3526 of CEUR Workshop Proceedings, page 16--20. (2023)
DOI: 10.48550/ARXIV.2308.16622

Full text

Abstract

As the field of Large Language Models (LLMs) evolves at an accelerated pace, the critical need to assess and monitor their performance emerges. We introduce a benchmarking framework focused on knowledge graph engineering (KGE) accompanied by three challenges addressing syntax and error correction, facts extraction and dataset generation. We show that while being a useful tool, LLMs are yet unfit to assist in knowledge graph generation with zero-shot prompting. Consequently, our LLM-KG-Bench framework provides automatic evaluation and storage of LLM responses as well as statistical data and visualization tools to support tracking of prompt engineering and model performance.

BibTeX key: Meyer2023DevelopingScalableBenchmark
entry type: inproceedings
booktitle: Proceedings of Poster Track of Semantics 2023
year: 2023
pages: 16--20
series: CEUR Workshop Proceedings
volume: 3526
issn: 1613-0073
comment: Code: https://github.com/AKSW/LLM-KG-Bench Results: https://github.com/AKSW/LLM-KG-Bench-Results/blob/main/2023-SEMANTICS_LLM-KGE-Bench-Results
DOI: 10.48550/ARXIV.2308.16622
Document: https://ceur-ws.org/Vol-3526/paper-04.pdf

BibSonomy

Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on