Efficient Validation of RDF Data using SHACL

The Resource Description Framework (RDF) is the W3C standard for publishing and exchanging data on the Web. Many data sources suffer from data quality issues. The Shapes Constraint Language (SHACL) is the W3C recommendation language for defining integrity constraints over RDF data. Corman et. al [1] showed that the validation of an RDF data source using an arbitrary SHACL shape schema is NP-hard. The goal of this thesis is to define efficient methods to validate SHACL shape schemas over RDF data sources accessible via SPARQL; a query language for RDF data sources. The implementation part of the thesis will be based on an already existing prototype for simple constraints.

Voraussetzungen

  • Immatrikulation an einer deutschen Universität
  • Gute Englischkenntnisse in Schrift und Wort
  • Gute Programmierkenntnisse in Python

Hilfreiche Lehrveranstaltungen

  • Grundlagen der Datenbanksysteme
  • Datenstrukturen und Algorithmen
  • Knowledge Engineering und Semantic Web
  • Komplexität von Algorithmen

Abgedeckte Themen

  • Big Data
  • Knowledge Graphs
  • Quality Assessment

Literatur

[1] J. Corman, J.L. Reutter, O. Savković: Semantics and Validation of Recursive SHACL. 2018. URL: https://www.inf.unibz.it/krdb/KRDB%20files/tech-reports/KRDB18-01.pdf

[2] J. Corman, F. Florenzano, J.L. Reutter, O. Savković: Validating SHACL Constraints over a SPARQL Endpoint. 2019. URL: https://jreutter.sitios.ing.uc.cl/SHACL_19.pdf

[3] M. Figuera, P.D. Rohde, M.-E. Vidal: Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. 2021. URL: https://arxiv.org/pdf/2101.07136.pdf

Feedback