Digital Approaches for the Synthesis of Poorly Accessible Biodiversity Information (BacDive & Semantics)



Dr. Angelina Kraft


Gautam Kishore Shahi


Leibniz SAW


May 2020 – April 2023

The digitalization and integration of biodiversity information can generate substantial added value for existing data and yield novel scientific insights of relevance to bioeconomy, biotechnology, human health, and environmental protection. So far this potential has been exploited only rarely due the heterogeneity and fragmentation of data sources, and the little documentation, variable standards, and limited interoperability of data. For bacteria, research data are particularly diverse and broadly distributed; therefore these organisms will serve as the model group for the current project. The project DiASPora will establish an approach for synthesizing information for bacterial species by applying state-of-the-art data science methodology, genomics, and developing user-centric workflows


The DiASPora project will establish an approach for the synthesis of information for bacterial species using state-of-the-art data scientific methods. The extraction of phenotypic data from microbiological literature will be achieved by large-scale text mining using artificial intelligence (AI) techniques trained through feedback from microbiologist curators.

The TIB is working on the following tasks within the framework of the project:

  • Semantification of prokaryotic data: The data will be standardized and converted into a machine-readable format that complies with the FAIR (findable, accessible, interoperable, reusable, reusable) and Linked Data principles. This includes the use of semantic formalisms such as the Resource Description Framework (RDF), ontologies and R2RML mappings.
  • Creation of a machine-readable knowledge graph: This task includes the semantic integration of data, metadata and schema. Agile, iterative and community-driven methods for the development of ontology are developed by and with all participants. This includes the evaluation of the NCBI taxon ontology and the presentation of quality criteria including classification schemes for the microbiological sector.
  • Improving graphical and programmatic access to microbiological data

The project is dedicated to an integral community engagement and an efficient dissemination of results. DiASPora builds upon the complementary expertise of three participating institutions, covering the fields of microbial databases and diversity research, bacterial genomics, text mining, artificial intelligence, and semantic technologies.


  • Leibniz-Institut DSMZ - Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH
  • ZB MED (Informationszentrum Lebenswissenschaften)


Back to list