In line with its strategy, TIB pursues the long-term approach "Move beyond text". In this context, TIB considers not only visual and textual information, but also scientific/technical data sources in digital form, such as research data, audiovisual content, 3D information, simulation models and software. The quantities of data in all of these areas are growing exponentially, placing new demands on TIB concerning their indexing, preparation and provision. This involves not only tagging data sets with metainformation and automated content mining, but also linking them in an overarching framework.
To achieve this, new methodological approaches are required for TIB's internal information mining and processing as well as for service external customers within TIB's service portfolio.
What is data science?
The term "data science" was originally coined in 1960 as a synonym for information technology or computer science, but was reinterpreted in the 1990s. The topic is closely connected to the term "big data", the rapidly growing wide range of digital information sources available in the age of the internet that present new challenges with regard to mining and analysing content. In this connection, the following four categories are especially used to describe quantities of information: volume, velocity, variety and veracity.
In the current definition, the research area of data science combines established techniques and theories from the areas of mathematics, statistics and computer science in an interdisciplinary manner in order to mine, model and analyse information for thematic decision-making purposes. In so doing, new technical solutions are derived from the original technologies, and extended. The thematic areas of machine learning, pattern recognition and statistics play a central role.
TIB's research activities in the field of data science focus on developing and implementing solutions for the library sector so that the growing volumes of data in TIB's collections can be retrieved, searched and archived in a sustainable, future-proof manner.
The focus is currently on the following thematic areas:
- Text (data) mining - an analysis technique to identify structures of meaning from unstructured or weakly structured texts. The aim is to gain core information from the analysed texts, enabling correlations to be derived that were not known to exist in the texts. These methods are also applied to improve TIB's services, e.g. for TIB's portal infrastructure.
- Knowledge management for deriving hierarchical, thematic classification approaches (ontologies and taxonomies) based on existing document collections. These classifications form the basis for extended services in TIB's portals, such as hierarchical search or search term expansion.
Multimedia retrieval methods and semantic analysis are combined in the context of TIB's AV Portal. The resulting automatic video analysis includes not only structural analysis (scene recognition), but also text, audio and image analysis.