English-language extension of the TIB AV Portal
Dr. Sven Strobel
1 December 2013 – 31 March 2014
The aim of the project was to obtain English technical vocabulary for tagging English-language videos in the TIB AV Portal by mapping GND terms to DBpedia and other standard data.
Videos in the TIB AV Portal are automatically tagged with a total of 63,356 GND terms from the realms of science and technology. In addition to German-language videos, the TIB AV Portal also contains numerous English-language videos. The GND contains only very few English identifiers for the terms used in the TIB AV Portal knowledge base. There is therefore a lack of English indexing vocabulary that could be used to automatically tag the English-language videos. The problem was to be tackled as follows: the English identifiers were to be obtained by mapping GND terms to other datasets that contain an English translation for the terms. The mapping strategies applied used the results of DBpedia, LCSH, MACS and the WTI thesaurus. (At least) one English label was ultimately identified for each of the 35,025 GND terms. These English identifiers can be directly used to automatically tag English-language videos. Although it was not possible to ‘translate’ 11,694 GND terms into English, they were at least associated with a hypernym for which an English translation exists. This association helps to expand the search results.
- English-language extension of the TIB AV Portal. GND-DBpedia mapping to obtain an English term system
- How the TIB AV Portal learned English. An English translation for terms from the AV Portal knowledge basis.
- Semantic search for scientific videos. Automatic tagging using named-entity recognition