Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-18771
Publication type: Article in scientific journal
Type of review: Peer review (publication)
Title: Enabling semantic queries across federated bioinformatics databases
Authors: Sima, Ana-Claudia
Mendes de Farias, Tarcisio
Zbinden, Erich
Anisimova, Maria
Gil, Manuel
Stockinger, Heinz
Stockinger, Kurt
Robinson-Rechavi, Marc
Dessimoz, Christophe
et. al: No
DOI: 10.1093/database/baz106
10.21256/zhaw-18771
Published in: Database: The Journal of Biological Databases and Curation
Volume(Issue): 2019
Issue: baz106
Issue Date: 2019
Publisher / Ed. Institution: Oxford University Press
ISSN: 1758-0463
Language: English
Subjects: Semantic query; Federated database; Semantic web technology; Data integration; Query processing; Natural language interface
Subject (DDC): 005: Computer programming, programs and data
Abstract: Motivation: Data integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data available publicly. However, the heterogeneity of the different data sources, both at the syntactic and the semantic level, still poses significant challenges for achieving interoperability among biological databases. Results: We introduce an ontology-based federated approach for data integration. We applied this approach to three heterogeneous data stores that span different areas of biological knowledge: (i) Bgee, a gene expression relational database; (ii) Orthologous Matrix (OMA), a Hierarchical Data Format 5 orthology DS; and (iii) UniProtKB, a Resource Description Framework (RDF) store containing protein sequence and functional information. To enable federated queries across these sources, we first defined a new semantic model for gene expression called GenEx. We then show how the relational data in Bgee can be expressed as a virtual RDF graph, instantiating GenEx, through dedicated relational-to-RDF mappings. By applying these mappings, Bgee data are now accessible through a public SPARQL endpoint. Similarly, the materialized RDF data of OMA, expressed in terms of the Orthology ontology, is made available in a public SPARQL endpoint. We identified and formally described intersection points (i.e. virtual links) among the three data sources. These allow performing joint queries across the data stores. Finally, we lay the groundwork to enable nontechnical users to benefit from the integrated data, by providing a natural language template-based search interface.
URI: https://digitalcollection.zhaw.ch/handle/11475/18771
Fulltext version: Published version
License (according to publishing contract): CC BY 4.0: Attribution 4.0 International
Departement: Life Sciences and Facility Management
School of Engineering
Organisational Unit: Institute of Applied Information Technology (InIT)
Institute of Applied Simulation (IAS)
Published as part of the ZHAW project: Bio-SODA: Enabling Complex, Semantic Queries to Bioinformatics Databases through Intuitive Searching over Data
Appears in Collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
SemanticQueriesOverFederatedDatabases_DatabaseJournal2019.pdfSemanticQueriesOverFederatedDatabases_DatabaseJournal20192.27 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.