Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-20229
Publication type: Conference paper
Type of review: Peer review (publication)
Title: Annotating web tables through knowledge bases : a context-based approach (Best Paper Award)
Authors: Eslahi, Yasamin
Stockinger, Kurt
Bhardwaj, Akansha
Cudré-Mauroux, Philippe
Rosso, Paolo
et. al: No
DOI: 10.21256/zhaw-20229
Proceedings: Proceedings of the 7th SDS
Conference details: Swiss Conference on Data Science, Lucerne, Switzerland, 26 June 2020
Issue Date: Jun-2020
Publisher / Ed. Institution: IEEE
Language: English
Subjects: Web table annotation; Knowledge base; Embeddings; Data integration
Subject (DDC): 005: Computer programming, programs and data
Abstract: The Web has a collection of over 150 million tables, which as a whole represents an invaluable source of semi-structured knowledge. Such tables are commonly referred to as Web tables, and are considerably easier to leverage in automated processes than completely unstructured, free-format text. Understanding the semantics of Web tables is important since they are used in various applications like knowledge base augmentation, information retrieval or natural language interfaces for databases. The task of understanding the semantics of a given Web table is known as Web table annotation. In recent years, it has been tackled through methods where the table is enriched using existing knowledge bases containing valuable information on the domain at hand, its entities and their mutual relationships. In this paper, we present two novel and unsupervised Web table annotation methods, which leverage the context of the tables to better capture their semantics. Our first method is lookup-based and exploits text similarity to find reference entities in the knowledge base. The second method uses distributional vector representations – a.k.a. embeddings – of the Web tables to elicit their context and disambiguate their semantics. Experiments show that our proposed approach outperforms the state of the art in Web table annotation by up to 18%. Another contribution of this work is a manually corrected version of one of the popular gold standard datasets, Limaye, with annotations from DBpedia. Our dataset and code are publicly available.
Further description: Best Paper Award ​© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
URI: https://digitalcollection.zhaw.ch/handle/11475/20229
Fulltext version: Published version
License (according to publishing contract): Licence according to publishing contract
Departement: School of Engineering
Organisational Unit: Institute of Applied Information Technology (InIT)
Appears in Collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
2020_Eslahi_Annotating_web_tables_throughknowledge_bases_SDS_2020.pdf322.46 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.