Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-19637
Publication type: Conference paper
Type of review: Peer review (publication)
Title: Entity matching with transformer architectures - a step forward in data integration
Authors : Brunner, Ursin
Stockinger, Kurt
et. al : No
DOI : 10.21256/zhaw-19637
Proceedings: Proceedings of the 23rd EDBT
Conference details: International Conference on Extending Database Technology, Copenhagen, 30 March-2 April 2020
Issue Date: Mar-2020
ISBN: 978-3-89318-083-7
Language : English
Subjects : Entity matching; Data integration; Machine learning; Neural networks; Transformers; BERT
Subject (DDC) : 004: Computer science
Abstract: Transformer architectures have proven to be very effective and provide state-of-the-art results in many natural language tasks. The attention-based architecture in combination with pre-training on large amounts of text lead to the recent breakthrough and a variety of slightly different implementations. In this paper we analyze how well four of the most recent attention-based transformer architectures (BERT, XLNet, RoBERTa and DistilBERT) perform on the task of entity matching - a crucial part of data integration. Entity matching (EM) is the task of finding data instances that refer to the same real-world entity. It is a challenging task if the data instances consist of long textual data or if the data instances are "dirty" due to misplaced values. To evaluate the capability of transformer architectures and transfer-learning on the task of EM, we empirically compare the four approaches on inherently difficult data sets. We show that transformer architectures outperform classical deep learning methods in EM by an average margin of 27.5%.
URI: https://digitalcollection.zhaw.ch/handle/11475/19637
Fulltext version : Published version
License (according to publishing contract) : CC BY-NC-ND 4.0: Attribution - Non commercial - No derivatives 4.0 International
Departement: School of Engineering
Organisational Unit: Institute of Applied Information Technology (InIT)
Appears in Collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
Entity_Machting_with_Transformers_edbt_2020__Camera_Ready.pdfEntity Machting with Transformers EDBT 20201.12 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.