Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-3197
Full metadata record
DC FieldValueLanguage
dc.contributor.authorBrunner, Ursin-
dc.contributor.authorStockinger, Kurt-
dc.date.accessioned2019-07-01T12:17:01Z-
dc.date.available2019-07-01T12:17:01Z-
dc.date.issued2019-06-14-
dc.identifier.isbn978-1-7281-3105-4de_CH
dc.identifier.urihttps://digitalcollection.zhaw.ch/handle/11475/17388-
dc.description​© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.de_CH
dc.description.abstractWith the growing number of data sources in enterprises, entity matching becomes a crucial part of every data integration project. In order to reduce the human effort involved in identifying matching entities between different database tables, typically machine learning algorithms are applied. Moreover, active learning is often combined with supervised machine learning methods to further reduce the effort of labeling entities as true or false matches. However, while state-of-the-art active learning algorithms have proven to work well on structured data sets, unstructured data still poses a challenge in entity matching. This paper proposes an end-to-end entity matching pipeline to minimize the human labeling effort for entity matching on unstructured data sets. We use several natural language processing techniques such as soft tf-idf to pre-process the record pairs before we classify them using a novel Active Learning with Uncertainty Sampling (ALWUS) algorithm. We designed our algorithm as a plug-in system to work with any state-of-the-art classifier such as support vector machines, random forests or deep neural networks. Detailed experimental results demonstrate that our end-to-end entity matching pipeline clearly outperforms comparable entity matching approaches on an unstructured real-word data set. Our approach achieves significantly better scores (F1-score) while using 1 to 2 orders of magnitude fewer human labeling efforts than existing state-of-the-art algorithms.de_CH
dc.language.isoende_CH
dc.publisherIEEEde_CH
dc.rightsNot specifiedde_CH
dc.subjectEntity matchingde_CH
dc.subjectActive learningde_CH
dc.subjectData integrationde_CH
dc.subjectUnstructured datade_CH
dc.subject.ddc005: Computerprogrammierung, Programme und Datende_CH
dc.subject.ddc006: Spezielle Computerverfahrende_CH
dc.titleEntity matching on unstructured data : an active learning approachde_CH
dc.typeKonferenz: Paperde_CH
dcterms.typeTextde_CH
zhaw.departementSchool of Engineeringde_CH
zhaw.organisationalunitInstitut für Informatik (InIT)de_CH
dc.identifier.doi10.1109/SDS.2019.00006de_CH
dc.identifier.doi10.21256/zhaw-3197-
zhaw.conference.detailsSwiss Conference on Data Science, Berne, Switzerland, 14 June 2019de_CH
zhaw.funding.euNode_CH
zhaw.originated.zhawYesde_CH
zhaw.pages.end102de_CH
zhaw.pages.start97de_CH
zhaw.publication.statusacceptedVersionde_CH
zhaw.publication.reviewPeer review (Publikation)de_CH
zhaw.title.proceedingsProceedings of the 6th SDSde_CH
zhaw.webfeedDatalabde_CH
zhaw.webfeedInformation Engineeringde_CH
Appears in collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
ActiveLearning_Brunner_Stockinger_SDS_2019.pdfAccepted Version221.35 kBAdobe PDFThumbnail
View/Open
Show simple item record
Brunner, U., & Stockinger, K. (2019). Entity matching on unstructured data : an active learning approach [Conference paper]. Proceedings of the 6th SDS, 97–102. https://doi.org/10.1109/SDS.2019.00006
Brunner, U. and Stockinger, K. (2019) ‘Entity matching on unstructured data : an active learning approach’, in Proceedings of the 6th SDS. IEEE, pp. 97–102. Available at: https://doi.org/10.1109/SDS.2019.00006.
U. Brunner and K. Stockinger, “Entity matching on unstructured data : an active learning approach,” in Proceedings of the 6th SDS, Jun. 2019, pp. 97–102. doi: 10.1109/SDS.2019.00006.
BRUNNER, Ursin und Kurt STOCKINGER, 2019. Entity matching on unstructured data : an active learning approach. In: Proceedings of the 6th SDS. Conference paper. IEEE. 14 Juni 2019. S. 97–102. ISBN 978-1-7281-3105-4
Brunner, Ursin, and Kurt Stockinger. 2019. “Entity Matching on Unstructured Data : An Active Learning Approach.” Conference paper. In Proceedings of the 6th SDS, 97–102. IEEE. https://doi.org/10.1109/SDS.2019.00006.
Brunner, Ursin, and Kurt Stockinger. “Entity Matching on Unstructured Data : An Active Learning Approach.” Proceedings of the 6th SDS, IEEE, 2019, pp. 97–102, https://doi.org/10.1109/SDS.2019.00006.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.