Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-31238
Publication type: Conference paper
Type of review: Peer review (publication)
Title: GraLMatch : matching groups of entities with graphs and language models
Authors: de Meer Pardo, Fernando
Lehmann, Claude
Gehrig, Dennis
Nagy, Andrea
Nicoli, Stefano
Branka Hadji, Misheva
Braschler, Martin
Stockinger, Kurt
et. al: No
DOI: 10.48786/edbt.2025.01
10.21256/zhaw-31238
Proceedings: Proceedings 28th International Conference on Extending Database Technology (EDBT 2025)
Page(s): 1
Pages to: 12
Conference details: 28th International Conference on Extending Database Technology (EDBT), Barcelona, Spain, 25-28 March 2025
Issue Date: Mar-2025
Series: Advances in Database Technology
Series volume: 28
Publisher / Ed. Institution: Open Proceedings
ISBN: 978-3-98318-097-4
ISSN: 2367-2005
Language: English
Subjects: Data integration; Entity matching; Machine learning; Large language models; Graph processing
Subject (DDC): 005: Computer programming, programs and data
006: Special computer methods
Abstract: In this paper, we present an end-to-end multi-source Entity Matching problem, which we call entity group matching, where the goal is to assign to the same group records originating from multiple data sources but representing the same real-world entity. We focus on the effects of transitively matched records, i.e. the records connected by paths in the graph G = (V,E) whose nodes and edges represent the records and whether they are a match or not. We present a real-world instance of this problem, where the challenge is to match records of companies and financial securities originating from different data providers. We also introduce two new multi-source benchmark datasets that present similar matching challenges as real-world records. A distinctive characteristic of these records is that they are regularly updated following real-world events, but updates are not applied uniformly across data sources. This phenomenon makes the matching of certain groups of records only possible through the use of transitive information. In our experiments, we illustrate how considering transitively matched records is challenging since a limited amount of false positive pairwise match predictions can throw off the group assignment of large quantities of records. Thus, we propose GraLMatch, a method that can partially detect and remove false positive pairwise predictions through graph-based properties. Finally, we showcase how fine-tuning a Transformer-based model (DistilBERT) on a reduced number of labeled samples yields a better final entity group matching than training on more samples and/or incorporating fine-tuning optimizations, illustrating how precision becomes the deciding factor in the entity group matching of large volumes of records.
URI: https://openproceedings.org/2025/conf/edbt/paper-10.pdf
https://digitalcollection.zhaw.ch/handle/11475/31238
Fulltext version: Published version
License (according to publishing contract): CC BY-NC-ND 4.0: Attribution - Non commercial - No derivatives 4.0 International
Departement: School of Engineering
Organisational Unit: Institute of Data Analysis and Process Design (IDP)
Institute of Computer Science (InIT)
Published as part of the ZHAW project: DataInc – Intelligent Data Integration and Cleaning
Appears in collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
2025_DeMeerPardo-etal_GraLMatch-matching-groups-entities-EDBT.pdf653.06 kBAdobe PDFThumbnail
View/Open
Show full item record
de Meer Pardo, F., Lehmann, C., Gehrig, D., Nagy, A., Nicoli, S., Branka Hadji, M., Braschler, M., & Stockinger, K. (2025). GraLMatch : matching groups of entities with graphs and language models [Conference paper]. Proceedings 28th International Conference on Extending Database Technology (EDBT 2025), 1–12. https://doi.org/10.48786/edbt.2025.01
de Meer Pardo, F. et al. (2025) ‘GraLMatch : matching groups of entities with graphs and language models’, in Proceedings 28th International Conference on Extending Database Technology (EDBT 2025). Open Proceedings, pp. 1–12. Available at: https://doi.org/10.48786/edbt.2025.01.
F. de Meer Pardo et al., “GraLMatch : matching groups of entities with graphs and language models,” in Proceedings 28th International Conference on Extending Database Technology (EDBT 2025), Mar. 2025, pp. 1–12. doi: 10.48786/edbt.2025.01.
DE MEER PARDO, Fernando, Claude LEHMANN, Dennis GEHRIG, Andrea NAGY, Stefano NICOLI, Misheva BRANKA HADJI, Martin BRASCHLER und Kurt STOCKINGER, 2025. GraLMatch : matching groups of entities with graphs and language models. In: Proceedings 28th International Conference on Extending Database Technology (EDBT 2025) [online]. Conference paper. Open Proceedings. März 2025. S. 1–12. ISBN 978-3-98318-097-4. Verfügbar unter: https://openproceedings.org/2025/conf/edbt/paper-10.pdf
de Meer Pardo, Fernando, Claude Lehmann, Dennis Gehrig, Andrea Nagy, Stefano Nicoli, Misheva Branka Hadji, Martin Braschler, and Kurt Stockinger. 2025. “GraLMatch : Matching Groups of Entities with Graphs and Language Models.” Conference paper. In Proceedings 28th International Conference on Extending Database Technology (EDBT 2025), 1–12. Open Proceedings. https://doi.org/10.48786/edbt.2025.01.
de Meer Pardo, Fernando, et al. “GraLMatch : Matching Groups of Entities with Graphs and Language Models.” Proceedings 28th International Conference on Extending Database Technology (EDBT 2025), Open Proceedings, 2025, pp. 1–12, https://doi.org/10.48786/edbt.2025.01.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.