ScienceBenchmark : a complex real-world benchmark for evaluating natural language to SQL systems

Zhang, Yi; Deriu, Jan Milan; Katsogiannis-Meimarakis, George; Kosten, Catherine; Koutrika, Georgia; Stockinger, Kurt

doi:10.14778/3636218.3636225

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: https://doi.org/10.21256/zhaw-30173

Publikationstyp:	Beitrag in wissenschaftlicher Zeitschrift
Art der Begutachtung:	Peer review (Publikation)
Titel:	ScienceBenchmark : a complex real-world benchmark for evaluating natural language to SQL systems
Autor/-in:	Zhang, Yi Deriu, Jan Milan Katsogiannis-Meimarakis, George Kosten, Catherine Koutrika, Georgia Stockinger, Kurt
et. al:	No
DOI:	10.14778/3636218.3636225 10.21256/zhaw-30173
Erschienen in:	Proceedings of the VLDB Endowment
Band(Heft):	17
Heft:	4
Seite(n):	685
Seiten bis:	698
Erscheinungsdatum:	Mär-2024
Verlag / Hrsg. Institution:	Association for Computing Machinery
ISSN:	2150-8097
Sprache:	Englisch
Schlagwörter:	Database system; Latural language processing; Machine learning; Large language model
Fachgebiet (DDC):	005: Computerprogrammierung, Programme und Daten 006: Spezielle Computerverfahren
Zusammenfassung:	Natural Language to SQL systems (NL-to-SQL) have recently shown improved accuracy (exceeding 80%) for natural language to SQL query translation due to the emergence of transformer-based language models, and the popularity of the Spider benchmark. However, Spider mainly contains simple databases with few tables, columns, and entries, which do not reflect a realistic setting. Moreover, complex real-world databases with domain-specific content have little to no training data available in the form of NL/SQL-pairs leading to poor performance of existing NL-to-SQL systems. In this paper, we introduce ScienceBenchmark, a new complex NL-to-SQL benchmark for three real-world, highly domain-specific databases. For this new benchmark, SQL experts and domain experts created high-quality NL/SQL-pairs for each domain. To garner more data, we extended the small amount of human-generated data with synthetic data generated using GPT-3. We show that our benchmark is highly challenging, as the top performing systems on Spider achieve a very low performance on our benchmark. Thus, the challenge is many-fold: creating NL-to-SQL systems for highly complex domains with a small amount of hand-made training data augmented with synthetic data. To our knowledge, ScienceBenchmark is the first NL-to-SQL benchmark designed with complex real-world scientific databases, containing challenging training and test data carefully validated by domain experts.
URI:	https://digitalcollection.zhaw.ch/handle/11475/30173
Volltext Version:	Publizierte Version
Lizenz (gemäss Verlagsvertrag):	CC BY-NC-ND 4.0: Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International
Departement:	School of Engineering
Organisationseinheit:	Centre for Artificial Intelligence (CAI) Institut für Informatik (InIT)
Publiziert im Rahmen des ZHAW-Projekts:	INODE – Intelligent Open Data Exploration (EU Horizon 2020)
Enthalten in den Sammlungen:	Publikationen School of Engineering

Dateien zu dieser Ressource:

Datei	Beschreibung	Größe	Format
2024_Zhang-etal_ScienceBenchmark-PVLDB2024.pdf		608.6 kB	Adobe PDF	Öffnen/Anzeigen

Zur Langanzeige

Zhang, Y., Deriu, J. M., Katsogiannis-Meimarakis, G., Kosten, C., Koutrika, G., & Stockinger, K. (2024). ScienceBenchmark : a complex real-world benchmark for evaluating natural language to SQL systems. Proceedings of the VLDB Endowment, 17(4), 685–698. https://doi.org/10.14778/3636218.3636225

Zhang, Y. et al. (2024) ‘ScienceBenchmark : a complex real-world benchmark for evaluating natural language to SQL systems’, Proceedings of the VLDB Endowment, 17(4), pp. 685–698. Available at: https://doi.org/10.14778/3636218.3636225.

Y. Zhang, J. M. Deriu, G. Katsogiannis-Meimarakis, C. Kosten, G. Koutrika, and K. Stockinger, “ScienceBenchmark : a complex real-world benchmark for evaluating natural language to SQL systems,” Proceedings of the VLDB Endowment, vol. 17, no. 4, pp. 685–698, Mar. 2024, doi: 10.14778/3636218.3636225.

ZHANG, Yi, Jan Milan DERIU, George KATSOGIANNIS-MEIMARAKIS, Catherine KOSTEN, Georgia KOUTRIKA und Kurt STOCKINGER, 2024. ScienceBenchmark : a complex real-world benchmark for evaluating natural language to SQL systems. Proceedings of the VLDB Endowment. März 2024. Bd. 17, Nr. 4, S. 685–698. DOI 10.14778/3636218.3636225

Zhang, Yi, Jan Milan Deriu, George Katsogiannis-Meimarakis, Catherine Kosten, Georgia Koutrika, and Kurt Stockinger. 2024. “ScienceBenchmark : A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems.” Proceedings of the VLDB Endowment 17 (4): 685–98. https://doi.org/10.14778/3636218.3636225.

Zhang, Yi, et al. “ScienceBenchmark : A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems.” Proceedings of the VLDB Endowment, vol. 17, no. 4, Mar. 2024, pp. 685–98, https://doi.org/10.14778/3636218.3636225.

Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt, soweit nicht anderweitig angezeigt.