Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen:
https://doi.org/10.21256/zhaw-30993
Publikationstyp: | Konferenz: Paper |
Art der Begutachtung: | Peer review (Publikation) |
Titel: | StatBot.Swiss : bilingual open data exploration in natural language |
Autor/-in: | Nooralahzadeh, Farhad Zhang, Yi Smith, Ellery Maennel, Sabine Matthey-Doret, Cyril Raphaël, de Fondville Stockinger, Kurt |
et. al: | No |
DOI: | 10.21256/zhaw-30993 |
Tagungsband: | Findings of the Association for Computational Linguistics: ACL 2024 |
Angaben zur Konferenz: | 62nd Annual Meeting of the Association for Computational Linguistics (ACL), Bangkok, Thailand, 11-16 August 2024 |
Erscheinungsdatum: | Aug-2024 |
Verlag / Hrsg. Institution: | Association for Computational Linguistics |
Sprache: | Englisch |
Schlagwörter: | Natural language processing; Machine learning; Database; Generative AI |
Fachgebiet (DDC): | 005: Computerprogrammierung, Programme und Daten 006: Spezielle Computerverfahren |
Zusammenfassung: | The potential for improvements brought by Large Language Models (LLMs) in Text-to-SQL systems is mostly assessed on monolingual English datasets. However, LLMs' performance for other languages remains vastly unexplored. In this work, we release the StatBot.Swiss dataset, the \emph{first bilingual benchmark for evaluating Text-to-SQL systems} based on real-world applications. The StatBot.Swiss dataset contains 455 natural language/SQL-pairs over 35 big databases with varying level of complexity for both English and German. We evaluate the performance of state-of-the-art LLMs such as GPT-3.5-Turbo and mixtral-8x7b-instruct for the Text-to-SQL translation task using an in-context learning approach. Our experimental analysis illustrates that current LLMs struggle to generalize well in generating SQL queries on our novel bilingual dataset. |
URI: | https://digitalcollection.zhaw.ch/handle/11475/30993 |
Zugehörige Forschungsdaten: | https://github.com/dscc-admin-ch/statbot.swiss |
Volltext Version: | Akzeptierte Version |
Lizenz (gemäss Verlagsvertrag): | Lizenz gemäss Verlagsvertrag |
Departement: | School of Engineering |
Organisationseinheit: | Institut für Informatik (InIT) |
Publiziert im Rahmen des ZHAW-Projekts: | INODE4StatBot.swiss – Anwendung neuer Algorithmen zur automatischen Übersetzung natürlicher Sprache in die Datenbankabfragesprache SQL (NL-to-SQL) |
Enthalten in den Sammlungen: | Publikationen School of Engineering |
Dateien zu dieser Ressource:
Datei | Beschreibung | Größe | Format | |
---|---|---|---|---|
2024_Nooralahzadeh-etal_StatBot-Swiss_ACL2024.pdf | Accepted Version | 882.9 kB | Adobe PDF | ![]() Öffnen/Anzeigen |
Zur Langanzeige
Nooralahzadeh, F., Zhang, Y., Smith, E., Maennel, S., Matthey-Doret, C., Raphaël, d. F., & Stockinger, K. (2024, August). StatBot.Swiss : bilingual open data exploration in natural language. Findings of the Association for Computational Linguistics: ACL 2024. https://doi.org/10.21256/zhaw-30993
Nooralahzadeh, F. et al. (2024) ‘StatBot.Swiss : bilingual open data exploration in natural language’, in Findings of the Association for Computational Linguistics: ACL 2024. Association for Computational Linguistics. Available at: https://doi.org/10.21256/zhaw-30993.
F. Nooralahzadeh et al., “StatBot.Swiss : bilingual open data exploration in natural language,” in Findings of the Association for Computational Linguistics: ACL 2024, Aug. 2024. doi: 10.21256/zhaw-30993.
NOORALAHZADEH, Farhad, Yi ZHANG, Ellery SMITH, Sabine MAENNEL, Cyril MATTHEY-DORET, de Fondville RAPHAËL und Kurt STOCKINGER, 2024. StatBot.Swiss : bilingual open data exploration in natural language. In: Findings of the Association for Computational Linguistics: ACL 2024. Conference paper. Association for Computational Linguistics. August 2024
Nooralahzadeh, Farhad, Yi Zhang, Ellery Smith, Sabine Maennel, Cyril Matthey-Doret, de Fondville Raphaël, and Kurt Stockinger. 2024. “StatBot.Swiss : Bilingual Open Data Exploration in Natural Language.” Conference paper. In Findings of the Association for Computational Linguistics: ACL 2024. Association for Computational Linguistics. https://doi.org/10.21256/zhaw-30993.
Nooralahzadeh, Farhad, et al. “StatBot.Swiss : Bilingual Open Data Exploration in Natural Language.” Findings of the Association for Computational Linguistics: ACL 2024, Association for Computational Linguistics, 2024, https://doi.org/10.21256/zhaw-30993.
Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt, soweit nicht anderweitig angezeigt.