A methodology for creating question answering corpora using inverse data annotation

Deriu, Jan Milan; Mlynchyk, Katsiaryna; Schläpfer, Philippe; Rodrigo, Alvaro; von Grünigen, Dirk; Kaiser, Nicolas; Stockinger, Kurt; Agirre, Eneko; Cieliebak, Mark

doi:10.18653/v1/2020.acl-main.84

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: https://doi.org/10.21256/zhaw-20319

Publikationstyp:	Konferenz: Paper
Art der Begutachtung:	Peer review (Publikation)
Titel:	A methodology for creating question answering corpora using inverse data annotation
Autor/-in:	Deriu, Jan Milan Mlynchyk, Katsiaryna Schläpfer, Philippe Rodrigo, Alvaro von Grünigen, Dirk Kaiser, Nicolas Stockinger, Kurt Agirre, Eneko Cieliebak, Mark
et. al:	No
DOI:	10.18653/v1/2020.acl-main.84 10.21256/zhaw-20319
Tagungsband:	Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Seite(n):	897
Seiten bis:	911
Angaben zur Konferenz:	58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), online, 5-10 July 2020
Erscheinungsdatum:	Jul-2020
Verlag / Hrsg. Institution:	Association for Computational Linguistics
Sprache:	Englisch
Schlagwörter:	Natural language interface to database; Artificial intelligence; Deep learning; Semantic parsing
Fachgebiet (DDC):	006: Spezielle Computerverfahren 400: Sprache und Linguistik
Zusammenfassung:	In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database, called Operation Trees (OT). This representation allows us to invert the annotation process without loosing flexibility in the types of queries that we generate. Furthermore, it allows for fine-grained alignment of the tokens to the operations. Thus, we randomly generate OTs from a context free grammar and annotators just have to write the appropriate question and assign the tokens. We compare our corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus for evaluating natural language interfaces to databases, to Spider and LC-QuaD 2.0 and show that our methodology more than triples the annotation speed while maintaining the complexity of the queries. Finally, we train a state-of-the-art semantic parsing model on our data and show that our dataset is a challenging dataset and that the token alignment can be leveraged to significantly increase the performance.
URI:	https://digitalcollection.zhaw.ch/handle/11475/20319
Volltext Version:	Publizierte Version
Lizenz (gemäss Verlagsvertrag):	CC BY 4.0: Namensnennung 4.0 International
Departement:	School of Engineering
Organisationseinheit:	Institut für Informatik (InIT)
Publiziert im Rahmen des ZHAW-Projekts:	LIHLITH - Learning to Interact with Humans by Lifelong Interaction with Humans EU Horizon 2020: INODE - Intelligent Open Data Exploration
Enthalten in den Sammlungen:	Publikationen School of Engineering

Dateien zu dieser Ressource:

Datei	Beschreibung	Größe	Format
2020_Deriu-etal_Question-answering-corpora-inverse-data-annotation.pdf		556.6 kB	Adobe PDF	Öffnen/Anzeigen

Zur Langanzeige

Deriu, J. M., Mlynchyk, K., Schläpfer, P., Rodrigo, A., von Grünigen, D., Kaiser, N., Stockinger, K., Agirre, E., & Cieliebak, M. (2020). A methodology for creating question answering corpora using inverse data annotation [Conference paper]. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 897–911. https://doi.org/10.18653/v1/2020.acl-main.84

Deriu, J.M. et al. (2020) ‘A methodology for creating question answering corpora using inverse data annotation’, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 897–911. Available at: https://doi.org/10.18653/v1/2020.acl-main.84.

J. M. Deriu et al., “A methodology for creating question answering corpora using inverse data annotation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp. 897–911. doi: 10.18653/v1/2020.acl-main.84.

DERIU, Jan Milan, Katsiaryna MLYNCHYK, Philippe SCHLÄPFER, Alvaro RODRIGO, Dirk VON GRÜNIGEN, Nicolas KAISER, Kurt STOCKINGER, Eneko AGIRRE und Mark CIELIEBAK, 2020. A methodology for creating question answering corpora using inverse data annotation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Conference paper. Association for Computational Linguistics. Juli 2020. S. 897–911

Deriu, Jan Milan, Katsiaryna Mlynchyk, Philippe Schläpfer, Alvaro Rodrigo, Dirk von Grünigen, Nicolas Kaiser, Kurt Stockinger, Eneko Agirre, and Mark Cieliebak. 2020. “A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation.” Conference paper. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 897–911. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.84.

Deriu, Jan Milan, et al. “A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation.” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2020, pp. 897–911, https://doi.org/10.18653/v1/2020.acl-main.84.

Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt, soweit nicht anderweitig angezeigt.