Please use this identifier to cite or link to this item:
Title: Twist Bytes : German dialect identification with data mining optimization
Authors : Benites de Azevedo e Souza, Fernando
Grubenmann, Ralf
von Däniken, Pius
von Grünigen, Dirk
Deriu, Jan Milan
Cieliebak, Mark
Proceedings: Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Pages : 218
Pages to: 227
Conference details: 27th International Conference on Computational Linguistics (COLING 2018), Santa Fe, August 20-26, 2018
Publisher / Ed. Institution : VarDial
Issue Date: 2018
License (according to publishing contract) : CC BY 4.0: Namensnennung 4.0 International
Type of review: Peer review (Publication)
Language : English
Subjects : Dialect recognition; Text classification; Shared task; Swiss german
Subject (DDC) : 410.285: Computational linguistics
430: German
Abstract: We describe our approaches used in the German Dialect Identification (GDI) task at the VarDial Evaluation Campaign 2018. The goal was to identify to which out of four dialects spoken in German speaking part of Switzerland a sentence belonged to. We adopted two different metaclassifier approaches and used some data mining insights to improve the preprocessing and the meta-classifier parameters. Especially, we focused on using different feature extraction methods and how to combine them, since they influenced the performance very differently of the system. Our system achieved second place out of 8 teams, with a macro averaged F-1 of 64.6%. We also participated on the surprise dialect task with a multi-label approach.
Departement: School of Engineering
Organisational Unit: Institute of Applied Information Technology (InIT)
Publication type: Conference Paper
DOI : 10.21256/zhaw-4850
Appears in Collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
2018_Benites_Twist_Bytes_German_Dialect_Identification_with_data_mining_optimization.pdf250.32 kBAdobe PDFThumbnail

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.