A Twitter corpus and benchmark resources for german sentiment analysis

Cieliebak, Mark; Deriu, Jan Milan; Egger, Dominic; Uzdilli, Fatih

doi:10.21256/zhaw-1530

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: https://doi.org/10.21256/zhaw-1530

Publikationstyp:	Konferenz: Paper
Art der Begutachtung:	Peer review (Abstract)
Titel:	A Twitter corpus and benchmark resources for german sentiment analysis
Autor/-in:	Cieliebak, Mark Deriu, Jan Milan Egger, Dominic Uzdilli, Fatih
DOI:	10.21256/zhaw-1530 10.18653/v1/W17-1106
Seite(n):	45
Seiten bis:	51
Angaben zur Konferenz:	5th International Workshop on Natural Language Processing for Social Media, Boston MA, USA, 11 December 2017
Erscheinungsdatum:	2017
Verlag / Hrsg. Institution:	Association for Computational Linguistics
Sprache:	Englisch
Schlagwörter:	Sentiment Analysis; Corpus; Twitter
Fachgebiet (DDC):	006: Spezielle Computerverfahren 410.285: Computerlinguistik
Zusammenfassung:	In this paper we present SB10k, a new corpus for sentiment analysis with approx.10,000 German tweets. We use this new corpus and two existing corpora to provide state-of-the-art bench-marks for sentiment analysis in German:we implemented a CNN (based on the winning system of SemEval-2016) and a feature-based SVM and compare their performance on all three corpora. For the CNN, we also created German word embeddings trained on 300M tweets. These word embeddings were then optimized for sentiment analysis using distant-supervised learning. The new corpus, the German word embeddings (plain and optimized), and source code to re-run the benchmarks are publicly available.
URI:	https://digitalcollection.zhaw.ch/handle/11475/1856
Volltext Version:	Publizierte Version
Lizenz (gemäss Verlagsvertrag):	Lizenz gemäss Verlagsvertrag
Departement:	School of Engineering
Organisationseinheit:	Institut für Informatik (InIT)
Enthalten in den Sammlungen:	Publikationen School of Engineering

Dateien zu dieser Ressource:

Datei	Beschreibung	Größe	Format
10_Paper.pdf		516.72 kB	Adobe PDF	Öffnen/Anzeigen

Zur Langanzeige

Cieliebak, M., Deriu, J. M., Egger, D., & Uzdilli, F. (2017). A Twitter corpus and benchmark resources for german sentiment analysis [Conference paper]. 5th International Workshop on Natural Language Processing for Social Media, Boston MA, USA, 11 December 2017, 45–51. https://doi.org/10.21256/zhaw-1530

Cieliebak, M. et al. (2017) ‘A Twitter corpus and benchmark resources for german sentiment analysis’, in 5th International Workshop on Natural Language Processing for Social Media, Boston MA, USA, 11 December 2017. Association for Computational Linguistics, pp. 45–51. Available at: https://doi.org/10.21256/zhaw-1530.

M. Cieliebak, J. M. Deriu, D. Egger, and F. Uzdilli, “A Twitter corpus and benchmark resources for german sentiment analysis,” in 5th International Workshop on Natural Language Processing for Social Media, Boston MA, USA, 11 December 2017, 2017, pp. 45–51. doi: 10.21256/zhaw-1530.

CIELIEBAK, Mark, Jan Milan DERIU, Dominic EGGER und Fatih UZDILLI, 2017. A Twitter corpus and benchmark resources for german sentiment analysis. In: 5th International Workshop on Natural Language Processing for Social Media, Boston MA, USA, 11 December 2017. Conference paper. Association for Computational Linguistics. 2017. S. 45–51

Cieliebak, Mark, Jan Milan Deriu, Dominic Egger, and Fatih Uzdilli. 2017. “A Twitter Corpus and Benchmark Resources for German Sentiment Analysis.” Conference paper. In 5th International Workshop on Natural Language Processing for Social Media, Boston MA, USA, 11 December 2017, 45–51. Association for Computational Linguistics. https://doi.org/10.21256/zhaw-1530.

Cieliebak, Mark, et al. “A Twitter Corpus and Benchmark Resources for German Sentiment Analysis.” 5th International Workshop on Natural Language Processing for Social Media, Boston MA, USA, 11 December 2017, Association for Computational Linguistics, 2017, pp. 45–51, https://doi.org/10.21256/zhaw-1530.

Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt, soweit nicht anderweitig angezeigt.