Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-1530
Title: A Twitter corpus and benchmark resources for german sentiment analysis
Authors : Cieliebak, Mark
Deriu, Jan Milan
Egger, Dominic
Uzdilli, Fatih
Pages : 45
Pages to: 51
Conference details: 5th International Workshop on Natural Language Processing for Social Media, Boston, MA, USA, December 11, 2017
Publisher / Ed. Institution : Association for Computational Linguistics
Issue Date: 11-Dec-2017
License (according to publishing contract) : Licence according to publishing contract
Type of review: Peer review (Abstract)
Language : English
Subjects : Sentiment Analysis; Corpus; Twitter
Subject (DDC) : 004: Computer science
005: Computer programming, programs and data
410.285: Computational linguistics
Abstract: In this paper we present SB10k, a newcorpus for sentiment analysis with approx.10,000 German tweets.We use this new corpus and two existingcorpora to provide state-of-the-art bench-marks for sentiment analysis in German:we implemented a CNN (based on thewinning system of SemEval-2016) anda feature-based SVM and compare theirperformance on all three corpora.For the CNN, we also created Germanword embeddings trained on 300Mtweets. These word embeddings werethen optimized for sentiment analysisusing distant-supervised learning.The new corpus, the German wordembeddings (plain and optimized), andsource code to re-run the benchmarks arepublicly available.
Departement: School of Engineering
Organisational Unit: Institute of Applied Information Technology (InIT)
Publication type: Conference Paper
DOI : 10.18653/v1/W17-1106
10.21256/zhaw-1530
URI: https://digitalcollection.zhaw.ch/handle/11475/1856
Appears in Collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
10_Paper.pdf516.72 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.