Please use this identifier to cite or link to this item:
https://doi.org/10.21256/zhaw-4974
Publication type: | Conference paper |
Type of review: | Peer review (publication) |
Title: | German compound splitting using the compound productivity of morphemes |
Authors: | Sugisaki, Kyoko Tuggener, Don |
DOI: | 10.21256/zhaw-4974 |
Proceedings: | 14th Conference on Natural Language Processing - KONVENS 2018 |
Editors of the parent work: | Barbaresi, Adrien Biber, Hanno Neubarth, Friedrich Osswald, Rainer |
Pages: | 141 |
Pages to: | 147 |
Conference details: | 14th Conference on Natural Language Processing (KONVENS 2018), Vienna, Austria, 19-21 September 2018 |
Issue Date: | 2018 |
Publisher / Ed. Institution: | Austrian Academy of Sciences Press |
Other identifiers: | 0xc1aa5576 0x003a2438 |
Language: | English |
Subjects: | Compound splitting |
Subject (DDC): | 410.285: Computational linguistics |
Abstract: | In this work, we present a novel compound splitting method for German by capturing the compound productivity of morphemes. We use a giga web corpus to create a lexicon and decompose noun compounds by computing the probabilities of compound elements as bound and free morphemes. Furthermore, we provide a uniformed evaluation of several unsupervised approaches and morphological analysers for the task. Our method achieved a high F1 score of 0.92, which was a comparable result to state-of-the-art methods. |
URI: | https://digitalcollection.zhaw.ch/handle/11475/14372 |
Fulltext version: | Published version |
License (according to publishing contract): | Licence according to publishing contract |
Departement: | School of Engineering |
Organisational Unit: | Institute of Applied Information Technology (InIT) |
Appears in collections: | Publikationen School of Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2018_Sugisaki_German_compound_splitting_using_the_compound.pdf | 177.39 kB | Adobe PDF | ![]() View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.