Title: Manual and semi-automatic normalization of historical spelling : case studies from Early New High German
Authors : Bollmann, Marcel
Dipper, Stefanie
Krasselt, Julia
Petran, Florian
Proceedings: Proceedings of the 11th Edition of the Conference on Natural Language Processing (KONVENS). Vienna, September 19-21, 2012
Pages : 342
Pages to: 350
Conference details: Conference on Natural Language Processing (KONVENS), First International Workshop on Language Technology for Historical Text(s) (LThist2012), Vienna, 21. September 2012
Publisher / Ed. Institution : Eigenverlag ÖGAI
Publisher / Ed. Institution: Wien
Issue Date: 2012
Series : Schriftenreihe der Österreichischen Gesellschaft für Artificial Intelligence (ÖGAI)
Series volume: 5
Language : Englisch / English
Subject (DDC) : 410.285: Computerlinguistik
Abstract: This paper presents work on manual and semi-automatic normalization of historical language data. We first address the guidelines that we use for mapping historical to modern word forms. The guidelines distinguish between normalization (preferring forms close to the original) and modernization (preferring forms close to modern language). Average inter-annotator agreement is 88.38% on a set of data from Early New High German. We then present Norma, a semi-automatic normalization tool. It integrates different modules (lexicon lookup, rewrite rules) for normalizing words in an interactive way. The tool dynamically updates the set of rule entries, given new input. Depending on the text and training settings, normalizing 1,000 tokens results in overall accuracies of 61.78–79.65% (baseline: 24.76–59.53%).
Departement: Angewandte Linguistik
Publication type: Konferenz: Paper / Conference Paper
Type of review: Peer review (Abstract)
ISBN: 3-85027-005-X
URI: http://www.oegai.at/konvens2012/proceedings/51_bollmann12w/51_bollmann12w.pdf
https://digitalcollection.zhaw.ch/handle/11475/4045
License (according to publishing contract) : Lizenz gemäss Verlagsvertrag / Licence according to publishing contract
Appears in Collections:Publikationen Angewandte Linguistik

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.