Title: Manual and semi-automatic normalization of historical spelling : case studies from Early New High German
Authors : Bollmann, Marcel
Dipper, Stefanie
Krasselt, Julia
Petran, Florian
Proceedings: Proceedings of the 11th Edition of the Conference on Natural Language Processing (KONVENS). Vienna, September 19-21, 2012
Pages : 342
Pages to: 350
Conference details: Conference on Natural Language Processing (KONVENS), First International Workshop on Language Technology for Historical Text(s) (LThist2012), Vienna, 21. September 2012
Publisher / Ed. Institution : Eigenverlag ÖGAI
Publisher / Ed. Institution: Wien
Issue Date: 2012
License (according to publishing contract) : Licence according to publishing contract
Series : Schriftenreihe der Österreichischen Gesellschaft für Artificial Intelligence (ÖGAI)
Series volume: 5
Type of review: Peer review (Abstract)
Language : English
Subject (DDC) : 410.285: Computational linguistics
Abstract: This paper presents work on manual and semi-automatic normalization of historical language data. We first address the guidelines that we use for mapping historical to modern word forms. The guidelines distinguish between normalization (preferring forms close to the original) and modernization (preferring forms close to modern language). Average inter-annotator agreement is 88.38% on a set of data from Early New High German. We then present Norma, a semi-automatic normalization tool. It integrates different modules (lexicon lookup, rewrite rules) for normalizing words in an interactive way. The tool dynamically updates the set of rule entries, given new input. Depending on the text and training settings, normalizing 1,000 tokens results in overall accuracies of 61.78–79.65% (baseline: 24.76–59.53%).
Departement: Angewandte Linguistik
Publication type: Conference Paper
ISBN: 3-85027-005-X
URI: http://www.oegai.at/konvens2012/proceedings/51_bollmann12w/51_bollmann12w.pdf
https://digitalcollection.zhaw.ch/handle/11475/4045
Appears in Collections:Publikationen Angewandte Linguistik

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.