Publication type: Conference paper
Type of review: Peer review (abstract)
Title: Manual and semi-automatic normalization of historical spelling : case studies from Early New High German
Authors : Bollmann, Marcel
Dipper, Stefanie
Krasselt, Julia
Petran, Florian
Proceedings: Proceedings of the 11th Edition of the Conference on Natural Language Processing (KONVENS). Vienna, September 19-21, 2012
Pages : 342
Pages to: 350
Conference details: Conference on Natural Language Processing (KONVENS 2012), Vienna, Austria, 21 September 2012
Issue Date: 2012
Series : Schriftenreihe der Österreichischen Gesellschaft für Artificial Intelligence (ÖGAI)
Series volume: 5
Publisher / Ed. Institution : Eigenverlag ÖGAI
Publisher / Ed. Institution: Wien
ISBN: 3-85027-005-X
Language : English
Subject (DDC) : 410.285: Computational linguistics
Abstract: This paper presents work on manual and semi-automatic normalization of historical language data. We first address the guidelines that we use for mapping historical to modern word forms. The guidelines distinguish between normalization (preferring forms close to the original) and modernization (preferring forms close to modern language). Average inter-annotator agreement is 88.38% on a set of data from Early New High German. We then present Norma, a semi-automatic normalization tool. It integrates different modules (lexicon lookup, rewrite rules) for normalizing words in an interactive way. The tool dynamically updates the set of rule entries, given new input. Depending on the text and training settings, normalizing 1,000 tokens results in overall accuracies of 61.78–79.65% (baseline: 24.76–59.53%).
Fulltext version : Published version
License (according to publishing contract) : Licence according to publishing contract
Departement: Applied Linguistics
Appears in Collections:Publikationen Angewandte Linguistik

Files in This Item:
There are no files associated with this item.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.