Full metadata record
DC FieldValueLanguage
dc.contributor.authorMildenberger, Thoralf-
dc.description.abstractIn corpus linguistics, statistical hypotheses tests (e.g. Likelihood-Ratio, Chi-Square or Fisher’s Exact Test) are used for identifying keywords, i. e. words that occur more frequently in one corpus than in another one. A problem with these tests is that they are all essentially based on the same, often inappropriate sampling model: Corpora are modeled as sets of independently sampled tokens, although in many cases the natural sampling units are whole texts. Occurrences of words tend to cluster in only a few texts, and in the extreme a word may be identified as a keyword because it appears very often in only one single text. We propose the use of permutation tests based on a model that regards corpora as samples of texts instead of samples of tokens, which often seems much more realistic. P-values for assessing keyness can be obtained using Monte Carlo methods, making the method applicable in practice. We outline our approach and contrast the results with those obtained by traditional methods.de_CH
dc.rightsLicence according to publishing contractde_CH
dc.subjectCorpus statisticsde_CH
dc.subjectKeyword analysisde_CH
dc.subjectPermutation testde_CH
dc.subject.ddc400: Sprache und Linguistikde_CH
dc.subject.ddc500: Naturwissenschaften und Mathematikde_CH
dc.titleAssessing keyness using permutation testsde_CH
dc.typeKonferenz: Paperde_CH
zhaw.departementSchool of Engineeringde_CH
zhaw.organisationalunitInstitut für Datenanalyse und Prozessdesign (IDP)de_CH
zhaw.conference.detailsStatistical standards for scientific discovery in linguistics, Zurich, 4–6 October 2017de_CH
zhaw.publication.reviewNot specifiedde_CH
zhaw.webfeedStatistik und Quantitative Financede_CH
zhaw.funding.zhawEnergiediskurs Messende_CH
Appears in Collections:Publikationen School of Engineering

Files in This Item:
There are no files associated with this item.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.