Publikationstyp: | Konferenz: Paper |
Art der Begutachtung: | Keine Angabe |
Titel: | Assessing keyness using permutation tests |
Autor/-in: | Mildenberger, Thoralf |
et. al: | No |
Angaben zur Konferenz: | Statistical standards for scientific discovery in linguistics, Zurich, 4–6 October 2017 |
Erscheinungsdatum: | 6-Okt-2017 |
Sprache: | Englisch |
Schlagwörter: | Corpus statistics; Keyword analysis; Permutation test |
Fachgebiet (DDC): | 400: Sprache und Linguistik 510: Mathematik |
Zusammenfassung: | In corpus linguistics, statistical hypotheses tests (e.g. Likelihood-Ratio, Chi-Square or Fisher’s Exact Test) are used for identifying keywords, i. e. words that occur more frequently in one corpus than in another one. A problem with these tests is that they are all essentially based on the same, often inappropriate sampling model: Corpora are modeled as sets of independently sampled tokens, although in many cases the natural sampling units are whole texts. Occurrences of words tend to cluster in only a few texts, and in the extreme a word may be identified as a keyword because it appears very often in only one single text. We propose the use of permutation tests based on a model that regards corpora as samples of texts instead of samples of tokens, which often seems much more realistic. P-values for assessing keyness can be obtained using Monte Carlo methods, making the method applicable in practice. We outline our approach and contrast the results with those obtained by traditional methods. |
URI: | https://digitalcollection.zhaw.ch/handle/11475/20059 |
Volltext Version: | Publizierte Version |
Lizenz (gemäss Verlagsvertrag): | Lizenz gemäss Verlagsvertrag |
Departement: | School of Engineering |
Organisationseinheit: | Institut für Datenanalyse und Prozessdesign (IDP) |
Publiziert im Rahmen des ZHAW-Projekts: | Energiediskurs Messen |
Enthalten in den Sammlungen: | Publikationen School of Engineering |
Dateien zu dieser Ressource:
Es gibt keine Dateien zu dieser Ressource.
Zur Langanzeige
Mildenberger, T. (2017, October 6). Assessing keyness using permutation tests. Statistical Standards for Scientific Discovery in Linguistics, Zurich, 4–6 October 2017.
Mildenberger, T. (2017) ‘Assessing keyness using permutation tests’, in Statistical standards for scientific discovery in linguistics, Zurich, 4–6 October 2017.
T. Mildenberger, “Assessing keyness using permutation tests,” in Statistical standards for scientific discovery in linguistics, Zurich, 4–6 October 2017, Oct. 2017.
MILDENBERGER, Thoralf, 2017. Assessing keyness using permutation tests. In: Statistical standards for scientific discovery in linguistics, Zurich, 4–6 October 2017. Conference paper. 6 Oktober 2017
Mildenberger, Thoralf. 2017. “Assessing Keyness Using Permutation Tests.” Conference paper. In Statistical Standards for Scientific Discovery in Linguistics, Zurich, 4–6 October 2017.
Mildenberger, Thoralf. “Assessing Keyness Using Permutation Tests.” Statistical Standards for Scientific Discovery in Linguistics, Zurich, 4–6 October 2017, 2017.
Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt, soweit nicht anderweitig angezeigt.