Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-20125
Full metadata record
DC FieldValueLanguage
dc.contributor.authorUlasik, Malgorzata Anna-
dc.contributor.authorHürlimann, Manuela-
dc.contributor.authorGermann, Fabian-
dc.contributor.authorGedik, Esin-
dc.contributor.authorBenites de Azevedo e Souza, Fernando-
dc.contributor.authorCieliebak, Mark-
dc.date.accessioned2020-06-08T08:08:17Z-
dc.date.available2020-06-08T08:08:17Z-
dc.date.issued2020-
dc.identifier.isbn979-10-95546-34-4de_CH
dc.identifier.urihttps://www.aclweb.org/anthology/2020.lrec-1.798de_CH
dc.identifier.urihttps://digitalcollection.zhaw.ch/handle/11475/20125-
dc.description.abstractIn this paper, we present CEASR, a Corpus for Evaluating ASR quality. It is a data set derived from public speech corpora, containing manual transcripts enriched with metadata along with transcripts generated by several modern state-of-the-art ASR systems. CEASR provides this data in a unified structure, consistent across all corpora and systems with normalised transcript texts and metadata. We then use CEASR to evaluate the quality of ASR systems on the basis of their Word Error Rate (WER). Our experiments show, among other results, a substantial difference in quality between commercial versus open-source ASR tools and differences up to a factor of ten for single systems on different corpora. By using CEASR, we could very efficiently and easily obtain these results. This shows that our corpus enables researchers to perform ASR-related evaluations and various in-depth analyses with noticeably reduced effort: without the need to collect, process and transcribe the speech data themselves.de_CH
dc.language.isoende_CH
dc.publisherEuropean Language Resources Associationde_CH
dc.rightshttp://creativecommons.org/licenses/by-nc/4.0/de_CH
dc.subjectAutomatic speech recognitionde_CH
dc.subjectEvaluationde_CH
dc.subjectSpeech corpusde_CH
dc.subjectASR systemde_CH
dc.subject.ddc006: Spezielle Computerverfahrende_CH
dc.titleCEASR : a corpus for evaluating automatic speech recognitionde_CH
dc.typeKonferenz: Paperde_CH
dcterms.typeTextde_CH
zhaw.departementSchool of Engineeringde_CH
zhaw.organisationalunitInstitut für Informatik (InIT)de_CH
dc.identifier.doi10.21256/zhaw-20125-
zhaw.conference.details12th Language Resources and Evaluation Conference (LREC), Marseille, France, 11-16 May 2020de_CH
zhaw.funding.euNode_CH
zhaw.originated.zhawYesde_CH
zhaw.pages.end6485de_CH
zhaw.pages.start6477de_CH
zhaw.parentwork.editorCalzolari, Nicoletta-
zhaw.parentwork.editorBéchet, Frédéric-
zhaw.parentwork.editorBlache, Philippe-
zhaw.parentwork.editorChoukri, Khalid-
zhaw.parentwork.editorCieri, Christopher-
zhaw.parentwork.editorDeclerck, Thierry-
zhaw.parentwork.editorGoggi, Sara-
zhaw.parentwork.editorIsahara, Hitoshi-
zhaw.parentwork.editorMaegaard, Bente-
zhaw.parentwork.editorMariani, Joseph-
zhaw.parentwork.editorMazo, Hélène-
zhaw.parentwork.editorMoreno, Asuncion-
zhaw.parentwork.editorOdijk, Jan-
zhaw.parentwork.editorPiperidis, Stelios-
zhaw.publication.statuspublishedVersionde_CH
zhaw.publication.reviewPeer review (Publikation)de_CH
zhaw.title.proceedingsProceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)de_CH
zhaw.webfeedSoftware Systemsde_CH
zhaw.webfeedNatural Language Processingde_CH
zhaw.author.additionalNode_CH
zhaw.display.portraitYesde_CH
Appears in collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
2020_Ulasik-etal_CEASR_LREC.pdf733.4 kBAdobe PDFThumbnail
View/Open
Show simple item record
Ulasik, M. A., Hürlimann, M., Germann, F., Gedik, E., Benites de Azevedo e Souza, F., & Cieliebak, M. (2020). CEASR : a corpus for evaluating automatic speech recognition [Conference paper]. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) (pp. 6477–6485). European Language Resources Association. https://doi.org/10.21256/zhaw-20125
Ulasik, M.A. et al. (2020) ‘CEASR : a corpus for evaluating automatic speech recognition’, in N. Calzolari et al. (eds) Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). European Language Resources Association, pp. 6477–6485. Available at: https://doi.org/10.21256/zhaw-20125.
M. A. Ulasik, M. Hürlimann, F. Germann, E. Gedik, F. Benites de Azevedo e Souza, and M. Cieliebak, “CEASR : a corpus for evaluating automatic speech recognition,” in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, pp. 6477–6485. doi: 10.21256/zhaw-20125.
ULASIK, Malgorzata Anna, Manuela HÜRLIMANN, Fabian GERMANN, Esin GEDIK, Fernando BENITES DE AZEVEDO E SOUZA und Mark CIELIEBAK, 2020. CEASR : a corpus for evaluating automatic speech recognition. In: Nicoletta CALZOLARI, Frédéric BÉCHET, Philippe BLACHE, Khalid CHOUKRI, Christopher CIERI, Thierry DECLERCK, Sara GOGGI, Hitoshi ISAHARA, Bente MAEGAARD, Joseph MARIANI, Hélène MAZO, Asuncion MORENO, Jan ODIJK und Stelios PIPERIDIS (Hrsg.), Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) [online]. Conference paper. European Language Resources Association. 2020. S. 6477–6485. ISBN 979-10-95546-34-4. Verfügbar unter: https://www.aclweb.org/anthology/2020.lrec-1.798
Ulasik, Malgorzata Anna, Manuela Hürlimann, Fabian Germann, Esin Gedik, Fernando Benites de Azevedo e Souza, and Mark Cieliebak. 2020. “CEASR : A Corpus for Evaluating Automatic Speech Recognition.” Conference paper. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), edited by Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, et al., 6477–85. European Language Resources Association. https://doi.org/10.21256/zhaw-20125.
Ulasik, Malgorzata Anna, et al. “CEASR : A Corpus for Evaluating Automatic Speech Recognition.” Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), edited by Nicoletta Calzolari et al., European Language Resources Association, 2020, pp. 6477–85, https://doi.org/10.21256/zhaw-20125.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.