Full metadata record
DC FieldValueLanguage
dc.contributor.advisorHutter, Hans-Peter-
dc.contributor.advisorCieliebak, Mark-
dc.contributor.authorBüchi, Matthias-
dc.date.accessioned2020-05-25T08:48:53Z-
dc.date.available2020-05-25T08:48:53Z-
dc.date.issued2020-
dc.identifier.urihttps://digitalcollection.zhaw.ch/handle/11475/20045-
dc.description.abstractUser experience is key to make a computer program successful. If the handling needs a lot of expertise, people will not use it. In an optimal scenario, the user does not need to learn new procedures to control a new application. Conversational agents try to achieve that by providing a user interface using natural language. With spoken natural language the interaction can be simplified even more. In order to create a conversational agent with spoken natural language, a reliable speech recognition system is essential. In this work different aspects of automatic speech recognition (ASR), for the application with a conversational agent, are explored. The goal of the conversational agent is to support people in the process of legal research. It has to find the correct information based on the user’s input. To train a speech recognition system, data is needed. In a first step, two different ways to collect text data are explored. The text is needed to record speech data. With a grammar-based approach, manually crafted rules are used to generate sentences. Since grammars are restricted in variation, neural question generation was evaluated to produce open questions from specific input texts. In a next step, the performance of ASR systems was tested on task- and domain-specific data, using data recorded based on the generated text. Due to restricted time and resources, data was recorded only from one speaker. Since there was not enough data for further experiments on task-specific scenarios, open source German datasets were used to implement and improve acoustic models for generic speech recognition. In order to build a speech recognition component for a conversational agent, different aspects influence the final result. Text generation for training language models or collecting speech data still needs grammar-based approaches for reliable results. Neural question generation produces too many invalid samples. Nevertheless, text generated with grammars can be employed to record speech and train language models. With adaptation using specific language models, open source ASR systems achieve similar results or even outperform commercial systems. For data with a very specific structure open source systems can outperform commercial systems by about 30% word error rate absolutely. Furthermore, for the acoustic model different approaches are feasible. Hybrid systems and end-to-end systems achieve similar results, but the hybrid system is still slightly better. End-to-end systems make adaptation to domain specific use cases easier, since no phonetic transcriptions are needed. To go even further, an end-to-end system can be trained on character n-grams instead of only single characters. Models trained on predicting tokens, generated with byte pair encoding, perform similar to models based on single characters. With the integration of complex decoding strategies and language models, character-based models still perform better.de_CH
dc.format.extent62de_CH
dc.language.isoende_CH
dc.publisherZHAW Zürcher Hochschule für Angewandte Wissenschaftende_CH
dc.rightsLicence according to publishing contractde_CH
dc.subjectAutomatic speech recognitionde_CH
dc.subjectConversational agentde_CH
dc.subject.ddc006: Spezielle Computerverfahrende_CH
dc.titleSpeech recognition component for search-oriented conversational artificial intelligencede_CH
dc.typeThesis: Masterde_CH
dcterms.typeTextde_CH
zhaw.departementSchool of Engineeringde_CH
zhaw.organisationalunitInstitut für Informatik (InIT)de_CH
zhaw.publisher.placeWinterthurde_CH
zhaw.originated.zhawYesde_CH
zhaw.webfeedHuman Information Interactionde_CH
zhaw.webfeedNatural Language Processingde_CH
Appears in collections:Publikationen School of Engineering

Files in This Item:
There are no files associated with this item.
Show simple item record
Büchi, M. (2020). Speech recognition component for search-oriented conversational artificial intelligence [Master’s thesis]. ZHAW Zürcher Hochschule für Angewandte Wissenschaften.
Büchi, M. (2020) Speech recognition component for search-oriented conversational artificial intelligence. Master’s thesis. ZHAW Zürcher Hochschule für Angewandte Wissenschaften.
M. Büchi, “Speech recognition component for search-oriented conversational artificial intelligence,” Master’s thesis, ZHAW Zürcher Hochschule für Angewandte Wissenschaften, Winterthur, 2020.
BÜCHI, Matthias, 2020. Speech recognition component for search-oriented conversational artificial intelligence. Master’s thesis. Winterthur: ZHAW Zürcher Hochschule für Angewandte Wissenschaften
Büchi, Matthias. 2020. “Speech Recognition Component for Search-Oriented Conversational Artificial Intelligence.” Master’s thesis, Winterthur: ZHAW Zürcher Hochschule für Angewandte Wissenschaften.
Büchi, Matthias. Speech Recognition Component for Search-Oriented Conversational Artificial Intelligence. ZHAW Zürcher Hochschule für Angewandte Wissenschaften, 2020.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.