Capturing suprasegmental features of a voice with RNNs for improved speaker clustering

Stadelmann, Thilo; Glinski-Haefeli, Sebastian; Gerber, Patrick; Dürr, Oliver

doi:10.1007/978-3-319-99978-4_26

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: https://doi.org/10.21256/zhaw-3784

Publikationstyp:	Konferenz: Paper
Art der Begutachtung:	Peer review (Publikation)
Titel:	Capturing suprasegmental features of a voice with RNNs for improved speaker clustering
Autor/-in:	Stadelmann, Thilo Glinski-Haefeli, Sebastian Gerber, Patrick Dürr, Oliver
DOI:	10.1007/978-3-319-99978-4_26 10.21256/zhaw-3784
Tagungsband:	Artificial Neural Networks in Pattern Recognition
Seite(n):	333
Seiten bis:	345
Angaben zur Konferenz:	8th IAPR TC3 Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR), Siena, Italy, 19-21 September 2018
Erscheinungsdatum:	2018
Reihe:	Lecture Notes in Computer Science
Reihenzählung:	11081
Verlag / Hrsg. Institution:	Springer
ISBN:	978-3-319-99977-7 978-3-319-99978-4
Sprache:	Englisch
Schlagwörter:	Speaker clustering; Speaker recognition; Recurrent neural network
Fachgebiet (DDC):	006: Spezielle Computerverfahren
Zusammenfassung:	Deep neural networks have become a veritable alternative to classic speaker recognition and clustering methods in recent years. However, while the speech signal clearly is a time series, and despite the body of literature on the benefits of prosodic (suprasegmental) features, identifying voices has usually not been approached with sequence learning methods. Only recently has a recurrent neural network (RNN) been successfully applied to this task, while the use of convolutional neural networks (CNNs) (that are not able to capture arbitrary time dependencies, unlike RNNs) still prevails. In this paper, we show the effectiveness of RNNs for speaker recognition by improving state of the art speaker clustering performance and robustness on the classic TIMIT benchmark. We provide arguments why RNNs are superior by experimentally showing a “sweet spot” of the segment length for successfully capturing prosodic information that has been theoretically predicted in previous work.
URI:	https://digitalcollection.zhaw.ch/handle/11475/7429
Volltext Version:	Akzeptierte Version
Lizenz (gemäss Verlagsvertrag):	Lizenz gemäss Verlagsvertrag
Departement:	School of Engineering
Organisationseinheit:	Institut für Informatik (InIT)
Enthalten in den Sammlungen:	Publikationen School of Engineering

Dateien zu dieser Ressource:

Datei	Beschreibung	Größe	Format
ANNPR_2018b.pdf	Accepted Version	692.47 kB	Adobe PDF	Öffnen/Anzeigen

Zur Langanzeige

Stadelmann, T., Glinski-Haefeli, S., Gerber, P., & Dürr, O. (2018). Capturing suprasegmental features of a voice with RNNs for improved speaker clustering [Conference paper]. Artificial Neural Networks in Pattern Recognition, 333–345. https://doi.org/10.1007/978-3-319-99978-4_26

Stadelmann, T. et al. (2018) ‘Capturing suprasegmental features of a voice with RNNs for improved speaker clustering’, in Artificial Neural Networks in Pattern Recognition. Springer, pp. 333–345. Available at: https://doi.org/10.1007/978-3-319-99978-4_26.

T. Stadelmann, S. Glinski-Haefeli, P. Gerber, and O. Dürr, “Capturing suprasegmental features of a voice with RNNs for improved speaker clustering,” in Artificial Neural Networks in Pattern Recognition, 2018, pp. 333–345. doi: 10.1007/978-3-319-99978-4_26.

STADELMANN, Thilo, Sebastian GLINSKI-HAEFELI, Patrick GERBER und Oliver DÜRR, 2018. Capturing suprasegmental features of a voice with RNNs for improved speaker clustering. In: Artificial Neural Networks in Pattern Recognition. Conference paper. Springer. 2018. S. 333–345. ISBN 978-3-319-99977-7

Stadelmann, Thilo, Sebastian Glinski-Haefeli, Patrick Gerber, and Oliver Dürr. 2018. “Capturing Suprasegmental Features of a Voice with RNNs for Improved Speaker Clustering.” Conference paper. In Artificial Neural Networks in Pattern Recognition, 333–45. Springer. https://doi.org/10.1007/978-3-319-99978-4_26.

Stadelmann, Thilo, et al. “Capturing Suprasegmental Features of a Voice with RNNs for Improved Speaker Clustering.” Artificial Neural Networks in Pattern Recognition, Springer, 2018, pp. 333–45, https://doi.org/10.1007/978-3-319-99978-4_26.

Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt, soweit nicht anderweitig angezeigt.