Capturing suprasegmental features of a voice with RNNs for improved speaker clustering

Stadelmann, Thilo; Glinski-Haefeli, Sebastian; Gerber, Patrick; Dürr, Oliver

doi:10.1007/978-3-319-99978-4_26

Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-3784

Full metadata record

DC Field	Value	Language
dc.contributor.author	Stadelmann, Thilo	-
dc.contributor.author	Glinski-Haefeli, Sebastian	-
dc.contributor.author	Gerber, Patrick	-
dc.contributor.author	Dürr, Oliver	-
dc.date.accessioned	2018-06-28T09:44:15Z	-
dc.date.available	2018-06-28T09:44:15Z	-
dc.date.issued	2018	-
dc.identifier.isbn	978-3-319-99977-7	de_CH
dc.identifier.isbn	978-3-319-99978-4	de_CH
dc.identifier.uri	https://digitalcollection.zhaw.ch/handle/11475/7429	-
dc.description.abstract	Deep neural networks have become a veritable alternative to classic speaker recognition and clustering methods in recent years. However, while the speech signal clearly is a time series, and despite the body of literature on the benefits of prosodic (suprasegmental) features, identifying voices has usually not been approached with sequence learning methods. Only recently has a recurrent neural network (RNN) been successfully applied to this task, while the use of convolutional neural networks (CNNs) (that are not able to capture arbitrary time dependencies, unlike RNNs) still prevails. In this paper, we show the effectiveness of RNNs for speaker recognition by improving state of the art speaker clustering performance and robustness on the classic TIMIT benchmark. We provide arguments why RNNs are superior by experimentally showing a “sweet spot” of the segment length for successfully capturing prosodic information that has been theoretically predicted in previous work.	de_CH
dc.language.iso	en	de_CH
dc.publisher	Springer	de_CH
dc.relation.ispartofseries	Lecture Notes in Computer Science	de_CH
dc.rights	Licence according to publishing contract	de_CH
dc.subject	Speaker clustering	de_CH
dc.subject	Speaker recognition	de_CH
dc.subject	Recurrent neural network	de_CH
dc.subject.ddc	006: Spezielle Computerverfahren	de_CH
dc.title	Capturing suprasegmental features of a voice with RNNs for improved speaker clustering	de_CH
dc.type	Konferenz: Paper	de_CH
dcterms.type	Text	de_CH
zhaw.departement	School of Engineering	de_CH
zhaw.organisationalunit	Institut für Informatik (InIT)	de_CH
dc.identifier.doi	10.1007/978-3-319-99978-4_26	de_CH
dc.identifier.doi	10.21256/zhaw-3784	-
zhaw.conference.details	8th IAPR TC3 Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR), Siena, Italy, 19-21 September 2018	de_CH
zhaw.funding.eu	No	de_CH
zhaw.originated.zhaw	Yes	de_CH
zhaw.pages.end	345	de_CH
zhaw.pages.start	333	de_CH
zhaw.publication.status	acceptedVersion	de_CH
zhaw.series.number	11081	de_CH
zhaw.publication.review	Peer review (Publikation)	de_CH
zhaw.title.proceedings	Artificial Neural Networks in Pattern Recognition	de_CH
zhaw.webfeed	Datalab	de_CH
zhaw.webfeed	Information Engineering	de_CH
zhaw.webfeed	Machine Perception and Cognition	de_CH
Appears in collections:	Publikationen School of Engineering

Files in This Item:

File	Description	Size	Format
ANNPR_2018b.pdf	Accepted Version	692.47 kB	Adobe PDF	View/Open

Show simple item record

Stadelmann, T., Glinski-Haefeli, S., Gerber, P., & Dürr, O. (2018). Capturing suprasegmental features of a voice with RNNs for improved speaker clustering [Conference paper]. Artificial Neural Networks in Pattern Recognition, 333–345. https://doi.org/10.1007/978-3-319-99978-4_26

Stadelmann, T. et al. (2018) ‘Capturing suprasegmental features of a voice with RNNs for improved speaker clustering’, in Artificial Neural Networks in Pattern Recognition. Springer, pp. 333–345. Available at: https://doi.org/10.1007/978-3-319-99978-4_26.

T. Stadelmann, S. Glinski-Haefeli, P. Gerber, and O. Dürr, “Capturing suprasegmental features of a voice with RNNs for improved speaker clustering,” in Artificial Neural Networks in Pattern Recognition, 2018, pp. 333–345. doi: 10.1007/978-3-319-99978-4_26.

STADELMANN, Thilo, Sebastian GLINSKI-HAEFELI, Patrick GERBER und Oliver DÜRR, 2018. Capturing suprasegmental features of a voice with RNNs for improved speaker clustering. In: Artificial Neural Networks in Pattern Recognition. Conference paper. Springer. 2018. S. 333–345. ISBN 978-3-319-99977-7

Stadelmann, Thilo, Sebastian Glinski-Haefeli, Patrick Gerber, and Oliver Dürr. 2018. “Capturing Suprasegmental Features of a Voice with RNNs for Improved Speaker Clustering.” Conference paper. In Artificial Neural Networks in Pattern Recognition, 333–45. Springer. https://doi.org/10.1007/978-3-319-99978-4_26.

Stadelmann, Thilo, et al. “Capturing Suprasegmental Features of a Voice with RNNs for Improved Speaker Clustering.” Artificial Neural Networks in Pattern Recognition, Springer, 2018, pp. 333–45, https://doi.org/10.1007/978-3-319-99978-4_26.