Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-3784
Title: Capturing suprasegmental features of a voice with RNNs for improved speaker clustering
Authors : Stadelmann, Thilo
Glinski-Haefeli, Sebastian
Gerber, Patrick
Dürr, Oliver
Proceedings: Proceedings of the 8th IAPR TC3 Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR)
Pages : 333
Pages to: 345
Conference details: 8th IAPR TC3 Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR), Siena, 19-21 September 2018
Publisher / Ed. Institution : IAPR
Issue Date: Sep-2018
License (according to publishing contract) : Licence according to publishing contract
Series : Lecture Notes in Computer Science book series (LNCS)
Series volume: 11081
Type of review: Peer review (Publication)
Language : English
Subjects : Speaker clustering; Speaker recognition; Recurrent neural network
Subject (DDC) : 004: Computer science
Abstract: Deep neural networks have become a veritable alternative to classic speaker recognition and clustering methods in recent years. However, while the speech signal clearly is a time series, and despite the body of literature on the benefits of prosodic (suprasegmental) features, identifying voices has usually not been approached with sequence learning methods. Only recently has a recurrent neural network (RNN) been successfully applied to this task, while the use of convolutional neural networks (CNNs) (that are not able to capture arbitrary time dependencies, unlike RNNs) still prevails. In this paper, we show the effectiveness of RNNs for speaker recognition by improving state of the art speaker clustering performance and robustness on the classic TIMIT benchmark. We provide arguments why RNNs are superior by experimentally showing a “sweet spot” of the segment length for successfully capturing prosodic information that has been theoretically predicted in previous work.
Departement: School of Engineering
Organisational Unit: Institute of Applied Information Technology (InIT)
Publication type: Conference Paper
DOI : 10.21256/zhaw-3784
10.1007/978-3-319-99978-4_26
ISBN: 978-3-319-99977-7
978-3-319-99978-4
URI: https://digitalcollection.zhaw.ch/handle/11475/7429
Appears in Collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
ANNPR_2018b.pdf692.47 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.