Speaker identification and clustering using convolutional neural networks

Lukic, Yanick; Vogt, Carlo; Dürr, Oliver; Stadelmann, Thilo

doi:10.21256/zhaw-3761

Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-3761

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lukic, Yanick	-
dc.contributor.author	Vogt, Carlo	-
dc.contributor.author	Dürr, Oliver	-
dc.contributor.author	Stadelmann, Thilo	-
dc.date.accessioned	2018-06-19T12:32:11Z	-
dc.date.available	2018-06-19T12:32:11Z	-
dc.date.issued	2016	-
dc.identifier.isbn	978-1-5090-0746-2	de_CH
dc.identifier.other	INSPEC Accession Number: 16449884	de_CH
dc.identifier.uri	https://digitalcollection.zhaw.ch/handle/11475/7087	-
dc.description.abstract	Deep learning, especially in the form of convolutional neural networks (CNNs), has triggered substantial improvements in computer vision and related fields in recent years. This progress is attributed to the shift from designing features and subsequent individual sub-systems towards learning features and recognition systems end to end from nearly unprocessed data. For speaker clustering, however, it is still common to use handcrafted processing chains such as MFCC features and GMM-based models. In this paper, we use simple spectrograms as input to a CNN and study the optimal design of those networks for speaker identification and clustering. Furthermore, we elaborate on the question how to transfer a network, trained for speaker identification, to speaker clustering. We demonstrate our approach on the well known TIMIT dataset, achieving results comparable with the state of the art – without the need for handcrafted features.	de_CH
dc.language.iso	en	de_CH
dc.publisher	IEEE	de_CH
dc.rights	Licence according to publishing contract	de_CH
dc.subject	Datalab	de_CH
dc.subject	Speaker identification	de_CH
dc.subject	Speaker clustering	de_CH
dc.subject	Deep learning	de_CH
dc.subject.ddc	006: Spezielle Computerverfahren	de_CH
dc.title	Speaker identification and clustering using convolutional neural networks	de_CH
dc.type	Konferenz: Paper	de_CH
dcterms.type	Text	de_CH
zhaw.departement	School of Engineering	de_CH
zhaw.organisationalunit	Institut für Informatik (InIT)	de_CH
zhaw.organisationalunit	Institut für Datenanalyse und Prozessdesign (IDP)	de_CH
dc.identifier.doi	10.21256/zhaw-3761	-
dc.identifier.doi	10.1109/MLSP.2016.7738816	de_CH
zhaw.conference.details	26th IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2016), Vietri sul Mare, Italy, 13-16 Sept. 2016	de_CH
zhaw.funding.eu	No	de_CH
zhaw.originated.zhaw	Yes	de_CH
zhaw.publication.status	acceptedVersion	de_CH
zhaw.publication.review	Peer review (Publikation)	de_CH
zhaw.title.proceedings	2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP),	de_CH
zhaw.webfeed	Datalab	de_CH
zhaw.webfeed	Information Engineering	de_CH
zhaw.webfeed	Machine Perception and Cognition	de_CH
Appears in collections:	Publikationen School of Engineering

Files in This Item:

File	Description	Size	Format
MLSP_2016.pdf		897.9 kB	Adobe PDF	View/Open

Show simple item record

Lukic, Y., Vogt, C., Dürr, O., & Stadelmann, T. (2016). Speaker identification and clustering using convolutional neural networks. 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP),. https://doi.org/10.21256/zhaw-3761

Lukic, Y. et al. (2016) ‘Speaker identification and clustering using convolutional neural networks’, in 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP),. IEEE. Available at: https://doi.org/10.21256/zhaw-3761.

Y. Lukic, C. Vogt, O. Dürr, and T. Stadelmann, “Speaker identification and clustering using convolutional neural networks,” in 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), 2016. doi: 10.21256/zhaw-3761.

LUKIC, Yanick, Carlo VOGT, Oliver DÜRR und Thilo STADELMANN, 2016. Speaker identification and clustering using convolutional neural networks. In: 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP),. Conference paper. IEEE. 2016. ISBN 978-1-5090-0746-2

Lukic, Yanick, Carlo Vogt, Oliver Dürr, and Thilo Stadelmann. 2016. “Speaker Identification and Clustering Using Convolutional Neural Networks.” Conference paper. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP),. IEEE. https://doi.org/10.21256/zhaw-3761.

Lukic, Yanick, et al. “Speaker Identification and Clustering Using Convolutional Neural Networks.” 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, 2016, https://doi.org/10.21256/zhaw-3761.