Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-3762
Title: Learning embeddings for speaker clustering based on voice equality
Authors : Lukic, Yanick X.
Vogt, Carlo
Dürr, Oliver
Stadelmann, Thilo
Proceedings: 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)
Conference details: 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), Tokyo, 25-28 September 2017
Publisher / Ed. Institution : IEEE
Issue Date: Sep-2017
License (according to publishing contract) : Licence according to publishing contract
Type of review: Peer review (Publication)
Language : English
Subjects : Datalab; Speaker recognition; Deep learning; Speaker clustering
Subject (DDC) : 004: Computer science
Abstract: Recent work has shown that convolutional neural networks (CNNs) trained in a supervised fashion for speaker identification are able to extract features from spectrograms which can be used for speaker clustering. These features are represented by the activations of a certain hidden layer and are called embeddings. However, previous approaches require plenty of additional speaker data to learn the embedding, and although the clustering results are then on par with more traditional approaches using MFCC features etc., room for improvements stems from the fact that these embeddings are trained with a surrogate task that is rather far away from segregating unknown voices - namely, identifying few specific speakers. We address both problems by training a CNN to extract embeddings that are similar for equal speakers (regardless of their specific identity) using weakly labeled data. We demonstrate our approach on the well-known TIMIT dataset that has often been used for speaker clustering experiments in the past. We exceed the clustering performance of all previous approaches, but require just 100 instead of 590 unrelated speakers to learn an embedding suited for clustering.
Departement: School of Engineering
Organisational Unit: Institute of Applied Information Technology (InIT)
Institute of Data Analysis and Process Design (IDP)
Publication type: Conference Paper
DOI : 10.1109/MLSP.2017.8168166
10.21256/zhaw-3762
ISBN: 978-1-5090-6341-3
URI: https://digitalcollection.zhaw.ch/handle/11475/7088
Other identifiers : INSPEC Accession Number: 17416144
Appears in Collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
MLSP_2017.pdf1.37 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.