Learning embeddings for speaker clustering based on voice equality

Lukic, Yanick X.; Vogt, Carlo; Dürr, Oliver; Stadelmann, Thilo

doi:10.21256/zhaw-3762

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: https://doi.org/10.21256/zhaw-3762

Publikationstyp:	Konferenz: Paper
Art der Begutachtung:	Peer review (Publikation)
Titel:	Learning embeddings for speaker clustering based on voice equality
Autor/-in:	Lukic, Yanick X. Vogt, Carlo Dürr, Oliver Stadelmann, Thilo
DOI:	10.21256/zhaw-3762 10.1109/MLSP.2017.8168166
Tagungsband:	2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)
Angaben zur Konferenz:	27th IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2017), Tokyo, 25-28 September 2017
Erscheinungsdatum:	2017
Verlag / Hrsg. Institution:	IEEE
ISBN:	978-1-5090-6341-3
Andere Identifier:	INSPEC Accession Number: 17416144
Sprache:	Englisch
Schlagwörter:	Datalab; Speaker recognition; Deep learning; Speaker clustering
Fachgebiet (DDC):	006: Spezielle Computerverfahren
Zusammenfassung:	Recent work has shown that convolutional neural networks (CNNs) trained in a supervised fashion for speaker identification are able to extract features from spectrograms which can be used for speaker clustering. These features are represented by the activations of a certain hidden layer and are called embeddings. However, previous approaches require plenty of additional speaker data to learn the embedding, and although the clustering results are then on par with more traditional approaches using MFCC features etc., room for improvements stems from the fact that these embeddings are trained with a surrogate task that is rather far away from segregating unknown voices - namely, identifying few specific speakers. We address both problems by training a CNN to extract embeddings that are similar for equal speakers (regardless of their specific identity) using weakly labeled data. We demonstrate our approach on the well-known TIMIT dataset that has often been used for speaker clustering experiments in the past. We exceed the clustering performance of all previous approaches, but require just 100 instead of 590 unrelated speakers to learn an embedding suited for clustering.
URI:	https://digitalcollection.zhaw.ch/handle/11475/7088
Volltext Version:	Akzeptierte Version
Lizenz (gemäss Verlagsvertrag):	Lizenz gemäss Verlagsvertrag
Departement:	School of Engineering
Organisationseinheit:	Institut für Informatik (InIT) Institut für Datenanalyse und Prozessdesign (IDP)
Enthalten in den Sammlungen:	Publikationen School of Engineering

Dateien zu dieser Ressource:

Datei	Beschreibung	Größe	Format
MLSP_2017.pdf		1.37 MB	Adobe PDF	Öffnen/Anzeigen

Zur Langanzeige

Lukic, Y. X., Vogt, C., Dürr, O., & Stadelmann, T. (2017). Learning embeddings for speaker clustering based on voice equality. 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP). https://doi.org/10.21256/zhaw-3762

Lukic, Y.X. et al. (2017) ‘Learning embeddings for speaker clustering based on voice equality’, in 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE. Available at: https://doi.org/10.21256/zhaw-3762.

Y. X. Lukic, C. Vogt, O. Dürr, and T. Stadelmann, “Learning embeddings for speaker clustering based on voice equality,” in 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), 2017. doi: 10.21256/zhaw-3762.

LUKIC, Yanick X., Carlo VOGT, Oliver DÜRR und Thilo STADELMANN, 2017. Learning embeddings for speaker clustering based on voice equality. In: 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP). Conference paper. IEEE. 2017. ISBN 978-1-5090-6341-3

Lukic, Yanick X., Carlo Vogt, Oliver Dürr, and Thilo Stadelmann. 2017. “Learning Embeddings for Speaker Clustering Based on Voice Equality.” Conference paper. In 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE. https://doi.org/10.21256/zhaw-3762.

Lukic, Yanick X., et al. “Learning Embeddings for Speaker Clustering Based on Voice Equality.” 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, 2017, https://doi.org/10.21256/zhaw-3762.

Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt, soweit nicht anderweitig angezeigt.