Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-3762
Full metadata record
DC FieldValueLanguage
dc.contributor.authorLukic, Yanick X.-
dc.contributor.authorVogt, Carlo-
dc.contributor.authorDürr, Oliver-
dc.contributor.authorStadelmann, Thilo-
dc.date.accessioned2018-06-19T12:33:06Z-
dc.date.available2018-06-19T12:33:06Z-
dc.date.issued2017-
dc.identifier.isbn978-1-5090-6341-3de_CH
dc.identifier.otherINSPEC Accession Number: 17416144de_CH
dc.identifier.urihttps://digitalcollection.zhaw.ch/handle/11475/7088-
dc.description.abstractRecent work has shown that convolutional neural networks (CNNs) trained in a supervised fashion for speaker identification are able to extract features from spectrograms which can be used for speaker clustering. These features are represented by the activations of a certain hidden layer and are called embeddings. However, previous approaches require plenty of additional speaker data to learn the embedding, and although the clustering results are then on par with more traditional approaches using MFCC features etc., room for improvements stems from the fact that these embeddings are trained with a surrogate task that is rather far away from segregating unknown voices - namely, identifying few specific speakers. We address both problems by training a CNN to extract embeddings that are similar for equal speakers (regardless of their specific identity) using weakly labeled data. We demonstrate our approach on the well-known TIMIT dataset that has often been used for speaker clustering experiments in the past. We exceed the clustering performance of all previous approaches, but require just 100 instead of 590 unrelated speakers to learn an embedding suited for clustering.de_CH
dc.language.isoende_CH
dc.publisherIEEEde_CH
dc.rightsLicence according to publishing contractde_CH
dc.subjectDatalabde_CH
dc.subjectSpeaker recognitionde_CH
dc.subjectDeep learningde_CH
dc.subjectSpeaker clusteringde_CH
dc.subject.ddc006: Spezielle Computerverfahrende_CH
dc.titleLearning embeddings for speaker clustering based on voice equalityde_CH
dc.typeKonferenz: Paperde_CH
dcterms.typeTextde_CH
zhaw.departementSchool of Engineeringde_CH
zhaw.organisationalunitInstitut für Informatik (InIT)de_CH
zhaw.organisationalunitInstitut für Datenanalyse und Prozessdesign (IDP)de_CH
dc.identifier.doi10.21256/zhaw-3762-
dc.identifier.doi10.1109/MLSP.2017.8168166de_CH
zhaw.conference.details27th IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2017), Tokyo, 25-28 September 2017de_CH
zhaw.funding.euNode_CH
zhaw.originated.zhawYesde_CH
zhaw.publication.statusacceptedVersionde_CH
zhaw.publication.reviewPeer review (Publikation)de_CH
zhaw.title.proceedings2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)de_CH
zhaw.webfeedDatalabde_CH
zhaw.webfeedInformation Engineeringde_CH
zhaw.webfeedMachine Perception and Cognitionde_CH
Appears in collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
MLSP_2017.pdf1.37 MBAdobe PDFThumbnail
View/Open
Show simple item record
Lukic, Y. X., Vogt, C., Dürr, O., & Stadelmann, T. (2017). Learning embeddings for speaker clustering based on voice equality. 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP). https://doi.org/10.21256/zhaw-3762
Lukic, Y.X. et al. (2017) ‘Learning embeddings for speaker clustering based on voice equality’, in 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE. Available at: https://doi.org/10.21256/zhaw-3762.
Y. X. Lukic, C. Vogt, O. Dürr, and T. Stadelmann, “Learning embeddings for speaker clustering based on voice equality,” in 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), 2017. doi: 10.21256/zhaw-3762.
LUKIC, Yanick X., Carlo VOGT, Oliver DÜRR und Thilo STADELMANN, 2017. Learning embeddings for speaker clustering based on voice equality. In: 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP). Conference paper. IEEE. 2017. ISBN 978-1-5090-6341-3
Lukic, Yanick X., Carlo Vogt, Oliver Dürr, and Thilo Stadelmann. 2017. “Learning Embeddings for Speaker Clustering Based on Voice Equality.” Conference paper. In 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE. https://doi.org/10.21256/zhaw-3762.
Lukic, Yanick X., et al. “Learning Embeddings for Speaker Clustering Based on Voice Equality.” 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, 2017, https://doi.org/10.21256/zhaw-3762.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.