Speaker identification and clustering using convolutional neural networks

Lukic, Yanick; Vogt, Carlo; Dürr, Oliver; Stadelmann, Thilo

doi:10.21256/zhaw-3761

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: https://doi.org/10.21256/zhaw-3761

Publikationstyp:	Konferenz: Paper
Art der Begutachtung:	Peer review (Publikation)
Titel:	Speaker identification and clustering using convolutional neural networks
Autor/-in:	Lukic, Yanick Vogt, Carlo Dürr, Oliver Stadelmann, Thilo
DOI:	10.21256/zhaw-3761 10.1109/MLSP.2016.7738816
Tagungsband:	2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP),
Angaben zur Konferenz:	26th IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2016), Vietri sul Mare, Italy, 13-16 Sept. 2016
Erscheinungsdatum:	2016
Verlag / Hrsg. Institution:	IEEE
ISBN:	978-1-5090-0746-2
Andere Identifier:	INSPEC Accession Number: 16449884
Sprache:	Englisch
Schlagwörter:	Datalab; Speaker identification; Speaker clustering; Deep learning
Fachgebiet (DDC):	006: Spezielle Computerverfahren
Zusammenfassung:	Deep learning, especially in the form of convolutional neural networks (CNNs), has triggered substantial improvements in computer vision and related fields in recent years. This progress is attributed to the shift from designing features and subsequent individual sub-systems towards learning features and recognition systems end to end from nearly unprocessed data. For speaker clustering, however, it is still common to use handcrafted processing chains such as MFCC features and GMM-based models. In this paper, we use simple spectrograms as input to a CNN and study the optimal design of those networks for speaker identification and clustering. Furthermore, we elaborate on the question how to transfer a network, trained for speaker identification, to speaker clustering. We demonstrate our approach on the well known TIMIT dataset, achieving results comparable with the state of the art – without the need for handcrafted features.
URI:	https://digitalcollection.zhaw.ch/handle/11475/7087
Volltext Version:	Akzeptierte Version
Lizenz (gemäss Verlagsvertrag):	Lizenz gemäss Verlagsvertrag
Departement:	School of Engineering
Organisationseinheit:	Institut für Informatik (InIT) Institut für Datenanalyse und Prozessdesign (IDP)
Enthalten in den Sammlungen:	Publikationen School of Engineering

Dateien zu dieser Ressource:

Datei	Beschreibung	Größe	Format
MLSP_2016.pdf		897.9 kB	Adobe PDF	Öffnen/Anzeigen

Zur Langanzeige

Lukic, Y., Vogt, C., Dürr, O., & Stadelmann, T. (2016). Speaker identification and clustering using convolutional neural networks. 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP),. https://doi.org/10.21256/zhaw-3761

Lukic, Y. et al. (2016) ‘Speaker identification and clustering using convolutional neural networks’, in 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP),. IEEE. Available at: https://doi.org/10.21256/zhaw-3761.

Y. Lukic, C. Vogt, O. Dürr, and T. Stadelmann, “Speaker identification and clustering using convolutional neural networks,” in 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), 2016. doi: 10.21256/zhaw-3761.

LUKIC, Yanick, Carlo VOGT, Oliver DÜRR und Thilo STADELMANN, 2016. Speaker identification and clustering using convolutional neural networks. In: 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP),. Conference paper. IEEE. 2016. ISBN 978-1-5090-0746-2

Lukic, Yanick, Carlo Vogt, Oliver Dürr, and Thilo Stadelmann. 2016. “Speaker Identification and Clustering Using Convolutional Neural Networks.” Conference paper. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP),. IEEE. https://doi.org/10.21256/zhaw-3761.

Lukic, Yanick, et al. “Speaker Identification and Clustering Using Convolutional Neural Networks.” 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, 2016, https://doi.org/10.21256/zhaw-3761.

Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt, soweit nicht anderweitig angezeigt.