Evaluating pre-trained Sentence-BERT with class embeddings in active learning for multi-label text classification

Wertz, Lukas; Bogojeska, Jasmina; Mirylenka, Katsiaryna; Kuhn, Jonas

doi:10.21256/zhaw-26577

Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-26577

Publication type:	Conference paper
Type of review:	Peer review (publication)
Title:	Evaluating pre-trained Sentence-BERT with class embeddings in active learning for multi-label text classification
Authors:	Wertz, Lukas Bogojeska, Jasmina Mirylenka, Katsiaryna Kuhn, Jonas
et. al:	No
DOI:	10.21256/zhaw-26577
Proceedings:	Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Page(s):	366
Pages to:	372
Conference details:	2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP), online, 20-23 November 2022
Issue Date:	Nov-2022
Publisher / Ed. Institution:	Association for Computational Linguistics
Language:	English
Subjects:	Multi-label text classification; Active learning; Transformer
Subject (DDC):	410.285: Computational linguistics
Abstract:	The Transformer Language Model is a powerful tool that has been shown to excel at various NLP tasks and has become the de-facto standard solution thanks to its versatility. In this study, we employ pre-trained document embeddings in an Active Learning task to group samples with the same labels in the embedding space on a legal document corpus. We find that the calculated class embeddings are not close to the respective samples and consequently do not partition the embedding space in a meaningful way. In addition, we explore using the class embeddings as an Active Learning strategy with dramatically reduced results compared to all baselines.
URI:	https://aclanthology.org/2022.aacl-short.45 https://digitalcollection.zhaw.ch/handle/11475/26577
Fulltext version:	Published version
License (according to publishing contract):	CC BY 4.0: Attribution 4.0 International
Departement:	School of Engineering
Organisational Unit:	Centre for Artificial Intelligence (CAI)
Appears in collections:	Publikationen School of Engineering

Files in This Item:

File	Description	Size	Format
2022_Wertz-etal_Sentence-BERT-with-class-embeddings-multi-label-text-classification.pdf		213.12 kB	Adobe PDF	View/Open

Show full item record

Wertz, L., Bogojeska, J., Mirylenka, K., & Kuhn, J. (2022). Evaluating pre-trained Sentence-BERT with class embeddings in active learning for multi-label text classification [Conference paper]. Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 366–372. https://doi.org/10.21256/zhaw-26577

Wertz, L. et al. (2022) ‘Evaluating pre-trained Sentence-BERT with class embeddings in active learning for multi-label text classification’, in Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, pp. 366–372. Available at: https://doi.org/10.21256/zhaw-26577.

L. Wertz, J. Bogojeska, K. Mirylenka, and J. Kuhn, “Evaluating pre-trained Sentence-BERT with class embeddings in active learning for multi-label text classification,” in Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Nov. 2022, pp. 366–372. doi: 10.21256/zhaw-26577.

WERTZ, Lukas, Jasmina BOGOJESKA, Katsiaryna MIRYLENKA und Jonas KUHN, 2022. Evaluating pre-trained Sentence-BERT with class embeddings in active learning for multi-label text classification. In: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) [online]. Conference paper. Association for Computational Linguistics. November 2022. S. 366–372. Verfügbar unter: https://aclanthology.org/2022.aacl-short.45

Wertz, Lukas, Jasmina Bogojeska, Katsiaryna Mirylenka, and Jonas Kuhn. 2022. “Evaluating Pre-Trained Sentence-BERT with Class Embeddings in Active Learning for Multi-Label Text Classification.” Conference paper. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 366–72. Association for Computational Linguistics. https://doi.org/10.21256/zhaw-26577.

Wertz, Lukas, et al. “Evaluating Pre-Trained Sentence-BERT with Class Embeddings in Active Learning for Multi-Label Text Classification.” Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Association for Computational Linguistics, 2022, pp. 366–72, https://doi.org/10.21256/zhaw-26577.