So you want your private LLM at home? : a survey and benchmark of methods for efficient GPTs

Tuggener, Lukas; Sager, Pascal; Taoudi-Benchekroun, Yassine; Grewe, Benjamin F.; Stadelmann, Thilo

doi:10.21256/zhaw-30279

Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-30279

Publication type:	Conference paper
Type of review:	Peer review (publication)
Title:	So you want your private LLM at home? : a survey and benchmark of methods for efficient GPTs
Authors:	Tuggener, Lukas Sager, Pascal Taoudi-Benchekroun, Yassine Grewe, Benjamin F. Stadelmann, Thilo
et. al:	No
DOI:	10.21256/zhaw-30279
Conference details:	11th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 30-31 May 2024
Issue Date:	31-May-2024
Publisher / Ed. Institution:	ZHAW Zürcher Hochschule für Angewandte Wissenschaften
Language:	English
Subjects:	Large language model; LlamaV2; Fine-tuning; LLM quantization; LLM deployment
Subject (DDC):	006: Special computer methods
Abstract:	At least since the introduction of ChatGPT, the abilities of generative large language models (LLMs), sometimes called GPTs, are at the center of the attention of AI researchers, entrepreneurs, and others. However, for many applications, it is not possible to call an existing LLM service via an API due to data protection concerns or when no task-appropriate LLM exists. On the other hand, deploying or training a private LLM is often prohibitively computationally expensive. In this paper, we give an overview of the most important recent methodologies that help reduce the computational footprint of LLMs. We further present extensive benchmarks for seven methods from two of the most important areas of recent progress: model quantization and low-rank adapters, showcasing how it is possible to leverage state-of-the-art LLMs with limited resources. Our benchmarks include resource consumption metrics (e.g. GPU memory usage), a state-of-the-art quantitative performance evaluation as well as a qualitative performance study conducted by eight individual human raters. Our evaluations show that quantization has a profound effect on GPU memory requirements. However, we also show that these quantization methods, contrary to how they are advertised, cause a noticeable loss in text quality. We further show that low-rank adapters allow effective model fine-tuning with moderate compute resources. For methods that require less than 16 GB of GPU memory, we provide easy-to-use Jupyter notebooks that allow anyone to deploy and fine-tune state-of-theart LLMs on the Google Colab free tier within minutes without any prior experience or infrastructure.
URI:	https://digitalcollection.zhaw.ch/handle/11475/30279
Fulltext version:	Accepted version
License (according to publishing contract):	Licence according to publishing contract
Departement:	School of Engineering
Organisational Unit:	Centre for Artificial Intelligence (CAI)
Published as part of the ZHAW project:	Practical data efficient deep learning trough contrastive self-supervised learning
Appears in collections:	Publikationen School of Engineering

Files in This Item:

File	Description	Size	Format
2024_Tuggener-etal_Survey-and-benchmark-of-methods-for-efficient-GPTs_SDS.pdf		1.99 MB	Adobe PDF	View/Open

Show full item record

Tuggener, L., Sager, P., Taoudi-Benchekroun, Y., Grewe, B. F., & Stadelmann, T. (2024, May 31). So you want your private LLM at home? : a survey and benchmark of methods for efficient GPTs. 11th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 30-31 May 2024. https://doi.org/10.21256/zhaw-30279

Tuggener, L. et al. (2024) ‘So you want your private LLM at home? : a survey and benchmark of methods for efficient GPTs’, in 11th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 30-31 May 2024. ZHAW Zürcher Hochschule für Angewandte Wissenschaften. Available at: https://doi.org/10.21256/zhaw-30279.

L. Tuggener, P. Sager, Y. Taoudi-Benchekroun, B. F. Grewe, and T. Stadelmann, “So you want your private LLM at home? : a survey and benchmark of methods for efficient GPTs,” in 11th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 30-31 May 2024, May 2024. doi: 10.21256/zhaw-30279.

TUGGENER, Lukas, Pascal SAGER, Yassine TAOUDI-BENCHEKROUN, Benjamin F. GREWE und Thilo STADELMANN, 2024. So you want your private LLM at home? : a survey and benchmark of methods for efficient GPTs. In: 11th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 30-31 May 2024. Conference paper. ZHAW Zürcher Hochschule für Angewandte Wissenschaften. 31 Mai 2024

Tuggener, Lukas, Pascal Sager, Yassine Taoudi-Benchekroun, Benjamin F. Grewe, and Thilo Stadelmann. 2024. “So You Want Your Private LLM at Home? : A Survey and Benchmark of Methods for Efficient GPTs.” Conference paper. In 11th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 30-31 May 2024. ZHAW Zürcher Hochschule für Angewandte Wissenschaften. https://doi.org/10.21256/zhaw-30279.

Tuggener, Lukas, et al. “So You Want Your Private LLM at Home? : A Survey and Benchmark of Methods for Efficient GPTs.” 11th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 30-31 May 2024, ZHAW Zürcher Hochschule für Angewandte Wissenschaften, 2024, https://doi.org/10.21256/zhaw-30279.