Large language models (LLMs) have garnered significant attention, but the definition of "large" lacks clarity. This paper focuses on medium-sized lan-guage models (MLMs), defined as having at least six billion parameters but less than 100 billion. The study evaluates MLMs regarding zero-shot genera-tive question answering, which requires models to provide elaborate answers without external document retrieval. The paper introduces an own test da-taset and presents results from human evaluation. Results show that combin-ing the best answers from different MLMs yielded an overall correct answer rate of 82.7% which is better than the 60.9% of ChatGPT. The best MLM achieved 46.4% and has 7B parameters, which highlights the importance of using appropriate training data for fine-tuning rather than solely relying on the number of parameters. More fine-grained feedback should be used to further improve the quality of answers.
moreTitel | Evaluation of medium-large Language Models at zero-shot closed book generative question answering |
---|---|
Medien | 11th International Conference on Artificial Intelligence and Applications (AIAP) |
Verlag | --- |
Heft | --- |
Band | 2023 |
ISBN | --- |
Verfasser/Herausgeber | Prof. Dr. René Peinl, Johannes Wirth |
Seiten | --- |
Veröffentlichungsdatum | 2023-05-19 |
Projekttitel | M4-SKI |
Zitation | Peinl, René; Wirth, Johannes (2023): Evaluation of medium-large Language Models at zero-shot closed book generative question answering. 11th International Conference on Artificial Intelligence and Applications (AIAP) 2023. |