Evaluation of medium-large Language Models at zero-shot closed book generative question answering

Abstract

Large language models (LLMs) have garnered significant attention, but the definition of "large" lacks clarity. This paper focuses on medium-sized lan-guage models (MLMs), defined as having at least six billion parameters but less than 100 billion. The study evaluates MLMs regarding zero-shot genera-tive question answering, which requires models to provide elaborate answers without external document retrieval. The paper introduces an own test da-taset and presents results from human evaluation. Results show that combin-ing the best answers from different MLMs yielded an overall correct answer rate of 82.7% which is better than the 60.9% of ChatGPT. The best MLM achieved 46.4% and has 7B parameters, which highlights the importance of using appropriate training data for fine-tuning rather than solely relying on the number of parameters. More fine-grained feedback should be used to further improve the quality of answers.

mehr

Mehr zum Titel

Titel Evaluation of medium-large Language Models at zero-shot closed book generative question answering
Medien 11th International Conference on Artificial Intelligence and Applications (AIAP)
Verlag ---
Heft ---
Band 2023
ISBN ---
Verfasser/Herausgeber Prof. Dr. René Peinl, Johannes Wirth
Seiten ---
Veröffentlichungsdatum 19.05.2023
Projekttitel M4-SKI
Zitation Peinl, René; Wirth, Johannes (2023): Evaluation of medium-large Language Models at zero-shot closed book generative question answering. 11th International Conference on Artificial Intelligence and Applications (AIAP) 2023.