FlagEval(天秤)由智源研究院将联合多个高校团队打造,是一种采用“能力—任务—指标”三维评测框架的大模型评测平台,旨在提供全面、细致的评测结果。该平台已提供了 30 多种能力、5 种任务和 4 大类指标,共 600 多个维度的全面评测,任务维度包括 22 个主客观评测数据集和 84433 道题目。

Disclaimer: This website only reposts or shares content from other websites or online sources for the purpose of transmitting information technology, etc. The content is for reference only, and we maintain neutrality towards their views. Copyright belongs to the original author. If there is any infringement, please contact us promptly 1743542898@qq.com Delete, thank you!
