HELM全称Holistic Evaluation of Language Models(语言模型整体评估)是由斯坦福大学推出的大模型评测体系,该评测方法主要包括场景、适配、指标三个模块,每次评测的运行都需要指定一个场景,一个适配模型的提示,以及一个或多个指标。它评测主要覆盖的是英语,有7个指标,包括准确率、不确定性/校准、鲁棒性、公平性、偏差、毒性、推断效率;任务包括问答、信息检索、摘要、文本分类等。
Disclaimer: This website only reposts or shares content from other websites or online sources for the purpose of transmitting information technology, etc. The content is for reference only, and we maintain neutrality towards their views. Copyright belongs to the original author. If there is any infringement, please contact us promptly 1743542898@qq.com Delete, thank you!