If there's Intelligent Life out There

Optimizing LLMs to be proficient at specific tests backfires on Meta, Stability.

-. -. -. -. -. -. -

When you buy through links on our site, we may make an affiliate commission. Here's how it works.

Hugging Face has released its second LLM leaderboard to rank the best language designs it has . The brand-new leaderboard looks for to be a more tough uniform standard for testing open large language model (LLM) efficiency across a variety of tasks. Alibaba's Qwen designs appear dominant in the leaderboard's inaugural rankings, taking 3 spots in the top 10.

Pumped to reveal the brand name brand-new open LLM leaderboard. We burned 300 H100 to re-run brand-new evaluations like MMLU-pro for all significant open LLMs!Some learning:- Qwen 72B is the king and Chinese open designs are dominating overall- Previous examinations have become too simple for current ... June 26, 2024

Hugging Face's second leaderboard tests language models across four jobs: understanding testing, reasoning on extremely long contexts, complex mathematics capabilities, larsaluarna.se and direction following. Six benchmarks are used to check these qualities, larsaluarna.se with tests consisting of fixing 1,000-word murder mysteries, explaining PhD-level concerns in layman's terms, and many difficult of all: high-school mathematics formulas. A complete breakdown of the standards utilized can be discovered on Hugging Face's blog.

The frontrunner of the brand-new leaderboard is Qwen, Alibaba's LLM, which takes 1st, 3rd, and 10th place with its handful of versions. Also showing up are Llama3-70B, Meta's LLM, setiathome.berkeley.edu and a handful of smaller sized open-source tasks that handled to surpass the pack. Notably absent is any sign of ChatGPT