If there's Intelligent Life out There

Optimizing LLMs to be great at particular tests backfires on Meta, Stability.

-. -. -. -. -. -. -

When you acquire through links on our site, we may earn an affiliate commission. Here's how it works.

Hugging Face has actually released its 2nd LLM leaderboard to rank the finest language designs it has checked. The new leaderboard looks for to be a more difficult consistent standard for checking open big language design (LLM) performance throughout a variety of tasks. Alibaba's Qwen designs appear dominant in the leaderboard's inaugural rankings, taking 3 areas in the leading 10.

Pumped to reveal the brand brand-new open LLM leaderboard. We burned 300 H100 to re-run brand-new assessments like MMLU-pro for all significant open LLMs!Some learning:- Qwen 72B is the king and Chinese open designs are dominating general- Previous examinations have actually become too simple for recent ... June 26, 2024

Hugging Face's second leaderboard tests language models throughout 4 jobs: knowledge testing, thinking on very long contexts, intricate mathematics capabilities, and instruction following. Six criteria are used to evaluate these qualities, with tests consisting of solving 1,000-word murder mysteries, explaining PhD-level concerns in layperson's terms, and the majority of difficult of all: high-school mathematics formulas. A complete breakdown of the standards utilized can be discovered on Hugging Face's blog site.

The frontrunner of the new leaderboard is Qwen, Alibaba's LLM, which takes 1st, 3rd, and 10th place with its handful of variants. Also revealing up are Llama3-70B, Meta's LLM, and a handful of smaller sized open-source projects that managed to exceed the pack. Notably absent is any sign of ChatGPT