If there's Intelligent Life out There

Optimizing LLMs to be great at specific tests backfires on Meta, Stability.

-. -. -. -. -. -. -

When you acquire through links on our site, we may make an affiliate commission. Here's how it works.

Hugging Face has actually released its second LLM leaderboard to rank the very best language designs it has actually checked. The brand-new leaderboard looks for to be a more difficult consistent standard for checking open large language model (LLM) efficiency across a variety of jobs. Alibaba's Qwen models appear dominant in the leaderboard's inaugural rankings, taking three areas in the top 10.

Pumped to reveal the brand name brand-new open LLM leaderboard. We burned 300 H100 to re-run brand-new evaluations like MMLU-pro for all major open LLMs!Some learning:- Qwen 72B is the king and Chinese open models are controling total- Previous assessments have actually become too simple for current ... June 26, 2024

Hugging Face's second leaderboard tests language designs across four jobs: knowledge testing, reasoning on very long contexts, intricate math capabilities, and direction following. Six criteria are utilized to check these qualities, links.gtanet.com.br with tests consisting of solving 1,000-word murder mysteries, explaining PhD-level concerns in layman's terms, and most challenging of all: high-school mathematics formulas. A complete breakdown of the benchmarks used can be found on Hugging Face's blog.

The frontrunner of the brand-new leaderboard is Qwen, Alibaba's LLM, which takes 1st, 3rd, and 10th place with its handful of variations. Also showing up are Llama3-70B, Meta's LLM, and a handful of smaller that handled to outshine the pack. Notably missing is any sign of ChatGPT