Decoder.sh

Exploring ML and LLMs

Meta's Llama3 - The Mistral Killer?

Description

Meta's LLama3 family of models in 8B and 30B flavors was just released and is already making waves in the open source community. With a much larger tokenizer, GQA for all model sizes, and 7.7 million GPU hours spent training on 15 TRILLION tokens, LLama3 seems primed to overtake incumbent models like Mistral and Gemini. I review the most important parts of the announcement before testing the new 8B model against my own battery of questions. Let's go!