Open Source AI taking forward leaps

Most progressive week in the history of Open Source AI (yet):

  1. Mistral (in collaboration with Nvidia) dropped Apache 2.0 licensed NeMo 12B LLM, better than L3 8B and Gemma 2 9B. Models are multilingual with 128K context and a highly efficient tokenizer — tekken.
  1. Apple released DCLM 7B — truly open source LLM, based on OpenELM, trained on 2.5T tokens with 63.72 MMLU (better than Mistral 7B)
  1. HF shared SmolLM — 135M, 360M, & 1.7B Smol LMs capable of running directly in the browser; they beat Qwen 1.5B, Phi 1.5B and more. Trained on just 650B tokens.
  1. Groq put out Llama 3 8B & 70B tool use & function calling model checkpoints — achieves 90.76% accuracy on Berkely Function Calling Leaderboard (BFCL). Excels at API usage & structured data manipulation!
  1. Salesforce released xLAM 1.35B & 7B Large Action Models along with 60K instruction fine-tuning dataset. The 7B model scores 88.24% on BFCL & 2B 78.94%
  1. Deepseek changed the game with v2 chat 0628 — The best open LLM on LYMSYS arena right now — 236B parameter model with 21B active parameters. It also excels at coding (rank #3) and arena hard problems (rank #3)

There’s a lot more; Arcee (mergekit) released a series of LLMs, each better than the other, and Numina and HF Numina 72B (based on Qwen 2) and Math datasets, Mixbread with embedding models (english + german) and a lot more!




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • What Is the Technical Debt of Large Language Models (LLMs) and How Does It Affect Us
  • Taking a Deep Dive into YOLOv8
  • Brief Timeline of YOLO models
  • Summarizing YouTube Videos using Python and Online AI Tools
  • Where do Pakistan stand in AI race?