Apple Unveils Open-Source LLM DCLM-Baseline 7B with 7 Billion Parameters

Apple Unveils Open-Source LLM DCLM-Baseline 7B with 7 Billion Parameters

Apple releases the DCLM-Baseline 7B, a new open-source language model with 7 billion parameters, available on Hugging Face and in Transformers.

Key Points
  • Apple releases DCLM-Baseline 7B, an open-source language model with 7 billion parameters.
  • The model integrates data from DCLM-BASELINE, StarCoder, and ProofPile2, achieving an MMLU score of 0.6372.
  • Available on Hugging Face and in Transformers, licensed under the Apple Sample Code License.
  • Developed using PyTorch with the OpenLM framework, matching the performance of closed-dataset models like Mistral.
  • Follows Apple’s AI advancements, including Apple Intelligence for Siri and previous open-source models like MM1 and ReALM.

Apple has introduced a new open-source language model, DCLM-Baseline 7B, featuring 7 billion parameters. This robust model, which includes weights, training code, and the dataset, has been trained on 2.5 trillion tokens from various open datasets. Primarily utilizing English data, the model boasts a 2048-token context window, making it a significant addition to the field of large language models (LLMs).

The DCLM-Baseline 7B integrates data from DCLM-BASELINE, StarCoder, and ProofPile2, resulting in an impressive MMLU score of 0.6372. This performance places it between other notable models like Mistral and Llama3 in terms of metrics. Licensed under the Apple Sample Code License, the model is readily available on platforms such as Hugging Face and in Transformers. It has been trained using PyTorch with the OpenLM framework, achieving performance levels comparable to closed-dataset models like Mistral.

This announcement follows Apple’s unveiling of Apple Intelligence at WWDC 2024, aimed at enhancing Siri’s capabilities with generative AI. Apple developed a 3 billion parameter on-device language model and a larger server-based model accessible via Private Cloud Compute on Apple silicon servers. These models were crafted using Apple’s AXLearn framework, an open-source project based on JAX and XLA.

In addition to DCLM-Baseline 7B, Apple has previously open-sourced the MM1 series, which features multimodal AI models with 30 billion parameters, and ReALM, combining text and images to improve interaction. Another significant release was ‘Ferret-UI,’ a multimodal AI model designed for precise task execution concerning user interfaces and handling open-ended language instructions. Ferret-UI’s core focus lies in its multimodal capabilities, combining advanced language understanding with visual comprehension tailored specifically for mobile UI screens, incorporating referring, grounding, and reasoning capabilities.

The DCLM-Baseline 7B model represents a continuation of Apple’s commitment to advancing AI and making powerful tools accessible to the developer community. By open-sourcing these models, Apple is fostering innovation and enabling researchers and developers to build on its cutting-edge technology.