Ollama Integrates Apple's MLX Framework, Boosts Performance on Apple Silicon Macs
Image: The New Stack

Ollama Integrates Apple's MLX Framework, Boosts Performance on Apple Silicon Macs

31 March, 2026.Technology and Science.7 sources

Key Takeaways

  • Ollama integrates Apple's MLX framework on Apple Silicon to accelerate local AI models.
  • MLX enables unified memory usage, boosting LLM processing speed on Macs.
  • NVFP4 format support for model compression improves memory efficiency.

MLX Integration

The update allows Ollama to take advantage of unified memory and GPU Neural Accelerators.

Image from App4Phone
App4PhoneApp4Phone

The update supports running the 35-billion-parameter variant of Alibaba’s Qwen3.5 model.

Caching and Compression

Ollama 0.19 improves caching by reusing cache across conversations.

The update adds support for Nvidia’s NVFP4 compression format.

Image from Ars Technica
Ars TechnicaArs Technica

These enhancements aim to make local models more practical.

Hardware Requirements

This could limit accessibility for many users.

Ollama is working to support additional models in future updates.

Implications

Ollama exemplifies the growing viability of local AI model execution.

The update positions Apple Silicon Macs as capable for AI development.

Image from iPhonote
iPhonoteiPhonote

More on Technology and Science