Image: The New Stack

Ollama Integrates Apple's MLX Framework, Boosts Performance on Apple Silicon Macs

31 March, 2026.Technology and Science.7 sources

Key Takeaways

Ollama integrates Apple's MLX framework on Apple Silicon to accelerate local AI models.
MLX enables unified memory usage, boosting LLM processing speed on Macs.
NVFP4 format support for model compression improves memory efficiency.

MLX Integration

Ollama integrated Apple’s MLX framework to boost performance on Apple Silicon Macs.

“Ollama, l’un des meilleurs outils pour exécuter des modèles d’IA localement sur Mac, vient de franchir un cap”

App4Phone

The update allows Ollama to take advantage of unified memory and GPU Neural Accelerators.

App4Phone

The update supports running the 35-billion-parameter variant of Alibaba’s Qwen3.5 model.

Caching and Compression

Ollama 0.19 improves caching by reusing cache across conversations.

The update adds support for Nvidia’s NVFP4 compression format.

Ars Technica

These enhancements aim to make local models more practical.

Hardware Requirements

Running the preview requires at least 32GB of unified memory.

“Ollama, a runtime system for operating large language models on a local computer, has introduced support for Apple’s open source MLX framework for machine learning”

Ars Technica

This could limit accessibility for many users.

Ollama is working to support additional models in future updates.