Microsoft Debuts MAI-Transcribe-1, Voice-1, and Image-2 AI Models
Image: VentureBeat

Microsoft Debuts MAI-Transcribe-1, Voice-1, and Image-2 AI Models

02 April, 2026.Technology and Science.7 sources

Key Takeaways

  • Microsoft unveils three MAI foundational models for text, speech, and image generation.
  • MAI-Transcribe-1 transcribes speech; MAI-Voice-1 and MAI-Image-2 cover audio and visuals.
  • In-house MAI stack aims to compete with OpenAI and Google.

Microsoft Launches Three AI Models

The models are positioned as faster, more cost-effective alternatives to OpenAI and Google.

Image from mezha.net
mezha.netmezha.net

MAI-Transcribe-1 supports 25 languages and runs 2.5 times faster than Azure Fast.

Advanced Speech Recognition

MAI-Transcribe-1 achieved the lowest average Word Error Rate on the FLEURS benchmark, averaging 3.8%.

It beats OpenAI's Whisper on all 25 languages tested.

Image from National Today
National TodayNational Today

Pricing starts from $0.36 per hour for transcription and $22 per 1 million characters for voice.

Humanist AI Vision

Suleyman framed the new models as part of a push to build Humanist AI.

The MAI Superintelligence team was formed in November 2025.

Image from TechCrunch
TechCrunchTechCrunch

More models will be introduced across Microsoft products.

More on Technology and Science