AI & Tech Artificial Intelligence BigTech Companies Newswire Technology

Microsoft Launches 3 New Foundational AI Models

April 2, 2026Last Updated: April 2, 2026

2 minutes read

▼ Summary

– Microsoft AI released three new foundational AI models for generating text, voice, and images to build its own multimodal AI stack.
– The models are MAI-Transcribe-1 for speech transcription, MAI-Voice-1 for audio generation, and MAI-Image-2 for video generation.
– These models are now available on Microsoft Foundry, with the transcription and voice models also on MAI Playground.
– A key selling point is that these models are positioned as being cheaper than comparable offerings from Google and OpenAI.
– Microsoft reaffirmed its ongoing partnership with OpenAI despite this push to develop its own competing models.

Microsoft’s AI research division unveiled three new foundational models this week, marking a significant expansion of its in-house multimodal capabilities. This strategic release underscores the company’s ambition to develop a comprehensive AI model stack while maintaining its pivotal partnership with OpenAI. The new offerings include specialized models for transcription, audio generation, and video creation, all designed to be more cost-effective than competing solutions from major rivals.

The first model, MAI-Transcribe-1, provides multilingual speech-to-text conversion across 25 languages. Microsoft claims it operates 2.5 times faster than its existing Azure Fast transcription service. For audio generation, MAI-Voice-1 enables users to produce 60 seconds of synthesized speech in just one second and supports the creation of custom voices. The third model, MAI-Image-2, is focused on generating video content. It was initially previewed on the MAI Playground testing platform in March and is now part of this broader launch.

All three models are now available on Microsoft Foundry, the company’s platform for AI tools. The transcription and voice models are also accessible through MAI Playground. These developments originated from the MAI Superintelligence team, a research group led by Microsoft AI CEO Mustafa Suleyman that was established in November 2025.

In a blog post, Suleyman outlined the philosophy behind the new models, describing a focus on “Humanist AI.” He emphasized a design approach that centers on human communication patterns and practical utility. “You’ll see more models from us soon in Foundry and directly in Microsoft products and experiences,” Suleyman wrote. The company also positions competitive pricing as a key advantage, stating its models are priced lower than comparable offerings from Google and OpenAI.

Pricing for the new services starts at $0.36 per hour for MAI-Transcribe-1. The voice generation model, MAI-Voice-1, begins at $22 per million characters processed. For the video model, MAI-Image-2, costs are $5 for one million text input tokens and $33 for one million image output tokens.

Despite this push into proprietary model development, Microsoft reaffirmed its ongoing commitment to OpenAI. Suleyman noted in an interview that a recent renegotiation of the partnership terms actually facilitated Microsoft’s internal superintelligence research efforts. The company has invested over $13 billion in OpenAI and integrates its models widely across Microsoft’s product ecosystem. This dual strategy mirrors its approach in other areas, such as semiconductor procurement, where it both develops its own chips and sources from external suppliers.

(Source: TechCrunch)

Topics

microsoft ai models 98% Multimodal AI 95% ai video generation 90% ai transcription 90% ai voice generation 90% ai pricing strategy 88% microsoft openai partnership 87% ai research teams 85% humanist ai 83% ai model deployment 82%

Microsoft Launches 3 New Foundational AI Models

Topics

Nvidia’s open-source simulator trains surgical robots in under 2 minutes

Woman’s UTI bacteria evolved, invaded her brain after two years

Retina chip restoring sight now on sale in Europe

How AI is creating the universal entertainment app

Orcas Coordinate Attacks to Blow Up Sunfish

Science Corp’s vision-restoring chip receives EU approval

Why Human-Level AI Still Eludes Us

Study: China can convert plastic waste to jet fuel cheaply

Naked mole-rat queens suppress rivals with chemical signal