AI & TechArtificial IntelligenceBigTech CompaniesNewswireTechnology

Microsoft Launches 3 New Foundational AI Models

▼ Summary

– Microsoft AI released three new foundational AI models for generating text, voice, and images to build its own multimodal AI stack.
– The models are MAI-Transcribe-1 for speech transcription, MAI-Voice-1 for audio generation, and MAI-Image-2 for video generation.
– These models are now available on Microsoft Foundry, with the transcription and voice models also on MAI Playground.
– A key selling point is that these models are positioned as being cheaper than comparable offerings from Google and OpenAI.
– Microsoft reaffirmed its ongoing partnership with OpenAI despite this push to develop its own competing models.

Microsoft’s AI research division unveiled three new foundational models this week, marking a significant expansion of its in-house multimodal capabilities. This strategic release underscores the company’s ambition to develop a comprehensive AI model stack while maintaining its pivotal partnership with OpenAI. The new offerings include specialized models for transcription, audio generation, and video creation, all designed to be more cost-effective than competing solutions from major rivals.

The first model, MAI-Transcribe-1, provides multilingual speech-to-text conversion across 25 languages. Microsoft claims it operates 2.5 times faster than its existing Azure Fast transcription service. For audio generation, MAI-Voice-1 enables users to produce 60 seconds of synthesized speech in just one second and supports the creation of custom voices. The third model, MAI-Image-2, is focused on generating video content. It was initially previewed on the MAI Playground testing platform in March and is now part of this broader launch.

All three models are now available on Microsoft Foundry, the company’s platform for AI tools. The transcription and voice models are also accessible through MAI Playground. These developments originated from the MAI Superintelligence team, a research group led by Microsoft AI CEO Mustafa Suleyman that was established in November 2025.

In a blog post, Suleyman outlined the philosophy behind the new models, describing a focus on “Humanist AI.” He emphasized a design approach that centers on human communication patterns and practical utility. “You’ll see more models from us soon in Foundry and directly in Microsoft products and experiences,” Suleyman wrote. The company also positions competitive pricing as a key advantage, stating its models are priced lower than comparable offerings from Google and OpenAI.

Pricing for the new services starts at $0.36 per hour for MAI-Transcribe-1. The voice generation model, MAI-Voice-1, begins at $22 per million characters processed. For the video model, MAI-Image-2, costs are $5 for one million text input tokens and $33 for one million image output tokens.

Despite this push into proprietary model development, Microsoft reaffirmed its ongoing commitment to OpenAI. Suleyman noted in an interview that a recent renegotiation of the partnership terms actually facilitated Microsoft’s internal superintelligence research efforts. The company has invested over $13 billion in OpenAI and integrates its models widely across Microsoft’s product ecosystem. This dual strategy mirrors its approach in other areas, such as semiconductor procurement, where it both develops its own chips and sources from external suppliers.

(Source: TechCrunch)

Topics

microsoft ai models 98% Multimodal AI 95% ai video generation 90% ai transcription 90% ai voice generation 90% ai pricing strategy 88% microsoft openai partnership 87% ai research teams 85% humanist ai 83% ai model deployment 82%