Topic: ai inference
-
AI Ignites the Cloud-Native Computing Boom
The rapid expansion of artificial intelligence is driving massive growth in cloud-native computing, with AI inference workloads moving from isolated training into widespread enterprise use, creating unprecedented demand for scalable infrastructure. A new generation of cloud-native inference engin...
Read More » -
Modal Labs in Talks for $2.5B Valuation Funding Round
Modal Labs is reportedly negotiating a major funding round that would value the AI infrastructure company at roughly $2.5 billion, a dramatic increase from its valuation five months ago. The company specializes in optimizing AI inference to reduce computing costs and latency, placing it in a high...
Read More » -
Microsoft Unveils New AI Chip for Faster Inference
Microsoft has launched the Maia 200, a custom AI chip designed to efficiently run AI inference workloads, addressing the high computational demands of using trained models. The chip offers a major performance leap with over 100 billion transistors, delivering over 10 petaflops in 4-bit precision ...
Read More » -
Phison's aiDAPTIV+ Boosts PC AI Speed 10X, Expands Model Size 3X
Phison's aiDAPTIV+ technology dramatically accelerates AI on consumer PCs by using high-speed SSD storage as an extension of GPU memory, eliminating redundant computations and enabling up to 10x faster inference. This innovation allows systems with modest hardware, like entry-level graphics and l...
Read More » -
Elon Musk Predicts the Future of AI Gadgets: 'Devices Will Just Be...'
Elon Musk envisions a future where AI processing shifts to edge computing, with devices handling tasks locally rather than relying on distant servers, potentially making traditional operating systems obsolete. He predicts that everyday electronics like smartphones and laptops will become "edge no...
Read More » -
Nvidia DGX Spark Update Slashes Idle Power by 32%
Nvidia's DGX Spark, a compact AI development platform, received a software update that significantly reduces its idle power consumption, enhancing its value for energy-sensitive deployments. The update enables hot-plug detection for the ConnectX 7 NIC, allowing it to enter a low-power state, whic...
Read More » -
OpenAI's $10B Deal with Cerebras for AI Compute Power
OpenAI has entered a multi-year, multi-billion dollar partnership with chipmaker Cerebras to secure 750 megawatts of dedicated compute capacity, aiming to power next-generation AI applications. The deal aims to dramatically improve AI performance by using Cerebras's specialized systems to acceler...
Read More » -
Clarifai's New AI Engine Boosts Speed, Cuts Costs
Clarifai has launched a reasoning engine that doubles AI processing speeds and cuts operational costs by up to 40%, offering a hardware-agnostic solution for businesses. The engine uses advanced optimizations like CUDA kernel enhancements and speculative decoding to boost performance on existing ...
Read More » -
Intel Unveils 160GB Energy-Efficient Inference GPU in New Annual Release
Intel has launched the Crescent Island data center GPU, a 160GB model optimized for energy-efficient AI inference, marking the start of an annual GPU release schedule to compete in the AI infrastructure market. The GPU, built with Xe3P microarchitecture, is designed for high performance-per-watt ...
Read More » -
Mistral's AI Environmental Audit Reveals Planet Impact
Mistral's environmental audit of its Large 2 AI model revealed that training and running queries account for over 85% of emissions and 91% of water use, highlighting key areas for mitigation. While individual AI interactions have modest impacts (1.14g CO₂ per text page), cumulative effects are su...
Read More » -
Phison CEO on 244TB SSDs, PLC NAND, and the Problem with High Bandwidth Flash
The primary bottleneck for deploying advanced AI is insufficient memory, not processing power, which can cause system crashes and degrade user experience by creating long delays like a slow Time to First Token. Phison's aiDAPTIV+ technology addresses this by using high-capacity SSDs as an expande...
Read More » -
OpenAI's Microsoft Payments Revealed in Leaked Docs
Microsoft received significant revenue share payments from OpenAI, totaling $493.8 million in 2024 and $865.8 million in the first three quarters of 2025, under a 20% revenue-sharing agreement from their $13 billion investment. OpenAI's revenue has grown substantially, with estimates of at least ...
Read More » -
Cerebras Raises $1.1B While Still Private a Year After IPO Filing
Cerebras Systems raised $1.1 billion in Series G funding, reaching an $8.1 billion valuation, co-led by Fidelity and Atreides Management, despite previous IPO plans for 2025. The company's growth is driven by strong demand for its AI inference services launched in 2024, leading to workforce expan...
Read More » -
Microsoft to Keep Buying Nvidia, AMD AI Chips Despite In-House Designs
Microsoft has deployed its first custom AI accelerator, the Maia 200, designed as an "AI inference powerhouse" for running trained models, with performance reportedly surpassing rival chips. The company's strategy involves parallel innovation and partnership, as CEO Satya Nadella emphasized conti...
Read More » -
Qualcomm Repurposes Phone Chips to Challenge Nvidia in AI
Qualcomm is launching two new AI processors, the AI200 and AI250, to directly compete with Nvidia by leveraging its mobile neural processing architecture. The chips are specialized for AI inference tasks, using technology from Qualcomm's Hexagon NPUs to run pre-trained models efficiently and scal...
Read More » -
Positron Secures $230M to Challenge Nvidia in AI Chip Race
A Reno-based semiconductor startup, Positron, raised $230 million in Series B funding to accelerate its challenge to Nvidia by deploying specialized, high-speed memory chips for AI hardware. The funding round included Qatar's sovereign wealth fund, aligning with Qatar's strategic ambition to beco...
Read More » -
Cloudflare Buys Replicate to Power Global Serverless AI
Cloudflare's acquisition of Replicate integrates a vast AI model library into its Workers platform, enabling developers to deploy sophisticated AI applications globally with minimal code. The move addresses the complexity and high costs of managing AI infrastructure, such as specialized GPU hardw...
Read More » -
XCENA's MX1 Memory: Thousands of RISC-V Cores, CXL 3.2 & SSD Tiering
XCENA unveiled the MX1 computational memory platform at FMS 2025, which uses near-data processing and PCIe Gen6/CXL 3.2 to reduce latency and energy by placing compute resources directly alongside DRAM. The MX1 incorporates thousands of custom RISC-V cores for demanding workloads like AI and anal...
Read More » -
Microsoft's 132-Core Azure Cobalt 200 CPU Targets Performance Boost
Microsoft has launched the Azure Cobalt 200, a 132-core Arm-based CPU built on TSMC's 3nm process, designed to enhance performance and efficiency for its cloud services. The processor offers over 50% higher performance than its predecessor, integrates hardware accelerators for compression and cry...
Read More » -
The Brutal Economics of Orbital AI
The vision for orbital AI data centers is driven by the belief that space will soon be the cheapest location for AI compute, with major companies racing to develop prototypes despite significant technical and economic hurdles. The primary economic barriers are the immense costs of launch and sate...
Read More » -
Amazon Unveils New AI Chip, Hints at Nvidia Partnership
AWS launched its new Trainium3 AI training chip and UltraServer system, promising major performance gains and a focus on improved energy efficiency for AI workloads. The Trainium3 chip offers over four times the speed and memory of its predecessor, with systems scalable to clusters of up to 1 mil...
Read More » -
AMD to power OpenAI with $10B+ chip deal for 6GW compute
AMD has entered a multi-year agreement to supply OpenAI with six gigawatts of compute capacity using its Instinct GPU accelerators, starting with the MI450 series in late 2026. As part of the deal, OpenAI received an option to purchase up to 160 million AMD shares, vesting with compute delivery a...
Read More » -
OpenAI Pauses ChatGPT's Model Router for Most Users
OpenAI has removed the automated model router for its free and $5 Go tier users, reverting them to the default GPT-5.2 Instant model to reduce operational costs and address user retention metrics. The router, which automatically directed complex queries to advanced reasoning models, led to a sign...
Read More » -
China Approves Import of Nvidia's High-End AI Chips
China has approved three major tech firms, ByteDance, Alibaba, and Tencent, to import over 400,000 of Nvidia's advanced H200 AI chips, marking a shift from a previous weeks-long suspension of these shipments. The H200 chip represents a significant performance leap, being roughly six times more capa...
Read More »