Topic: ai inference

  • AI Ignites the Cloud-Native Computing Boom

    AI Ignites the Cloud-Native Computing Boom

    The rapid expansion of artificial intelligence is driving massive growth in cloud-native computing, with AI inference workloads moving from isolated training into widespread enterprise use, creating unprecedented demand for scalable infrastructure. A new generation of cloud-native inference engin...

    Read More »
  • Modal Labs in Talks for $2.5B Valuation Funding Round

    Modal Labs in Talks for $2.5B Valuation Funding Round

    Modal Labs is reportedly negotiating a major funding round that would value the AI infrastructure company at roughly $2.5 billion, a dramatic increase from its valuation five months ago. The company specializes in optimizing AI inference to reduce computing costs and latency, placing it in a high...

    Read More »
  • Microsoft Unveils New AI Chip for Faster Inference

    Microsoft Unveils New AI Chip for Faster Inference

    Microsoft has launched the Maia 200, a custom AI chip designed to efficiently run AI inference workloads, addressing the high computational demands of using trained models. The chip offers a major performance leap with over 100 billion transistors, delivering over 10 petaflops in 4-bit precision ...

    Read More »
  • Phison's aiDAPTIV+ Boosts PC AI Speed 10X, Expands Model Size 3X

    Phison's aiDAPTIV+ Boosts PC AI Speed 10X, Expands Model Size 3X

    Phison's aiDAPTIV+ technology dramatically accelerates AI on consumer PCs by using high-speed SSD storage as an extension of GPU memory, eliminating redundant computations and enabling up to 10x faster inference. This innovation allows systems with modest hardware, like entry-level graphics and l...

    Read More »
  • Elon Musk Predicts the Future of AI Gadgets: 'Devices Will Just Be...'

    Elon Musk Predicts the Future of AI Gadgets: 'Devices Will Just Be...'

    Elon Musk envisions a future where AI processing shifts to edge computing, with devices handling tasks locally rather than relying on distant servers, potentially making traditional operating systems obsolete. He predicts that everyday electronics like smartphones and laptops will become "edge no...

    Read More »
  • Reface and Prisma Founders Launch Mirai for On-Device AI

    Reface and Prisma Founders Launch Mirai for On-Device AI

    Mirai, a startup founded by the creators of Reface and Prisma, is developing technology to make AI models run faster and more efficiently on personal devices like smartphones, shifting toward **on-device AI** to reduce costs and improve privacy. The company's core product is a framework, includin...

    Read More »
  • Nvidia DGX Spark Update Slashes Idle Power by 32%

    Nvidia DGX Spark Update Slashes Idle Power by 32%

    Nvidia's DGX Spark, a compact AI development platform, received a software update that significantly reduces its idle power consumption, enhancing its value for energy-sensitive deployments. The update enables hot-plug detection for the ConnectX 7 NIC, allowing it to enter a low-power state, whic...

    Read More »
  • OpenAI's $10B Deal with Cerebras for AI Compute Power

    OpenAI's $10B Deal with Cerebras for AI Compute Power

    OpenAI has entered a multi-year, multi-billion dollar partnership with chipmaker Cerebras to secure 750 megawatts of dedicated compute capacity, aiming to power next-generation AI applications. The deal aims to dramatically improve AI performance by using Cerebras's specialized systems to acceler...

    Read More »
  • Clarifai's New AI Engine Boosts Speed, Cuts Costs

    Clarifai's New AI Engine Boosts Speed, Cuts Costs

    Clarifai has launched a reasoning engine that doubles AI processing speeds and cuts operational costs by up to 40%, offering a hardware-agnostic solution for businesses. The engine uses advanced optimizations like CUDA kernel enhancements and speculative decoding to boost performance on existing ...

    Read More »
  • Intel Unveils 160GB Energy-Efficient Inference GPU in New Annual Release

    Intel Unveils 160GB Energy-Efficient Inference GPU in New Annual Release

    Intel has launched the Crescent Island data center GPU, a 160GB model optimized for energy-efficient AI inference, marking the start of an annual GPU release schedule to compete in the AI infrastructure market. The GPU, built with Xe3P microarchitecture, is designed for high performance-per-watt ...

    Read More »
  • Mistral's AI Environmental Audit Reveals Planet Impact

    Mistral's AI Environmental Audit Reveals Planet Impact

    Mistral's environmental audit of its Large 2 AI model revealed that training and running queries account for over 85% of emissions and 91% of water use, highlighting key areas for mitigation. While individual AI interactions have modest impacts (1.14g CO₂ per text page), cumulative effects are su...

    Read More »
  • Watch Nvidia GTC 2026: Jensen Huang's Keynote Livestream

    Watch Nvidia GTC 2026: Jensen Huang's Keynote Livestream

    Nvidia's GTC conference keynote, led by CEO Jensen Huang, is the primary venue for announcing major new products and partnerships, focusing this year on the future of AI and computing. The event will highlight AI's impact across sectors like healthcare and robotics, with anticipated announcements...

    Read More »
  • Phison CEO on 244TB SSDs, PLC NAND, and the Problem with High Bandwidth Flash

    Phison CEO on 244TB SSDs, PLC NAND, and the Problem with High Bandwidth Flash

    The primary bottleneck for deploying advanced AI is insufficient memory, not processing power, which can cause system crashes and degrade user experience by creating long delays like a slow Time to First Token. Phison's aiDAPTIV+ technology addresses this by using high-capacity SSDs as an expande...

    Read More »
  • OpenAI's Microsoft Payments Revealed in Leaked Docs

    OpenAI's Microsoft Payments Revealed in Leaked Docs

    Microsoft received significant revenue share payments from OpenAI, totaling $493.8 million in 2024 and $865.8 million in the first three quarters of 2025, under a 20% revenue-sharing agreement from their $13 billion investment. OpenAI's revenue has grown substantially, with estimates of at least ...

    Read More »
  • Cerebras Raises $1.1B While Still Private a Year After IPO Filing

    Cerebras Raises $1.1B While Still Private a Year After IPO Filing

    Cerebras Systems raised $1.1 billion in Series G funding, reaching an $8.1 billion valuation, co-led by Fidelity and Atreides Management, despite previous IPO plans for 2025. The company's growth is driven by strong demand for its AI inference services launched in 2024, leading to workforce expan...

    Read More »
  • Nvidia and Meta Forge New Era of Computing Power

    Nvidia and Meta Forge New Era of Computing Power

    Nvidia is expanding its AI strategy beyond GPU-based training to include inference and agentic software, highlighted by a major deal with Meta involving billions in hardware, including its Grace CPU. The partnership marks a strategic shift as agentic AI workloads, which require more logical decis...

    Read More »
  • Mistral AI Acquires Koyeb to Boost Cloud Strategy

    Mistral AI Acquires Koyeb to Boost Cloud Strategy

    Mistral AI has acquired the Paris-based startup Koyeb, marking its first acquisition to strengthen its position in the AI market. The deal signals a strategic shift for Mistral from pure research to becoming a full-stack AI provider, accelerating its Mistral Compute cloud service. Koyeb's technol...

    Read More »
  • Microsoft to Keep Buying Nvidia, AMD AI Chips Despite In-House Designs

    Microsoft to Keep Buying Nvidia, AMD AI Chips Despite In-House Designs

    Microsoft has deployed its first custom AI accelerator, the Maia 200, designed as an "AI inference powerhouse" for running trained models, with performance reportedly surpassing rival chips. The company's strategy involves parallel innovation and partnership, as CEO Satya Nadella emphasized conti...

    Read More »
  • Qualcomm Repurposes Phone Chips to Challenge Nvidia in AI

    Qualcomm Repurposes Phone Chips to Challenge Nvidia in AI

    Qualcomm is launching two new AI processors, the AI200 and AI250, to directly compete with Nvidia by leveraging its mobile neural processing architecture. The chips are specialized for AI inference tasks, using technology from Qualcomm's Hexagon NPUs to run pre-trained models efficiently and scal...

    Read More »
  • Positron Secures $230M to Challenge Nvidia in AI Chip Race

    Positron Secures $230M to Challenge Nvidia in AI Chip Race

    A Reno-based semiconductor startup, Positron, raised $230 million in Series B funding to accelerate its challenge to Nvidia by deploying specialized, high-speed memory chips for AI hardware. The funding round included Qatar's sovereign wealth fund, aligning with Qatar's strategic ambition to beco...

    Read More »
  • Cloudflare Buys Replicate to Power Global Serverless AI

    Cloudflare Buys Replicate to Power Global Serverless AI

    Cloudflare's acquisition of Replicate integrates a vast AI model library into its Workers platform, enabling developers to deploy sophisticated AI applications globally with minimal code. The move addresses the complexity and high costs of managing AI infrastructure, such as specialized GPU hardw...

    Read More »
  • XCENA's MX1 Memory: Thousands of RISC-V Cores, CXL 3.2 & SSD Tiering

    XCENA's MX1 Memory: Thousands of RISC-V Cores, CXL 3.2 & SSD Tiering

    XCENA unveiled the MX1 computational memory platform at FMS 2025, which uses near-data processing and PCIe Gen6/CXL 3.2 to reduce latency and energy by placing compute resources directly alongside DRAM. The MX1 incorporates thousands of custom RISC-V cores for demanding workloads like AI and anal...

    Read More »
  • Microsoft's 132-Core Azure Cobalt 200 CPU Targets Performance Boost

    Microsoft's 132-Core Azure Cobalt 200 CPU Targets Performance Boost

    Microsoft has launched the Azure Cobalt 200, a 132-core Arm-based CPU built on TSMC's 3nm process, designed to enhance performance and efficiency for its cloud services. The processor offers over 50% higher performance than its predecessor, integrates hardware accelerators for compression and cry...

    Read More »
  • Meta's $100B AMD Deal Powers AI 'Superintelligence' Push

    Meta's $100B AMD Deal Powers AI 'Superintelligence' Push

    Meta has entered a multiyear agreement to purchase up to $100 billion worth of AMD processors, a strategic move to diversify its chip supply and fuel its AI data centers. The deal includes a performance-based warrant for Meta to acquire up to 10% of AMD stock, with a final tranche contingent on A...

    Read More »
  • The Brutal Economics of Orbital AI

    The Brutal Economics of Orbital AI

    The vision for orbital AI data centers is driven by the belief that space will soon be the cheapest location for AI compute, with major companies racing to develop prototypes despite significant technical and economic hurdles. The primary economic barriers are the immense costs of launch and sate...

    Read More »
  • Amazon Unveils New AI Chip, Hints at Nvidia Partnership

    Amazon Unveils New AI Chip, Hints at Nvidia Partnership

    AWS launched its new Trainium3 AI training chip and UltraServer system, promising major performance gains and a focus on improved energy efficiency for AI workloads. The Trainium3 chip offers over four times the speed and memory of its predecessor, with systems scalable to clusters of up to 1 mil...

    Read More »
  • AMD to power OpenAI with $10B+ chip deal for 6GW compute

    AMD to power OpenAI with $10B+ chip deal for 6GW compute

    AMD has entered a multi-year agreement to supply OpenAI with six gigawatts of compute capacity using its Instinct GPU accelerators, starting with the MI450 series in late 2026. As part of the deal, OpenAI received an option to purchase up to 160 million AMD shares, vesting with compute delivery a...

    Read More »
  • OpenAI Pauses ChatGPT's Model Router for Most Users

    OpenAI Pauses ChatGPT's Model Router for Most Users

    OpenAI has removed the automated model router for its free and $5 Go tier users, reverting them to the default GPT-5.2 Instant model to reduce operational costs and address user retention metrics. The router, which automatically directed complex queries to advanced reasoning models, led to a sign...

    Read More »
  • China Approves Import of Nvidia's High-End AI Chips

    China Approves Import of Nvidia's High-End AI Chips

    China has approved three major tech firms, ByteDance, Alibaba, and Tencent, to import over 400,000 of Nvidia's advanced H200 AI chips, marking a shift from a previous weeks-long suspension of these shipments. The H200 chip represents a significant performance leap, being roughly six times more capa...

    Read More »