OpenAI's GDPval benchmark evaluates AI performance against human professionals in key economic sectors, showing models like GPT-5 and Claude Opus…
Read More »ai benchmarking
Google launched Gemini 2.5 Deep Think, its most advanced AI reasoning model, capable of solving complex problems by evaluating multiple…
Read More »But Glazer wanted to speed things up, so Epoch AI hosted the in-person meeting on Saturday, May 17, and Sunday,…
Read More »Alibaba's QwenLong-L1 framework enables large language models to analyze lengthy documents (hundreds of thousands of tokens) with high accuracy, addressing…
Read More »LM Arena raised $100 million in seed funding at a $600 million valuation, led by Andreessen Horowitz and UC Investments,…
Read More »Harvey, a $3B legal AI startup, is expanding its partnerships to include AI models from Anthropic and Google, moving beyond…
Read More »