Reports

OpenAI's o3 Model Underperforms as Microsoft Reveals AI Ad

By Agentive Studio

April 29, 2025

It's Wednesday, April 28, 2025, and you're reading the Agentive Daily Report.

Busy People Section

OpenAI's released o3 model significantly underperforms its preview version on advanced reasoning benchmarks

Microsoft secretly used AI to create a Surface ad that seamlessly blended with real footage, going unnoticed for months

Australian radio station CADA aired a four-hour AI-hosted show for months without disclosing the host wasn't human

DeepSeek is launching its R2 model, reportedly 97% cheaper than GPT-4 and fully trained on Huawei chips

Google discovered and fixed a critical vulnerability in Gemini's PDF uploads that allowed crafted PDFs to extract data

xAI Holdings is seeking $20 billion in funding, which would value the company at over $120 billion

Today's Top Stories

OpenAI's Released o3 Model Dramatically Underperforms Preview Version

OpenAI's official o3 model has fallen significantly short of expectations in recent evaluations, particularly in reasoning tests. According to the ARC Prize Foundation, the released version scored only 41% on low compute and 53% on medium compute for the ARC-AGI-1 benchmark, far behind the impressive 76% and 88% results demonstrated by the December 2024 preview version.

This performance gap appears to stem from architectural changes, with the new multimodal o3 optimised for chat and product use rather than the advanced reasoning showcased in the preview. Despite outperforming earlier models like o1, this dramatic regression highlights how commercial demands and production constraints can diminish a model's specialised capabilities. The situation raises important questions about the tradeoffs between commercial viability and the pursuit of more advanced reasoning systems.

Microsoft Secretly Used AI to Create a Surface Ad That Fooled Everyone

Microsoft revealed that a one-minute Surface device advertisement released in January on YouTube was partially created using generative AI tools, with virtually no one noticing until the company disclosed this fact in a recent blog post. The ad combined real footage with AI-generated content, using AI for scripting, storyboarding, and some scene creation while reserving real footage for close-ups requiring precise movements.

This represents a significant milestone in the evolution of AI-generated media, demonstrating that AI can now produce content indistinguishable from professionally shot footage when strategically combined with real elements. The revelation points to a future where creative work increasingly involves the skilled editing and direction of AI-generated content rather than replacing human creativity entirely – a powerful demonstration of how imperceptibly AI has already entered commercial production pipelines.

Australian Radio Station Secretly Aired AI Host For Months

CADA Radio in Sydney has been running a four-hour daily music show hosted by an AI personality called "Thy" since November 2024, gaining at least 72,000 listeners who had no idea they were listening to an artificial host. The voice was created using ElevenLabs technology and modeled after a real employee, but the station never disclosed to its audience that the host wasn't human.

This case raises serious ethical concerns about transparency in media. Listeners developed a relationship with a host they believed was a real person, violating the implicit trust between media outlets and their audience. As AI becomes increasingly capable of mimicking human interaction, this incident highlights the urgent need for disclosure standards and transparency requirements before such practices become normalised across the industry.

Other Developments Worth Noting

DeepSeek's Ultra-Affordable R2 Model: DeepSeek is launching its R2 model, reportedly 97% cheaper than GPT-4 and fully trained on Huawei chips with complete vertical integration, potentially disrupting the AI pricing landscape significantly.
Google's Gemini Vulnerability Fix: Google addressed a critical vulnerability in Gemini that allowed specially crafted PDFs to extract data when uploaded, including cached data from previous conversations, highlighting the ongoing security challenges in AI systems.
Voice AI in Healthcare: Synthflow's voice AI helped healthcare provider Medbelle increase answered calls by 60%, double qualified appointments, and reduce no-show rates by 30%, demonstrating real-world benefits of conversational AI in administrative healthcare processes.
Anthropic's AI Consciousness Research: Anthropic researchers studying AI consciousness found deep uncertainty among experts, who estimate current AI models are 0.15% to 15% likely to be conscious, concluding this likelihood will increase as models advance and suggesting preparation for the moral implications of AI welfare.
DeepMind's Lyria 2 Music Generation: Google DeepMind launched Lyria 2, their most controllable AI text-to-music interface yet, advancing the frontier of creative AI capabilities in the audio domain.

New Tools Discovered

Magnitude: An AI-native testing framework for web apps that uses visual AI agents to adapt to UI changes, allowing developers to build test cases with natural language.
Rabbit internOS: A platform that transforms simple requests into functional web tools like math games or interactive quizzes using just a natural language prompt.
PageOn.AI 2.0: A tool for creating rich, interactive visual communications that goes beyond traditional slides, integrating text, images, charts, and 3D models.
RightNow AI: A specialised tool for CUDA engineers that automatically profiles, detects bottlenecks, and optimises GPU kernels without requiring deep expertise.
AI Presentation Narrator: A free tool that automatically generates AI voiceovers for presentations, eliminating the need to record your own narration.

Discover more tools at Agentive.Directory

That's a wrap for today! Thank you for reading this report.

Have thoughts on today's edition? Hit reply and let us know what you're thinking. Or if you've discovered a cool AI tool we should feature, drop us a line.

Until tomorrow,
Hak from Agentive.Studio

Reports

OpenAI's o3 Model Underperforms as Microsoft Reveals AI Ad

Busy People Section

Today's Top Stories

OpenAI's Released o3 Model Dramatically Underperforms Preview Version

Microsoft Secretly Used AI to Create a Surface Ad That Fooled Everyone

Australian Radio Station Secretly Aired AI Host For Months

Other Developments Worth Noting

New Tools Discovered

Read Next

China's AI Giants Unveil Groundbreaking Models Amidst Tension

OpenAI Warns: ChatGPT Chats Could Be Subpoenaed

US AI Action Plan Unveils Major Infrastructure Overhaul

OpenAI Unveils Modular GPT-5 with Memory Upgrades