OpenAI's o3 Model Underperforms as Microsoft Reveals AI Ad

It's Wednesday, April 28, 2025, and you're reading the Agentive Daily Report.
Busy People Section
Today's Top Stories
OpenAI's Released o3 Model Dramatically Underperforms Preview Version
OpenAI's official o3 model has fallen significantly short of expectations in recent evaluations, particularly in reasoning tests. According to the ARC Prize Foundation, the released version scored only 41% on low compute and 53% on medium compute for the ARC-AGI-1 benchmark, far behind the impressive 76% and 88% results demonstrated by the December 2024 preview version.
This performance gap appears to stem from architectural changes, with the new multimodal o3 optimised for chat and product use rather than the advanced reasoning showcased in the preview. Despite outperforming earlier models like o1, this dramatic regression highlights how commercial demands and production constraints can diminish a model's specialised capabilities. The situation raises important questions about the tradeoffs between commercial viability and the pursuit of more advanced reasoning systems.
Microsoft Secretly Used AI to Create a Surface Ad That Fooled Everyone
Microsoft revealed that a one-minute Surface device advertisement released in January on YouTube was partially created using generative AI tools, with virtually no one noticing until the company disclosed this fact in a recent blog post. The ad combined real footage with AI-generated content, using AI for scripting, storyboarding, and some scene creation while reserving real footage for close-ups requiring precise movements.
This represents a significant milestone in the evolution of AI-generated media, demonstrating that AI can now produce content indistinguishable from professionally shot footage when strategically combined with real elements. The revelation points to a future where creative work increasingly involves the skilled editing and direction of AI-generated content rather than replacing human creativity entirely – a powerful demonstration of how imperceptibly AI has already entered commercial production pipelines.
Australian Radio Station Secretly Aired AI Host For Months
CADA Radio in Sydney has been running a four-hour daily music show hosted by an AI personality called "Thy" since November 2024, gaining at least 72,000 listeners who had no idea they were listening to an artificial host. The voice was created using ElevenLabs technology and modeled after a real employee, but the station never disclosed to its audience that the host wasn't human.
This case raises serious ethical concerns about transparency in media. Listeners developed a relationship with a host they believed was a real person, violating the implicit trust between media outlets and their audience. As AI becomes increasingly capable of mimicking human interaction, this incident highlights the urgent need for disclosure standards and transparency requirements before such practices become normalised across the industry.
Other Developments Worth Noting
- DeepSeek's Ultra-Affordable R2 Model: DeepSeek is launching its R2 model, reportedly 97% cheaper than GPT-4 and fully trained on Huawei chips with complete vertical integration, potentially disrupting the AI pricing landscape significantly.
- Google's Gemini Vulnerability Fix: Google addressed a critical vulnerability in Gemini that allowed specially crafted PDFs to extract data when uploaded, including cached data from previous conversations, highlighting the ongoing security challenges in AI systems.
- Voice AI in Healthcare: Synthflow's voice AI helped healthcare provider Medbelle increase answered calls by 60%, double qualified appointments, and reduce no-show rates by 30%, demonstrating real-world benefits of conversational AI in administrative healthcare processes.
- Anthropic's AI Consciousness Research: Anthropic researchers studying AI consciousness found deep uncertainty among experts, who estimate current AI models are 0.15% to 15% likely to be conscious, concluding this likelihood will increase as models advance and suggesting preparation for the moral implications of AI welfare.
- DeepMind's Lyria 2 Music Generation: Google DeepMind launched Lyria 2, their most controllable AI text-to-music interface yet, advancing the frontier of creative AI capabilities in the audio domain.
New Tools Discovered
- Magnitude: An AI-native testing framework for web apps that uses visual AI agents to adapt to UI changes, allowing developers to build test cases with natural language.
- Rabbit internOS: A platform that transforms simple requests into functional web tools like math games or interactive quizzes using just a natural language prompt.
- PageOn.AI 2.0: A tool for creating rich, interactive visual communications that goes beyond traditional slides, integrating text, images, charts, and 3D models.
- RightNow AI: A specialised tool for CUDA engineers that automatically profiles, detects bottlenecks, and optimises GPU kernels without requiring deep expertise.
- AI Presentation Narrator: A free tool that automatically generates AI voiceovers for presentations, eliminating the need to record your own narration.
Discover more tools at Agentive.Directory
That's a wrap for today! Thank you for reading this report.
Have thoughts on today's edition? Hit reply and let us know what you're thinking. Or if you've discovered a cool AI tool we should feature, drop us a line.
Until tomorrow,
Hak from Agentive.Studio