X.AI Launches Grok 3 and Cost-Effective Mini Version

Good morning
It's Monday, April 21, 2025, and you're reading the Agentive Daily Report, where we cut through the noise of the AI sphere to bring you what actually matters. Let's dive into what's caught our eyes the most today.
TL;DR for busy people
- X.AI released Grok 3 and Grok 3-mini via API, with the mini version offering competitive performance at a fraction of the cost
- Google's Gemini 3 QAT models dramatically reduce VRAM requirements from 54GB to 14.1GB while maintaining performance
- A new "VideoGameBench" benchmark testing LLMs on DOS/GB games like DOOM II reveals current limitations in real-time visual-reasoning
- Seedream 3.0 has tied with GPT-4o for top position in Artificial Analysis image generation rankings
- OpenAI introduced "Flex" pricing at 50% reduced cost for non-time-sensitive API usage
Today's Top Stories
Grok 3 & 3-mini Join the API Race with Competitive Pricing
X.AI has released Grok 3 and Grok 3-mini to their API, making these models available for developer integration. At 50 cents per output token, Grok 3-mini is positioning itself as a cost-effective alternative to larger models while claiming to deliver competitive performance. The mini variant appears particularly strong at tool use, though some users report it can be overly aggressive in calling tools.
This release marks X.AI's strategic push to compete in the developer ecosystem, with Grok 3-mini offering a pricing advantage at approximately 1/7th the output token cost of comparable models like Gemini 2.5 Flash. As competition in the API space intensifies, we're seeing more deliberate price-performance positioning from providers trying to carve out particular niches.
Google's Gemma 3 QAT Dramatically Reduces VRAM Requirements
Google has released Quantization-Aware Training (QAT) variants of their Gemma 3 models, cutting VRAM requirements by over 74% while preserving model quality. The 27B parameter model's memory footprint drops from 54GB to just 14.1GB, making advanced AI accessible on consumer hardware.
QAT differs from standard post-training quantization by integrating quantization effects during the training process, resulting in exceptional quality preservation when using 4-bit precision. The models are available across major platforms including MLX, llama.cpp, Ollama, and LM Studio, positioning Gemma as a serious contender in the local AI ecosystem with broad compatibility and reduced computational demands.
VideoGameBench Tests LLMs' Ability to Play Classic Games
Researchers have introduced VideoGameBench, a new benchmark that evaluates vision-language models on their ability to play 20 classic DOS and Game Boy games in real-time. Leading models including GPT-4o, Claude Sonnet 3.7, and Gemini 2.5 Pro were tested but failed to clear even the first level of DOOM II.
This benchmark reveals significant limitations in current AI systems' ability to handle dynamic, embodied cognition tasks requiring rapid visual processing and decision-making. The results highlight that despite impressive performance on static benchmarks, today's most advanced models struggle with the kind of continuous real-time interaction humans find intuitive, exposing a gap in generalization capabilities.
Other Developments Worth Noting
- Seedream 3.0 Tops Image Generation Rankings: ByteDance's Seedream 3.0 has tied with OpenAI's GPT-4o at the top of Artificial Analysis arena rankings, outperforming Google's Imagen-3 and other competitors. This represents a significant advancement in ByteDance's AI capabilities, though some commenters note the benchmark may not fully reflect current model capabilities in areas like prompt understanding.
- OpenAI Introduces "Flex" Pricing: OpenAI has launched a new "Flex" pricing tier that offers 50% reduced costs for API usage with the trade-off of potentially slower response times. This represents a shift toward more granular pricing options and appears to be a response to competitive pressure in the AI API market, giving developers more cost flexibility for non-time-critical workloads.
- arXiv Moving to Google Cloud: The scientific preprint repository arXiv is migrating from Cornell-managed on-premise servers to Google Cloud Platform, coinciding with a major codebase rewrite. Technical commenters expressed concern about the simultaneous implementation of two major changes and potential vendor lock-in, highlighting tensions between modernizing infrastructure and maintaining independence.
- Microsoft Releases MAI-DS-R1: Microsoft has released MAI-DS-R1, a post-trained version of DeepSeek R1, showing substantial improvements in code completion benchmarks. The model was trained using 110k safety samples and 350k multilingual bias-focused data, representing a significant investment in large-scale model enhancement and fine-tuning.
- Perplexity Launches Telegram Bot: Perplexity has released a Telegram bot that allows users to interact with their AI directly within the messaging platform. The company has also eliminated citation token charges in their pricing scheme to make costs more predictable for developers.
New Tools Discovered
- CSM 1B: A real-time Text-to-Speech model that now supports streaming and LoRA-based fine-tuning, enabling fast local voice generation
- VideoGameBench: A benchmarking tool testing vision-language models' ability to play 20 classic games through visual input and text commands
- Clara: A local-first AI assistant desktop app running on Ollama with integrated n8n workflow automation, requiring no cloud services or API keys
- Concall-parser: A Python NLP tool for extracting structured data from earnings call reports, designed to handle the variability of PDF formats
- Embzip: A new Python library for embedding quantization, allowing one-line saving and loading of embeddings with product quantization
Discover more tools at Agentive.Directory
That's a wrap for today! Thank you for reading this report. Have thoughts on today's developments? Hit reply and let me know what you're thinking. Or if you've discovered a cool AI tool we should feature, drop me a line.
Until tomorrow,
Hak from Agentive.Studio