M
Maik'sAIBlog
← Back to Home

Google Gemini 3.1 Pro: The New King of AI Reasoning

Maik Vreeling
GeminiGoogle
Google Gemini 3.1 Pro: The New King of AI Reasoning

1. The ARC-AGI-2 Breakthrough: 77.1%

The most staggering headline from the 3.1 release is its performance on the ARC-AGI-2 (Abstraction and Reasoning Corpus). This benchmark is notoriously difficult because it tests a model's ability to solve entirely new logic puzzles it has never seen in its training data—a true measure of fluid intelligence.

  • Gemini 3.1 Pro: 77.1%

  • Gemini 3 Pro (Legacy): 31.1%

  • GPT-5.2 (High): ~58% (est.)

By more than doubling the reasoning performance of its predecessor, Gemini 3.1 Pro has effectively "cracked" a benchmark that many researchers thought would take years to saturate. It currently leads the Artificial Analysis Intelligence Index in reasoning and knowledge, surfacing gaps in research that even Claude Opus 4.6 has been known to miss.

2. Animated SVG Generation & Creative Coding

One of the most practical additions is Gemini 3.1 Pro’s ability to generate animated SVG graphics and interactive experiences directly from a text prompt. Unlike video generators that export heavy pixel-based files, 3.1 Pro writes the animations in pure code.

  • Website-Ready Artifacts: These visuals stay perfectly sharp at any resolution, have tiny file sizes, and can be dropped directly into web projects.

  • Immersive Design: In one demonstration, the model coded a complex 3D starling murmuration. It didn't just generate the visual; it built an experience where users could manipulate the flock with hand-tracking and listen to a generative audio score.

  • Vibe-Driven UI: When asked to build a portfolio for Emily Brontë's Wuthering Heights, the model reasoned through the novel's atmospheric tone to design a sleek, contemporary interface rather than just summarizing the text.

3. "Thinking Levels": Granular Control

Gemini 3.1 Pro introduces Thinking Levels (Low, Medium, and High). This allows developers and power users to calibrate the model's reasoning depth based on the complexity of the task:

Level

Best For

Reasoning Depth

Cost Impact

Low

Autocomplete, classification

Minimal

Lowest

Medium

Code review, summarization

Balanced

Standard

High

Complex debugging, research

Deep / Recursive

Highest

This feature addresses the "laziness" often found in large models by forcing the AI to expend more "thought tokens" on difficult logic puzzles, while remaining efficient for simple lookups.

4. Technical Specifications & Agentic Performance

Google has maintained its industry-leading 1-million token context window, but the architectural improvements make it significantly more reliable at "needle-in-a-haystack" tasks.

  • Output Limit: Up to 64,000 tokens (roughly 50,000 words) in a single response.

  • Scientific Prowess: It scored a record 94.3% on GPQA Diamond, missing only a handful of expert-level scientific questions.

  • Coding: In the Terminal-Bench Hard benchmark, it achieved a 54% success rate, proving its ability to act as an autonomous engineer navigating a command line.

5. Pricing and Accessibility

Despite the massive jump in intelligence, Google has kept the pricing identical to the standard Gemini 3 Pro, making it the most cost-efficient frontier model in its class.

  • API Pricing: $2.00 per 1M input tokens / $12.00 per 1M output tokens.

  • Availability: Currently in Public Preview via the Gemini API (AI Studio) and Vertex AI.

  • Consumer Access: Rolling out to the Gemini App and NotebookLM for Google AI Pro and Ultra subscribers with higher usage limits.

Share this article