Gemini 3.1 Explained: Gemini 3 Pro/Flash, Nano Banana Pro, and Veo 3.1 (Features + API)

Lisa Ernst · 19.02.2026 · Artificial Intelligence · 10 min

What “Gemini 3.1” Means in Practice

If you’ve seen people talk about “Gemini 3.1”, they often mean a bundle of updates: the Gemini 3 reasoning-first LLM family (Pro / Flash), plus the newest “.1” generative media model Veo 3.1 for video generation. Officially, Google brands the LLM family as Gemini 3 — while 3.1 shows up prominently on the video side (Veo 3.1).

This post focuses on the real, developer-relevant capabilities: thinking levels, media resolution, thought signatures, tool use, and where each model fits (text, images, and video).

Quick summary
Gemini 3 model family (Pro, Flash, Pro Image)
Thinking levels: speed vs. depth
Media resolution: better vision, predictable cost
Thought signatures: the field you can’t ignore
Tool use & agentic workflows
Nano Banana Pro: image generation + editing
Veo 3.1: video generation with native audio
FAQ
Conclusion

Quick summary

Gemini 3 Pro is the flagship reasoning model for complex, high-stakes tasks (1M input / 64k output; Jan 2025 cutoff; preview model ID: gemini-3-pro-preview).
Gemini 3 Flash delivers Pro-class capabilities with lower latency for high-frequency workflows (preview model ID: gemini-3-flash-preview; includes a free tier in the Gemini API).
Nano Banana Pro (aka Gemini 3 Pro Image) is the high-quality image generation/editing model (preview model ID: gemini-3-pro-image-preview).
New API controls: thinking_level (latency vs. reasoning depth) and media_resolution (vision fidelity vs. token cost).
Thought signatures are required for strict workflows (especially function calling and image generation/editing). If your SDK doesn’t handle them, you must round-trip them.
“3.1” highlight: Veo 3.1 is Google’s newest video generation model with native audio and high-end output options (preview model IDs: veo-3.1-generate-preview, veo-3.1-fast-generate-preview).
Where to use it: Gemini API / AI Studio / Vertex AI, plus agentic environments like Google Antigravity and Gemini CLI.

The Gemini 3 model family

Gemini 3 is a reasoning-first model family designed for agentic workflows, autonomous coding, and multimodal tasks. The official developer guide lists these preview models and IDs:

Model	Best for	Gemini API model ID	Context window (In / Out)	Knowledge cutoff
Gemini 3 Pro	Complex reasoning, long-context analysis, agentic coding	`gemini-3-pro-preview`	1M / 64k	Jan 2025
Gemini 3 Flash	Fast, cost-efficient reasoning + multimodal understanding	`gemini-3-flash-preview`	1M / 64k	Jan 2025
Gemini 3 Pro Image (Nano Banana Pro)	High-quality image generation & editing	`gemini-3-pro-image-preview`	65k / 32k	Jan 2025

Source: deepmind.google

Nano Banana Pro (Gemini 3 Pro Image) is built for studio-quality image generation and editing — especially when you need crisp text and controlled layouts.

Thinking levels: speed vs. depth

Gemini 3 introduces thinking_level as a practical control knob for reasoning depth. If you want the fastest possible responses (chat, high-throughput tasks), constrain thinking. If you need deeper reasoning (debugging, planning, complex math), keep it high.

thinking_level	What it optimizes	Typical use cases
`minimal` (Flash only)	Lowest latency	Chat, UI assistants, ultra-fast iteration loops
`low`	Lower latency & cost	Summaries, classification, simple instruction following
`medium` (Flash only)	Balanced	Most everyday dev workflows
`high` (default)	Maximum reasoning depth	Hard debugging, architecture decisions, multi-step reasoning

Example (REST):

thinking_level.json

{
  "contents": [{
    "parts": [{ "text": "Find the race condition in this C++ snippet: [code here]" }]
  }],
  "generationConfig": {
    "thinkingConfig": {
      "thinkingLevel": "high"
    }
  }
}

Tip: Gemini 3 is tuned around a default temperature of 1.0. If you previously forced low temperature for deterministic outputs, test removing it first — especially for complex reasoning.

Media resolution: better vision, predictable cost

For image/video understanding, media_resolution controls how many tokens the model may spend per image (or per video frame). Higher settings can improve small-text OCR and fine details — at the cost of more tokens and latency.

Setting	When to use	Trade-off
`media_resolution_low`	Basic visual understanding	Cheapest / fastest
`media_resolution_medium`	Documents, common screenshots	Good default for PDFs
`media_resolution_high`	Small text, UI details, dense diagrams	Higher token usage
`media_resolution_ultra_high`	Edge cases (very small details)	Most expensive; use sparingly

Example snippet (per media part):

media_resolution.json

{
  "parts": [
    { "text": "Read the small UI labels and explain what each button does." },
    {
      "inlineData": { "mimeType": "image/png", "data": "..." },
      "mediaResolution": { "level": "media_resolution_high" }
    }
  ]
}

Thought signatures: the field you can’t ignore

Thought signatures (thoughtSignature) are encrypted “reasoning state” blobs used by Gemini 3 to maintain reasoning context across API calls. In strict flows — especially function calling and image generation/editing — missing signatures can trigger 400 errors. If you use the official SDKs and standard history handling, this is usually automatic.

If you need to migrate history from older models or inject custom tool calls (where you don’t have a valid signature), the docs describe a specific dummy string you can use to bypass strict validation in that scenario:

thought_signature.json

"thoughtSignature": "context_engineering_is_the_way_to_go"

Tool use & agentic workflows

Gemini 3 supports built-in tools in the Gemini API (such as Search grounding, URL context, code execution, and file search), plus standard function calling for your own tools. In practice, this enables agent-like workflows: gather info, run code, produce structured outputs, and iterate — without leaving the model loop.

Practical note: built-in tools and custom function calling don’t always combine in a single request (depending on the endpoint/config), so design your orchestration with clear phases (tool step → model step → tool step).

Nano Banana Pro: image generation + editing

Nano Banana Pro (Gemini 3 Pro Image) is the image-focused model that shines when you need: crisp typography, controlled composition, and multi-turn edits. It’s designed for workflows where “make it look professional” isn’t optional — brand assets, UI mockups, posters, diagrams, and localized designs.

Veo 3.1: video generation with native audio

This is the part many people refer to when they say “3.1”: Veo 3.1 is Google’s state-of-the-art video generation model available through the Gemini API (paid tier). It emphasizes cinematic motion, temporal consistency, and native audio generation. There’s also a faster variant (veo-3.1-fast-generate-preview) for lower latency/cost workflows.

SynthID visual mark (used for provenance and authenticity).

Source: ai.google.dev

Veo 3.1 uses provenance tech (including SynthID in Google’s ecosystem) to help identify AI-generated media and support responsible usage.

Example model IDs you’ll see in the Gemini API:

veo-3.1-generate-preview (highest quality)
veo-3.1-fast-generate-preview (faster + cheaper)

Frequently Asked Questions (FAQ)

What’s the knowledge cutoff for Gemini 3 Pro and Flash?

Gemini 3 models list a knowledge cutoff of January 2025. For more recent info, use Search grounding when appropriate.

How big is the context window?

Gemini 3 Pro and Flash support up to 1 million input tokens and up to 64k output tokens (preview).

Is there a free tier?

Gemini 3 Flash (gemini-3-flash-preview) offers a free tier in the Gemini API (rate limits apply). Pro is typically paid in the API, while both can be tried in AI Studio.

Do I need to manually manage thought signatures?

If you use the official SDKs and standard chat history handling, signatures are usually handled automatically. If you manually build request history (or inject tool calls), you must round-trip signatures exactly as received — especially for strict flows.

Can Gemini 3 use Google Maps / Flights / Shopping as built-in tools?

Tool availability depends on the specific Gemini API tool set and endpoint. In the Gemini 3 developer guide, Search grounding, URL context, code execution, and file search are highlighted as built-in tools. Always confirm current tool support in the official docs before building hard dependencies.

Conclusion

The Gemini 3 generation is not just “bigger chat”: it’s a reasoning-first stack built for long context, multimodal inputs, and agentic workflows — with practical controls like thinking_level and media_resolution that let you trade latency/cost for deeper reasoning and better vision fidelity. On top, the “3.1” headline for many creators is Veo 3.1: high-end video generation with native audio and cinematic control.

If you’re building tools, the biggest wins usually come from: (1) choosing the right model per task (Flash vs Pro vs Pro Image), (2) using thinking levels intentionally, and (3) treating thought signatures as “state” that must not be lost.

Source: YouTube