Gemini 3.1 Explained: Gemini 3 Pro/Flash, Nano Banana Pro, and Veo 3.1 (Features + API)

Avatar
Lisa Ernst · 19.02.2026 · Artificial Intelligence · 10 min

What “Gemini 3.1” Means in Practice

If you’ve seen people talk about “Gemini 3.1”, they often mean a bundle of updates: the Gemini 3 reasoning-first LLM family (Pro / Flash), plus the newest “.1” generative media model Veo 3.1 for video generation. Officially, Google brands the LLM family as Gemini 3 — while 3.1 shows up prominently on the video side (Veo 3.1).

This post focuses on the real, developer-relevant capabilities: thinking levels, media resolution, thought signatures, tool use, and where each model fits (text, images, and video).

Table of contents

  1. Quick summary
  2. Gemini 3 model family (Pro, Flash, Pro Image)
  3. Thinking levels: speed vs. depth
  4. Media resolution: better vision, predictable cost
  5. Thought signatures: the field you can’t ignore
  6. Tool use & agentic workflows
  7. Nano Banana Pro: image generation + editing
  8. Veo 3.1: video generation with native audio
  9. FAQ
  10. Conclusion

Quick summary

The Gemini 3 model family

Gemini 3 is a reasoning-first model family designed for agentic workflows, autonomous coding, and multimodal tasks. The official developer guide lists these preview models and IDs:

Model Best for Gemini API model ID Context window (In / Out) Knowledge cutoff
Gemini 3 Pro Complex reasoning, long-context analysis, agentic coding gemini-3-pro-preview 1M / 64k Jan 2025
Gemini 3 Flash Fast, cost-efficient reasoning + multimodal understanding gemini-3-flash-preview 1M / 64k Jan 2025
Gemini 3 Pro Image (Nano Banana Pro) High-quality image generation & editing gemini-3-pro-image-preview 65k / 32k Jan 2025
Nano Banana Pro logo (Gemini 3 Pro Image).

Source: deepmind.google

Nano Banana Pro (Gemini 3 Pro Image) is built for studio-quality image generation and editing — especially when you need crisp text and controlled layouts.

Thinking levels: speed vs. depth

Gemini 3 introduces thinking_level as a practical control knob for reasoning depth. If you want the fastest possible responses (chat, high-throughput tasks), constrain thinking. If you need deeper reasoning (debugging, planning, complex math), keep it high.

thinking_level What it optimizes Typical use cases
minimal (Flash only) Lowest latency Chat, UI assistants, ultra-fast iteration loops
low Lower latency & cost Summaries, classification, simple instruction following
medium (Flash only) Balanced Most everyday dev workflows
high (default) Maximum reasoning depth Hard debugging, architecture decisions, multi-step reasoning

Example (REST):

thinking_level.json
{
  "contents": [{
    "parts": [{ "text": "Find the race condition in this C++ snippet: [code here]" }]
  }],
  "generationConfig": {
    "thinkingConfig": {
      "thinkingLevel": "high"
    }
  }
}

Tip: Gemini 3 is tuned around a default temperature of 1.0. If you previously forced low temperature for deterministic outputs, test removing it first — especially for complex reasoning.

Media resolution: better vision, predictable cost

For image/video understanding, media_resolution controls how many tokens the model may spend per image (or per video frame). Higher settings can improve small-text OCR and fine details — at the cost of more tokens and latency.

Setting When to use Trade-off
media_resolution_low Basic visual understanding Cheapest / fastest
media_resolution_medium Documents, common screenshots Good default for PDFs
media_resolution_high Small text, UI details, dense diagrams Higher token usage
media_resolution_ultra_high Edge cases (very small details) Most expensive; use sparingly

Example snippet (per media part):

media_resolution.json
{
  "parts": [
    { "text": "Read the small UI labels and explain what each button does." },
    {
      "inlineData": { "mimeType": "image/png", "data": "..." },
      "mediaResolution": { "level": "media_resolution_high" }
    }
  ]
}

Thought signatures: the field you can’t ignore

Thought signatures (thoughtSignature) are encrypted “reasoning state” blobs used by Gemini 3 to maintain reasoning context across API calls. In strict flows — especially function calling and image generation/editing — missing signatures can trigger 400 errors. If you use the official SDKs and standard history handling, this is usually automatic.

If you need to migrate history from older models or inject custom tool calls (where you don’t have a valid signature), the docs describe a specific dummy string you can use to bypass strict validation in that scenario:

thought_signature.json
"thoughtSignature": "context_engineering_is_the_way_to_go"

Tool use & agentic workflows

Gemini 3 supports built-in tools in the Gemini API (such as Search grounding, URL context, code execution, and file search), plus standard function calling for your own tools. In practice, this enables agent-like workflows: gather info, run code, produce structured outputs, and iterate — without leaving the model loop.

Practical note: built-in tools and custom function calling don’t always combine in a single request (depending on the endpoint/config), so design your orchestration with clear phases (tool step → model step → tool step).

Nano Banana Pro: image generation + editing

Nano Banana Pro (Gemini 3 Pro Image) is the image-focused model that shines when you need: crisp typography, controlled composition, and multi-turn edits. It’s designed for workflows where “make it look professional” isn’t optional — brand assets, UI mockups, posters, diagrams, and localized designs.

Veo 3.1: video generation with native audio

This is the part many people refer to when they say “3.1”: Veo 3.1 is Google’s state-of-the-art video generation model available through the Gemini API (paid tier). It emphasizes cinematic motion, temporal consistency, and native audio generation. There’s also a faster variant (veo-3.1-fast-generate-preview) for lower latency/cost workflows.

SynthID visual mark (used for provenance and authenticity).

Source: ai.google.dev

Veo 3.1 uses provenance tech (including SynthID in Google’s ecosystem) to help identify AI-generated media and support responsible usage.

Example model IDs you’ll see in the Gemini API:

Frequently Asked Questions (FAQ)

What’s the knowledge cutoff for Gemini 3 Pro and Flash?

Gemini 3 models list a knowledge cutoff of January 2025. For more recent info, use Search grounding when appropriate.

How big is the context window?

Gemini 3 Pro and Flash support up to 1 million input tokens and up to 64k output tokens (preview).

Is there a free tier?

Gemini 3 Flash (gemini-3-flash-preview) offers a free tier in the Gemini API (rate limits apply). Pro is typically paid in the API, while both can be tried in AI Studio.

Do I need to manually manage thought signatures?

If you use the official SDKs and standard chat history handling, signatures are usually handled automatically. If you manually build request history (or inject tool calls), you must round-trip signatures exactly as received — especially for strict flows.

Can Gemini 3 use Google Maps / Flights / Shopping as built-in tools?

Tool availability depends on the specific Gemini API tool set and endpoint. In the Gemini 3 developer guide, Search grounding, URL context, code execution, and file search are highlighted as built-in tools. Always confirm current tool support in the official docs before building hard dependencies.

Conclusion

The Gemini 3 generation is not just “bigger chat”: it’s a reasoning-first stack built for long context, multimodal inputs, and agentic workflows — with practical controls like thinking_level and media_resolution that let you trade latency/cost for deeper reasoning and better vision fidelity. On top, the “3.1” headline for many creators is Veo 3.1: high-end video generation with native audio and cinematic control.

If you’re building tools, the biggest wins usually come from: (1) choosing the right model per task (Flash vs Pro vs Pro Image), (2) using thinking levels intentionally, and (3) treating thought signatures as “state” that must not be lost.

Source: YouTube

Share our post!
Sources