Gemini 3.1 Explained: Gemini 3 Pro/Flash, Nano Banana Pro, and Veo 3.1 (Features + API)
What “Gemini 3.1” Means in Practice
If you’ve seen people talk about “Gemini 3.1”, they often mean a bundle of updates: the Gemini 3 reasoning-first LLM family (Pro / Flash), plus the newest “.1” generative media model Veo 3.1 for video generation. Officially, Google brands the LLM family as Gemini 3 — while 3.1 shows up prominently on the video side (Veo 3.1).
This post focuses on the real, developer-relevant capabilities: thinking levels, media resolution, thought signatures, tool use, and where each model fits (text, images, and video).
Table of contents
- Quick summary
- Gemini 3 model family (Pro, Flash, Pro Image)
- Thinking levels: speed vs. depth
- Media resolution: better vision, predictable cost
- Thought signatures: the field you can’t ignore
- Tool use & agentic workflows
- Nano Banana Pro: image generation + editing
- Veo 3.1: video generation with native audio
- FAQ
- Conclusion
Quick summary
- Gemini 3 Pro is the flagship reasoning model for complex, high-stakes tasks (1M input / 64k output; Jan 2025 cutoff; preview model ID:
gemini-3-pro-preview). - Gemini 3 Flash delivers Pro-class capabilities with lower latency for high-frequency workflows (preview model ID:
gemini-3-flash-preview; includes a free tier in the Gemini API). - Nano Banana Pro (aka Gemini 3 Pro Image) is the high-quality image generation/editing model (preview model ID:
gemini-3-pro-image-preview). - New API controls:
thinking_level(latency vs. reasoning depth) andmedia_resolution(vision fidelity vs. token cost). - Thought signatures are required for strict workflows (especially function calling and image generation/editing). If your SDK doesn’t handle them, you must round-trip them.
- “3.1” highlight: Veo 3.1 is Google’s newest video generation model with native audio and high-end output options (preview model IDs:
veo-3.1-generate-preview,veo-3.1-fast-generate-preview). - Where to use it: Gemini API / AI Studio / Vertex AI, plus agentic environments like Google Antigravity and Gemini CLI.
The Gemini 3 model family
Gemini 3 is a reasoning-first model family designed for agentic workflows, autonomous coding, and multimodal tasks. The official developer guide lists these preview models and IDs:
| Model | Best for | Gemini API model ID | Context window (In / Out) | Knowledge cutoff |
|---|---|---|---|---|
| Gemini 3 Pro | Complex reasoning, long-context analysis, agentic coding | gemini-3-pro-preview |
1M / 64k | Jan 2025 |
| Gemini 3 Flash | Fast, cost-efficient reasoning + multimodal understanding | gemini-3-flash-preview |
1M / 64k | Jan 2025 |
| Gemini 3 Pro Image (Nano Banana Pro) | High-quality image generation & editing | gemini-3-pro-image-preview |
65k / 32k | Jan 2025 |

Source: deepmind.google
Nano Banana Pro (Gemini 3 Pro Image) is built for studio-quality image generation and editing — especially when you need crisp text and controlled layouts.
Thinking levels: speed vs. depth
Gemini 3 introduces thinking_level as a practical control knob for reasoning depth. If you want the
fastest possible responses (chat, high-throughput tasks), constrain thinking. If you need deeper reasoning
(debugging, planning, complex math), keep it high.
| thinking_level | What it optimizes | Typical use cases |
|---|---|---|
minimal (Flash only) |
Lowest latency | Chat, UI assistants, ultra-fast iteration loops |
low |
Lower latency & cost | Summaries, classification, simple instruction following |
medium (Flash only) |
Balanced | Most everyday dev workflows |
high (default) |
Maximum reasoning depth | Hard debugging, architecture decisions, multi-step reasoning |
Example (REST):
{
"contents": [{
"parts": [{ "text": "Find the race condition in this C++ snippet: [code here]" }]
}],
"generationConfig": {
"thinkingConfig": {
"thinkingLevel": "high"
}
}
}
Tip: Gemini 3 is tuned around a default temperature of 1.0. If you previously forced low temperature
for deterministic outputs, test removing it first — especially for complex reasoning.
Media resolution: better vision, predictable cost
For image/video understanding, media_resolution controls how many tokens the model may spend per image
(or per video frame). Higher settings can improve small-text OCR and fine details — at the cost of more tokens and latency.
| Setting | When to use | Trade-off |
|---|---|---|
media_resolution_low |
Basic visual understanding | Cheapest / fastest |
media_resolution_medium |
Documents, common screenshots | Good default for PDFs |
media_resolution_high |
Small text, UI details, dense diagrams | Higher token usage |
media_resolution_ultra_high |
Edge cases (very small details) | Most expensive; use sparingly |
Example snippet (per media part):
{
"parts": [
{ "text": "Read the small UI labels and explain what each button does." },
{
"inlineData": { "mimeType": "image/png", "data": "..." },
"mediaResolution": { "level": "media_resolution_high" }
}
]
}
Thought signatures: the field you can’t ignore
Thought signatures (thoughtSignature) are encrypted “reasoning state” blobs used by Gemini 3 to maintain
reasoning context across API calls. In strict flows — especially function calling and image generation/editing —
missing signatures can trigger 400 errors. If you use the official SDKs and standard history handling, this is usually automatic.
If you need to migrate history from older models or inject custom tool calls (where you don’t have a valid signature), the docs describe a specific dummy string you can use to bypass strict validation in that scenario:
"thoughtSignature": "context_engineering_is_the_way_to_go"
Tool use & agentic workflows
Gemini 3 supports built-in tools in the Gemini API (such as Search grounding, URL context, code execution, and file search), plus standard function calling for your own tools. In practice, this enables agent-like workflows: gather info, run code, produce structured outputs, and iterate — without leaving the model loop.
Practical note: built-in tools and custom function calling don’t always combine in a single request (depending on the endpoint/config), so design your orchestration with clear phases (tool step → model step → tool step).
Nano Banana Pro: image generation + editing
Nano Banana Pro (Gemini 3 Pro Image) is the image-focused model that shines when you need: crisp typography, controlled composition, and multi-turn edits. It’s designed for workflows where “make it look professional” isn’t optional — brand assets, UI mockups, posters, diagrams, and localized designs.
Veo 3.1: video generation with native audio
This is the part many people refer to when they say “3.1”: Veo 3.1 is Google’s state-of-the-art video generation model
available through the Gemini API (paid tier). It emphasizes cinematic motion, temporal consistency, and native audio generation.
There’s also a faster variant (veo-3.1-fast-generate-preview) for lower latency/cost workflows.

Source: ai.google.dev
Veo 3.1 uses provenance tech (including SynthID in Google’s ecosystem) to help identify AI-generated media and support responsible usage.
Example model IDs you’ll see in the Gemini API:
veo-3.1-generate-preview(highest quality)veo-3.1-fast-generate-preview(faster + cheaper)
Frequently Asked Questions (FAQ)
What’s the knowledge cutoff for Gemini 3 Pro and Flash?
Gemini 3 models list a knowledge cutoff of January 2025. For more recent info, use Search grounding when appropriate.
How big is the context window?
Gemini 3 Pro and Flash support up to 1 million input tokens and up to 64k output tokens (preview).
Is there a free tier?
Gemini 3 Flash (gemini-3-flash-preview) offers a free tier in the Gemini API (rate limits apply). Pro is typically paid in the API, while both can be tried in AI Studio.
Do I need to manually manage thought signatures?
If you use the official SDKs and standard chat history handling, signatures are usually handled automatically. If you manually build request history (or inject tool calls), you must round-trip signatures exactly as received — especially for strict flows.
Can Gemini 3 use Google Maps / Flights / Shopping as built-in tools?
Tool availability depends on the specific Gemini API tool set and endpoint. In the Gemini 3 developer guide, Search grounding, URL context, code execution, and file search are highlighted as built-in tools. Always confirm current tool support in the official docs before building hard dependencies.
Conclusion
The Gemini 3 generation is not just “bigger chat”: it’s a reasoning-first stack built for long context, multimodal inputs,
and agentic workflows — with practical controls like thinking_level and media_resolution that let you trade
latency/cost for deeper reasoning and better vision fidelity. On top, the “3.1” headline for many creators is Veo 3.1:
high-end video generation with native audio and cinematic control.
If you’re building tools, the biggest wins usually come from: (1) choosing the right model per task (Flash vs Pro vs Pro Image), (2) using thinking levels intentionally, and (3) treating thought signatures as “state” that must not be lost.
Source: YouTube