Gemini API Pricing History (2024–2026): How Token Costs Evolved + What You Pay Today
Gemini’s pricing has changed meaningfully since the Gemini API went paid in 2024 — especially for high-volume “Flash” usage and long-context “Pro” workloads. If you build production features (agents, summarization pipelines, search grounding, coding assistants), understanding the price evolution matters as much as understanding today’s rate card.
Official current rates live here: Gemini Developer API pricing. This article focuses on the biggest price milestones (2024 → 2026) and then summarizes what you pay today.
Quick Summary
- 2024 → 2026 = major repricing: Flash became dramatically cheaper (high volume), Pro got large cuts for <128K prompts in 2024, and modern 2.5 / 3 models introduced “thinking-token” accounting in output pricing.
- Best “history view”: follow the timeline + the before/after tables below (official charts + numbers).
- What you pay today: see the “Current Pricing Snapshot” table (2.5 Pro / 2.5 Flash / 2.5 Flash-Lite + Gemini 3 previews).
- Optimization levers: Batch pricing can halve costs for non-urgent workloads; caching reduces repeated-context costs; grounding has separate per-request fees.
Pricing Timeline (2024–2026)
Google publishes current prices on the official pricing page and announces major changes on the Google Developers Blog. The table below highlights the most important milestones that affected real-world API spend.
| Effective | What changed | Impact | Official reference |
|---|---|---|---|
| May 2024 | Paid pricing published | Baseline token prices for Gemini 1.5 Pro and Gemini 1.5 Flash were published for the paid tier. (Long prompts had a higher priced tier.) | Archived pricing page snapshot |
| Aug 12, 2024 | Flash price drop | Massive reduction for Flash: input down to $0.075 / 1M, output to $0.30 / 1M (for prompts under 128K tokens), plus lower caching rates. | Google Developers Blog (Aug 8, 2024) |
| Oct 1, 2024 | Pro price drop | Gemini 1.5 Pro got major reductions for prompts under 128K (and also lower long-context tier pricing). New under-128K: $1.25 input and $5.00 output per 1M tokens. | Google Developers Blog (Sep 24, 2024) |
| 2026 (today) | 2.5 + 3 preview pricing | Current pricing includes 2.5 family and Gemini 3 previews; some tables explicitly state output price (including thinking tokens). | Current pricing page |
Major Price Drops (Before/After Tables)
These two charts are the cleanest way to “see” the price evolution. They come straight from official Google announcements and show the old vs. new rates for the same model.

Source: developers.googleblog.com
Gemini 1.5 Flash: large price drop effective Aug 12, 2024 (official chart). It shows input/output + caching changes for both ≤128K and >128K tiers.

Source: developers.googleblog.com
Gemini 1.5 Pro: price reduction effective Oct 1, 2024 (official chart). Under 128K prompts dropped from $3.50 → $1.25 input and $10.50 → $5.00 output per 1M tokens.
| Model | Effective | Tier | Input | Output | Source |
|---|---|---|---|---|---|
| Gemini 1.5 Flash | May 2024 | ≤128K tokens | $0.35 | $0.53 | Archived pricing page |
| Gemini 1.5 Flash | May 2024 | >128K tokens | $0.70 | $1.05 | Archived pricing page |
| Gemini 1.5 Flash | Aug 12, 2024 | ≤128K tokens | $0.075 | $0.30 | Google Developers Blog |
| Gemini 1.5 Flash | Aug 12, 2024 | >128K tokens | $0.15 | $0.60 | Google Developers Blog |
| Gemini 1.5 Pro | May 2024 | ≤128K tokens | $3.50 | $10.50 | Archived pricing page |
| Gemini 1.5 Pro | May 2024 | >128K tokens | $7.00 | $21.00 | Archived pricing page |
| Gemini 1.5 Pro | Oct 1, 2024 | ≤128K tokens | $1.25 | $5.00 | Google Developers Blog |
| Gemini 1.5 Pro | Oct 1, 2024 | >128K tokens | $2.50 | $10.00 | Google Developers Blog |
Note: Gemini 1.5 models had an even larger context window (up to 2M tokens) and were frequently repriced as the product matured. For current production work, use the pricing page for today’s supported models.
Current Pricing Snapshot (Gemini 2.5 + Gemini 3 Preview)
Below is a “today view” (token prices per 1M tokens, USD) pulled from the official pricing page. Pay attention to the wording “output price (including thinking tokens)” on newer models.
| Model | Use case | Input price | Output price | Notes |
|---|---|---|---|---|
| Gemini 3 Pro Preview gemini-3-pro-preview |
Most powerful multimodal + agentic workloads | $2.00 (≤200k) $4.00 (>200k) |
$12.00 (≤200k) $18.00 (>200k) (incl. thinking tokens) |
Preview model |
| Gemini 3 Flash Preview gemini-3-flash-preview |
Fast “frontier” model with strong grounding/search | $0.50 (text/image/video) $1.00 (audio) |
$3.00 (text) (incl. thinking tokens) |
Preview model |
| Gemini 2.5 Pro gemini-2.5-pro |
Best for coding + complex reasoning | $1.25 (≤200k) $2.50 (>200k) |
$10.00 (≤200k) $15.00 (>200k) (incl. thinking tokens) |
Batch is discounted |
| Gemini 2.5 Flash gemini-2.5-flash |
High volume, low latency, strong price/perf | $0.30 (text/image/video) $1.00 (audio) |
$2.50 (incl. thinking tokens) |
Batch: $0.15 in / $1.25 out |
| Gemini 2.5 Flash-Lite gemini-2.5-flash-lite |
Cheapest at scale (classification, extraction) | $0.10 (text/image/video) $0.30 (audio) |
$0.40 (incl. thinking tokens) |
Great for “utility” jobs |
Source: Gemini Developer API pricing. (This pricing page also lists image token assumptions like 560 tokens per image for certain pricing calculations.)
Cost Examples (Real Numbers)
Costs scale linearly with tokens. A quick mental model: cost ≈ (input_tokens / 1,000,000) × input_rate + (output_tokens / 1,000,000) × output_rate.
| Scenario | Model | Tokens | Estimated cost |
|---|---|---|---|
| Summarize documents at scale | gemini-2.5-flash | 50k input + 10k output | ≈ (0.05 × $0.30) + (0.01 × $2.50) = $0.04 |
| Complex reasoning / coding assist | gemini-2.5-pro | 20k input + 10k output | ≈ (0.02 × $1.25) + (0.01 × $10.00) = $0.13 |
| Extraction / tagging (cheap utility) | gemini-2.5-flash-lite | 30k input + 3k output | ≈ (0.03 × $0.10) + (0.003 × $0.40) = $0.0042 |
Cost-Saving Levers (Batch, Caching, Grounding)
- Batch pricing: Many models offer a discounted batch tier for non-urgent workloads (great for nightly processing).
- Context caching: If you reuse large system prompts or documents, caching can reduce repeated input cost; you pay caching + storage rates.
- Grounding: Search/Maps grounding is billed separately (per grounded prompt / search query), so treat it as a distinct budget line item.
- Token discipline: Shorten outputs (limits), reduce unnecessary context, and chunk long docs intelligently.
FAQ
Why do “output” costs dominate so often?
Output tokens are typically priced higher than input tokens. For newer “thinking” models, the pricing tables may explicitly state that output pricing includes “thinking tokens”, which can increase total output billing if you allow large reasoning budgets.
How do long prompts affect pricing?
Historically, Gemini applied higher prices above certain context thresholds (e.g., 128K for Gemini 1.5; 200K for several current models). If your prompt crosses the threshold, the request can be billed at the higher tier.
Where can I verify today’s prices?
Always verify against the official pricing page: Gemini Developer API pricing. Major changes are usually announced on the Google Developers Blog.
Conclusion
The most useful way to understand Gemini API costs is to combine history (what changed and when) with the current rate card (what you pay today). Flash saw huge price drops for high-volume workloads in 2024, Pro pricing was reduced significantly for typical prompt sizes, and modern 2.5/3 models provide clear pricing tables including “thinking token” output accounting. If you run Gemini at scale, focus on (1) output control, (2) batch for offline workloads, and (3) caching for repeated context.