Solved: Outdated Gemini Pro image pricing? By tile, or by ...

matvei · 09-28-2024 06:07 AM

I'm trying to understand the pricing for image inputs to Gemini Pro, which seems to be contradictory.

On this page, it's written that the token cost per image is calculated by splitting the image into tiles of 258 tokens each. So a larger image is likely to incur higher token costs than a smaller one. My rough calculations say a maximum sized image would require 16 tiles x 258 tokens/tile = 4,128 tokens.

Here's how tokens are calculated for images:

Gemini 1.0 Pro Vision: Each image accounts for 258 tokens.
Gemini 1.5 Flash and Gemini 1.5 Pro:
- If both dimensions of an image are less than or equal to 384 pixels, then 258 tokens are used.
- If one dimension of an image is greater than 384 pixels, then the image is cropped into tiles. Each tile size defaults to the smallest dimension (width or height) divided by 1.5. If necessary, each tile is adjusted so that it's not smaller than 256 pixels and not greater than 768 pixels. Each tile is then resized to 768x768 and uses 258 tokens.

But here, we see a fixed cost of $0.001315 per image (current Gemini Pro pricing). No mention of tiles or maximum dimensions for that price.

This seems to be contradictory. Anybody know which is the correct info?

AndrewB

Hi @matvei,

Each model has a maximum number of tokens that it can handle in a prompt and response. Knowing the token count of your prompt lets you know whether you've exceeded this limit or not. The token calculations outlined in the first link is useful when calculating or estimating the size of your total prompt, but it is not the cost of the prompt.

Your second link is the pricing of images in the prompt. As an example your billing report would have a line item that would include:

Gemini 1.5 Pro Image Input - Predictions (the SKU), XX images input, total $$ cost for images.

View solution in original post

AndrewB

Hi @matvei,

Each model has a maximum number of tokens that it can handle in a prompt and response. Knowing the token count of your prompt lets you know whether you've exceeded this limit or not. The token calculations outlined in the first link is useful when calculating or estimating the size of your total prompt, but it is not the cost of the prompt.

Your second link is the pricing of images in the prompt. As an example your billing report would have a line item that would include:

Gemini 1.5 Pro Image Input - Predictions (the SKU), XX images input, total $$ cost for images.

HansvanDam

Hi Andrew,

On https://ai.google.dev/pricing#1_5flash
it says the pricing of Gemini-flash is calculated per token, while on
https://cloud.google.com/vertex-ai/generative-ai/pricing
it says it is calculated in characters.

Does this mean that if you include images in the prompt (using parts), your textual part input is calculated using characters, while when keeping it purely textual, it is computed using tokens?

I would be so happy if the response just had a cost element, that we could read out.
Thanks,
Hans

matvei

Thanks for the clarification, Andrew.

shaikhsharmeen4

The confusion arises from the distinction between token calculation and pricing. Here’s a breakdown:

1. Token Calculation: Tokens are used for processing images internally in Gemini 1.5 Flash and Pro. If an image exceeds 384 pixels, it’s split into tiles, and each tile incurs 258 tokens (as per your calculation, larger images require more tiles and hence more tokens).

2. Pricing: Despite the internal token count varying with image size (based on tiles), the pricing is flat at $0.001315 per image for Gemini Pro. The pricing doesn't directly reflect token usage, which is why you see a fixed cost per image regardless of the tiling.

So, while token usage scales with image size (due to tiling), the cost you’re billed remains fixed per image in the current pricing model for Gemini Pro.

Outdated Gemini Pro image pricing? By tile, or by image?