Based on the pricing table here: https://ai.google.dev/gemini-api/docs/pricing#gemini-1.5-pro
Is this the right logic to calculate the cost based on the usage metadata?
export function calculateGeminiCost(
usage: UsageMetadata,
// storageDurationSeconds: number,
modelLevel: ExtractorModelLevel,
): {
inputTokenCost: number
outputTokenCost: number
cachedTokenCost: number
// contextStorageCost: number
totalCost: number
} {
const model = geminiPricing[modelLevel]
// Safely handle undefined values with defaults
const cached = usage.cachedContentTokenCount ?? 0
const promptTokenCount = usage.promptTokenCount ?? 0
const candidatesTokenCount = usage.candidatesTokenCount ?? 0
// const storageMs = storageDurationSeconds ?? 0 // Assumed new field in UsageMetadata
// Calculate uncached tokens and storage duration in hours
const uncachedTokens = Math.max(0, promptTokenCount - cached)
// const storageHours = storageMs / 3_600 // Convert ms to hours
// Determine pricing tier based on uncached tokens
const priceTier = uncachedTokens >= model.contextWindow ? 'high' : 'low'
// Calculate costs
const inputTokenCost = (uncachedTokens / 1_000_000) * model.input[priceTier]
const outputTokenCost =
(candidatesTokenCount / 1_000_000) * model.output[priceTier]
const cachedTokenCost = (cached / 1_000_000) * model.cached[priceTier]
// const contextStorageCost = (cached / 1_000_000) * model.storage * storageHours
const totalCost = inputTokenCost + outputTokenCost + cachedTokenCost
return {
inputTokenCost: Number(inputTokenCost.toFixed(6)),
outputTokenCost: Number(outputTokenCost.toFixed(6)),
cachedTokenCost: Number(cachedTokenCost.toFixed(6)),
// contextStorageCost: Number(contextStorageCost.toFixed(6)),
totalCost: Number(totalCost.toFixed(6)),
}
There is a discrepancy where google billing is reporting $0.04 and on my end is $0.029, I mean it's only 1 cent different but at this scale it's significant. I want to make sure that I am calculating it right or at least have a very educated guess.
Also, is there a cost for writing the cache? As in repeatedly calling
const cacheManager = new GoogleAICacheManager(process.env.API_KEY_GEMINI, {})
const cache = await cacheManager.create({
...
I set the TTL to 3 minutes and since im prompting against the same docs repeatedly, sometimes the cache expires in the middle, and I would call cacheManager.create again to cache it before prompting again, or is it better to put it at a longer duration? Note latency is not an issue but cost is.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |