calculating the cost per request

Hyuyu · 02-23-2025 01:13 AM

Based on the pricing table here: https://ai.google.dev/gemini-api/docs/pricing#gemini-1.5-pro

Is this the right logic to calculate the cost based on the usage metadata?

export function calculateGeminiCost(
  usage: UsageMetadata,
//   storageDurationSeconds: number,
  modelLevel: ExtractorModelLevel,
): {
  inputTokenCost: number
  outputTokenCost: number
  cachedTokenCost: number
  // contextStorageCost: number
  totalCost: number
} {
  const model = geminiPricing[modelLevel]

  // Safely handle undefined values with defaults
  const cached = usage.cachedContentTokenCount ?? 0
  const promptTokenCount = usage.promptTokenCount ?? 0
  const candidatesTokenCount = usage.candidatesTokenCount ?? 0
  // const storageMs = storageDurationSeconds ?? 0 // Assumed new field in UsageMetadata

  // Calculate uncached tokens and storage duration in hours
  const uncachedTokens = Math.max(0, promptTokenCount - cached)
  // const storageHours = storageMs / 3_600 // Convert ms to hours

  // Determine pricing tier based on uncached tokens
  const priceTier = uncachedTokens >= model.contextWindow ? 'high' : 'low'

  // Calculate costs
  const inputTokenCost = (uncachedTokens / 1_000_000) * model.input[priceTier]
  const outputTokenCost =
    (candidatesTokenCount / 1_000_000) * model.output[priceTier]
  const cachedTokenCost = (cached / 1_000_000) * model.cached[priceTier]
  // const contextStorageCost = (cached / 1_000_000) * model.storage * storageHours

  const totalCost = inputTokenCost + outputTokenCost + cachedTokenCost

  return {
    inputTokenCost: Number(inputTokenCost.toFixed(6)),
    outputTokenCost: Number(outputTokenCost.toFixed(6)),
    cachedTokenCost: Number(cachedTokenCost.toFixed(6)),
    // contextStorageCost: Number(contextStorageCost.toFixed(6)),
    totalCost: Number(totalCost.toFixed(6)),
  }

There is a discrepancy where google billing is reporting $0.04 and on my end is $0.029, I mean it's only 1 cent different but at this scale it's significant. I want to make sure that I am calculating it right or at least have a very educated guess.

Also, is there a cost for writing the cache? As in repeatedly calling

  const cacheManager = new GoogleAICacheManager(process.env.API_KEY_GEMINI, {})
  const cache = await cacheManager.create({
...

I set the TTL to 3 minutes and since im prompting against the same docs repeatedly, sometimes the cache expires in the middle, and I would call cacheManager.create again to cache it before prompting again, or is it better to put it at a longer duration? Note latency is not an issue but cost is.