If I use Context Caching to cache a million tokens, then run n small prompts that use those million tokens in the cache, am i still limited to 2 million tokens per minute?
For e.g. if I cache a million tokens and make 3 prompts that are 500 tokens that reference the cache then execute all 3 parallel, does the 3rd prompt get rate limited?
I assume the above is how it would work if i _was not_ using the cache, I'm curious if the cache changes anything?
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |