Hi,
Here's my scenario -
I have a set of system prompts that will run as separate inferences with each element of a set of user prompts. The user prompts consist of large PDFs so I want to cache those.
So if I have
System Prompt A, B, C
and
UserPrompt 1, 2, 3
then I will run inferences
SP_A + UP_1
SP_A + UP_2
SP_A + UP_3
SP_B + UP_1
SP_B + UP_2
etc.
As each User Prompt has a large number of tokens I want to cache these.
However I can't seem to be able to just cache the user prompt.
I have
cached_content = client.caches.create(
model=llm_id,
config=CreateCachedContentConfig(
contents=<List of Part.from_uri>,
ttl=PROMPT_CACHE_TTL,
),
)
...
generation_config = GenerateContentConfig(
temperature=params.temperature,
max_output_tokens=params.max_tokens,
system_instruction="You are a helpful assistant."
cached_content=cache_entry_id,
)
generation_response = client.chats.create(
model=params.llm_id,
config=generation_config,
).send_message(message="Summarize the documents")
I get
google.genai.errors.ClientError: 400 INVALID_ARGUMENT. {
'error': {
'code': 400,
'message': 'Tool config, tools and system instruction should not be set in therequest when using cached content.',
'status': 'INVALID_ARGUMENT'
}}
(And there's a typo - "therequest")
Thanks
-Darren
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |