Re: LLM Evaluation

Rajavelu · 03-04-2025 05:15 PM

Hi All,

How can we validated the playbook produced outputs?

I created a playbook with tools in agent builder and I would like to test it and evaluate certain metrics like accuracy, relevance, response etc ...along with other metrics.

I know we can do human validations. Is there any other tool or service or best practices/Standard practice?

Thanks,

ruthseki

Hi Rajavelu,

Welcome to Google Cloud Community!

You may try example-based validation by providing various example user inputs and expected responses within the playbook to train the agent on different scenarios, allowing the system to compare generated responses against these examples to assess accuracy.

Another option is to utilize the Test Cases functionality. In the Test Cases section you can create scenarios with specific user inputs and expected outcomes, enabling you to test the playbook against a variety of conditions.

Also be mindful that:

When your playbook relies on a tool to inform its response, an empty tool result can lead to unpredictable playbook behavior. Sometimes, the playbook AI Generator will hallucinate information in a response in lieu of a tool result. To prevent this, you can add specific instructions to ensure the playbook AI Generator doesn't attempt to answer on its own.

For more guidelines to build robust agents, you may refer to this playbook best practices.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.