Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Tokenize function/endpoint for PaLM 2 models

Hi all,

I am trying to use PaLM 2 models with the Vertex AI Python SDK, it works as expected with the text-bison model with the predict method.

I am looking for a method to tokenize the text before using the predict method. This is useful to truncate if the input text exceeds the context size. For example, other LLM services such as OpenAI and Cohere have tokenizers.

Is there a way to do that with the Python SDK or another Pythonic way? I could not find any.

Thanks.

2 3 870
3 REPLIES 3

Exact same problem here, i am also looking for a solution

Hi @aliacar,

Welcome and thank you for reaching out to our community.

I understand that you are trying to tokenize your predictions and we appreciate your eagerness to learn it. You are right about not being able to find any reference to this as it is not currently offered by Vertex AI Python SDK specifically for PaLM 2 models.

You can however submit this as a feature request to our Issue tracking system and product feature requests.

Hope this helps.

Hey @lsolatorio thanks for your answer.