VertexAI API | Output token exhausted

udayk · 11-14-2023 10:11 PM

I want to translate a long input text using chat-bison model. But in the output I don't get complete result. My understanding is this happens because of 1k token limit for output in chat-bison model.

Is there any way to get complete result through VertexAI in one shot? I am able to recall that there's some native capability in VertexAI API/Palm2 model to solve problem like this.

quangtrunghuynh

es, there are a couple of ways to get the complete translation result through Vertex AI in one shot.

sing Palm2 model:

Another way to get the complete translation result is to use the Palm2 model. This model is a large language model that can translate text up to 10,000 tokens long.

To use Palm2 model, you will need to create a Vertex AI deployment that uses the model. Then, you can use the PredictionService class to translate text.

Here is an example of how to use Palm2 model to translate a long input text:

Using chunking:

import google.cloud.aiplatform as aiplatform

def translate_long_text(input_text):
# Create a Vertex AI endpoint for the Palm2 model
endpoint = aiplatform.Endpoint.list(
project="YOUR_PROJECT",
display_name="palm2-endpoint",
)[0]

# Create a PredictionService object for the endpoint
prediction_service = aiplatform.PredictionService(endpoint=endpoint)

# Send the input text to the translation API
response = prediction_service.predict(instances={'text': input_text})

# Extract the translated text from the response
translated_text = response['predictions'][0]['text']
return translated_text

input_text = "This is a long input text that needs to be translated."
translated_text = translate_long_text(input_text)
print(translated_text)

I hope this helps!

One way to get the complete translation result is to chunk the input text into smaller pieces and translate each piece separately. Then, you can concatenate the translated pieces together to get the complete translation.

To do this, you can use the batch_predict method of the PredictionService class. This method takes a list of input texts and returns a list of translated texts.

Here is an example of how to use chunking to translate a long input text:

import requests

def translate_chunked(input_text, chunk_size=1000):
translated_text = ""
chunks = input_text.split("\n")
for chunk in chunks:
if len(chunk) > chunk_size:
translated_chunk = translate(chunk[:chunk_size])
translated_text += translated_chunk + "\n"
chunk = chunk[chunk_size:]
else:
translated_chunk = translate(chunk)
translated_text += translated_chunk + "\n"
return translated_text

def translate(input_text):
# Send the input text to the translation API
response = requests.post('https://translate.google.com/v2/translate',
json={'q': input_text, 'target': 'en', 'source': 'auto'})
# Extract the translated text from the response
translated_text = response.json()['data']['translations'][0]['translatedText']
return translated_text

input_text = "This is a long input text that needs to be translated."
translated_text = translate_chunked(input_text)
print(translated_text)

udayk

Pls help me creating a Vertex AI deployment that uses the Palm2 model. I was going through this - https://www.googlecloudcommunity.com/gc/AI-ML/vertex-AI-model-deployment/m-p/640532 which says this is not possible right now.

quangtrunghuynh

Hi ,

Prerequisites:

Google Cloud Platform (GCP) Account: You'll need a GCP account with access to the Vertex AI service.
PaLM 2 Model Access: Ensure you have access to the PaLM 2 model, either through a pre-trained model or by fine-tuning your own model.

Steps:

Create a Vertex AI Project: Set up a new Vertex AI project in the GCP console. This will provide the necessary infrastructure for your deployment.
Upload the PaLM 2 Model: Upload the PaLM 2 model to Vertex AI. This can be done using the Vertex AI UI or the Vertex AI SDK.
Create a Vertex AI Endpoint: Create a Vertex AI endpoint to serve the PaLM 2 model. This endpoint will expose the model's capabilities to your applications.
Configure Endpoint Deployment: Configure the endpoint deployment settings, including the model version, machine type, and access control.
Deploy the Endpoint: Deploy the endpoint to make it available for use. This will make the PaLM 2 model accessible through the endpoint.
Test the Deployment: Test the deployment by sending requests to the endpoint. Use the Vertex AI UI or the Vertex AI SDK to interact with the deployed model.

Here's an example of how to deploy the PaLM 2 model using the Vertex AI UI:

Navigate to Vertex AI: In the GCP console, go to the "AI & Machine Learning" section and select "Vertex AI."
Create a Model: Click on "Models" and then "Create Model." Select "Upload" and choose the PaLM 2 model file.
Deploy the Model: Click on "Deploy" and select "Real-time prediction." Choose an endpoint type (HTTP, gRPC, or custom) and configure the deployment settings.
Test the Deployment: Click on "Test" to send a test request to the endpoint. Verify that the model returns the expected response.

Once the deployment is successful, you can integrate the endpoint into your applications to utilize the PaLM 2 model's capabilities.