Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Using ML.generate_text with tabular data

Hi, 

I am trying to use ml.generate_text for summarizing a data table hosted on big query.

Basically, the data table has the following columns :

>labels : total_visitors, total_visits....

>countrys: nigeria, france...

>weekly_delta : numerical weekly variation for every (label, country) 

I woud like generate_text call (gemini pro as remote model) to summarize the weekly performance analyzing the data table. 

But i did not find any easy and efficient way to do it with Big Query ML.

Does Big Query ML support whole data table as input? 

Do i have to switch on another more approriate GCP/AI product ? 

Thks

1 REPLY 1

Hi @NotoriousRom,

Welcome to Google Cloud Community!

BigQuery ML's ml.generate_text function isn't designed to handle entire data tables as input. It's intended for generating text based on a prompt rather than analyzing and summarizing structured data.

Here's how you can approach this problem, along with some potential solutions:

1. Data Preparation:

  • Aggregation: Before using the text generation model, aggregate the weekly delta values. This could involve calculating averages, sums, or medians for each (label, country) pair.
  • Formatting for Text Generation: Organize the aggregated data into an understandable format. For example:
    • Tables: Create tables that show the average weekly delta for each label and country.
    • Textual Summaries: Write descriptive sentences summarizing trends for each label across different countries.

2. Using BigQuery ML with Text Function:

  • Function Limitations: While ml.generate_text can’t directly summarize data tables, you can use it to generate text based on the aggregated data.
  • Approach: Feed the aggregated data into a structured table and use a prompt like: "Summarize the weekly performance of visitors in different countries using the provided data."
  • Manual Input: You’ll need to manually create the prompt and input the data for this function.

3. Exploring Alternative GCP/AI Products:

  • Vertex AI: This platform offers advanced machine learning capabilities, including:

1. Pre-trained Models: Use pre-trained text generation models such as text-davinci-003 (from OpenAI) or flan-t5-xl (from Google) optimized for summarization tasks. Import these models into Vertex AI for inference.

2. Custom Models: Train custom text summarization models using Vertex AI's training tools if you need more precise control.

  • Dataflow: Use Dataflow for batch data processing to convert your BigQuery data into a format suitable for text generation models.
  • Cloud Functions: Deploy functions that interact with other Google Cloud Platform services. For instance, you can create a Cloud Function to trigger text generation from a Vertex AI model post-data processing.

To summarize, while BigQuery ml.generate_text feature may not be ideal for directly summarizing data tables, integrating it with appropriate data preparation techniques and leveraging additional Google Cloud Platform or Artificial Intelligence resources, such as Vertex AI, can significantly enhance the quality of the generated text.

I hope the above information is helpful.