Empowering DFCX Chatbots with BigQuery Analytics

aniketagrawal · 02-13-2025 12:34 PM

In our previous blog posts (1, 2, 3 and 4), we explored how Agent Builder can intelligently answer questions from BigQuery and unstructured data stores via chat or search bot. We demonstrated how it can easily handle queries like “Find me a mall in a city” or “Discuss tourism in India”.

However, things get trickier when we need to address analytical queries that involve aggregations, comparisons, and more complex conditions. Queries like “How many malls have an average rating above 4.0 in Delhi?” necessitate a more sophisticated solution beyond simple keyword matching.

To tackle this challenge, we’ll leverage the power of Webhook, NL2SQL (Natural Language to SQL), and BigQuery, all while using Cloud Run Functions or NGROK to streamline our development process.

The challenge: Handling analytical queries

As mentioned already, while Dialogflow CX excels at understanding and responding to informational queries, analytical queries present a unique challenge.

For example, consider a user asking, “Which city has the maximum number of malls listed?” This requires understanding concepts like COUNT, MAX, MIN, and comparisons like "less than" or "more than," which go beyond simple LIKE or = operators.

They often involve:

Aggregate functions: COUNT(*), MAX(), MIN(), AVG(), SUM()
Comparative operators: >, <, >=, <=, !=
Grouping and filtering: GROUP BY, HAVING, WHERE clauses beyond simple equality checks.

Basic and traditional Dialogflow CX setups struggle to directly translate these natural language analytical queries into SQL. To address this, we introduce a system that intelligently handles the complexity of analytical queries, providing robust and accurate answers. This is where NL2SQL, generator, and a webhook need to come into play.

Here’s the breakdown of our solution:

We are addressing a beginner-friendly scenario, so we’ll initiate the basic building blocks of the intent-matching system.

Dialogflow CX intent recognition: We start by training Dialogflow CX to recognize user intents that indicate an analytical query. This involves adding relevant training phrases, parameters, etc. Dialogflow CX’s machine learning expands on your examples, so you don’t need to list every possibility.

When Dialogflow CX detects such an intent, it transitions to a page designed to handle analytical queries. This page triggers a webhook call.

Intent name: analytical_query

General training phrases:

Parameters:

city: (e.g., Mumbai, Delhi, Bengaluru, Ahmedabad)
place: (e.g., mall, temple, museum)
rating: Any number (e.g., 4.5, 4.0, 3.5)
@sys.comparison-operator: (e.g., greater than, less than, equal to)

Important notes:

Annotation is key: Make sure to annotate your training phrases with the appropriate parameters. This is crucial for Dialogflow CX to understand the user’s intent and extract the correct information.
Entity creation: You might need to create custom entities in Dialogflow CX for things like city and place if they are not already covered by system entities.
Regular testing: Test your intent regularly with a variety of phrases to ensure it is performing as expected.

By using these training phrases and parameters, your Dialogflow CX agent will be better equipped to handle a wide range of analytical queries related to your dataset.

2. NL2SQL translation with Code-Bison:

Our DFCX generator utilizes Code-Bison (you can substitute with other NL2SQL libraries or services). Code-Bison receives the user’s natural language query and translates it into the equivalent SQL statement using the following prompt:

You are a Google BigQuery SQL expert . Given an input question, first create a syntactically correct GoogleSQL BigQuery compatible SQL query to run.
Only use the following description and insert them in the final query:
project_id = ''
dataset_id = '.'
table_id = '..'

The table_ has two columns 'string_field_0' specifying place and 'string_field_1' and it's description.
Question: {$last-user-utterance}
Example:
'''
SELECT
string_field_0 AS place,
string_field_1 AS description
FROM
'..'
WHERE
LOWER(string_field_0) LIKE '%mall%'
'''

For example, “What’s the average rating of malls in Mumbai?” might translate to:

SELECT AVG(Av_Rating) FROM `your-project.your-dataset.your-table` WHERE City = 'Mumbai' AND LOWER(Place) LIKE '%mall%';

3. Python Flask webhook:

A Python Flask application serves as our webhook, acting as the bridge between Dialogflow CX and BigQuery.
This webhook receives the generated SQL query from ‘generator’, executes it against BigQuery, and formats the results for delivery back to Dialogflow CX.

4. Cloud Run Functions (CRF) and NGROK for network GROKking:

While developing and testing the webhook, both CRF and NGROK are capable of exposing your local Flask application to the internet, enabling Dialogflow CX (which resides in the Google Cloud) to communicate with it.
This avoids the need to deploy the webhook to a cloud server during the development phase. Remember to replace the webhook URL with your actual deployment URL once your application is live.
This eliminates the need for complex deployments during the development phase and simplifies testing and debugging before deployment.

5. BigQuery query execution:

The Flask webhook uses the Google Cloud BigQuery client library to connect to your BigQuery database and execute the received SQL query.

6. Response formulation:

The webhook processes the BigQuery results, converts them into a user-friendly format (e.g., JSON), and sends the response back to Dialogflow CX.

BigQuery table structure

Here’s a glimpse of the BigQuery table structure we’re working with:

Example scenario

Let’s say a user asks, “Which city has the highest average rating?”

Dialogflow CX recognizes the analytical intent and triggers the webhook.
Generator (Code-Bison) translates this query into: SELECT City FROM `your-table` GROUP BY City ORDER BY AVG(Av_Rating) DESC LIMIT 1
The Flask webhook executes this query in BigQuery.
BigQuery client returns the city with the highest average rating to the Webhook.
Dialogflow CX responds to the user with the city name.

Benefits of this approach

Enhanced user experience: Users can interact with your chatbot using natural language for analytical queries.
Powerful analytics: Leverage the full power of BigQuery for complex data analysis.
Simplified development: NGROK streamlines webhook development and testing.
Scalability: BigQuery handles large datasets efficiently, ensuring your chatbot can scale as your data grows.

This powerful combination of Dialogflow CX, NL2SQL, BigQuery, and NGROK unlocks a new level of analytical capabilities for your chatbot, providing valuable insights to your users in a seamless and conversational way. Stay tuned for future blog posts where we’ll delve deeper into implementation details and advanced techniques!

Chatbot results

The results are shown in form of GIF and screenshots:

1. Preliminary results:

2. A sample error message response when an invalid SQL query is created. Troubleshooting on the right ('list' and 'how many?').

3. A sample 'what' question and all the detailed results:

4. Webhook Configuration:

4(a). Configuration of 'SQL_Webhook' in CRF.

4(b). NGROK Configuration and Troubleshooting when encountering ERROR 50:

5. Generator and prompt configuration:

Ngrok and Cloud Run functions

Ngrok and Cloud Run functions are both tools that can be used to connect and extend cloud services, but they have different purposes:

Ngrok: A programmable network edge that adds connectivity, security, and observability to apps without requiring code changes. You can create a secure tunnel for free with an ngrok account.

Following is the used Ngrok code and configuration:

from email.message import EmailMessage
import json
import logging
import os
import smtplib
import ssl

from flask import Flask, request, jsonify
from google.cloud import bigquery

# Initialize Flask app
app = Flask(__name__)

# Set up logging (optional)
logging.basicConfig(level=logging.INFO)

@app.route('/', methods=['GET', 'POST'])
def home():
   """Health check endpoint."""
   return 'OK', 200

@app.route('/dialogflow', methods=['POST'])  # Changed to only accept POST requests
def dialogflow_webhook():
   """
   Webhook for Dialogflow CX to execute SQL queries on BigQuery.
   """
   try:
       req = request.get_json()
      
       # Extract SQL query from request
       message = req['sessionInfo']['parameters']['$request.generative.code']
       query = message.replace("sql", "").replace("\n", " ").replace("```", "").strip()
       logging.info(f"Received SQL query: {query}")  # Log the query

       # Execute query on BigQuery
       client = bigquery.Client(project='aniket-tdcx', location="US")
       query_job = client.query(query)
       results = query_job.result()  # Get query results

       # Format results for Dialogflow CX (customize as needed)
       results_list = [str(row) for row in results]  # Simple string conversion of rows

       for i, row in enumerate(results_list):
           output_str += f"{i+1}. {str(row)}\n\n"  # Add numbering and format the row
           print(row)

       # Construct the response
       response = {
           "fulfillment_response": {
               "messages": [{"text": {"text": [output_str]}}]
           }
       }
       logging.info(f"Sending response: {response}")  # Log the response
return jsonify(response)


   except Exception as e:
       logging.error(f"An error occurred: {e}")

       # Construct the response
       response = {
           "fulfillment_response": {
               "messages": [{"text": {"text": ["Error executing query."]}}]
           }
       }
return jsonify(response), 500

if __name__ == '__main__':
   app.run(debug=True, host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))

Cloud Run functions (CRF): A tool that allows you to write code to connect and extend cloud services. You can use Cloud Run functions to:

Respond to events like file uploads, log changes, or incoming messages
Connect Google Cloud products to one another or third-party services
Run functions across multiple environments, including Cloud Run, local development environments, and on-premises

Here are some other features of CRF:

Simplified developer experience: You can write code and let Google Cloud handle the operational infrastructure.
Pay only for what you use: You’re only billed for your function’s execution time, and you pay nothing when your function is idle.
Seamless authentication: Cloud Run functions have access to service account credentials and are authenticated with most Google Cloud services.

Following is the used CRF code and configuration:

import functions_framework
from google.cloud import bigquery

@functions_framework.http
def hello_http(request):
   """HTTP Cloud Function.
   Args:
       request (flask.Request): The request object.
       <https://flask.palletsprojects.com/en/1.1.x/api/#incoming-request-data>
   Returns:
       The response text, or any set of values that can be turned into a
       Response object using `make_response`
       <https://flask.palletsprojects.com/en/1.1.x/api/#flask.make_response>.
   """

   try:
       req = request.get_json()
       message = req['sessionInfo']['parameters']['$request.generative.code']
       query = str(str(message).replace("sql","").replace("\n"," ").replace("```","").strip())
       print(f"Received SQL query: {query}")
      
       # Execute the query on BigQuery
       client=bigquery.Client(project='aniket-tdcx',location="US")
       result=client.query(query)
       rows = next(result.result()) # Waits for query to finish
       rows = result.result() # Waits for query to finish

       output_str = ""

       for i, row in enumerate(rows):
           output_str += f"{i+1}. {str(row)}\n\n"  # Add numbering and format the row
           print(row)

       # Construct the response
       response = {
           "fulfillment_response": {
               "messages": [{"text": {"text": [output_str]}}]
           }
       }
       return response

   except Exception as e:
       # Handle errors gracefully
       error_message = f"An error occurred while executing the query, retry!: {e}"
       print(error_message)
       return {"fulfillment_response": {"messages": [{"text": {"text": [error_message]}}]}}

Use cases

Generator + webhook architecture:

Trigger a mail to an agent: You can use a generator to summarize the conversation ($conversation) and trigger a mail via webhook under sys.no-match should the agent not be able to answer. Alternatively, you can refer to the internet via the Google Search API.

NL2SQL utility:

Real-time sales analytics: A sales executive asks, “What was the total revenue from the Western region in the past quarter?”
Proactive inventory management: A warehouse manager asks, “Which products have less than 20 units remaining in stock?”
Data-driven customer support: A support agent asks, “Show me all open tickets from high priority customers created in the last 24 hours.”

Challenges and best practices

NL2SQL is an active area of research. Challenges arise with highly complex natural language queries or intricate database structures. Mitigate this by:

Starting with a well-defined scope of supported analytical query types.
Training your NL2SQL model on your dataset and common question patterns.
Iterative refinement: Continuously monitor and evaluate how your NL2SQL model translates user queries. Use this feedback to improve your model and Dialogflow CX generator prompt.
Clear user guidance: Set expectations by providing examples of supported query types.
Error handling: Implement graceful error handling in your webhook for cases when query translation fails. Offer the user alternative ways to get the needed information.
Visualizations: Consider enhancing the user experience by generating simple charts or graphs based on the BigQuery results when appropriate.

Conclusion

By integrating Dialogflow CX, Code-Bison, BigQuery, and a Python Flask webhook (using NGROK during development), we’ve created a robust system capable of handling complex analytical queries. This significantly enhances the capabilities of Dialogflow CX, enabling it to provide insightful data-driven responses to users. This solution moves beyond simple question-answering, opening doors to sophisticated conversational analytics within your applications. Remember to adapt this architecture and code to your specific needs and chosen NL2SQL library.

In the upcoming blogs, we’ll handle analytical queries using NL2SQL Automated Prompt Design (APD) approaches which is an active research area.

References

Create BigQuery Table: Load the CSV data from a Cloud Storage Bucket
Processed Kaggle dataset as a CSV data source (please request access!):
Source: https://docs.google.com/spreadsheets/d/1WXm2I0KWrfsKpI8xpyYQ7WeR-nA8xgrSExfpS045HQY/edit?gid=0#gid=... (Please request access!)
https://cloud.google.com/dialogflow/cx/docs/concept/intent

Let’s keep the conversation going! Share your thoughts, questions, and ideas in the comments.

Note: Should you have any concerns or queries about this post or my implementation, please feel free to connect with me on LinkedIn! Thanks!