In our previous blog posts (1, 2, 3 and 4), we explored how Agent Builder can intelligently answer questions from BigQuery and unstructured data stores via chat or search bot. We demonstrated how it can easily handle queries like “Find me a mall in a city” or “Discuss tourism in India”.
However, things get trickier when we need to address analytical queries that involve aggregations, comparisons, and more complex conditions. Queries like “How many malls have an average rating above 4.0 in Delhi?” necessitate a more sophisticated solution beyond simple keyword matching.
To tackle this challenge, we’ll leverage the power of Webhook, NL2SQL (Natural Language to SQL), and BigQuery, all while using Cloud Run Functions or NGROK to streamline our development process.
The challenge: Handling analytical queries
As mentioned already, while Dialogflow CX excels at understanding and responding to informational queries, analytical queries present a unique challenge.
For example, consider a user asking, “Which city has the maximum number of malls listed?” This requires understanding concepts like COUNT
, MAX
, MIN
, and comparisons like "less than" or "more than," which go beyond simple LIKE
or =
operators.
They often involve:
COUNT(*)
, MAX()
, MIN()
, AVG()
, SUM()
>
, <
, >=
, <=
, !=
GROUP BY
, HAVING
, WHERE
clauses beyond simple equality checks.Basic and traditional Dialogflow CX setups struggle to directly translate these natural language analytical queries into SQL. To address this, we introduce a system that intelligently handles the complexity of analytical queries, providing robust and accurate answers. This is where NL2SQL, generator, and a webhook need to come into play.
Here’s the breakdown of our solution:
We are addressing a beginner-friendly scenario, so we’ll initiate the basic building blocks of the intent-matching system.
When Dialogflow CX detects such an intent, it transitions to a page designed to handle analytical queries. This page triggers a webhook call.
Intent name: analytical_query
General training phrases:
Parameters:
city
: (e.g., Mumbai, Delhi, Bengaluru, Ahmedabad)place
: (e.g., mall, temple, museum)rating
: Any number (e.g., 4.5, 4.0, 3.5)@sys.comparison-operator
: (e.g., greater than, less than, equal to)Important notes:
city
and place
if they are not already covered by system entities.By using these training phrases and parameters, your Dialogflow CX agent will be better equipped to handle a wide range of analytical queries related to your dataset.
2. NL2SQL translation with Code-Bison:
You are a Google BigQuery SQL expert . Given an input question, first create a syntactically correct GoogleSQL BigQuery compatible SQL query to run. Only use the following description and insert them in the final query: project_id = '' dataset_id = '.' table_id = '..' The table_ has two columns 'string_field_0' specifying place and 'string_field_1' and it's description. Question: {$last-user-utterance} Example: ''' SELECT string_field_0 AS place, string_field_1 AS description FROM '..' WHERE LOWER(string_field_0) LIKE '%mall%' '''
SELECT AVG(Av_Rating) FROM `your-project.your-dataset.your-table` WHERE City = 'Mumbai' AND LOWER(Place) LIKE '%mall%';
3. Python Flask webhook:
4. Cloud Run Functions (CRF) and NGROK for network GROKking:
5. BigQuery query execution:
6. Response formulation:
Here’s a glimpse of the BigQuery table structure we’re working with:
Example scenario
Let’s say a user asks, “Which city has the highest average rating?”
SELECT City FROM
`your-table` GROUP BY City ORDER BY AVG(Av_Rating) DESC LIMIT 1
This powerful combination of Dialogflow CX, NL2SQL, BigQuery, and NGROK unlocks a new level of analytical capabilities for your chatbot, providing valuable insights to your users in a seamless and conversational way. Stay tuned for future blog posts where we’ll delve deeper into implementation details and advanced techniques!
The results are shown in form of GIF and screenshots:
1. Preliminary results:
2. A sample error message response when an invalid SQL query is created. Troubleshooting on the right ('list' and 'how many?').
3. A sample 'what' question and all the detailed results:
4. Webhook Configuration:
4(a). Configuration of 'SQL_Webhook' in CRF.
4(b). NGROK Configuration and Troubleshooting when encountering ERROR 50:
5. Generator and prompt configuration:
Ngrok and Cloud Run functions
Ngrok and Cloud Run functions are both tools that can be used to connect and extend cloud services, but they have different purposes:
Ngrok: A programmable network edge that adds connectivity, security, and observability to apps without requiring code changes. You can create a secure tunnel for free with an ngrok account.
Following is the used Ngrok code and configuration:
from email.message import EmailMessage import json import logging import os import smtplib import ssl from flask import Flask, request, jsonify from google.cloud import bigquery # Initialize Flask app app = Flask(__name__) # Set up logging (optional) logging.basicConfig(level=logging.INFO) @app.route('/', methods=['GET', 'POST']) def home(): """Health check endpoint.""" return 'OK', 200 @app.route('/dialogflow', methods=['POST']) # Changed to only accept POST requests def dialogflow_webhook(): """ Webhook for Dialogflow CX to execute SQL queries on BigQuery. """ try: req = request.get_json() # Extract SQL query from request message = req['sessionInfo']['parameters']['$request.generative.code'] query = message.replace("sql", "").replace("\n", " ").replace("```", "").strip() logging.info(f"Received SQL query: {query}") # Log the query # Execute query on BigQuery client = bigquery.Client(project='aniket-tdcx', location="US") query_job = client.query(query) results = query_job.result() # Get query results # Format results for Dialogflow CX (customize as needed) results_list = [str(row) for row in results] # Simple string conversion of rows for i, row in enumerate(results_list): output_str += f"{i+1}. {str(row)}\n\n" # Add numbering and format the row print(row) # Construct the response response = { "fulfillment_response": { "messages": [{"text": {"text": [output_str]}}] } } logging.info(f"Sending response: {response}") # Log the response return jsonify(response) except Exception as e: logging.error(f"An error occurred: {e}") # Construct the response response = { "fulfillment_response": { "messages": [{"text": {"text": ["Error executing query."]}}] } } return jsonify(response), 500 if __name__ == '__main__': app.run(debug=True, host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))
Cloud Run functions (CRF): A tool that allows you to write code to connect and extend cloud services. You can use Cloud Run functions to:
Here are some other features of CRF:
Following is the used CRF code and configuration:
import functions_framework from google.cloud import bigquery @functions_framework.http def hello_http(request): """HTTP Cloud Function. Args: request (flask.Request): The request object. <https://flask.palletsprojects.com/en/1.1.x/api/#incoming-request-data> Returns: The response text, or any set of values that can be turned into a Response object using `make_response` <https://flask.palletsprojects.com/en/1.1.x/api/#flask.make_response>. """ try: req = request.get_json() message = req['sessionInfo']['parameters']['$request.generative.code'] query = str(str(message).replace("sql","").replace("\n"," ").replace("```","").strip()) print(f"Received SQL query: {query}") # Execute the query on BigQuery client=bigquery.Client(project='aniket-tdcx',location="US") result=client.query(query) rows = next(result.result()) # Waits for query to finish rows = result.result() # Waits for query to finish output_str = "" for i, row in enumerate(rows): output_str += f"{i+1}. {str(row)}\n\n" # Add numbering and format the row print(row) # Construct the response response = { "fulfillment_response": { "messages": [{"text": {"text": [output_str]}}] } } return response except Exception as e: # Handle errors gracefully error_message = f"An error occurred while executing the query, retry!: {e}" print(error_message) return {"fulfillment_response": {"messages": [{"text": {"text": [error_message]}}]}}
Generator + webhook architecture:
NL2SQL utility:
NL2SQL is an active area of research. Challenges arise with highly complex natural language queries or intricate database structures. Mitigate this by:
By integrating Dialogflow CX, Code-Bison, BigQuery, and a Python Flask webhook (using NGROK during development), we’ve created a robust system capable of handling complex analytical queries. This significantly enhances the capabilities of Dialogflow CX, enabling it to provide insightful data-driven responses to users. This solution moves beyond simple question-answering, opening doors to sophisticated conversational analytics within your applications. Remember to adapt this architecture and code to your specific needs and chosen NL2SQL library.
In the upcoming blogs, we’ll handle analytical queries using NL2SQL Automated Prompt Design (APD) approaches which is an active research area.
Let’s keep the conversation going! Share your thoughts, questions, and ideas in the comments.
Note: Should you have any concerns or queries about this post or my implementation, please feel free to connect with me on LinkedIn! Thanks!