Issue with NL2SQL Query Generation for Category-Ba...

Nikita_G · 03-18-2025 10:43 PM

Hi everyone,

I’m building a chatbot that generates SQL queries using an NL2SQL function to retrieve data from BigQuery tables. My table contains multiple columns related to categories (e.g., level1, level2, level3) and commodities (e.g., tag1, tag2, etc.)

Expected Functionality

For example, if "Software Engineering" is a Level 1 category, and a user asks:

"What is the total spend for Software Engineering?"
The query should automatically search across the category columns (level1, level2, level3) and return the relevant results.

Problem

The NL2SQL-generated query does not correctly provide answers based on category columns.

I attempted prompt engineering, explicitly listing the column names and category values in the prompt. While this improved accuracy, it does not scale well when dealing with a large number of columns.
Manually specifying all column details is impractical, so I am looking for alternative approaches to improve NL2SQL performance.

Looking for Solutions

Is there a way to automatically map user queries to relevant columns without explicitly listing them in the prompt?
Would embedding-based approaches (e.g., vector search for column selection) improve accuracy? I tried using Open Data QnA Pipeline, but when retrieving results for a specific category, it incorrectly returns "data not found."
Are there better techniques for handling this type of scenario in NL2SQL models?

Any insights or suggestions would be greatly appreciated!

Thanks in advance.

playz12

You can improve your NL2SQL model's accuracy by using a hybrid approach: embedding-based column mapping (vector search) to dynamically match user queries to relevant table columns, fuzzy matching (e.g., rapidfuzz) as a fallback for quick similarity checks, and schema-aware prompting to structure queries without explicitly listing all columns. If high accuracy is needed, consider fine-tuning an NL2SQL model on your schema. This approach balances scalability and precision. Hope this helps!

dev9

Okay, what can I expect in the results after implementing each of these? Has your implementation lead to a significant increase in accuracy ? Can you please quote your accuracy before and after this implementations? Thanks