A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?

·

Understanding the Evolution of Text-to-SQL with Large Language Models

The transformation of natural language (NL) queries into executable SQL statements—commonly known as Text-to-SQL—has become a cornerstone in democratizing access to relational databases. With the rise of Large Language Models (LLMs), this field has seen unprecedented advancements, enabling non-experts to interact with complex data systems using everyday language. This comprehensive survey explores the current state and future trajectory of Text-to-SQL technologies, focusing on model design, data synthesis, evaluation frameworks, error analysis, and real-world deployment challenges.

Modern Text-to-SQL systems no longer operate as monolithic end-to-end models. Instead, they follow a modular architecture composed of pre-processing, translation, and post-processing stages. This shift reflects the increasing complexity of real-world database environments and the need for fine-grained control over query generation.

👉 Discover how AI is revolutionizing data interaction through intelligent query generation.

Core Components of LLM-Powered Text-to-SQL Systems

Pre-Processing: Enhancing Input Understanding

Before any SQL generation occurs, effective pre-processing modules prepare the input for accurate translation.

Schema Linking

Schema linking identifies relevant database tables and columns based on the natural language query. In the LLM era, three primary strategies dominate:

While ICL shows strong performance, it faces limitations with large schemas due to context length constraints.

Database Content Retrieval

This module extracts specific cell values referenced in the NL query (e.g., “orders placed on May 1st”). Approaches include:

Efficiency remains a key challenge, especially when dealing with dirty or voluminous data.

Additional Information Acquisition

To improve accuracy, models incorporate domain-specific knowledge such as date conventions ("Labor Day" = May 1 in China), unit conversions, or business rules. Two main approaches exist:

Despite their effectiveness, these methods increase token usage and computational cost.

Translation Phase: From Natural Language to SQL

The translation stage is where the actual NL-to-SQL conversion happens. It consists of several interrelated components.

Encoding Strategies

Encoding transforms unstructured text and structured schema into a format suitable for model processing.

Graph-based methods excel in complex joins but require more training data.

Decoding Strategies

Decoding determines how SQL tokens are generated step by step.

The latter is particularly valuable for generating deeply nested queries.

Task-Specific Prompting Techniques

With LLMs, prompt engineering plays a crucial role:

These strategies enhance both accuracy and transparency but can increase latency.

Intermediate Representations

To bridge the gap between free-form NL and rigid SQL syntax, researchers use intermediate representations (IR):

IRs reduce ambiguity and allow models to focus on one aspect at a time.

Post-Processing: Refining the Output

Even advanced models produce imperfect SQL. Post-processing techniques help correct and validate outputs.

SQL Correction

Self-correction modules identify and fix syntax errors. For example, DIN-SQL uses zero-shot prompts to repair faulty queries.

Output Consistency

Self-consistency sampling generates multiple reasoning paths and selects the most frequent valid output, reducing randomness.

Execution-Guided Refinement

By executing candidate queries and analyzing results (e.g., NULL returns), systems can iteratively refine their output. CHESS and CodeS use this feedback loop effectively.

N-Best Reranking

Top-k candidates are re-ranked using a secondary model or execution results, improving final selection accuracy.

👉 Explore how next-generation AI tools are making database queries more intuitive and accurate.

Evaluating Text-to-SQL Performance: Beyond Accuracy Metrics

Accurate evaluation is essential for guiding development and deployment decisions.

Key Evaluation Metrics

Comprehensive Evaluation Frameworks

These frameworks move beyond static benchmarks, offering scenario-based insights.

Error Analysis: Diagnosing Failures to Improve Models

Understanding why models fail is critical for improvement.

A Two-Level Error Taxonomy

We propose a structured approach:

  1. Error Localization: Identifies which SQL component contains the error (e.g., incorrect JOIN condition).
  2. Cause of Error: Determines the root cause:

    • Schema linking failure
    • Misinterpreted database content
    • Missing domain knowledge
    • Logical reasoning gap
    • Syntax violation

This taxonomy helps developers pinpoint weaknesses and target improvements systematically.

Practical Guidance for Building Text-to-SQL Solutions

Roadmap for Optimizing LLMs in Text-to-SQL

Your optimization strategy should depend on two key factors:

Data Privacy

Data Volume

Hardware availability and API budget also influence choices.

Decision Flow for Module Selection

Choose components based on your use case:

ScenarioRecommended ModuleBenefitTrade-off
Complex schemaSchema linkingReduces noise and token costIncreases latency
Dirty or large DBIndex-based content retrievalImproves speedRequires index maintenance
Ambiguous queriesChain-of-thought promptingEnhances reasoningHigher token cost
High accuracy neededExecution-guided refinementFilters invalid queriesSlower response time

Balancing performance, cost, and reliability is key to successful deployment.

Open Challenges and Future Directions

Despite rapid progress, significant hurdles remain:

Open-Domain Text-to-SQL

Current systems assume a single known database. Real-world applications often require querying multiple databases across domains. Challenges include:

Cost-Efficient Solutions

LLMs consume high tokens during inference. Hybrid approaches combining lightweight PLMs with selective LLM calls show promise for reducing costs without sacrificing quality.

Trustworthiness and Debuggability

Users need confidence in generated SQL. Future systems must offer:

These features will be essential for enterprise adoption.

Frequently Asked Questions

What is Text-to-SQL?

Text-to-SQL is the process of converting natural language questions into executable SQL queries. It allows users without technical expertise to retrieve data from relational databases using plain English (or other languages).

Why are Large Language Models important for Text-to-SQL?

LLMs bring emergent reasoning capabilities that enable them to understand complex queries, handle ambiguity, and generate syntactically correct SQL—even with minimal training data—through in-context learning and chain-of-thought prompting.

How do you evaluate a Text-to-SQL system?

Evaluation goes beyond simple accuracy. Use execution accuracy to check result correctness, exact match for structural fidelity, VES for efficiency, and QVT for robustness across paraphrased inputs. Tools like NL2SQL360 provide comprehensive multi-angle assessments.

What causes most errors in Text-to-SQL models?

Common error sources include:

Can Text-to-SQL work without training data?

Yes—through zero-shot prompting with powerful LLMs like GPT-4. However, performance improves significantly with even small amounts of fine-tuning data or well-designed few-shot examples.

What are the limitations of current Text-to-SQL systems?

Key limitations include:

👉 See how cutting-edge AI platforms are addressing these challenges with smarter query generation.

Conclusion

Text-to-SQL has evolved from rule-based parsers to sophisticated LLM-powered systems capable of handling complex, real-world queries. While significant progress has been made, challenges around scalability, efficiency, trustworthiness, and open-domain applicability remain active areas of research. By adopting modular architectures, leveraging hybrid PLM/LLM strategies, and implementing robust evaluation and error analysis practices, developers can build more reliable and accessible data interfaces for the future.