In this article, we will explore the application of Vanna.ai, a Python library specifically designed for training a model capable of processing natural language questions and generating SQL queries as responses.
In this article, we will explore the application of Vanna.ai, a 🐍 Python library specifically designed for training a model capable of processing natural language questions and generating SQL queries as responses. This implementation will be integrated into a ✨Streamlit application, creating a 🤖chatbot that facilitates posing questions and provides explanations for the returned queries.
Text-to-SQL
is a tool that utilizes models to translate natural language queries into SQL queries, aiming to make it easy for users to generate SQL queries and interact with databases seamlessly.
The implementation of Text-to-SQL
can be achieved through the use of Vanna.AI, an open-source 🐍Python library that allows the training of an RAG model with queries, DDL, and documentation from a database.
The following are the possible outputs that can be generated with Vanna methods, facilitating diverse ways of interacting with the database using natural language.
Output | Description |
---|---|
📄 SQL | Vanna.AI can generate SQL queries from natural language questions. These SQL queries can be used to interact with a database. |
📁 DataFrame | After generating an SQL query, Vanna.AI can execute it in the database and return the results as a pandas DataFrame. |
📊 Charts | Vanna.AI can generate plots using Plotly from the results of the SQL query. |
❓ Follow-up questions | Vanna.AI can generate follow-up questions based on the generated SQL query. These follow-up questions can help users refine their queries or explore the data in more detail. |
🔍 Explanations queries | Vanna.AI can provide explanations for the generated SQL queries. These explanations can help users understand how their natural language question is interpreted into an SQL query. |
The following table contains the key features available with Vanna.AI, enriching data interaction capabilities:
Feature | Description |
---|---|
🚀 Model Training | Train the RAG model on data sources, including Data Definition Language (DDL) statements, documentation, and SQL queries during the training process. |
🤖 User Query Handling | Users can pose questions in natural language, and Vanna.AI responds by generating SQL queries. |
📚 Documentation | Extensive documentation, a dedicated website, and a support community on Discord are available to offer comprehensive assistance. |
🔌 Database Connections | Vanna allows connecting to multiple databases, enabling users not only to retrieve SQL queries but also to execute them by establishing connections to their respective databases. |
🤔 AI-Generated Recommendation Questions | This framework includes a feature of generating AI-driven questions, offering suggestions for additional queries that the user could explore. |
For this initial implementation, we will leverage an example provided by vanna.ai, using the Chinook database. This database includes tables and data relevant to a music store, encompassing information about artists, albums, tracks, customers, orders, and various aspects associated with the management of an online music store.
!pip install vanna
Follow these steps to implement a straightforward example of text-to-SQL:
API Key Retrieval: To initiate this example, acquire an API key by registering at https://vanna.ai/. Utilize your registered email to obtain the key.
Setting API Key and Model: Configure the API key obtained and specify the model to be used, in this instance, the pre-existing “chinook” model
Connecting to the Database: Establish a connection with the database, utilizing an SQLite file available at https://vanna.ai/Chinook.sqlite.
Asking a Question: Finally, pose a natural language question to extract specific information from the database. In this step, the result includes the SQL query, the DataFrame from the query execution, and a representative chart. The system also generates follow-up questions based on the dataset.
import vanna as vn
# STEP 01: This is a simple example of how to use the Vanna API
= vn.get_api_key('your_email')
api_key
# Set the API key and the model
vn.set_api_key(api_key)
# STEP 02: Set the model
'chinook')
vn.set_model(
# STEP 03: Connect with the database
'https://vanna.ai/Chinook.sqlite')
vn.connect_to_sqlite(
# STEP 04: Ask a question
'What are the top 10 artists by sales?') vn.ask(
📃sql query
SELECT a.name,
sum(il.quantity) as totalsales
FROM artist a
INNER JOIN album al
ON a.artistid = al.artistid
INNER JOIN track t
ON al.albumid = t.albumid
INNER JOIN invoiceline il
ON t.trackid = il.trackid
GROUP BY a.name
ORDER BY totalsales desc limit 10;
📊plotly chart
❓New Questions
AI-generated follow-up questions:
* Who is the artist with the highest sales?
* What are the total sales for each artist?
* Which genre has the highest sales?
* Can you provide a breakdown of sales by album?
* Which artist has the lowest sales?
* What is the average sales per artist?
* Can you provide a list of the top-selling albums?
* Which genre has the least sales?
* Can you provide a breakdown of sales by country?
* What is the total sales for each genre?
To train your own model follow the following steps: 1. Log in to your account https://vanna.ai/ and create a new model. 2. Next we will define how to train the model. In our next example we will use ddl (data definition language), documentation and queries.
# Check the models available in the account
vn.get_models()
['ecommerce-test', 'demo-tpc-h', 'tpc', 'chinook', 'thelook']
# Set the model
"ecommerce-test") vn.set_model(
# Get the ddl for training the model
# Train the model with the ddl
= """
ddl CREATE TABLE if not exists stage.customers(
customer_id INT NOT NULL PRIMARY KEY,
email_address VARCHAR(50) NOT NULL,
name VARCHAR(50) NOT NULL,
business_type_id INT NOT NULL,
site_code VARCHAR(10) NOT NULL,
archived BOOLEAN NOT NULL,
is_key_account BOOLEAN NOT NULL,
date_updated TIMESTAMP NOT NULL,
date_created TIMESTAMP NOT NULL,
job_created_date TIMESTAMP WITH TIME ZONE DEFAULT
CURRENT_TIMESTAMP,
job_created_user varchar(50) default null,
job_updated_date TIMESTAMP default null,
job_updated_user varchar(50) default null,
CONSTRAINT fk_business_type_id FOREIGN KEY(business_type_id) REFERENCES stage.business_types (business_type_id)
);
"""
=ddl) vn.train(ddl
In my repository, you can find all the scripts, documentation, and queries to train the model and answer questions such as the following
# Ask a question for generating the SQL
= """What is the total count of new clients who registered between October 1, 2020, and
question January 1, 2022, and have made more than 10 purchases, each exceeding $20? Additionally,
could you provide their email addresses, the number of purchases made, and the date of their
most recent purchase?"""
print(vn.generate_sql(question=question))
SELECT COUNT(*) AS total_count,
c.email_address,
COUNT(o.order_id) AS num_purchases,
MAX(o.order_date) AS most_recent_purchase_date
FROM Customers c
JOIN Orders o ON c.customer_id = o.customer_id
WHERE c.registration_date >= '2020-10-01' AND c.registration_date <= '2022-01-01'
AND o.order_value > 20
GROUP BY c.email_address
HAVING COUNT(o.order_id) > 10;
In this section, we will implement a 🤖chatbot application using text-to-SQL capabilities with ✨Streamlit. This app will be developed through the integration of Vanna.AI and ✨Streamlit, providing a user-friendly interface for entering your username, selecting an avatar, and initiating a 💬chat.
src\db\
src\.streamlit\secrets.toml
Detailed instructions on how to run the application and add credentials can be found in the repository’s
README.md
.
The application, crafted with Vanna.AI and ✨Streamlit, and you can see below a video of how it works, remember that all the explanations are in the readme.md
file of the repository.
Feel free to check it out and give it a star if you find it helpful!
⭐️
If you want to learn…
Other references:
⚠️ Disclaimer: Before proceeding with the use of your own data in the Vanna.AI application, it is recommended to review and validate the data protection policies established by Vanna.AI. Ensure a thorough understanding of how your personal data is handled and protected within the application framework. It is crucial to comply with privacy and security regulations before providing any sensitive information.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Mendez (2024, Jan. 13). Romina Mendez: SQL Assistant: Text-to-SQL Application in Streamlit 🤖. Retrieved from https://r0mymendez.github.io/posts_en/2024-01-13-sql-assistant-text-to-sql-application-in-streamlit/
BibTeX citation
@misc{mendez2024sql, author = {Mendez, Romina}, title = {Romina Mendez: SQL Assistant: Text-to-SQL Application in Streamlit 🤖}, url = {https://r0mymendez.github.io/posts_en/2024-01-13-sql-assistant-text-to-sql-application-in-streamlit/}, year = {2024} }