If it looks like a table, and quacks like SQL...
TL;DR: Allow LLMs to interface (safely) with arbitrary data via SQLite syntax
LLMs are great at writing code, including SQL, to accomplish tasks. Using SQL as a data description and querying format allows a perfect blend of flexibility to service novel requests, while still being able to enforce "bounds" on the LLM.
There are problems with opening up an existing API or app to this kind of interface though:
- LLMs are susceptible to prompt injection, so if you're allowing 1:1 access to a database, you could run into a data-loss or data-injection scenario.
- You may need to restrict access to a slice of data (accounts matching some ID, for example)
- You may not have direct access to a database, but perhaps you have some rest apis, and want the power of SQL
This project is an attempt to solve the above generally in a way that is easy to "glue in" to existing APIs or data layers.
DuckQL operates as the glue between a BackingStore
interface and raw SQLite queries. Depending on the
kind of data you are accessing, you might choose to use a different type of backing store. Below, we can
demonstrate the concept with a SliceFilter
, which allows querying a slice of structs as if it were a
SQLite database.
First, point duckql
at your "schema" structs, and export the resulting DDL to your llm of choice:
type User struct {
ID string
Email string
Name string `ddl:"comment='First and last name of the user (space separated)'"`
PasswordHash []byte `ddl:"-"`
}
...
s := duckql.Initialize(&User{})
s.SetPermissions(duckql.AllowSelectStatements)
ddl := s.DDL()
// Should result in:
//
// CREATE TABLE users
// (
// id TEXT,
// email TEXT,
// name TEXT, -- First and last name of the user (space separated)
// )
In the above snippet, we call SetPermissions()
to restrict which types of SQL queries are allowed.
Fields which should not be accessible by an LLM can be marked as private with a 'ddl' tag value of "-".
Comments can be set via the comment field. These are purely to help the LLM understand the schema.
Construct your prompt with the DDL to explain to the model how it can query information, and have it
write a query. The next step is running the query against your data. We initialize our SliceFilter
backing store with some hardcoded data, which we can then execute against:
...
s.SetBacking(duckql.NewSliceFilter(
s, []any{
&User{
ID: "a",
Email: "a@gmail.com",
Name: "Cindy Lou",
},
&User{
ID: "b",
Email: "b@aol.com",
Name: "Bob Bert",
},
}
))
result, err := s.Execute("select name from users where email like '%aol.com")
if err != nil {
panic(err)
}
fmt.Println(result.String())
// Bob Bert
In the above, duckql
will do the following:
- Check that users is a valid table that it knows about
- Check that the "name" field is valid on the users table
- Match the where clause
- Only copy selected fields into return value
go get github.com/dburkart/duckql
You can find more examples in the ./examples directory. Examples so far:
- ./examples/slice: Uses a SliceFilter backing store to demonstrate using DuckQL to "emulate" a SQLite database using preset data.
- ./examples/sqlite: Uses the SQLite backing store to demonstrate how to use DuckQL as a "permission" layer on top of a SQLite database.
- ./examples/sheets: Use a Google Sheets backing store to demonstrate how to use DuckQL to "emulate" a SQL database on top of a google sheet.