Nothing Special   »   [go: up one dir, main page]

Skip to content

Vector database plugin for Postgres, written in Rust, specifically designed for LLM

License

Notifications You must be signed in to change notification settings

silver-ymz/pgvecto.rs

Β 
Β 

Repository files navigation

pgvecto.rs

discord invitation link trackgit-views all-contributors

pgvecto.rs is a Postgres extension that provides vector similarity search functions. It is written in Rust and based on pgrx.

Comparison with pgvector

Checkout pgvecto.rs vs pgvector for more details.

Feature pgvecto.rs pgvector
Filtering Introduces VBASE method for vector search and relational query (e.g. Single-Vector TopK + Filter + Join). When filters are applied, the results may be incomplete. For example, if you originally intended to limit the results to 10, you might end up with only 5 results with filters.
Vector Dimensions Supports up to 65535 dimensions. Supports up to 2000 dimensions.
SIMD SIMD instructions are dynamically dispatched at runtime to maximize performance based on the capabilities of the specific machine. Added CPU dispatching for distance functions on Linux x86-64" in 0.7.0.
Data Types Introduces additional data types: binary vectors, FP16 (16-bit floating point), and INT8 (8-bit integer). -
Indexing Handles the storage and memory of indexes separately from PostgreSQL Relies on the native storage engine of PostgreSQL
WAL Support Provides Write-Ahead Logging (WAL) support for data, index support is working in progress. Provides Write-Ahead Logging (WAL) support for index and data.

Quick start

For new users, we recommend using the Docker image to get started quickly.

docker run \
  --name pgvecto-rs-demo \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -p 5432:5432 \
  -d tensorchord/pgvecto-rs:pg16-v0.2.1

Then you can connect to the database using the psql command line tool. The default username is postgres, and the default password is mysecretpassword.

psql -h localhost -p 5432 -U postgres

Run the following SQL to ensure the extension is enabled.

DROP EXTENSION IF EXISTS vectors;
CREATE EXTENSION vectors;

pgvecto.rs introduces a new data type vector(n) denoting an n-dimensional vector. The n within the brackets signifies the dimensions of the vector.

You could create a table with the following SQL.

-- create table with a vector column

CREATE TABLE items (
  id bigserial PRIMARY KEY,
  embedding vector(3) NOT NULL -- 3 dimensions
);

Tip

vector(n) is a valid data type only if $1 \leq n \leq 65535$. Due to limits of PostgreSQL, it's possible to create a value of type vector(3) of $5$ dimensions and vector is also a valid data type. However, you cannot still put $0$ scalar or more than $65535$ scalars to a vector. If you use vector for a column or there is some values mismatched with dimension denoted by the column, you won't able to create an index on it.

You can then populate the table with vector data as follows.

-- insert values

INSERT INTO items (embedding)
VALUES ('[1,2,3]'), ('[4,5,6]');

-- or insert values using a casting from array to vector

INSERT INTO items (embedding)
VALUES (ARRAY[1, 2, 3]::real[]), (ARRAY[4, 5, 6]::real[]);

We support three operators to calculate the distance between two vectors.

  • <->: squared Euclidean distance, defined as $\Sigma (x_i - y_i) ^ 2$.
  • <#>: negative dot product, defined as $- \Sigma x_iy_i$.
  • <=>: cosine distance, defined as $1 - \frac{\Sigma x_iy_i}{\sqrt{\Sigma x_i^2 \Sigma y_i^2}}$.
-- call the distance function through operators

-- squared Euclidean distance
SELECT '[1, 2, 3]'::vector <-> '[3, 2, 1]'::vector;
-- negative dot product
SELECT '[1, 2, 3]'::vector <#> '[3, 2, 1]'::vector;
-- cosine distance
SELECT '[1, 2, 3]'::vector <=> '[3, 2, 1]'::vector;

You can search for a vector simply like this.

-- query the similar embeddings
SELECT * FROM items ORDER BY embedding <-> '[3,2,1]' LIMIT 5;

A simple Question-Answering application

Please check out the Question-Answering application tutorial.

Half-precision floating-point

vecf16 type is the same with vector in anything but the scalar type. It stores 16-bit floating point numbers. If you want to reduce the memory usage to get better performance, you can try to replace vector type with vecf16 type.

Roadmap πŸ—‚οΈ

Please check out ROADMAP. Want to jump in? Welcome discussions and contributions!

Contribute 😊

We welcome all kinds of contributions from the open-source community, individuals, and partners.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Alex Chi
Alex Chi

πŸ’»
AuruTus
AuruTus

πŸ’»
Avery
Avery

πŸ’» πŸ€”
Ben Ye
Ben Ye

πŸ“–
Ce Gao
Ce Gao

πŸ’Ό πŸ–‹ πŸ“–
Jinjing Zhou
Jinjing Zhou

🎨 πŸ€” πŸ“†
Joe Passanante
Joe Passanante

πŸ’»
Keming
Keming

πŸ› πŸ’» πŸ“– πŸ€” πŸš‡
Mingzhuo Yin
Mingzhuo Yin

πŸ’» ⚠️ πŸš‡
Usamoi
Usamoi

πŸ’» πŸ€”
cutecutecat
cutecutecat

πŸ’»
odysa
odysa

πŸ“– πŸ’»
yi wang
yi wang

πŸ’»
yihong
yihong

πŸ’»
盐粒 Yanli
盐粒 Yanli

πŸ’»
Add your contributions

This project follows the all-contributors specification. Contributions of any kind welcome!

Acknowledgements

Thanks to the following projects:

  • pgrx - Postgres extension framework in Rust
  • pgvector - Postgres extension for vector similarity search written in C

About

Vector database plugin for Postgres, written in Rust, specifically designed for LLM

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 85.0%
  • PLpgSQL 13.1%
  • Shell 1.5%
  • Other 0.4%