Vector Database Operations
The TigerGraph database stores and indexes vectors efficiently, effectively serving as a multi-modal graph + vector database for AI applications. The storage and indexing are distributed, so vector operations are fast and scalable.
Each vector is associated with a vertex, serving as a specialty attribute. The vertex holds the vector’s ID and another properties or labels that to be associated with the vector.
The vector search operations are implemented as functions which can be invoked from within GSQL graph queries. This enables an extremely rich set of possibilities for hybrid graph+vector queries.
This document is intended to be a quick reference to the available functionality. For a tutorial and examples. see our Vector Search tutorial. |
Vector Data and Search Specifications
Characteristic | Description |
---|---|
Element data type |
FLOAT (32-bit floating point) |
Max. dimensions (vector length) |
4096 |
Similarity metrics |
cosine, L2 (Euclidean) IP (inner product) |
Approximate search algorithms |
HNSW |
Indexing |
Automatic, incremental |
Vector operations |
Load, Read, Exact Similarity Search, Approximate Similarity Search, Top-K Search, Hybrid Graph+Vector Search |
Vector Operations Cheatsheet
For more details, click on the task links.
Task | Command Cheatsheet |
---|---|
In schema change job:
|
|
Clause in loading job:
|
|
Function call in ACCUM clause returns a vertex set:
|
|
In GSQL query: |
|
REST: Upsert API GSQL: assignment in ACCUM clause |
There are several other functions available for working with vector data.
Creating a Vector Schema
A vector attribute is always added to an existing vertex type, using the ALTER VERTEX
statement with a SCHEMA_CHANGE
job.
The ADD VECTOR ATTRIBUTE
clause adds a vector, and DROP VECTOR ATTRIBUTE
removed the attribute.
ALTER VERTEX vertex_name ADD VECTOR ATTRIBUTE vec_name(vertex_params);
where vertex_params =
DIMENSION=dim,
METRIC=metr
[, INDEXTYPE=i_type]
[, DATATYPE=d_type]
ALTER VERTEX vertex_name DROP VECTOR ATTRIBUTE vec_name;
Vector parameter* | Decription | Values |
---|---|---|
DIMENSION |
dimensions (length) of vector |
|
METRIC |
metric used for measuring similarity. |
|
INDEXTYPE |
Similarity indexing scheme |
|
DATATYPE |
element data type |
|
Vector parameter names are case-insensitive. |
Example:
CREATE GLOBAL SCHEMA_CHANGE JOB add_emb2 {
ALTER VERTEX Account ADD VECTOR ATTRIBUTE emb2(DIMENSION=3, METRIC="L2");
}
RUN GLOBAL SCHEMA_CHANGE JOB add_emb2
Loading Vectors
Vectors can be loaded in bulk using GSQL loading jobs. The can either be loaded to existing vertices or included when new vertices are being created.
The vector data is presumed to be in some format such as CSV where one separator field_sep
is used between fields like the vector ID and the vector itself. Another separator elem_sep
separates the vector’s elements from one another.
LOAD file_object TO VECTOR ATTRIBUTE vec_name ON VERTEX vertex_name \
VALUES (id_col, SPLIT(vec_col, elem_sep) USING SEPARATOR = field_sep;
Example:
CREATE LOADING JOB load_s3_file {
DEFINE FILENAME account;
DEFINE FILENAME accountEmb;
;
LOAD account TO VERTEX Account VALUES ($"name", gsql_to_bool(gsql_trim($"isBlocked"))) USING header="true", separator=",";
LOAD accountEmb TO VECTOR ATTRIBUTE em1 ON VERTEX Account VALUES ($0, SPLIT($1, ",")) USING SEPARATOR="|";
}
$0
is the first column, holding the vector IDs.
$1
is the vector, with "|"
separating them.
The vector itself is subdivided into elements, with ","
as the element separator.
Note the use of the split() to parse the vector.
Search Indexing
TigerGraph automatically builds an (HNSW) similarity index for the vectors as they are loaded, using the similarity metric that was specified when the vector schema was defined.
The search index is used when the vectorSearch
function is invoked.
The index is automatically updated in respond to changes in the vector database. Because similar calculations take some time, the index can lag behind the data updates.
vectorSearch Function
Once vectors are inserted, the database can be searched for vertices or vectors.
vectorSearch
is a built-in function for vector (approximate) similarity search.
result = vectorSearch(vectorAttributes, queryVector, k, optionalParam)
where result
is a vertex set variable, storing the top-k most similar vertices.
Parameter | Description |
---|---|
vectorAttributes |
A set of vector attributes to search. Each attribute should be in format |
queryVector |
The query (input) vector from which similarity is measured. |
k |
The number of vectors to return. E.g., if k = 10, |
optionalParam |
A map of optional params:
E.g., |
The meaning of the optional parameters is best explained by example. See the Vector Search tutorial.
Printing Vectors
Note that the vectorSearch
function returns the set of vertices that are associated with the selected vectors, rather than the vectors themselves.
To output the vector values, PRINT
the vertices and include the WITH VECTOR
clause.
Due to the large size of vectors, this option is usually used when writing the output directly to a file or object store.
Here is a simple example of a query with a WITH VECTOR
clause
CREATE OR REPLACE QUERY q1 (LIST<float> query_vector) SYNTAX v3 {
MapAccum<Vertex, Float> @@distances;
// Find top-5 similar vectors from Account's vector attribute emb1
// Store the distance in @@distance
v = vectorSearch({Account.emb1}, query_vector, 5, { distance_map: @@distances});
PRINT v WITH VECTOR; //show the embeddings
PRINT @@distances; //show the distance map
}
For more examples, please see the Vector Search tutorial.
Add and Update Vectors
Upsert API
The existing REST API for updating data can be used for vectors The vector attribute is treated like standard vector attributes. The upsert data for a vertex can include zero or more standard attributes and zero or more standard vector attributes, in the same JSON structure.
For example:
curl -X POST "http://localhost:14240/restpp/graph/financialGraph" -d ' { "vertices": { "Account": { "9350352": { "name": { "value": "Paul Gray" }, "emb1": { "value": [-0.017733968794345856, -0.01019224338233471, -0.016571875661611557] } } } } }
Assignment in GSQL
A vertex attribute can be updated like other attributes in the ACCUM or POST-ACCUM clause of a GSQL query.
This example assigns new vector values in the last clause:
CREATE OR REPLACE QUERY similarToSame (LIST<float> query_vector) SYNTAX v3 {
// Find top-5 similar vectors from Account's vector attribute emb1
// Then update those vectors to be exactly the same as the query vector
sim = vectorSearch({Account.emb1}, query_vector, 5);
same = SELECT s FROM sim:s
POST-ACCUM s.emb1 = query_vector;
}
Other Vector Functions
The Vector Search tutorial has a list of built-in vector functions.