Vector Database Operations

The TigerGraph database stores and indexes vectors efficiently, effectively serving as a multi-modal graph + vector database for AI applications. The storage and indexing are distributed, so vector operations are fast and scalable.

Each vector is associated with a vertex, serving as a specialty attribute. The vertex holds the vector’s ID and another properties or labels that to be associated with the vector.

The vector search operations are implemented as functions which can be invoked from within GSQL graph queries. This enables an extremely rich set of possibilities for hybrid graph+vector queries.

This document is intended to be a quick reference to the available functionality. For a tutorial and examples. see our Vector Search tutorial.

Vector Data and Search Specifications

Characteristic Description

Element data type

FLOAT (32-bit floating point)

Max. dimensions (vector length)

4096

Similarity metrics

cosine, L2 (Euclidean) IP (inner product)

Approximate search algorithms

HNSW

Indexing

Automatic, incremental

Vector operations

Load, Read, Exact Similarity Search, Approximate Similarity Search, Top-K Search, Hybrid Graph+Vector Search

Vector Operations Cheatsheet

For more details, click on the task links.

Task Command Cheatsheet

Add vectors to schema

In schema change job:

ALTER VERTEX vertex_name ADD VECTOR ATTRIBUTE vec_name(vertex_params);

Load vectors

Clause in loading job:

LOAD file_object TO VECTOR ATTRIBUTE vec_name ON VERTEX vertex_name
VALUES (id_col, SPLIT(vec_col, elem_sep) USING SEPARATOR = field_sep;

Search for similar vertices

Function call in ACCUM clause returns a vertex set: vectorSearch(vectorAttributes, queryVector, k, optionalParam)

Output vectors

In GSQL query: PRINT v WITH VECTOR;

Update vectors

REST: Upsert API

GSQL: assignment in ACCUM clause

There are several other functions available for working with vector data.

Creating a Vector Schema

A vector attribute is always added to an existing vertex type, using the ALTER VERTEX statement with a SCHEMA_CHANGE job. The ADD VECTOR ATTRIBUTE clause adds a vector, and DROP VECTOR ATTRIBUTE removed the attribute.

Vector schema syntax
ALTER VERTEX vertex_name ADD VECTOR ATTRIBUTE vec_name(vertex_params);

where vertex_params =
    DIMENSION=dim,
    METRIC=metr
    [, INDEXTYPE=i_type]
    [, DATATYPE=d_type]

ALTER VERTEX vertex_name DROP VECTOR ATTRIBUTE vec_name;
Vector parameter* Decription Values

DIMENSION

dimensions (length) of vector

INT from 1 to 4096

METRIC

metric used for measuring similarity.

STRING: "COSINE",

"L2" (Manhattan),

"IP" (inner product)

INDEXTYPE

Similarity indexing scheme

STRING: "HNSW" (default)

DATATYPE

element data type

STRING: "FLOAT" (default)

Vector parameter names are case-insensitive.

Example:

CREATE GLOBAL SCHEMA_CHANGE JOB add_emb2 {
  ALTER VERTEX Account ADD VECTOR ATTRIBUTE emb2(DIMENSION=3, METRIC="L2");
}
RUN GLOBAL SCHEMA_CHANGE JOB add_emb2

Loading Vectors

Vectors can be loaded in bulk using GSQL loading jobs. The can either be loaded to existing vertices or included when new vertices are being created.

The vector data is presumed to be in some format such as CSV where one separator field_sep is used between fields like the vector ID and the vector itself. Another separator elem_sep separates the vector’s elements from one another.

Vector load clause syntax
LOAD file_object TO VECTOR ATTRIBUTE vec_name ON VERTEX vertex_name \
VALUES (id_col, SPLIT(vec_col, elem_sep) USING SEPARATOR = field_sep;

Example:

CREATE LOADING JOB load_s3_file  {

 DEFINE FILENAME account;
 DEFINE FILENAME accountEmb;
;

 LOAD account TO VERTEX Account VALUES ($"name", gsql_to_bool(gsql_trim($"isBlocked"))) USING header="true", separator=",";
 LOAD accountEmb TO VECTOR ATTRIBUTE em1 ON VERTEX Account VALUES ($0, SPLIT($1, ",")) USING SEPARATOR="|";
}

$0 is the first column, holding the vector IDs. $1 is the vector, with "|" separating them. The vector itself is subdivided into elements, with "," as the element separator. Note the use of the split() to parse the vector.

Search Indexing

TigerGraph automatically builds an (HNSW) similarity index for the vectors as they are loaded, using the similarity metric that was specified when the vector schema was defined. The search index is used when the vectorSearch function is invoked.

The index is automatically updated in respond to changes in the vector database. Because similar calculations take some time, the index can lag behind the data updates.

vectorSearch Function

Once vectors are inserted, the database can be searched for vertices or vectors. vectorSearch is a built-in function for vector (approximate) similarity search.

vectorSearch function syntax
result = vectorSearch(vectorAttributes, queryVector, k, optionalParam)

where result is a vertex set variable, storing the top-k most similar vertices.

Parameter Description

vectorAttributes

A set of vector attributes to search. Each attribute should be in format VertexType.VectorName, e.g., {Account.eb1, Phone.eb1}

queryVector

The query (input) vector from which similarity is measured.

k

The number of vectors to return. E.g., if k = 10, vectorSearch finds the 10 vectors closest to queryVector.

optionalParam

A map of optional params:

  • vertex candidate set

  • EF: the exploration factor in HNSW algorithm

  • a global MapAccum storing top-k (vertex, distance score) pairs.

E.g., {candidate_set: vset1, ef: 20, distance_map: @@distmap}.

The meaning of the optional parameters is best explained by example. See the Vector Search tutorial.

Note that the vectorSearch function returns the set of vertices that are associated with the selected vectors, rather than the vectors themselves.

To output the vector values, PRINT the vertices and include the WITH VECTOR clause. Due to the large size of vectors, this option is usually used when writing the output directly to a file or object store.

Here is a simple example of a query with a WITH VECTOR clause

CREATE OR REPLACE QUERY q1 (LIST<float> query_vector) SYNTAX v3 {
  MapAccum<Vertex, Float> @@distances;

  // Find top-5 similar vectors from Account's vector attribute emb1
  // Store the distance in @@distance
  v = vectorSearch({Account.emb1}, query_vector, 5, { distance_map: @@distances});

  PRINT v WITH VECTOR; //show the embeddings
  PRINT @@distances; //show the distance map
}

For more examples, please see the Vector Search tutorial.

Add and Update Vectors

Upsert API

The existing REST API for updating data can be used for vectors The vector attribute is treated like standard vector attributes. The upsert data for a vertex can include zero or more standard attributes and zero or more standard vector attributes, in the same JSON structure.

For example:

curl -X POST "http://localhost:14240/restpp/graph/financialGraph" -d '
{
  "vertices": {
    "Account": {
      "9350352": {
        "name": {
          "value": "Paul Gray"
        },
        "emb1": {
          "value": [-0.017733968794345856, -0.01019224338233471, -0.016571875661611557]
} } } } }

Assignment in GSQL

A vertex attribute can be updated like other attributes in the ACCUM or POST-ACCUM clause of a GSQL query.

This example assigns new vector values in the last clause:

CREATE OR REPLACE QUERY similarToSame (LIST<float> query_vector) SYNTAX v3 {

  // Find top-5 similar vectors from Account's vector attribute emb1
  // Then update those vectors to be exactly the same as the query vector
  sim = vectorSearch({Account.emb1}, query_vector, 5);

  same = SELECT s FROM sim:s
      POST-ACCUM s.emb1 = query_vector;
}

Other Vector Functions

The Vector Search tutorial has a list of built-in vector functions.