Vector Functions

This page lists all the functions in the GSQL query language that operate on vector data. In general, a vector could be either a `LIST<>` type attribute or a `ListAccum<>` variable.

 The current vector functions are implemented as UDFs and preloaded as part of the Graph Data Science Library and thus are not yet permanently parts of the GSQL language. The current vector functions work with` ListAccum<>` inputs but not directly with `List<>` attributes.

tg_similarity_accum()

Returns a similarity (or distance) measure between two vectors. This function supports multiple similarity measures; the user selects which measure by setting one of the input parameters.

• The first two parameters are `ListAccum<>` for the two vectors being compared.

• The third parameter is a string and indicates which variation of the function to use.

Value Function

"COSINE"

cosine similarity

"EUCLIDEAN"

Euclidean distance

"JACCARD"

Jaccard similarity

"OVERLAP"

overlap similarity

"PEARSON"

Pearson correlation coefficient

Euclidean Distance

Syntax

`tg_similarity_accum(VectorA, VectorB, "EUCLIDEAN")`

Description

The Euclidean distance between two vectors. The vectors must have the same length and have corresponding numeric elements.

`double`

Parameters

Parameter Description Data type

VectorA

An n-dimensional vector denoted by a `ListAccum` of length `n`.

`ListAccum<INT/UINT/FLOAT/DOUBLE>`

VectorB

An n-dimensional vector denoted by a `ListAccum` of length `n`.

`ListAccum<INT/UINT/FLOAT/DOUBLE>`

"EUCLIDEAN"

A string that identifies the similarity function type.

`const string`

Example

• Query

• Result

``````CREATE QUERY euclidean_example() FOR GRAPH social {
ListAccum<INT> @@a = [1, 2, 3];
ListAccum<INT> @@b = [4, 5, 6];
double distance = tg_similarity_accum(@@a, @@b, "EUCLIDEAN");
PRINT distance;
}``````
``````{
"distance": 5.19615
}``````

Overlap

Syntax

`tg_similarity_accum( VectorA, VectorB, "OVERLAP" )`

Description

The overlap coefficient between two vectors, that is, the number of items appearing in both lists, divided by the number of items in the shorter list. Overlap similarity is intended for vectors which are unordered sets of categorical data, such as the sets of city names mentioned in travel blogs.

`double`

Parameters

Parameter Description Data type

`VectorA`

An n-dimensional vector denoted by a `ListAccum` of length `n`.

`ListAccum<INT/UINT/STRING>`

`VectorB`

An n-dimensional vector denoted by a `ListAccum` of length `n`.

`ListAccum<INT/UINT/STRING>`

"OVERLAP"

A string that identifies the similarity function type.

`const string`

Example

• Query

• Result

``````CREATE QUERY overlap_example(/* Parameters here */) FOR GRAPH social {
ListAccum<INT> @@a = [1, 2, 3];
ListAccum<INT> @@b = [2, 2, 3];
double overlap_similarity = tg_similarity_accum(@@a, @@b, "OVERLAP");
PRINT overlap_similarity;
}``````
``````[
{
"overlap_similarity": 0.66667
}
]``````

Pearson

Syntax

`tg_similarity_accum(VectorA, VectorB, "PEARSON")`

Description

The Pearson correlation coefficient between two vectors. The vectors must have the same length and have corresponding numeric elements.

`double`

Parameters

Parameter Description Data type

`VectorA`

An n-dimensional vector denoted by a `ListAccum` of length `n`.

`ListAccum<INT/UINT/FLOAT/DOUBLE>`

`VectorB`

An n-dimensional vector denoted by a `ListAccum` of length `n`.

`ListAccum<INT/UINT/FLOAT/DOUBLE>`

"PEARSON"

A string that identifies the similarity function type.

`const string`

Example

• Query

• Result

``````CREATE QUERY pearson_example() FOR GRAPH social {
ListAccum<INT> @@a = [1, 2, 3];
ListAccum<INT> @@b = [2, 2, 3];
double pearson_similarity = tg_similarity_accum(@@a, @@b "PEARSON");
PRINT pearson_similarity;
}``````
``````{
"pearson_similarity": 0.86603
}``````

Cosine

Syntax

`tg_similarity_accum(VectorA, VectorB, "COSINE")`

Description

The cosine similarity between the two vectors. The vectors must have the same length and have corresponding numeric elements.

`double`

Parameters

Parameter Description Data type

`VectorA`

An n-dimensional vector denoted by a `ListAccum` of length `n`.

`ListAccum<INT/UINT/FLOAT/DOUBLE>`

`VectorB`

An n-dimensional vector denoted by a `ListAccum` of length `n`.

`ListAccum<INT/UINT/FLOAT/DOUBLE>`

"COSINE"

A string that identifies the similarity function type.

`const string`

Example

• Query

• Result

``````CREATE QUERY cosine_similarity_example() FOR GRAPH social {
ListAccum<INT> @@a = [1, 0, 3];
ListAccum<INT> @@b = [0, 2, 6];
double similarity = tg_similarity_accum(@@a, @@b "COSINE");
PRINT similarity;
}``````
``````{
"similarity": 0.868243
}``````

Jaccard

Syntax

`tg_similarity_accum(VectorA, VectorB "JACCARD")`

Description

The Jaccard similarity between the two sets, that is, the number of items appearing in both sets, divided by the number of unique items in the lists. Jaccard similarity, like overlap similarity, is intended for vectors which are unordered sets of categorical data, such as the sets of city names mentioned in travel blogs.

`double`

Parameters

Parameter Description Data type

`VectorA`

An n-dimensional vector denoted by a `ListAccum` of length `n`.

`ListAccum<INT/UINT/STRING>`

`VectorB`

An n-dimensional vector denoted by a `ListAccum` of length `n`.

`ListAccum<INT/UINT/STRING>`

"JACCARD"

A string that identifies the similarity function type.

`const string`

Example

• Query

• Result

``````CREATE QUERY jaccard_similarity_example() FOR GRAPH social {
ListAccum<INT> @@a = [1, 2, 3];
ListAccum<INT> @@b = [2, 3, 4];
double similarity = tg_similarity_accum(@@a, @@b, "JACCARD");
PRINT similarity;
}``````
``````{
"jaccard_similarity": 0.5
}``````