Vector Functions

This page lists all the functions in the GSQL query language that operate on vector data. In general, a vector could be either a LIST<> type attribute or a ListAccum<> variable. Similarity Algorithms

The current vector functions are implemented as UDFs and preloaded as part of the Graph Data Science Library and thus are not yet permanently parts of the GSQL language. The current vector functions work with` ListAccum<>` inputs but not directly with List<> attributes.

tg_similarity_accum()

Returns a similarity (or distance) measure between two vectors. This function supports multiple similarity measures; the user selects which measure by setting one of the input parameters.

  • The first two parameters are ListAccum<> for the two vectors being compared.

  • The third parameter is a string and indicates which variation of the function to use.

    Value Function

    "COSINE"

    cosine similarity

    "EUCLIDEAN"

    Euclidean distance

    "JACCARD"

    Jaccard similarity

    "OVERLAP"

    overlap similarity

    "PEARSON"

    Pearson correlation coefficient

Euclidean Distance

Syntax

tg_similarity_accum(VectorA, VectorB, "EUCLIDEAN")

Description

The Euclidean distance between two vectors. The vectors must have the same length and have corresponding numeric elements.

Return type

double

Parameters

Parameter Description Data type

VectorA

An n-dimensional vector denoted by a ListAccum of length n.

ListAccum<INT/UINT/FLOAT/DOUBLE>

VectorB

An n-dimensional vector denoted by a ListAccum of length n.

ListAccum<INT/UINT/FLOAT/DOUBLE>

"EUCLIDEAN"

A string that identifies the similarity function type.

const string

Example

  • Query

  • Result

CREATE QUERY euclidean_example() FOR GRAPH social {
    ListAccum<INT> @@a = [1, 2, 3];
    ListAccum<INT> @@b = [4, 5, 6];
    double distance = tg_similarity_accum(@@a, @@b, "EUCLIDEAN");
    PRINT distance;
}
{
    "distance": 5.19615
}

Overlap

Syntax

tg_similarity_accum( VectorA, VectorB, "OVERLAP" )

Description

The overlap coefficient between two vectors, that is, the number of items appearing in both lists, divided by the number of items in the shorter list. Overlap similarity is intended for vectors which are unordered sets of categorical data, such as the sets of city names mentioned in travel blogs.

Return type

double

Parameters

Parameter Description Data type

VectorA

An n-dimensional vector denoted by a ListAccum of length n.

ListAccum<INT/UINT/STRING>

VectorB

An n-dimensional vector denoted by a ListAccum of length n.

ListAccum<INT/UINT/STRING>

"OVERLAP"

A string that identifies the similarity function type.

const string

Example

  • Query

  • Result

CREATE QUERY overlap_example(/* Parameters here */) FOR GRAPH social {
    ListAccum<INT> @@a = [1, 2, 3];
    ListAccum<INT> @@b = [2, 2, 3];
    double overlap_similarity = tg_similarity_accum(@@a, @@b, "OVERLAP");
    PRINT overlap_similarity;
}
[
  {
    "overlap_similarity": 0.66667
  }
]

Pearson

Syntax

tg_similarity_accum(VectorA, VectorB, "PEARSON")

Description

The Pearson correlation coefficient between two vectors. The vectors must have the same length and have corresponding numeric elements.

Return type

double

Parameters

Parameter Description Data type

VectorA

An n-dimensional vector denoted by a ListAccum of length n.

ListAccum<INT/UINT/FLOAT/DOUBLE>

VectorB

An n-dimensional vector denoted by a ListAccum of length n.

ListAccum<INT/UINT/FLOAT/DOUBLE>

"PEARSON"

A string that identifies the similarity function type.

const string

Example

  • Query

  • Result

CREATE QUERY pearson_example() FOR GRAPH social {
    ListAccum<INT> @@a = [1, 2, 3];
    ListAccum<INT> @@b = [2, 2, 3];
    double pearson_similarity = tg_similarity_accum(@@a, @@b "PEARSON");
    PRINT pearson_similarity;
}
{
    "pearson_similarity": 0.86603
}

Cosine

Syntax

tg_similarity_accum(VectorA, VectorB, "COSINE")

Description

The cosine similarity between the two vectors. The vectors must have the same length and have corresponding numeric elements.

Return type

double

Parameters

Parameter Description Data type

VectorA

An n-dimensional vector denoted by a ListAccum of length n.

ListAccum<INT/UINT/FLOAT/DOUBLE>

VectorB

An n-dimensional vector denoted by a ListAccum of length n.

ListAccum<INT/UINT/FLOAT/DOUBLE>

"COSINE"

A string that identifies the similarity function type.

const string

Example

  • Query

  • Result

CREATE QUERY cosine_similarity_example() FOR GRAPH social {
  ListAccum<INT> @@a = [1, 0, 3];
  ListAccum<INT> @@b = [0, 2, 6];
  double similarity = tg_similarity_accum(@@a, @@b "COSINE");
  PRINT similarity;
}
{
    "similarity": 0.868243
}

Jaccard

Syntax

tg_similarity_accum(VectorA, VectorB "JACCARD")

Description

The Jaccard similarity between the two sets, that is, the number of items appearing in both sets, divided by the number of unique items in the lists. Jaccard similarity, like overlap similarity, is intended for vectors which are unordered sets of categorical data, such as the sets of city names mentioned in travel blogs.

Return type

double

Parameters

Parameter Description Data type

VectorA

An n-dimensional vector denoted by a ListAccum of length n.

ListAccum<INT/UINT/STRING>

VectorB

An n-dimensional vector denoted by a ListAccum of length n.

ListAccum<INT/UINT/STRING>

"JACCARD"

A string that identifies the similarity function type.

const string

Example

  • Query

  • Result

CREATE QUERY jaccard_similarity_example() FOR GRAPH social {
  ListAccum<INT> @@a = [1, 2, 3];
  ListAccum<INT> @@b = [2, 3, 4];
  double similarity = tg_similarity_accum(@@a, @@b, "JACCARD");
  PRINT similarity;
}
{
    "jaccard_similarity": 0.5
}