Vector Functions
This page lists all the functions in the GSQL query language that operate on vector data. In general, a vector could be either a LIST<>
type attribute or a ListAccum<>
variable.
Similarity Algorithms
The current vector functions are implemented as UDFs and preloaded as part of the Graph Data Science Library and thus are not yet permanently parts of the GSQL language.
The current vector functions work with` ListAccum<>` inputs but not directly with |
tg_similarity_accum()
Returns a similarity (or distance) measure between two vectors. This function supports multiple similarity measures; the user selects which measure by setting one of the input parameters.
-
The first two parameters are
ListAccum<>
for the two vectors being compared. -
The third parameter is a string and indicates which variation of the function to use.
Value Function "COSINE"
cosine similarity
"EUCLIDEAN"
Euclidean distance
"JACCARD"
Jaccard similarity
"OVERLAP"
overlap similarity
"PEARSON"
Pearson correlation coefficient
Euclidean Distance
Description
The Euclidean distance between two vectors. The vectors must have the same length and have corresponding numeric elements.
Parameters
Parameter | Description | Data type |
---|---|---|
VectorA |
An n-dimensional vector denoted by a |
|
VectorB |
An n-dimensional vector denoted by a |
|
"EUCLIDEAN" |
A string that identifies the similarity function type. |
|
Overlap
Description
The overlap coefficient between two vectors, that is, the number of items appearing in both lists, divided by the number of items in the shorter list. Overlap similarity is intended for vectors which are unordered sets of categorical data, such as the sets of city names mentioned in travel blogs.
Parameters
Parameter | Description | Data type |
---|---|---|
|
An n-dimensional vector denoted by a |
|
|
An n-dimensional vector denoted by a |
|
"OVERLAP" |
A string that identifies the similarity function type. |
|
Pearson
Description
The Pearson correlation coefficient between two vectors. The vectors must have the same length and have corresponding numeric elements.
Parameters
Parameter | Description | Data type |
---|---|---|
|
An n-dimensional vector denoted by a |
|
|
An n-dimensional vector denoted by a |
|
"PEARSON" |
A string that identifies the similarity function type. |
|
Cosine
Description
The cosine similarity between the two vectors. The vectors must have the same length and have corresponding numeric elements.
Parameters
Parameter | Description | Data type |
---|---|---|
|
An n-dimensional vector denoted by a |
|
|
An n-dimensional vector denoted by a |
|
"COSINE" |
A string that identifies the similarity function type. |
|
Jaccard
Description
The Jaccard similarity between the two sets, that is, the number of items appearing in both sets, divided by the number of unique items in the lists. Jaccard similarity, like overlap similarity, is intended for vectors which are unordered sets of categorical data, such as the sets of city names mentioned in travel blogs.
Parameters
Parameter | Description | Data type |
---|---|---|
|
An n-dimensional vector denoted by a |
|
|
An n-dimensional vector denoted by a |
|
"JACCARD" |
A string that identifies the similarity function type. |
|