Overlap Similarity

The overlap coefficient, or Szymkiewicz–Simpson coefficient, is a similarity measure that measures the overlap between two finite sets.

\[{overlap} (X,Y)={\frac {|X\cap Y|}{\min(|X|,|Y|)}}\]

The algorithm takes two vectors denoted by ListAccum and returns the overlap coefficient between them.

Notes

This algorithm is implemented as a user-defined function. You need to follow the steps in Add a User-Defined Function to add the function to GSQL. After adding the function, you can call it in any GSQL query in the same way as a built-in GSQL function.

Specifications

tg_overlap_similarity_accum( A, B )

Parameters

Name Description Data type

A

An n-dimensional vector denoted by a ListAccum of length n

ListAccum<INT/UINT/FLOAT/DOUBLE>

B

An n-dimensional vector denoted by a ListAccum of length n

ListAccum<INT/UINT/FLOAT/DOUBLE>

Output

The overlap coefficient between the two vectors.

Time complexity

The algorithm has a complexity of \(O(n)\), where \(n\) is the number of dimensions of the vectors.

Example

  • Query

  • Result

CREATE QUERY overlap_example(/* Parameters here */) FOR GRAPH social {
  ListAccum<INT> @@a = [1, 2, 3];
  ListAccum<INT> @@b = [2, 2, 3];
  double overlap_similarity = tg_overlap_similarity_accum(@@a, @@b);
  PRINT overlap_similarity;
}
[
  {
    "overlap_similarity": 0.66667
  }
]