The Jaccard index measures the relative overlap between two sets. To compare two vertices by Jaccard similarity, first select a set of values for each vertex. For example, a set of values for a Person could be the cities the Person has lived in. Then the Jaccard index is computed for the two vectors.
The Jaccard index of two sets A and B is defined as follows:
The value ranges from 0 to 1. If A and B are identical, then Jaccard(A, B) = 1. If both A and B are empty, we define the value to be 0.
In the current
The algorithm will not output more than K vertices, so the algorithm may arbitrarily choose to output one vertex over another if there are tied similarity scores.
Using the movie graph, we run jaccard_nbor_ss("Neil", 5)
:
If the source vertex (person) doesn't have any common neighbors (movies) with any other vertex (person), such as Elena in our example, the result will be an empty list:
Characteristic
Value
Result
The top k vertices in the graph that have the highest similarity scores, along with their scores.
The result is available in three forms:
streamed out in JSON format
written to a file in tabular format, or
stored as a vertex attribute value.
Input Parameters
SET<STRING> v_type
: Vertex type to calculate similarity score for
SET<STRING> e_type
: Edge type to traverse
SET<STRING> re_type
: Reverse edge type to traverse
INT top_k
: the number of vertex pairs with the highest similarity scores to return
BOOL print_accum
: Boolean value that decides whether to output to console
STRING similarity_edge
: If provided, the similarity score will be saved to this edge
STRING filepath:
If provided, the algorithm will output to the file path in CSV format
Result Size
top_k
Time Complexity
O(D^2), D = outdegree of vertex v
Graph Types
Undirected or directed edges, unweighted edges