Jaccard Similarity of Neighborhoods (Single Source)
The Jaccard index measures the relative overlap between two sets. To compare two vertices by Jaccard similarity, first select a set of values for each vertex. For example, a set of values for a Person could be the cities the Person has lived in. Then the Jaccard index is computed for the two vectors.
The Jaccard index of two sets A and B is defined as follows:
The value ranges from 0 to 1. If A and B are identical, then Jaccard(A, B) = 1. If both A and B are empty, we define the value to be 0.
Specifications
In the current
Characteristic | Value |
Result | The top k vertices in the graph that have the highest similarity scores, along with their scores. The result is available in three forms:
|
Input Parameters |
|
Result Size |
|
Time Complexity | O(D^2), D = outdegree of vertex v |
Graph Types | Undirected or directed edges, unweighted edges |
The algorithm will not output more than K vertices, so the algorithm may arbitrarily choose to output one vertex over another if there are tied similarity scores.
Example
Using the movie graph, we run jaccard_nbor_ss("Neil", 5)
:
If the source vertex (person) doesn't have any common neighbors (movies) with any other vertex (person), such as Elena in our example, the result will be an empty list:
Last updated