# Jaccard Similarity of Neighborhoods (Single Source)

The Jaccard index measures the relative overlap between two sets. To compare two vertices by Jaccard similarity, first select a set of values for each vertex. For example, a set of values for a Person could be the cities the Person has lived in. Then the Jaccard index is computed for the two vectors.

The Jaccard index of two sets A and B is defined as follows:

$Jaccard(A,B)=\frac{|A \cap B|}{|A \cup B|}$

The value ranges from 0 to 1. If A and B are identical, then Jaccard(A, B) = 1. If both A and B are empty, we define the value to be 0.

## Specifications

``````tg_jaccard_nbor_ss (VERTEX source, STRING e_type, STRING rev_e_type,  INT top_k = 100,
BOOL print_accum = TRUE, STRING similarity_edge_type = "",STRING file_path = "")``````

### Time complexity

This algorithm has a time complexity of $O(D^2)$, where $D$ is equal to the outdegree of the vertex.

Characteristic Value

Result

The top k vertices in the graph that have the highest similarity scores, along with their scores.

The result is available in three forms:

• streamed out in JSON format

• written to a file in tabular format, or

• stored as a vertex attribute value.

Input Parameters

• `SET<STRING> v_type`: Vertex type to calculate similarity score for

• `SET<STRING> e_type`: Edge type to traverse

• `SET<STRING> re_type`: Reverse edge type to traverse

• `INT top_k`: the number of vertex pairs with the highest similarity scores to return

• `BOOL print_accum`: Boolean value that decides whether to output to console

• `STRING similarity_edge`: If provided, the similarity score will be saved to this edge

• `STRING filepath:`If provided, the algorithm will output to the file path in CSV format

Result Size

`top_k`

Graph Types

Undirected or directed edges, unweighted edges

The algorithm will not output more than K vertices, so the algorithm may arbitrarily choose to output one vertex over another if there are tied similarity scores.

## Example

Using the movie graph, we run `jaccard_nbor_ss("Neil", 5)`:

``````[
{
"@@result_topK": [
{
"vertex1": "Neil",
"vertex2": "Kat",
"score": 0.5
},
{
"vertex1": "Neil",
"vertex2": "Kevin",
"score": 0.4
},
{
"vertex1": "Neil",
"vertex2": "Jing",
"score": 0.2
}
]
}
]``````

If the source vertex (person) doesn’t have any common neighbors (movies) with any other vertex (person), such as Elena in our example, the result will be an empty list:

``````[
{
"@@result_topK": []
}
]``````