1 of 1

Jaccard Similarity of Neighborhoods (All Pairs)

This algorithm computes the same similarity scores as the Jaccard similarity of neighborhoods, single-source algorithm, except that it considers ALL pairs of vertices in the graph (for the vertex and edge types selected by the user). Naturally, this algorithm will take longer to run. For very large and very dense graphs, this algorithm may not be a practical choice.

Specifications

tg_jaccard_nbor_ap(STRING v_type, STRING e_type, STRING re_type, INT top_k,
  BOOL print_accum = TRUE, STRING similarity_edge = "", STRING file_path = "")

The algorithm will not output more than K vertex pairs, so the algorithm may arbitrarily chose to output one vertex pair over another, if there are tied similarity scores.

Example

For the movie graph, calculate the Jaccard similarity between all pairs and show the 5 most similar pairs: jaccard_nbor_ap(5). This is the JSON output :

[
  {
    "@@total_result": [
      {
        "vertex1": "Kat",
        "vertex2": "Neil",
        "score": 0.5
      },
      {
        "vertex1": "Kevin",
        "vertex2": "Neil",
        "score": 0.4
      },
      {
        "vertex1": "Jing",
        "vertex2": "Alex",
        "score": 0.25
      },
      {
        "vertex1": "Kat",
        "vertex2": "Kevin",
        "score": 0.25
      },
      {
        "vertex1": "Jing",
        "vertex2": "Neil",
        "score": 0.2
      }
    ]
  }
]

Jaccard Similarity of Neighborhoods (All Pairs)

Specifications

tg_jaccard_nbor_ap(STRING v_type, STRING e_type, STRING re_type, INT top_k,
  BOOL print_accum = TRUE, STRING similarity_edge = "", STRING file_path = "")

The algorithm will not output more than K vertex pairs, so the algorithm may arbitrarily chose to output one vertex pair over another, if there are tied similarity scores.

Example

For the movie graph, calculate the Jaccard similarity between all pairs and show the 5 most similar pairs: jaccard_nbor_ap(5). This is the JSON output :

[
  {
    "@@total_result": [
      {
        "vertex1": "Kat",
        "vertex2": "Neil",
        "score": 0.5
      },
      {
        "vertex1": "Kevin",
        "vertex2": "Neil",
        "score": 0.4
      },
      {
        "vertex1": "Jing",
        "vertex2": "Alex",
        "score": 0.25
      },
      {
        "vertex1": "Kat",
        "vertex2": "Kevin",
        "score": 0.25
      },
      {
        "vertex1": "Jing",
        "vertex2": "Neil",
        "score": 0.2
      }
    ]
  }
]

The top k vertex pairs in the graph that have the highest similarity scores, along with their scores.

The result is available in three forms:

streamed out in JSON format
written to a file in tabular format, or
stored as a vertex attribute value.

SET<STRING> v_type: Vertex types to calculate similarity score for
SET<STRING> e_type: Edge types to traverse
SET<STRING> re_type: Reverse edge types to traverse
INT top_k: the number of vertex pairs with the highest similarity scores to return
BOOL print_accum: Boolean value that decides whether to output to console
STRING similarity_edge: If provided, the similarity score will be saved to this edge
STRING file_path: If provided, the algorithm will output to the file path in CSV format