Jaccard Similarity of Neighborhoods (Batch)

".Supported Graph Characteristics

Unweighted edges

Homogeneous vertex types

Heterogeneous vertex types

This algorithm computes the same similarity scores as the Jaccard similarity of neighborhoods, single source.

Instead of selecting a single source vertex, however, it calculates similarity scores for all vertex pairs in the graph in parallel.

Since this is a memory-intensive operation, it is split into batches to reduce peak memory usage. The user can specify how many batches it is to be split into.

Specifications

CREATE QUERY tg_jaccard_nbor_ap_batch ( INT top_k = 10,  SET<STRING> v_type_set,
    SET<STRING> feat_v_type,  SET<STRING> e_type_set, SET<STRING> reverse_e_type_set,
    STRING similarity_edge, INT src_batch_num = 50, INT nbor_batch_num = 10,
    BOOL print_results = true, INT print_limit = 50, STRING file_path = "")

Parameters

Name Description

top_k

Number of top scores to report for each vertex

v_type

Vertex type to calculate similarity for

feat_v_type

Feature vertex type

e_type

Directed edge type to traverse

reverse_e_type

Reverse edge type to traverse

similarity_edge

If provided, the similarity scores will be saved to this edge type

src_batch_num

Number of batches to split the source vertices into

nbor_batch_num

Number of batches to split the 2-hop neighbor vertices into

print_results

If true, output JSON to standard output.

print_limit

Number of source vertices to print, -1 to print all

file_path

If a file path is provided, the algorithm will output to a file specified by the file path in CSV format

Time complexity

This algorithm has a time complexity of \$O(E)\$, where \$E\$ is the number of edges.

Output

The result contains the top k Jaccard similarity scores for each vertex and its corresponding pair. A pair is only included if its similarity is greater than 0, meaning there is at least one common neighbor between the pair. The result is available in JSON format, or can be output to a file in CSV, or it can be saved as an edge on the graph itself. A JSON formatted result could look like this:

// Run jaccard_batch on social10 graph traversing through Friend edges
[
  {
    "Start": [
      {
        "attributes": {
          "Start.@heap": [
            {
              "val": 0.33333,
              "ver": "Howard"
            },
            {
              "val": 0.25,
              "ver": "Ivy"
            },
            {
              "val": 0.25,
              "ver": "George"
            }
          ]
        },
        "v_id": "Fiona",
        "v_type": "Person"
      },
      {
        "attributes": {
          "Start.@heap": []
        },
        "v_id": "Justin",
        "v_type": "Person"
      },
      {
        "attributes": {
          "Start.@heap": []
        },
        "v_id": "Bob",
        "v_type": "Person"
      },
      {
        "attributes": {
          "Start.@heap": [
            {
              "val": 0.5,
              "ver": "Damon"
            }
          ]
        },
        "v_id": "Chase",
        "v_type": "Person"
      },
      {
        "attributes": {
          "Start.@heap": [
            {
              "val": 0.5,
              "ver": "Chase"
            }
          ]
        },
        "v_id": "Damon",
        "v_type": "Person"
      },
      {
        "attributes": {
          "Start.@heap": [
            {
              "val": 0.33333,
              "ver": "Ivy"
            }
          ]
        },
        "v_id": "Alex",
        "v_type": "Person"
      },
      {
        "attributes": {
          "Start.@heap": [
            {
              "val": 0.5,
              "ver": "Howard"
            },
            {
              "val": 0.25,
              "ver": "Fiona"
            }
          ]
        },
        "v_id": "George",
        "v_type": "Person"
      },
      {
        "attributes": {
          "Start.@heap": []
        },
        "v_id": "Eddie",
        "v_type": "Person"
      },
      {
        "attributes": {
          "Start.@heap": [
            {
              "val": 0.33333,
              "ver": "Alex"
            },
            {
              "val": 0.25,
              "ver": "Fiona"
            }
          ]
        },
        "v_id": "Ivy",
        "v_type": "Person"
      },
      {
        "attributes": {
          "Start.@heap": [
            {
              "val": 0.5,
              "ver": "George"
            },
            {
              "val": 0.33333,
              "ver": "Fiona"
            }
          ]
        },
        "v_id": "Howard",
        "v_type": "Person"
      }
    ]
  }
]