Cosine Similarity of Neighborhoods (All Pairs, Batch)

Supported Graph Characteristics

Weighted edges

Homogeneous vertex types

Algorithm link: Cosine Similarity of Neighborhoods (All Pairs, Batch)

This algorithm computes the same similarity scores as the Cosine similarity of neighborhoods, single source algorithm.

Instead of selecting a single source vertex, however, it calculates similarity scores for all vertex pairs in the graph in parallel.

Since this is a memory-intensive operation, it is split into batches to reduce peak memory usage. The user can specify how many batches it is to be split into.

Specifications

CREATE QUERY tg_cosine_nbor_ap_batch(STRING vertex_type, STRING edge_type,
    STRING edge_attribute, INT top_k, BOOL print_results = true,
    STRING file_path, STRING similarity_edge, INT num_of_batches = 1)

Parameters

Name Description

Name	Description
`v_type`	Vertex type to calculate similarity for
`e_type`	Directed edge type to traverse
`edge_attribute`	Name of the attribute on the edge type to use as the weight
`top_k`	Number of top scores to report for each vertex
`print_results`	If `true`, output JSON to standard output.
`similarity_edge`	If provided, the similarity score will be saved to this edge.
`file_path`	If not empty, write output to this file in CSV.
`num_of_batches`	Number of batches to divide the query into

v_type

Vertex type to calculate similarity for

e_type

Directed edge type to traverse

edge_attribute

Name of the attribute on the edge type to use as the weight

top_k

Number of top scores to report for each vertex

print_results

If true, output JSON to standard output.

similarity_edge

If provided, the similarity score will be saved to this edge.

file_path

If not empty, write output to this file in CSV.

num_of_batches

Number of batches to divide the query into

Output

The result of this algorithm is the top k cosine similarity scores and their corresponding pair for each vertex. The score is only included if it is greater than 0.

The result can be output in JSON format, in CSV to a file, or saved as a similarity edge in the graph itself.

Time complexity

This algorithm has a time complexity of \$O(E)\$, where \$E\$ is the number of edges.

Example

Using the social10 graph, we can calculate the cosine similarity of every person to every other person connected by the Friend edge, and print out the top k most similar pairs for each vertex.

We run tg_cosine_batch("Person", "Friend", "weight", 5, true, "", "", 1):

[
  {
    "start": [
      {
        "attributes": {
          "start.@heap": [
            {
              "val": 0.49903,
              "ver": "Howard"
            },
            {
              "val": 0.43938,
              "ver": "George"
            },
            {
              "val": 0.05918,
              "ver": "Alex"
            },
            {
              "val": 0.05579,
              "ver": "Ivy"
            }
          ]
        },
        "v_id": "Fiona",
        "v_type": "Person"
      },
      {
        "attributes": {
          "start.@heap": []
        },
        "v_id": "Justin",
        "v_type": "Person"
      },
      {
        "attributes": {
          "start.@heap": []
        },
        "v_id": "Bob",
        "v_type": "Person"
      },
      {
        "attributes": {
          "start.@heap": [
            {
              "val": 0.22361,
              "ver": "Bob"
            },
            {
              "val": 0.21213,
              "ver": "Alex"
            }
          ]
        },
        "v_id": "Chase",
        "v_type": "Person"
      },
      {
        "attributes": {
          "start.@heap": [
            {
              "val": 0.57143,
              "ver": "Bob"
            },
            {
              "val": 0.12778,
              "ver": "Chase"
            }
          ]
        },
        "v_id": "Damon",
        "v_type": "Person"
      },
      {
        "attributes": {
          "start.@heap": []
        },
        "v_id": "Alex",
        "v_type": "Person"
      },
      {
        "attributes": {
          "start.@heap": [
            {
              "val": 0.64253,
              "ver": "Alex"
            },
            {
              "val": 0.63607,
              "ver": "Ivy"
            },
            {
              "val": 0.27091,
              "ver": "Howard"
            },
            {
              "val": 0.14364,
              "ver": "Fiona"
            }
          ]
        },
        "v_id": "George",
        "v_type": "Person"
      },
      {
        "attributes": {
          "start.@heap": []
        },
        "v_id": "Eddie",
        "v_type": "Person"
      },
      {
        "attributes": {
          "start.@heap": [
            {
              "val": 0.94848,
              "ver": "Fiona"
            },
            {
              "val": 0.6364,
              "ver": "Alex"
            },
            {
              "val": 0.31046,
              "ver": "George"
            },
            {
              "val": 0.1118,
              "ver": "Howard"
            }
          ]
        },
        "v_id": "Ivy",
        "v_type": "Person"
      },
      {
        "attributes": {
          "start.@heap": [
            {
              "val": 1.09162,
              "ver": "Fiona"
            },
            {
              "val": 0.78262,
              "ver": "Ivy"
            },
            {
              "val": 0.11852,
              "ver": "George"
            }
          ]
        },
        "v_id": "Howard",
        "v_type": "Person"
      }
    ]
  }
]