Jaccard Similarity of Neighborhoods (Batch)
Algorithm link: Jaccard Similarity of Neighborhoods (Batch)
This algorithm computes the same similarity scores as the Jaccard similarity of neighborhoods, all pairs except that it starts from all vertices as the source vertex and computes its similarity scores with its neighbors for all the vertices in parallel.
Since this is a memory-intensive operation, it is split into batches to reduce peak memory usage. The user can specify how many batches it is to be split into.
This algorithm runs on graphs with unweighted edges (directed or undirected).
Specifications
CREATE QUERY tg_jaccard_nbor_ap_batch ( INT top_k = 10, SET<STRING> v_type,
SET<STRING> feat_v_type, SET<STRING> e_type, SET<STRING> re_type,
STRING similarity_edge, INT src_batch_num = 50, INT nbor_batch_num = 10,
BOOL print_accum = true, INT print_limit = 50, STRING file_path = "")
Parameters
Name | Description |
---|---|
|
Number of top scores to report for each vertex |
|
Vertex type to calculate similarity for |
|
Feature vertex type |
|
Directed edge type to traverse |
|
Reverse edge type to traverse |
|
If provided, the similarity scores will be saved to this edge type |
|
Number of batches to split the source vertices into |
|
Number of batches to split the 2-hop neighbor vertices into |
|
If |
|
Number of source vertices to print, -1 to print all |
|
If a file path is provided, the algorithm will output to a file specified by the file path in CSV format |
Result
The result contains the top k Jaccard similarity scores for each vertex and its corresponding pair. A pair is only included if its similarity is greater than 0, meaning there is at least one common neighbor between the pair. The result is available in JSON format, or can be output to a file in CSV, or it can be saved as an edge on the graph itself. A JSON formatted result could look like this:
// Run jaccard_batch on social10 graph traversing through Friend edges
[
{
"Start": [
{
"attributes": {
"Start.@heap": [
{
"val": 0.33333,
"ver": "Howard"
},
{
"val": 0.25,
"ver": "Ivy"
},
{
"val": 0.25,
"ver": "George"
}
]
},
"v_id": "Fiona",
"v_type": "Person"
},
{
"attributes": {
"Start.@heap": []
},
"v_id": "Justin",
"v_type": "Person"
},
{
"attributes": {
"Start.@heap": []
},
"v_id": "Bob",
"v_type": "Person"
},
{
"attributes": {
"Start.@heap": [
{
"val": 0.5,
"ver": "Damon"
}
]
},
"v_id": "Chase",
"v_type": "Person"
},
{
"attributes": {
"Start.@heap": [
{
"val": 0.5,
"ver": "Chase"
}
]
},
"v_id": "Damon",
"v_type": "Person"
},
{
"attributes": {
"Start.@heap": [
{
"val": 0.33333,
"ver": "Ivy"
}
]
},
"v_id": "Alex",
"v_type": "Person"
},
{
"attributes": {
"Start.@heap": [
{
"val": 0.5,
"ver": "Howard"
},
{
"val": 0.25,
"ver": "Fiona"
}
]
},
"v_id": "George",
"v_type": "Person"
},
{
"attributes": {
"Start.@heap": []
},
"v_id": "Eddie",
"v_type": "Person"
},
{
"attributes": {
"Start.@heap": [
{
"val": 0.33333,
"ver": "Alex"
},
{
"val": 0.25,
"ver": "Fiona"
}
]
},
"v_id": "Ivy",
"v_type": "Person"
},
{
"attributes": {
"Start.@heap": [
{
"val": 0.5,
"ver": "George"
},
{
"val": 0.33333,
"ver": "Fiona"
}
]
},
"v_id": "Howard",
"v_type": "Person"
}
]
}
]