Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The PageRank algorithm measures the influence of each vertex on every other vertex. PageRank influence is defined recursively: a vertex's influence is based on the influence of the vertices which refer to it. A vertex's influence tends to increase if (1) it has more referring vertices or if (2) its referring vertices have higher influence. The analogy to social influence is clear.
A common way of interpreting PageRank value is through the Random Network Surfer model. A vertex's PageRank score is proportional to the probability that a random network surfer will be at that vertex at any given time. A vertex with a high PageRank score is a vertex that is frequently visited, assuming that vertices are visited according to the following Random Surfer scheme:
Assume a person travels or surfs across a network's structure, moving from vertex to vertex in a long series of rounds.
The surfer can start anywhere. This start-anywhere property is part of the magic of PageRank, meaning the score is a truly fundamental property of the graph structure itself.
Each round, the surfer randomly picks one of the outward connections from the surfer's current location. The surfer repeats this random walk for a long time.
But wait. The surfer doesn't always follow the network's connection structure. There is a probability (1-damping, to be precise), that the surfer will ignore the structure and will magically teleport to a random vertex.
For more information, see the Google paper on PageRank.
We ran pageRank on our test10 graph (using Friend edges) with the following parameter values: damping=0.85, max_change=0.001, and max_iter=25. We see that Ivy (center bottom) has the highest pageRank score (1.12). This makes sense since there are 3 neighboring persons who point to Ivy, more than for any other person. Eddie and Justin have scores of exactly 1 because they do not have any out-edges. This is an artifact of our particular version pageRank. Likewise, Alex has a score of 0.15, which is (1-damping), because Alex has no in-edges.
Characteristic
Value
Result
Computes a PageRank value (FLOAT type) for each vertex.
Input Parameters
STRING v_type
: Names of vertex type to use
STRING e_type
: Names of edge type to use
FLOAT max_change
: PageRank will stop iterating when the largest difference between any vertex's current score and its previous score ≤ max_change.
That is, the scores have become very stable and are changing by less than max_change
from one iteration to the next.
INT max_iter
: Maximum number of iterations.
FLOAT damping
: Fraction of score that is due to the score of neighbors. The balance (1 - damping) is a minimum baseline score that every vertex receives.
INT top_k
: Sort the scores highest first and output only this many scores
BOOL print_accum
: If True, output JSON to standard output
STRING result_attr
: If not empty, store PageRank values (FLOAT) to this attribute
STRING file_path
: If not empty, write output to this file.
BOOL display_edges
: If true, include the graph's edges in the JSON output, so that the full graph can be displayed.
Result Size
V = number of vertices
Time Complexity
O(E*k), E = number of edges, k = number of iterations.
The number of iterations is data-dependent, but the user can set a maximum. Parallel processing reduces the time needed for computation.
Graph Types
Directed edges
ArticleRank is an algorithm that has been derived from the PageRank algorithm to measure the influence of journal articles.
Page Rank assumes that relationships originating from low-degree nodes have a higher influence than relationships from high-degree nodes. Article Rank modifies the formula in such a way that it retains the basic PageRank methodology but lowers the influence of low-degree nodes.
The Article Rank of a node v at iteration i is defined as:
Within the formula:
Nin(v) are the incoming neighbors and Nout(v) are the outgoing neighbors of node v.
d is a damping factor in [0, 1], usually set to 0.85.
Nout is the average out-degree
For more information, see ArticleRank: a PageRank‐based alternative to numbers of citations for analysing citation network.
The article rank score for each vertex.
Suppose we have the following graph:
By running Article Rank on the graph, we will see that the vertex with the highest score is Dan:
Eigenvector centrality (also called eigencentrality or prestige score) is a measure of the influence of a vertex in a network. Relative scores are assigned to all vertices in the network based on the concept that connections to high-scoring vertices contribute more to the score of the vertex in question than equal connections to low-scoring vertices. A high eigenvector score means that a vertex is connected to many vertices who themselves have high scores.
The vertices with the highest Eigenvector centrality scores along with their score.
Suppose we have the following graph:
Running the algorithm on the graph will show that Dan has the highest centrality score.
For more information, see .
Name
Description
Data type
v_type
A vertex type.
STRING
e_type
An edge type.
STRING
max_change
Article Rank will stop iterating when the largest difference between any vertex's current score and its previous score ≤ max_change.
That is, the scores have become very stable and are changing by less than max_change
from one iteration to the next.
FLOAT
max_iter
Maximum number of iterations.
INT
damping
The damping factor. Usually set to 0.85.
FLOAT
top_k
The number of results with the highest scores to return.
INT
print_accum
If true, print JSON output.
BOOL
result_attr
If true, store the article rank score of each vertex in this attribute.
STRING
file_path
If true, output CSV to this file.
STRING
Name | Description | Data type |
| Vertex types to assign scores to. |
|
| Edge types to traverse. |
|
| Maximum number of iteration. |
|
| The convergence limit. |
|
| The number of vertices with the highest scores to return. |
|
| If true, print results to JSON output. |
|
| If not empty, save the score of each vertex to this attribute. |
|
| If not empty, print results in CSV to this file. |
|
The Harmonic Centrality algorithm calculates the harmonic centrality of each vertex in the graph. Harmonic Centrality is a variant of Closeness Centrality. In a (not necessarily connected) graph, the harmonic centrality reverses the sum and reciprocal operations in the definition of closeness centrality:
If your graph has many unconnected clusters, the harmonic centrality could be a better indicator of centrality than closeness centrality.
For more information, see Harmonic Centrality.
If we have the following graph, we can see that Ivy is the most central of the five vertices. Running the algorithm on the graph shows that Ivy has the highest centrality score:
Name
Description
Data type
v_type
Vertex types to calculate harmonic centrality for
SET<STRING>
e_type
Edge types to traverse
SET<STRING>
re_type
Reverse edge types. For undirected edges, fill in the edge type name in this parameter as well as the e_type
parameter.
SET<STRING>
max_hops
The maximum number of hops the algorithm would consider for each vertex. If set to a non-positive value, the limit is ignored.
INT
top_k
Sort the scores high to low and output the highest k
scores
INT
wf
If true
, use the Wasserman-Faust normalization for multi-component graphs
BOOL
print_accum
If true
, output JSON to standard output
BOOL
result_attr
If not empty, store centrality values (FLOAT) to this attribute
STRING
file_path
If not empty, write output to this file in CSV.
STRING
display_edges
If true
, include the graph's edges in the JSON output, so that the full graph can be displayed.
BOOL
The Betweenness Centrality of a vertex is defined as the number of shortest paths that pass through this vertex, divided by the total number of shortest paths. That is
where is called the pair dependency, is the total number of shortest paths from node s
to node t
and is the number of those paths that pass through v
.
The TigerGraph implementation is based on A Faster Algorithm for Betweenness Centrality by Ulrik Brandes, Journal of Mathematical Sociology 25(2):163-177, (2001). For every vertex s
in the graph, the pair dependency starting from vertex s
to all other vertices t
via all other vertices v
is computed first,
.
Then betweenness centrality is computed as
.
According to Brandes, the accumulated pair dependency can be calculated as
where, the set of predecessors of vertex w
on shortest paths from s
, is defined as
For every vertex, the algorithm works in two phases. The first phase calculates the number of shortest paths passing through each vertex. Then starting from the vertex on the most outside layer in a non-incremental order with pair dependency initial value of 0, traverse back to the starting vertex
This algorithm query employs a subquery called bc_subquery. Both queries are needed to run the algorithm.
In the example below, Claire is in the very center of the graph and has the highest betweenness centrality. Six shortest paths pass through Sam (i.e. paths from Victor to all other 6 people except for Sam and Victor), so the score of Sam is 6. David also has a score of 6, since Brian has 6 paths to other people that pass through David.
In the following example, both Charles and David have 9 shortest paths passing through them. Ellen is in a similar position as Charles, but her centrality is weakened due to the path between Frank and Jack.
Characteristic
Value
Result
Computes a Betweenness Centrality value (FLOAT type) for each vertex.
Required Input Parameters
SET<STRING> v_type
: Names of vertex types to use
SET<STRING> e_type
: Names of edge types to use
SET<STRING> re_type:
Names of reverse edge types to use
INT max_hops
: If >=0, look only this far from each vertex
INT top_k
: Sort the scores highest first and output only this many scores
BOOL print_accum
: If True, output JSON to standard output
STRING result_attr
: If not empty, store centrality values (FLOAT) to this attribute
STRING file_path
: If not empty, write output to this file.
BOOL display_edges
: If true, include the graph's edges in the JSON output, so that the full graph can be displayed.
Result Size
V = number of vertices
Time Complexity
O(E*V), E = number of edges, V = number of vertices.
Considering the high time cost of running this algorithm on a big graph, the users can set a maximum number of iterations. Parallel processing reduces the time needed for computation.
Graph Types
Undirected edges, Unweighted edges
Degree centrality is defined as the number of edges incident upon a node (i.e., the number of ties that a node has). The degree can be interpreted in terms of the immediate risk of a node for catching whatever is flowing through the network (such as a virus, or some information).
The vertices with the highest degree centrality scores along with their scores.
Suppose we have the following graph:
Running the query on the graph will show that Dan has the highest degree centrality
In the original PageRank, the damping factor is the probability of the surfer continues browsing at each step. The surfer may also stop browsing and start again from a random vertex. In personalized PageRank, the surfer can only start browsing from a given set of source vertices both at the beginning and after stopping.
We ran Personalized PageRank on the graph social10
using Friend edges with the following parameter values:
In this case, the random walker can only start or restart walking from Fiona. In the figure below, we see that Fiona has the highest PageRank score in the result. Ivy and George have the next highest scores because they are direct out-neighbors of Ivy and there are looping paths that lead back to them again. Half of the vertices have a score of 0 since they can not be reached from Fiona.
In the Closeness Centrality algorithm, to obtain the closeness centrality score for a vertex, we measure the distance from the source vertex to every single vertex in the graph. In large graphs, running this calculation for every vertex can be highly time-consuming.
The Approximate Closeness Centrality algorithm (based on Cohen et al. 2014) calculates the approximate closeness centrality score for each vertex by combining two estimation approaches - sampling and pivoting. This hybrid estimation approach offers near-linear time processing and linear space overhead within a small relative error. It runs on graphs with unweighted edges (directed or undirected).
This query uses another subquerycloseness_cent_approx_sub
, which needs to be installed before closeness_approx
can be installed.
The result is a list of all vertices in the graph with their approximate closeness centrality score. It is available both in JSON and CSV format.
Below is an example of running the algorithm on the social10 test graph and an excerpt of the response.
We all have an intuitive understanding when we say a home, an office, or a store is "centrally located." Closeness Centrality provides a precise measure of how "centrally located" a vertex is. The steps below show the steps for one vertex v
:
TigerGraph's closeness centrality algorithm uses multi-source breadth-first search (MS-BFS) to traverse the graph and calculate the sum of a vertex's distance to every other vertex in the graph, which vastly improves the performance of the algorithm. The algorithm's implementation of MS-BFS is based on the paper The More the Merrier: Efficient Multi-source Graph Traversal by Then et al.
This algorithm query employs a subquery called cc_subquery
. Both queries are needed to run the algorithm.
Closeness centrality can be measured for either directed edges (from v
to others) or for undirected edges. Directed graphs may seem less intuitive, however, because if the distance from Alex to Bob is 1, it does not mean the distance from Bob to Alex is also 1.
For our example, we wanted to use the topology of the Likes graph, but to have undirected edges. We emulated an undirected graph by using both Friend
and Also_Friend
(reverse-direction) edges.
Influence maximization is the problem of finding a small subset of vertices in a social network that could maximize the spread of influence.
There are two versions of the Influence Maximization algorithm. Both versions find k
vertices that maximize the expected spread of influence in the network. The CELF version improves upon the efficiency of the greedy version and should be preferred in analyzing large networks.
The two versions of the algorithm are implemented on the following papers:
The CELF version and the greedy version of the algorithms share the same set of parameters.
The ID of the vertices with the highest influence scores along with their scores.
CELF:
Greedy:
Characteristic
Value
Result
Computes a weighted PageRank value (FLOAT type) for each vertex.
Input Parameters
<b></b>
STRING v_type
: Names of vertex type to use
STRING e_type
: Names of edge type to use
STRING wt_attr
: Name of edge weight attribute
FLOAT max_change
: PageRank will stop iterating when the largest difference between any vertex's current score and its previous score ≤ max_change
. That is, the scores have become very stable and are changing by less than max_change
from one iteration to the next.
INT max_iter
: Maximum number of iterations.
FLOAT damping
: Fraction of score that is due to the score of neighbors. The balance (1 - damping) is a minimum baseline score that every vertex receives.
INT top_k
: Sort the scores highest first and output only this many scores
BOOL print_accum
: If True, output JSON to standard output
STRING result_attr
: If not empty, store PageRank values (FLOAT) to this attribute
STRING file_path
: If not empty, write output to this file.
BOOL display_edges
: If true, include the graph's edges in the JSON output, so that the full graph can be displayed.
Result Size
V = number of vertices
Time Complexity
O(E*k), E = number of edges, k = number of iterations.
The number of iterations is data-dependent, but the user can set a maximum. Parallel processing reduces the time needed for computation.
Graph Types
Directed edges
Name
Description
Data type
v_type
A set of vertex types.
SET<STRING>
e_type
A set of edge types.
SET<STRING>
re_type
A set of reverse edge types. If an edge is undirected, put the edge name in the set as well.
`SET<STRING>
in_degree
Boolean value that indicates whether to count the incoming edges as part of a vertex's degree centrality.
BOOL
out_degree
Boolean value that indicates whether to count the outgoing edges as part of a vertex's degree centrality.
BOOL
top_k
The number of vertices with the highest scores to return.
INT
print_accum
If true, print results to JSON output.
BOOL
result_attr
If not empty, save the degree centrality score of each vertex to this attribute.
STRING
file_attr
If not empty, save results in CSV to this file.
STRING
Characteristic
Value
Result
Computes a personalized PageRank value (FLOAT type) for each vertex.
Input Parameters
SET<VERTEX> source
: Set of seed vertices
STRING e_type
: Names of edge type to use
FLOAT max_change
: PageRank will stop iterating when the largest difference between any vertex's current score and its previous score ≤ max_change
. That is, the scores have become very stable and are changing by less than max_change
from one iteration to the next.
INT max_iter
: Maximum number of iterations.
FLOAT damping
: Fraction of score that is due to the score of neighbors. The balance (1 - damping) is a minimum baseline score that every vertex receives.
INT top_k
: Sort the scores highest first and output only this many scores
BOOL print_accum
: If True, output JSON to standard output
STRING result_attr
: If not empty, store PageRank values (FLOAT) to this attribute
STRING file_path
: If not empty, write output to this file.
Result Size
V = number of vertices
Time Complexity
O(E*k), E = number of edges, k = number of iterations.
The number of iterations is data-dependent, but the user can set a maximum. Parallel processing reduces the time needed for computation.
Graph Types
Directed edges
Name
Description
v_type
Vertex types to calculate approximate closeness centrality for.
e_type
Edge types to traverse.
k
Size of the sample.
max_hops
Upper limit of how many jumps the algorithm will perform from each vertex.
epsilon
The maximum relative error, between 0 and 1. Setting a lower value produces a more accurate estimate but increases run time.
print_accum
Boolean value that indicates whether or not to output to console in JSON format.
file_path
If provided, the algorithm will output to this file path in CSV format
debug
There are many conditional logging statements inside the query. If the input is 0, nothing will be logged. If the input is 1, everything else but the breadth-first-search process from the sample-node. If the input is 2, everything will be logged.
sample_index
The algorithm will partition the graph based on the sample size. This index indicates which partition to use when estimating closeness centrality.
maxsize
If the number of vertices in the graph is lower than maxsize
, the exact closeness centrality is calculated instead and nothing will be approximated.
wf
Boolean value that indicates whether to use the Wasserman and Faust formula to calculate closeness centrality rather than the classic formula.
Step
Mathematical Formula
1. Compute the average distance from vertex v to every other vertex:
2. Invert the average distance, so we have average closeness of v:
Characteristic
Value
Result
Computes a Closeness Centrality value (FLOAT type) for each vertex.
Required Input Parameters
SET<STRING> v_type
: Names of vertex types to use
SET<STRING> e_type
: Names of edge types to use
SET<STRING>
re_type
: Names of reverse edge types to use
INT max_hops
: If >=0, look only this far from each vertex
INT top_k
: Sort the scores highest first and output only this many scores
BOOL wf
: If True, use Wasserman-Faust normalization for multi-component graphs
BOOL print_accum
: If true, output JSON to standard output
STRING result_attr
: If not empty, store centrality values (FLOAT) to this attribute
STRING file_path
: If not empty, write output to this file in CSV.
BOOL display_edges
: If true, include the graph's edges in the JSON output, so that the full graph can be displayed.
Result Size
V = number of vertices
Time Complexity
O(E), E = number of edges.
Parallel processing reduces the time needed for computation.
Graph Types
Directed or Undirected edges, Unweighted edges
Name | Description | Data type |
| A vertex type |
|
| An edge type |
|
| The name of the weight attribute on the edge type |
|
| The number of vertices with the highest influence score to return |
|
| If true, print results to JSON output. |
|
| If not empty, save results in CSV to this file. |
|