The PageRank algorithm measures the influence of each vertex on every other vertex. PageRank influence is defined recursively: a vertex's influence is based on the influence of the vertices which refer to it. A vertex's influence tends to increase if (1) it has more referring vertices or if (2) its referring vertices have higher influence. The analogy to social influence is clear.
A common way of interpreting PageRank value is through the Random Network Surfer model. A vertex's PageRank score is proportional to the probability that a random network surfer will be at that vertex at any given time. A vertex with a high PageRank score is a vertex that is frequently visited, assuming that vertices are visited according to the following Random Surfer scheme:
Assume a person travels or surfs across a network's structure, moving from vertex to vertex in a long series of rounds.
The surfer can start anywhere. This start-anywhere property is part of the magic of PageRank, meaning the score is a truly fundamental property of the graph structure itself.
Each round, the surfer randomly picks one of the outward connections from the surfer's current location. The surfer repeats this random walk for a long time.
But wait. The surfer doesn't always follow the network's connection structure. There is a probability (1-damping, to be precise), that the surfer will ignore the structure and will magically teleport to a random vertex.
For more information, see the Google paper on PageRank.
We ran pageRank on our test10 graph (using Friend edges) with the following parameter values: damping=0.85, max_change=0.001, and max_iter=25. We see that Ivy (center bottom) has the highest pageRank score (1.12). This makes sense since there are 3 neighboring persons who point to Ivy, more than for any other person. Eddie and Justin have scores of exactly 1 because they do not have any out-edges. This is an artifact of our particular version pageRank. Likewise, Alex has a score of 0.15, which is (1-damping), because Alex has no in-edges.
Characteristic
Value
Result
Computes a PageRank value (FLOAT type) for each vertex.
Input Parameters
STRING v_type
: Names of vertex type to use
STRING e_type
: Names of edge type to use
FLOAT max_change
: PageRank will stop iterating when the largest difference between any vertex's current score and its previous score ≤ max_change.
That is, the scores have become very stable and are changing by less than max_change
from one iteration to the next.
INT max_iter
: Maximum number of iterations.
FLOAT damping
: Fraction of score that is due to the score of neighbors. The balance (1 - damping) is a minimum baseline score that every vertex receives.
INT top_k
: Sort the scores highest first and output only this many scores
BOOL print_accum
: If True, output JSON to standard output
STRING result_attr
: If not empty, store PageRank values (FLOAT) to this attribute
STRING file_path
: If not empty, write output to this file.
BOOL display_edges
: If true, include the graph's edges in the JSON output, so that the full graph can be displayed.
Result Size
V = number of vertices
Time Complexity
O(E*k), E = number of edges, k = number of iterations.
The number of iterations is data-dependent, but the user can set a maximum. Parallel processing reduces the time needed for computation.
Graph Types
Directed edges