Personalized PageRank

Supported Graph Characteristics

Unweighted edges

Directed edges

Homogeneous vertex types

Heterogeneous vertex types

In the original PageRank, the damping factor is the probability of the surfer continues browsing at each step. The surfer may also stop browsing and start again from a random vertex. In personalized PageRank, the surfer can only start browsing from a given set of source vertices both at the beginning and after stopping.

Specifications

tg_pagerank_pers(SET<VERTEX> source, STRING e_type,
FLOAT max_change=0.001, INT maximum_iteration=25, FLOAT damping = 0.85, INT top_k = 100
BOOL print_results = TRUE, STRING result_attribute = "", STRING file_path = "")

Parameters

Parameter Description Default

Parameter	Description	Default
`SET<VERTEX> source`	IDs and types of vertices to use. Provide each vertex ID and type as a tuple: `("id","type")`	(empty set of string tuples)
`STRING e_type`	Name of edge types to use	(empty string)
`FLOAT max_change`	PageRank will stop iterating when the largest difference between any vertex’s current score and its previous score ≤ `max_change`. That is, the scores have become very stable and are changing by less than `max_change` from one iteration to the next.	0.001
`INT maximum_iteration`	Maximum number of iterations.	25
`FLOAT damping`	Fraction of score that is due to the score of neighbors. The balance (1 - damping) is a minimum baseline score that every vertex receives.	0.85
`INT top_k`	Sort the scores highest first and output only this many scores	100
`BOOL print_results`	If True, output JSON to standard output	True
`STRING result_attribute`	If not empty, store PageRank values in FLOAT format to this vertex attribute	(empty string)
`STRING file_path+`	If not empty, write output to this file.	(empty string)

SET<VERTEX> source

IDs and types of vertices to use. Provide each vertex ID and type as a tuple: ("id","type")

(empty set of string tuples)

STRING e_type

Name of edge types to use

(empty string)

FLOAT max_change

PageRank will stop iterating when the largest difference between any vertex’s current score and its previous score ≤ max_change. That is, the scores have become very stable and are changing by less than max_change from one iteration to the next.

0.001

INT maximum_iteration

Maximum number of iterations.

FLOAT damping

Fraction of score that is due to the score of neighbors. The balance (1 - damping) is a minimum baseline score that every vertex receives.

0.85

INT top_k

Sort the scores highest first and output only this many scores

100

BOOL print_results

If True, output JSON to standard output

True

STRING result_attribute

If not empty, store PageRank values in FLOAT format to this vertex attribute

(empty string)

STRING file_path+

If not empty, write output to this file.

(empty string)

Example

We ran Personalized PageRank on the graph social10 using Friend edges with the following parameter values:

# Using "_" to use default values
RUN QUERY tg_pagerank_pers([("Fiona","Person")], "Friend", _, _, _, _, _, _,
_)

In this case, the random walker can only start or restart walking from Fiona. In the figure below, we see that Fiona has the highest PageRank score in the result. Ivy and George have the next highest scores because they are direct out-neighbors of Ivy and there are looping paths that lead back to them again. Half of the vertices have a score of 0 since they can not be reached from Fiona.

Visualized results of example query on social10 graph