Graph algorithms are functions for measuring characteristics of graphs, vertices, or relationships. Graph algorithms can provide insights into the role or relevance of individual entities in a graph. For example: How centrally located is this vertex? How much influence does this vertex exert over the others?
Some graph algorithms measure or identify global characteristics: What are the natural community groupings in the graph? What is the density of connections?
The GSQL Graph Algorithm Library is a collection of highperformance GSQL queries, each of which implements a standard graph algorithm. Each algorithm is ready to be installed and used, either as a standalone query or as a building block of a larger analytics application.
GSQL running on the TigerGraph platform is particularly wellsuited for graph algorithms:
Turingcomplete with full support for imperative and procedural programming, ideal for algorithmic computation.
Parallel and Distributed Processing, enabling computations on larger graphs.
UserExtensible. Because the algorithms are written in standard GSQL and compiled by the user, they are easy to modify and customize.
OpenSource. Users can study the GSQL implementations to learn by example, and they can develop and submit additions to the library.
You can download the library from github: https://github.com/tigergraph/ecosys/tree/master/graph_algorithms
The library contains two main sections: Algorithms and Tests. The Algorithms folder contains template algorithms and scripts to help you customize and install them. The Tests folder contains small sample graphs that you can use to experiment with the algorithms. In this document, we use the test graphs to show you the expected result for each algorithm. The graphs are small enough that you can manually calculate and sometimes intuitively see what the answers should be.
Remember that GSQL graph algorithms are simply GSQL queries. However, since we do not know what vertices or edges you want to analyze, or how you want to receive output, the algorithms are in a template format. You need to run a script to personalize your algorithm and then to install it.
Within the Algorithms folder is a script install.sh. When you run the script, it will first ask you which graph schema you wish to work on. (The TigerGraph platform supports multiple concurrent graphs.)
It then asks you to choose from a menu of available algorithms.
After knowing your graph schema and your algorithm, the installer can give you some guidance in selecting appropriate vertex types and edges types. Note this does not have to be all the vertex or edge types in your graph. For example, you may have a social graph with three categories of persons and five types of relationships. You might decide to compute PageRank using Member and Guest vertices and Recommended edges.
It will then proceed to create up to three versions of your algorithm, based on the three ways of receiving the algorithm's output:
Stream the output in JSON format, the default behavior for most GSQL queries.
Save the output value(s) in CSV format to a file. For some algorithms, this option will add an input parameter to the query, to let the user specify how many total values to output.
Store the results as vertex or edge attribute values. The attributes must already exist in the graph schema, and the installer will ask you which attributes to use.
After creating queries for one algorithm, the installer will loop back to let you choose another algorithm (returning to step 2 above).
If you choose the exit, the installer makes a last request: Do you want to install your queries? Installation is when the code is compiled and bound into the query engine. It takes a few minutes, so it is best to create all your personalized queries at once and then install them as a group.
Example:
$ bash install.sh*** GSQL Graph Algorithm Installer ***Available graphs: Graph social(Person:v, Friend:e, Also_Friend:e, Coworker:e)Graph name? socialPlease enter the number of the algorithm to install:1) EXIT2) Closeness Centrality3) Connected Components4) Label Propagation5) Community detection: Louvain6) PageRank7) Shortest Path, SingleSource, Any Weight8) Triangle Counting(minimal memory)9) Triangle Counting(fast, more memory)#? 6pageRank() works on directed edgesAvailable vertex and edge types: VERTEX Person(PRIMARY_ID id STRING, name STRING, score FLOAT, tag STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE" DIRECTED EDGE Friend(FROM Person, TO Person, weight FLOAT, tag STRING) WITH REVERSE_EDGE="Also_Friend" DIRECTED EDGE Also_Friend(FROM Person, TO Person, weight FLOAT, tag STRING) WITH REVERSE_EDGE="Friend" UNDIRECTED EDGE Coworker(FROM Person, TO Person, weight FLOAT, tag STRING)Please enter the vertex type(s) and edge type(s) for running PageRank.Use commas to separate multiple types [ex: type1, type2]Leaving this blank will select all available typesVertex types: PersonEdge types: FriendThe query pageRank is dropped.The query pageRank_file is dropped.The query pageRank_attr is dropped.gsql g social ./templates/pageRank.gsqlThe query pageRank has been added!gsql g social ./templates/pageRank_file.gsqlThe query pageRank_file has been added!If your graph schema has appropriate vertex or edge attributes,you can update the graph with your results.Do you want to update the graph [yn]? yVertex attribute to store FLOAT result (e.g. pageRank): scoregsql g social ./templates/pageRank_attr.gsqlThe query pageRank_attr has been added!Created the following algorithms: pageRank(float maxChange, int maxIter, float damping, bool display, int outputLimit) pageRank_attr(float maxChange, int maxIter, float damping, bool display) pageRank_file(float maxChange, int maxIter, float damping, bool display, file f)Please enter the number of the algorithm to install:1) EXIT2) Closeness Centrality3) Connected Components4) Label Propagation5) Community detection: Louvain6) PageRank7) Shortest Path, SingleSource, Any Weight8) Triangle Counting(minimal memory)9) Triangle Counting(fast, more memory)#? 1ExitingAlgorithm files have been created. Do want to install them now [yn]? yStart installing queries, about 1 minute ...cpageRank query: curl X GET 'http://127.0.0.1:9000/query/social/pageRank?maxChange=VALUE&maxIter=VALUE&damping=VALUE&display=VALUE&outputLimit=VALUE'. Add H "Authorization: Bearer TOKEN" if authentication is enabled.pageRank_file query: curl X GET 'http://127.0.0.1:9000/query/social/pageRank_file?maxChange=VALUE&maxIter=VALUE&damping=VALUE&display=VALUE&f=VALUE'. Add H "Authorization: Bearer TOKEN" if authentication is enabled.pageRank_attr query: curl X GET 'http://127.0.0.1:9000/query/social/pageRank_attr?maxChange=VALUE&maxIter=VALUE&damping=VALUE&display=VALUE'. Add H "Authorization: Bearer TOKEN" if authentication is enabled.[======================================================================================================] 100% (3/3)$
After the algorithms are installed, you will see them listed among the rest of your GSQL queries.
GSQL > ls...Queries: cc_subquery(vertex v, int numVert, int maxHops) (installed v2) closeness_cent(bool display, int outputLimit) (installed v2) closeness_cent_attr(bool display) (installed v2) closeness_cent_file(bool display, file f) (installed v2) conn_comp() (installed v2) conn_comp_attr() (installed v2) conn_comp_file(file f) (installed v2) label_prop(int maxIter) (installed v2) label_prop_attr(int maxIter) (installed v2) label_prop_file(int maxIter, file f) (installed v2) louvain() (installed v2) louvain_attr() (installed v2) louvain_file(file f) (installed v2) pageRank(float maxChange, int maxIter, float damping, bool display, int outputLimit) (installed v2) pageRank_attr(float maxChange, int maxIter, float damping, bool display) (installed v2) pageRank_file(float maxChange, int maxIter, float damping, bool display, file f) (installed v2) tri_count() (installed v2) tri_count_fast() (installed v2)
We will soon update the library so that most of schema choices can be made when running the algorithm, rather than when installing the algorithm.
Running an algorithm is the same as running a GSQL query. For example, if you selected the JSON option for pageRank, you could run it from GSQL as below:
GSQL > RUN QUERY pageRank(0.001, 25, 0.85, 10)
Installing a query also creates a REST endpoint. The same query could be run thus:
curl X GET 'http://127.0.0.1:9000/query/alg_graph/pageRank?maxChange=0.001&maxIter=25&damping=0.85&outputSize=10'
GSQL lets you run queries from within other queries. This means you can use a library algorithm as a building block for more complex analytics.
The following algorithms are currently available. The algorithms are grouped into three classes:
Path
Centrality
Community
Moreover, each algorithm may or may be be currently available for a graph with Undirected Edges, Directed Edges, and Weighted Edges.
Coming soon means that TigerGraph plans to release this variant of the algorithm soon.
n/a means that this variant of the algorithm is typically not used
Algorithm  Class  Undirected Edges  Directed Edges  Weighted Edges 
SingleSource Shortest Path  Path  Yes  Yes  Yes 
All Pairs Shortest Path  Path  Yes  Yes  Yes 
PageRank  Centrality  n/a  Yes  Coming soon 
Closeness Centrality  Centrality  Yes  n/a  Coming soon 
Betweenness Centrality  Centrality  Coming soon  n/a  Coming soon 
Weakly Connected Components  Community  Yes  n/a  n/a 
Strongly Connected Components  Community  n/a  Coming soon  n/a 
Label Propagation  Community  Yes  n/a  n/a 
Louvain Modularity  Community  Yes  n/a  n/a 
Triangle Counting  Community  Yes  n/a  n/a 
Computational Complexity is a formal mathematical term, referring to how an algorithm's time requirements scale according to the size of the data or other key parameters. For graphs, there are two data parameters:
V (or sometimes n), the number of vertices
E (or sometimes m), the number of edges
The notation O(V^2) (read "big O V squared") means that when V is large, the computational time is proportional to V^2.
These algorithms help find the shortest path or evaluate the availability and quality of routes.
The algorithm we are discussing here finds an unweighted shortest path from one source vertex to each possible destination vertex in the graph. That is, it finds n paths.
If you just want to know the shortest path between two particular vertices, S and T in a graph with unweighted edges, we have described that query in detail in our tutorial document GSQL Demo Examples.
If your graph has weighted edges, see the next algorithm.
If a graph has unweighted edges, then finding the shortest path from one vertex to another is the same as finding the path with the fewest hops. Think of Six Degrees of Separation and Friend of a Friend. Unweighted Shortest Path answers the question "How are you two related?" The two entities do not have to be persons. Shortest Path is useful in a host of applications, from estimating influences or knowledge transfer, to criminal investigation.
When the graph is unweighted, we can use a "greedy" approach to find the shortest path. In computer science, a greedy algorithm makes intermediate choices based on the data being considered at the moment, and then does not revisit those choices later on. In this case, once the algorithm finds any path to a vertex T, it is certain that that is a shortest path.
shortest_ss(VERTEX v, INT maxDepth, BOOL display)shortest_ss_file(VERTEX v, INT maxDepth, BOOL display,STRING filepath)shortest_ss_attr(VERTEX v, INT maxDepth, BOOL display)
Characteristic  Value 
Result  Computes a shortest distance (INT) and shortest path (STRING) from vertex v to each other vertex T. The result is available in 3 forms:

Input Parameters 

Result Size  V = number of vertices 
Computational Complexity  O(E), E = number of edges 
Graph Types  Directed or Undirected edges, Unweighted edges 
Coming Soon
Finding shortest paths in a graph with weighted edges is algorithmically harder than in an unweighted graph because just because you find a path to a vertex T, you cannot be certain that it is a shortest path. If edge weights are always positive, then you must keep trying until you have considered every inedge to T. If edge weights can be negative, then it's even harder. You must consider all possible paths.
A classic application for weighted shortest path is finding the shortest route to get from A to B. (Think of route planning "GPS" apps.) In general, any application where you are looking for the cheapest route is a possible fit.
shortest_path_any_wt(VERTEX v, INT maxDepth, BOOL display)shortest_path_any_wt_file(VERTEX v, INT maxDepth, BOOL display, STRING filepath)shortest_path_any_wt_attr(VERTEX v, INT maxDepth, BOOL display)
Characteristic  Value 
Result  Computes a shortest distance (INT) and shortest path (STRING) from vertex v to each other vertex T. The result is available in 3 forms:

Input Parameters 

Result Size  V = number of vertices 
Computational Complexity  O(V*E), V = number of vertices, E = number of edges 
Graph Types  Directed or Undirected edges, Weighted edges 
The shortest_path_wt library query is an implementation of the BellmanFord algorithm. If there is more than one path with the same total weight, the algorithm returns one of them.
The graph below has both positive and negative edge weights. Using vertex A has the source vertex, the algorithm discovers that the shortest weighted path from A to E is ADCBE, with a cumulative score of 7  3  2  4 = 2.
The SinglePair Shortest Path task seeks the shortest path between a source vertex S and a target vertex T. If the edges are unweighted, then use the query in our tutorial document GSQL Demo Examples.
If the edges are weighted, then use the SingleSource Shortest Path algorithm. In the worst case, it takes the same computational effort to find the shortest path for one pair as to find the shortest paths for all pairs from the same source S. The reason is that you cannot know whether you have found the shortest (least weight) path until you have explored the full graph. If the weights are always positive, however, then a more efficient algorithm is possible. You can stop searching when you have found paths that use each of the inedges to T.
The AllPairs Shortest Path algorithm is costly for large graphs, because the computation time is O(V^3) and the output size is O(V^2). Be cautious about running this on very large graphs.
The AllPairs Shortest Path (APSP) task seeks to find shortest paths between every pair of vertices in the entire graph. In principle, this task can be handled by running the SingleSource Shortest Path (SSSP) algorithm for each input vertex, e.g.,
CREATE QUERY all_pairs_shortest(INT maxDepth, BOOL display, STRING fileBase){Start = {Node.*};Result = SELECT s FROM Start:sPOSTACCUMshortest_ss_any_wt_file(s, maxDepth, display, fileBase+s);}
This example highlights one of the strengths of GSQL: treating queries as stored procedures which can be called from within other queries.
For large graphs (with millions of vertices or more), however, this is an enormous task. While the massively parallel processing of the TigerGraph platform can speed up the computation by 10x or 100x, consider what it takes just to store or report the results. If there are 1 million vertices, then there are nearly 1 trillion output values.
There are more efficient methods than calling the singlesource shortest path algorithm n times, such as the FloydWarshall algorithm, which computes APSP in O(V^3) time.
Our recommendation:
If you have a smaller graph (perhaps thousands or tens of thousands of vertices), the APSP task may be tractable.
If you have a large graph, avoid using APSP.
Centrality algorithms determine the importance of each vertex within a network. Typical applications:
PageRank is designed for directed edges. The classic interpretation is to find the most "important" web pages, based on hyperlink referrals, but it can be used for another network where entities make positive referrals of one another.
Closeness Centrality and Betweenness Centrality both deal with the idea of "centrally located."
The PageRank algorithm measures the influence of each vertex on every other vertex. PageRank influence is defined recursively: a vertex's influence is based on the influence of the vertices which refer to it. A vertex's influence tends to increase if (1) it has more referring vertices or if (2) its referring vertices have higher influence. The analogy to social influence is clear.
A common way of interpreting PageRank value is through the Random Network Surfer model. A vertex's pageRank score is proportional to the probability that a random network surfer will be at that vertex at any given time. A vertex with a high pageRank score is a vertex that is frequently visited, assuming that vertices are visited according to the following Random Surfer scheme:
Assume a person travels or surfs across a network's structure, moving from vertex to vertex in a long series of rounds.
The surfer can start anywhere. This startanywhere property is part of the magic of PageRank, meaning the score is a truly fundamental property of the graph structure itself.
Each round, the surfer randomly picks one of the outward connections from the surfer's current location. The surfer repeats this random walk for a long time.
But wait. The surfer doesn't always follow the network's connection structure. There is a probability (1damping, to be precise), that the surfer will ignore the structure and will magically teleport to a random vertex.
pageRank(FLOAT maxChange, INT maxIter, FLOAT damping, BOOL display, INT maxOutput)pageRank_file(FLOAT maxChange, INT maxIter, FLOAT damping, BOOL display, STRING filepath)pageRank_attr(FLOAT maxChange, INT maxIter, FLOAT damping, BOOL display)
Characteristic  Value 
Result  Computes a PageRank value (FLOAT type) for each vertex. The result is available in 3 forms:

Input Parameters 

Result Size  V = number of vertices 
Computational Complexity  O(E*k), E = number of edges, k = number of iterations. The number of iterations is datadependent, but the user can set a maximum. Parallel processing reduces the time needed for computation. 
Graph Types  Directed edges 
We ran pageRank on our test10 graph (using Friend edges) with the following parameter values: damping=0.85, maxChange=0.001, and maxIter=25. We see that Ivy (center bottom) has the highest pageRank score (1.12). This makes sense, since there are 3 neighboring persons who point to Ivy, more than for any other person. Eddie and Justin have scores have exactly 1, because they do not have any outedges. This is an artifact of our particular version pageRank. Likewise, Alex has a score of 0.15, which is (1damping), because Alex has no inedges.
We all have an intuitive understanding when we say a home, an office, or a store is "centrally located." Closeness Centrality provides a precise measure of how "centrally located" is a vertex. The steps below show the steps for one vertex v.
Description of Steps  Mathematical Formulation 
1. Compute the average distance from vertex v to every other vertex:  $d_{avg}(v) = \sum_{u \ne v} dist(v,u)/(n1)$ 
2. Invert the average distance, so we have average closeness of v:  $CC(v) = 1/d_{avg}(v)$ 
These steps are repeated for every vertex in the graph.
closeness_cent(BOOL display, INT maxOutput)closeness_cent_file(BOOL display, STRING filepath)closeness_cent_attr(BOOL display)
Parameters
Characteristic  Value 
Result  Computes a Closeness Centrality value (FLOAT type) for each vertex. The result is available in 3 forms:

Required Input Parameters 

Result Size  V = number of vertices 
Computational Complexity  O(E*k), E = number of edges, k = number of iterations. The number of iterations is datadependent, but the user can set a maximum. Parallel processing reduces the time needed for computation. 
Graph Types  Directed or Undirected edges, Unweighted edges 
Closeness centrality can be measured for either directed edges (from v to others) or for undirected edges. Directed graphs may seem less intuitive, however. because if the distance from Alex to Bob is 1, it does not mean the distance from Bob to Alex is also 1.
For our example, we wanted to use the topology of the Likes graph, but to have undirected edges. We emulated an undirected graph by using both Friend and Also_Friend (reverse direction) edges.
These algorithms evaluate how a group is clustered or partitioned, as well as its tendency to strengthen or break apart.
A component is the maximal set of vertices, plus their connecting edges, which are interconnected. That is, you can reach each vertex from each other vertex. In the example figure below, there are three components.
This particular algorithm deals with undirected edges. If the same definition (each vertex can reach each other vertex) is applied to directed edges, then the components are called Strongly Connected Components. If you have directed edges but ignore the direction (permitting traversal in either direction), then the algorithm finds Weakly Connected Components.
conn_comp()conn_comp_file(STRING filepath)conn_comp_attr()
Characteristic  Value 
Result  Assigns a component id (INT) to each vertex, such that members of the same component have the same id value. The result is available in three forms:

Input Parameters 

Result Size  V = number of vertices 
Computational Complexity  O(E*d), E = number of edges, d = max(diameter of components) 
Graph Types  Undirected edges 
It is easy to see in this small graph that the algorithm correctly groups the vertices:
Alex, Bob and Justin all have Community ID = 2097152
Chase, Damon, and Eddie all have Community ID = 5242880
Fiona, George, Howard, and Ivy all have Community ID = 0
Our algorithm uses the TigerGraph engine's internal vertex ID numbers; they cannot be predicted.
Label Propagation is a heuristic method for determining communities. The idea is simple: If the plurality of your neighbors all bear the label X, then you should label yourself as also a member of X. The algorithm begins with each vertex having its own unique label. Then we iteratively update labels based on the neighbor influence described above. It is important that they the order for updating the vertices be random. The algorithm is favored for its efficiency and simplicity, but it is not guaranteed to produce the same results every time.
In a variant version, some vertices could initially be known to belong to the same community,. If they are wellconnected to one another, they are likely to preserve their common membership and influence their neighbors,
label_prop(INT maxIter)label_prop_file(INT maxIter, FILE filepath)label_prop_attr(INT maxIter)
Characteristic  Value 
Result  Assigns a component id (INT) to each vertex, such that members of the same component have the same id value. The result is available in three forms:

Input Parameters 

Result Size  V = number of vertices 
Computational Complexity  O(E*k), E = number of edges, k = number of iterations. 
Graph Types  Undirected edges 
This is the same graph that was used in the Connected Component example. The results are different, though. The quartet of Fiona, George, Howard, and Ivy have been split into 2 groups. See can see the symmetry:
(George & Ivy) each connect to (Fiona & Howard) and to one another.
(Fiona & Howard) each connect to (George & Ivy) but not to one another.
Label Propagation tries to find natural clusters and separations within connected components. That is, it looks at the quality and pattern of connections. The Component Component algorithm simply asks the Yes or No question: Are these two vertices connected?
We set maxIter to 10, but the algorithm reached steady state after 3 iterations.
The modularity score for a partitioned graph assesses the difference in density of links within a partition vs. the density of links crossing from one partition to another. The assumption is that a good partitioning (that is, dividing up the graph into communities or clusters), then the withindensity should be high and the interdensity should be low.
Also, we use changes in modularity to guide optimization of the partitioning. That is, we begin with a candidate partitioning and measure its modularity. Then we make an incremental change and confirm that the modularity has improved.
The most most efficient and empirically effective method for calculating modularity was published by a team of researchers at the University of Louvain. The Louvain method uses agglomeration and hierarchical optimization:
Optimize modularity for small local communities.
Treat each optimized local group as one unit, and repeat the modularity operation for groups of these condensed units.
louvain()louvain_file(FILE filepath)louvain_attr()
Characteristic  Value 
Result  Assigns a component id (INT) to each vertex, such that members of the same component have the same id value. The result is available in three forms:

Input Parameters 

Result Size  V = number of vertices 
Computational Complexity  O(V log V), V = number of vertices 
Graph Types  Undirected edges 
The results are the same as those from the Label Propagation example. This is not surprising, as they have the same highlevel goal: to find the natural communities in a graph. A larger and more complex graph would likely show some differences.
Why triangles? Think of it in terms of a social network:
If A knows B, and A also knows C, then we complete the triangle if B knows C. If this situation is common, it indicates a community with a lot of interaction.
The triangle is in fact the smallest multiedge "complete subgraph," where every vertex connects to every other vertex.
Triangle count (or density) is a measure of community and connectedness. In particular, it addresses the question of transitive relationships: If A> B and B>C, then what is the likelihood of A> C?
Note that it is computing a single number: How many triangles are in this graph? It is not finding communities within a graph.
It is not common to count triangles in directed graphs, though it is certainly possible. If you choose to do so, you need to be very specific about the direction of interest: In a directed graph, If A> B and B> C, then
if A>C, we have a "shortcut".
if C>A, then we have a feedback loop.
We present two different algorithms for counting triangles. The first, tri_count(), is the classic edgeiterator algorithm. For each edge and its two endpoint vertices S and T, count the overlap between S's neighbors and T's neighbors.
tri_count()tri_count_file(FILE filepath)tri_count_attr()
One side effect of the simple edgeiterator algorithm is that it ends up considering each of the three sides of a triangle. The count needs to be divided by 3, meaning we did 3 times more work than a smaller algorithm would have.
tri_count_fast() is a smarter algorithm which does two passes over the edges. In the first pass we mark which of the two endpoint vertices has fewer neighbors. In the second pass, we count the overlap only between marked vertices. The result is that we eliminate 1/3 of the neighborhood matching, the slowest 1/3, but at the cost of some additional memory.
tri_count_fast()tri_count_fast_file(FILE filepath)tri_count_fast_attr()
Characteristic  Value 
Result  Returns the number of triangles in the graph. 
Input Parameters  None 
Result Size  1 integer 
Computational Complexity  O(V * E), V = number of vertices, E = number of edges 
Graph Types  Undirected edges 
In the social10 graph with Coworker edges, there are clearly 4 triangles.
{"error": false,"message": "","version": {"schema": 0,"api": "v2"},"results": [{"num_triangles": 4}]}