Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Breadth-first Search (BFS) is an algorithm used to explore the vertexes of a graph layer by layer. It starts at the given vertex and explores all vertices at the present depth prior to moving on to the vertices at the next depth level.
In the example below, we run tg_bfs
algorithm from the source vertex alex
on the social10 graph.
Below is the visualized result of the query:
Characteristic
Value
Result
Returns all the nodes that are accessible from the source vertex
Required Input Parameters
v_type
: vertex types to traverse
e_type
: edge types to traverse
max_hops
: look only this far from each vertex
v_start
: source vertex for traverse
print_accum
: print JSON output
result_attr
: INT attr to store results to
file_path
: file to write CSV output to
display_edges
:output edges for visualization
Result Size
V = number of vertices
Time Complexity
O(E+V), E = number of edges, V = number of vertices.since every vertex and every edge will be explored in the worst case.
Graph Types
Directed or Undirected edges, Weighted or Unweighted edges
Finding shortest paths in a graph with weighted edges is algorithmically harder than in an unweighted graph because even after you find a path to a vertex T, you cannot be certain that it is a shortest path. If edge weights are always positive, then you must keep trying until you have considered every in-edge to T. If edge weights can be negative, then it's even harder. You must consider all possible paths.
A classic application for weighted shortest path is finding the shortest travel route to get from A to B. (Think of route planning "GPS" apps.) In general, any application where you are looking for the cheapest route is a possible fit.
The shortest path algorithm can be optimized if we know all the weights are nonnegative. If there can be negative weights, then sometimes a longer path will have a lower cumulative weight. Therefore, we have two versions of this algorithm
The shortest_path_any_wt query is an implementation of the Bellman-Ford algorithm. If there is more than one path with the same total weight, the algorithm returns one of them.
Currently, shortest_path_pos_wt also uses Bellman-Ford. The well-known Dijsktra's algorithm is designed for serial computation and cannot work with GSQL's parallel processing.
The graph below has only positive edge weights. Using vertex A as the source vertex, the algorithm discovers that the shortest weighted path from A to B is A-D-B, with distance 8. The shortest weighted path from A to C is A-D-B-C with distance 9.
The graph below has both positive and negative edge weights. Using vertex A as the source vertex, the algorithm discovers that the shortest weighted path from A to E is A-D-C-B-E, with a cumulative score of 7 - 3 - 2 - 4 = -2.
A* (pronounced "A-star") is a graph traversal and path search algorithm, which achieves better performance by using heuristics to guide its search.
The algorithm terminates when the path it chooses to extend is a path from start to goal or if there are no paths eligible to be extended. The heuristic function is problem-specific. If the heuristic function is admissible, meaning that it never overestimates the actual cost to get to the target, the algorithm is guaranteed to return a least-cost path from source to target.
Below is the visualized result of the query:
The algorithm starts from a source node, and at each iteration of its main loop, it selects the path that minimizes where n is the next node on the path, g(n) is the cost of the path from the source node to n, and h(n) is a heuristic function that estimates the cost of the cheapest path from n to the target node.
In the example below, we run the tg_astar
algorithm to find the shortest path between the source vertex "Kings Cross" and the target vertex "Kentish Town" on a graph which is a transport network of stations. Each station has its geometric attibutes(i.e.latitude and longitude) and the edge weight represents the distance between stations in kilometers. The heuristic function used to guide the search is the , which computes the distance between two points on a sphere given their longitudes and latitudes
Characteristic
Value
Result
Computes a shortest distance (INT) and shortest path (STRING) from vertex source to each other vertex.
Input Parameters
VERTEX source
: Id of the source vertex
SET<STRING> v_type
: Names of vertex types to use
SET<STRING> e_type
: Names of edge types to use
STRING wt_attr
: Name of edge weight attribute
STRING wt_type
: Data type of edge weight attribute: "INT", "FLOAT", or "DOUBLE"
INT output_limit
: If >=0, max number of vertices to output to JSON.
BOOL print_accum
: If True, output JSON to standard output
STRING result_attr
: If not empty, store distance values (INT) to this attribute
STRING file_path
: If not empty, write output to this file.
BOOL display_edges
: If true, include the graph's edges in the JSON output, so that the full graph can be displayed.
Result Size
V = number of vertices
Time Complexity
O(V*E), V = number of vertices, E = number of edges
Graph Types
Directed or Undirected edges, Weighted edges
Characteristic | Value |
Result | Computes a shortest distance (INT) and shortest path (STRING) from vertex source to target vertex. |
Required Input Parameters |
|
Result Size | 1 |
Time Complexity | O(V^2), V = number of vertices. |
Graph Types | Directed or Undirected edges, Weighted edges |
Random Walk is an algorithm that generate random paths in a graph. A random walk starts at every vertex that has an outgoing edge, and moves to another vertex at random. The random walk algorithm will print all possible paths of the specified size from the walks that were performed onto a file in the CSV format.
Suppose we have the following social graph:
If we run the random walk algorithm on the graph and supply a path size of 3 and a step of 2, and a sample number of 1. Then from each vertex there is a two-step random walk, and a total of 6 three-vertex paths:
We can also perform a 3 step walk and still ask for three-vertex paths, in which case the number of paths would double, because in each 3 step walk, there are two possible three-vertex paths:
Name
Description
Data type
step
The number of hops performed in each random walk.
INT
path_size
The length of the paths to output. The length refers to the number of vertices in a path. For example, if a walk has two steps: A -> B -> C, then there are paths of both lengths 2 and lengths 3 that can be output from this walk. If a path size of 2 is supplied, then the algorithm outputs two paths: A -> B and B -> C. If the path size is 3, then there is one path: A -> B -> C.
INT
filepath
The filepath to output the paths to.
STRING
edge_types
The edge types that the random walk will traverse.
SET<STRING>
sample_num
At each possible step, the number of sample walks to perform.
INT
If a graph has unweighted edges, then finding the shortest path from one vertex to another is the same as finding the path with the fewest hops. Think of Six Degrees of Separation and Friend of a Friend. Unweighted Shortest Path answers the question "How are you two related?" The two entities do not have to be persons. Shortest Path is useful in a host of applications, from estimating influences or knowledge transfer, to criminal investigation.
When the graph is unweighted, we can use a "greedy" approach to find the shortest path. In computer science, a greedy algorithm makes intermediate choices based on the data being considered at the moment, and then does not revisit those choices later on. In this case, once the algorithm finds any path to a vertex T, it is certain that that is a shortest path.
This algorithm finds an unweighted shortest path from one source vertex to each possible destination vertex in the graph. That is, it finds n paths.
If your graph has weighted edges, see the next algorithm. With weighted edges, it is necessary to search the whole graph, whether you want the path for just one destination or for all destinations.
In the below graph, we do not consider the weight on edges. Using vertex A as the source vertex, the algorithm discovers that the shortest path from A to B is A-B, and the shortest path from A to C is A-D-C, etc.
Characteristic
Value
Result
Computes a shortest distance (INT) and shortest path (STRING) from vertex source to each other vertex.
Input Parameters
VERTEX source
: ID of the source vertex
SET<STRING> v_type
: Names of vertex types to use
SET<STRING> e_type
: Names of edge types to use
INT output_limit
: If >=0, max number of vertices to output to JSON.
BOOL print_accum
: If True, output JSON to standard output
STRING result_attr
: If not empty, store distance values (INT) to this attribute
STRING file_path
: If not empty, write output to this file.
BOOL display_edges
: If true, include the graph's edges in the JSON output,
so that the full graph can be displayed.
Result Size
V = number of vertices
Time Complexity
O(E), E = number of edges
Graph Types
Directed or Undirected edges, Unweighted edges
The All-Pairs Shortest Path algorithm is costly for large graphs because the computation time is O(V^3) and the output size is O(V^2). Be cautious about running this on very large graphs.
The All-Pairs Shortest Path (APSP) task seeks to find the shortest paths between every pair of vertices in the entire graph. In principle, this task can be handled by running the Single-Source Shortest Path (SSSP) algorithm for each input vertex, e.g.,
This example highlights one of the strengths of GSQL: treating queries as stored procedures that can be called from within other queries. We only show the result_attr and file_path options, because subqueries cannot send their JSON output.
For large graphs (with millions of vertices or more), however, this is an enormous task. While the massively parallel processing of the TigerGraph platform can speed up the computation by 10x or 100x, consider what it takes just to store or report the results. If there are 1 million vertices, then there are nearly 1 trillion output values.
There are more efficient methods than calling the single-source shortest path algorithm n times, such as the Floyd-Warshall algorithm, which computes APSP in O(V^3) time.
Our recommendation:
If you have a smaller graph (perhaps thousands or tens of thousands of vertices), the APSP task may be tractable.
If you have a large graph, avoid using APSP.
The Single-Pair Shortest Path task seeks the shortest path between a source vertex S and a target vertex T. If the edges are unweighted, then use the query in our tutorial document GSQL Demo Examples.
If the edges are weighted, then use the Single-Source Shortest Path algorithm. In the worst case, it takes the same computational effort to find the shortest path for one pair as to find the shortest paths for all pairs from the same source S. The reason is that you cannot know whether you have found the shortest (least weight) path until you have explored the full graph. If the weights are always positive, however, then a more efficient algorithm is possible. You can stop searching when you have found paths that use each of the in-edges to T.
Given an undirected graph with one or more connected components, a minimum spanning forest is a set of minimum spanning trees, one for each component. The library implements the algorithm in section 6.2 of Qin et al. 2014: http://www-std1.se.cuhk.edu.hk/~hcheng/paper/SIGMOD2014qin.pdf.
Refer to the example for the MST algorithm. This graph has 3 components. MSF will find an MST for each of the three components.
An independent set of vertices does not contain any pair of vertices that are neighbors, i.e., ones which have an edge between them. A maximal independent set is the largest independent set that contains those vertices; you cannot improve upon it unless you start over with a different independent set. However, the search for the largest possible independent set (the maximum independent set, as opposed to the maximal independent set) is an NP-hard problem: there is no known algorithm that can find that answer in polynomial time. So we settle for the maximal independent set.
This algorithm finds use in applications wanting to find the most efficient configuration which "covers" all the necessary cases. For example, it has been used to optimize delivery or transit routes, where each vertex is one transit segment and each edge connects two segments that can NOT be covered by the same vehicle.
Consider our social10 graph, with three components.
It is clear that for each of the two triangles -- (Alex, Bob, Justin) and (Chase, Damon, Eddie) -- we can select one vertex from each triangle to be part of the MIS. For the 4-vertex component (Fiona, George, Howard, Ivy), it is less clear what will happen. If the algorithm selects either George or Ivy, then no other independent vertices remain in the component. However, the algorithm could select both Fiona and Howard; they are independent of one another.
This demonstrates the uncertainty of the Maximal Independent Set algorithm and how it differs from Maximum Independent Set. A maximum independent set algorithm would always select Fiona and Howard, plus 2 others, for a total of 4 vertices. The maximal independent set algorithm relies on chance. It could return either 3 or 4 vertices.
Given an undirected and connected graph, a minimum spanning tree is a set of edges that can connect all the vertices in the graph with the minimal sum of edge weights. The library implements a parallel version of Prim's algorithm:
Start with a set A = { a single vertex seed }
For all vertices in A, select a vertex y such that
y is not in A, and
There is an edge from y to a vertex x in A, and
The weight of the edge e(x,y) is the smallest among all eligible pairs (x,y).
Add y to A, and add the edge (x,y) to MST.
Repeat steps 2 and 3 until A has all vertices in the graph.
If the user specifies a source vertex, this will be used as the seed. Otherwise, the algorithm will select a random seed vertex.
If the graph contains multiple components (i.e., some vertices are disconnected from the rest of the graph, then the algorithm will span only the component of the seed vertex.
If you do not have a preferred vertex, and the graph might have more than one component, then you should use the Minimum Spanning Forest (MDF) algorithm instead.
In the graph social10
, we consider only the undirected Coworker edges.
This graph has 3 components. Minimum Spanning Tree finds a tree for one component, so which component it will work on depends on what vertex we give as the starting point. If we select Fiona, George, Howard, or Ivy as the start vertex, then it works on the 4-vertex component on the left. You can start from any vertex in the component and get the same or an equivalent MST result.
The figure below shows the result of
Note that the value for the one vertex is ("Ivy", "Person")
. In GSQL, this 2-tuple format which explicitly gives the vertex type is used when the query is written to accept a vertex of any type.
File output:
The attribute version requires a boolean attribute on the edge, and it will assign the attribute to "true" if that edge is selected in the MST:
The diameter of a graph is the worst-case length of a shortest path between any pair of vertices in a graph. It is the farthest distance to travel, to get from one vertex to another, if you always take the shortest path. Finding the diameter requires calculating (the lengths of) all shortest paths, which can be quite slow.
This algorithm uses a simple heuristic to estimate the diameter. rather than calculating the distance from each vertex to every other vertex, it selects K vertices randomly, where K is a user-provided parameter. It calculates the distances from each of these K vertices to all other vertices. So, instead of calculating V*(V-1) distances, this algorithm only calculates K*(V-1) distances. The higher the value of K, the greater the likelihood of hitting the actual longest shortest path.
The current version only computes unweighted distances.
This algorithm query employs a subquery called max_BFS_depth. Both queries are needed to run the algorithm.
The Cycle Detection problem seeks to find all the cycles (loops) in a graph. We apply the usual restriction that the cycles must be "simple cycles", that is, they are paths that start and end at the same vertex but otherwise never visit any vertex twice.
There are two versions of the task: for directed graphs and undirected graphs. The GSQL algorithm library currently supports only directed cycle detection. The Rocha–Thatte algorithm is an efficient distributed algorithm, which detects all the cycles in a directed graph. The algorithm will self-terminate, but it is also possible to stop at k iterations, which finds all the cycles having lengths up to k edges.
The basic idea of the algorithm is to (potentially) traverse every edge in parallel, again and again, forming all possible paths. At each step, if a path forms a cycle, it records it and stops extending it. More specifically:
Initialization For each vertex, record one path consisting of its own id. Mark the vertex as Active.
Iteration steps: For each Active vertex v:
Send its list of paths to each of its out-neighbors.
Inspect each path P in the list of the paths received:
If the first id in P is also id(v), a cycle has been found:
Remove P from its list.
If id(v) is the least id of any id in P, then add P to the Cycle List. (The purpose is to count each cycle only once.)
Else, if id(v) is somewhere else in the path, then remove P from the path list (because this cycle must have been counted already).
Else, append id(v) to the end of each of the remaining paths in its list.
In the social10 graph, there are 5 cycles, all with the Fiona-George-Howard-Ivy cluster.
Characteristic
Value
Result
Computes a minimum spanning forest. If the JSON or file output selected, the output is the set of edges that form the MSF. If the result_attr option is selected, the edges which are part of the MSF are tagged True; other edges are tagged False.
Input Parameters
SET<STRING> v_type
: Names of vertex types to use
SET<STRING> e_type
: Names of edge types to use
STRING wt_attr
: Name of edge weight attribute
STRING wt_type
: Data type of edge weight attribute: "INT", "FLOAT", or "DOUBLE"
BOOL print_accum
: If True, output JSON to standard output
STRING result_attr
: If not empty, store result values (BOOL) to this attribute
STRING file_path
: If not empty, write output to this file.
Result Size
V - c,
V = number of vertices, c = number of components
Time Complexity
O((V+E) * logV)
Graph Types
Undirected edges
Characteristic
Value
Result
A set of vertices that form a maximal independent set.
Input Parameters
STRING v_type
: Name of vertex type to use
STRING e_type
: Name of edge type to use
INT max_iter
: maximum number of iterations for the search
BOOL print_accum
: If True, output JSON to standard output
STRING file_path
: If not empty, write output to this file.
Result Size
Size of the MIS: unknown. Worst case: If the graph is a set of N unconnected vertices, then the MIS is all N vertices.
Time Complexity
O(E), E = number of edges
Graph Types
Undirected edges
Characteristic
Value
Result
Computes a minimum spanning tree. If the JSON or file output selected, the output is the set of edges that form the MST. If the result_attr option is selected, the edges which are part of the MST are tagged True; other edges are tagged False.
Input Parameters
VERTEX opt_source
: ID of a source vertex (optional)
SET<STRING> v_type
: Names of vertex types to use
SET<STRING> e_type
: Names of edge types to use
STRING wt_attr
: Name of edge weight attribute
STRING wt_type
: Data type of edge weight attribute: "INT", "FLOAT", or "DOUBLE"
INT max_iter
: Maximum of edges to include. If less than (V-1), then the result is not a true spanning tree.
BOOL print_accum
: If True, output JSON to standard output
STRING result_attr
: If not empty, store result values (BOOL) to this attribute
STRING file_path
: If not empty, write output to this file.
Result Size
V - 1 = number of vertices - 1
Time Complexity
O(V^2)
Graph Types
Undirected edges and connected
Characteristic
Value
Result
Returns the estimated value for the diameter of the graph
Input Parameters
SET<STRING> v_type
: Names of vertex types to use
SET<STRING> e_type
: Names of edge types to use
INT seed_set_length
: The number (K) of random seed vertices to use
BOOL print_accum
: If True, output JSON to standard output
STRING file_path
: If not empty, write output to this file.
Result Size
one integer
Time Complexity
O(k*E), E = number of edges, k = number of seed vertices
Graph Types
Directed
Characteristic
Value
Result
Computes a list of vertex id lists, each of which is a cycle. The result is available in 2 forms:
streamed out in JSON format
written to a file in tabular format
Input Parameters
SET<STRING> v_type
: Names of vertex types to use
SET<STRING> e_type
: Names of edge types to use
INT depth
: Maximum cycle length to search for = maximum number of iterations
BOOL print_accum
: If True, output JSON to standard output
STRING file_path
: If not empty, write output to this file.
Result Size
Number of cycles * average cycle length
Both of these measures are not known in advance.
Time Complexity
O(E *k), E = number of edges.
k = min(max. cycle length, depth parameter)
Graph Types
Directed