In addition to the regular weakly connected component algorithm, we also provide a version that is optimized for small-world graphs. A small world graph means the graph has a hub community, where the vast majority of the vertices in the graph are weakly connected.
This version improves upon the performance of the original algorithm when dealing with small-world graphs by combining several different methods used to find connected components in a multi-step process proposed by Slota et al. in BFS and Coloring-based Parallel Algorithms for Strongly Connected Components and Related Problems.
The algorithm starts by selecting an initial pivot vertex v with a high product of indegree and outdegree. From the initial pivot vertex , the algorithm uses Breadth-First Search to determine the massive weakly connected component. The vertices that are not included in this SCC are passed off to the next step.
After identifying the first WCC, the algorithm uses the coloring method to idenify the WCCs in the remaining vertices.
For more details, see Slota et al., BFS and Coloring-based Parallel Algorithms for Strongly Connected Components and Related Problems.
If to_show_cc_count
is set to true, the algorithm will return the number of vertices in each weakly connected component.
Suppose we have the following graph. We can see that there are three connected components. The first one has 5 vertices, while the two others have 3 vertices.
Running the algorithm on the graph will show that there are three weakly connected components, and have 5, 3, and 3 vertices respectively.
Name
Description
Data type
v_type
The vertex type to count as part of a connected component
STRING
e_type
The edge type to traverse
STRING
threshold
The threshold used to choose initial pivot vertices. Only vertices whose product of indegree and outdegree exceed this threshold will be considered candidates for the pivot vertex. This is an attempt to increase the chances that the initial pivot is contained within the largest WCC.
The default value for this parameter is 100000. It is suggested that you keep this default value when running the algorithm.
UINT
to_show_cc_count
If true, the algorithm will return the number of vertices in each connected component.
BOOL
A component is the maximal set of vertices, plus their connecting edges, which are interconnected. That is, you can reach each vertex from each other vertex. In the example figure below, there are three components.
This particular algorithm deals with undirected edges. If the same definition (each vertex can reach each other vertex) is applied to directed edges, then the components are called Strongly Connected Components. If you have directed edges but ignore the direction (permitting traversal in either direction), then the algorithm finds Weakly Connected Components.
It is easy to see in this small graph that the algorithm correctly groups the vertices:
Alex, Bob, and Justin all have Community ID = 2097152
Chase, Damon, and Eddie all have Community ID = 5242880
Fiona, George, Howard, and Ivy all have Community ID = 0
Our algorithm uses the TigerGraph engine's internal vertex ID numbers; they cannot be predicted.
Characteristic
Value
Result
Assigns a component id (INT) to each vertex, such that members of the same component have the same id value.
Input Parameters
SET<STRING> v_type
: Names of vertex types to use
SET<STRING> e_type
: Names of edge types to use
INT output_limit
: If >=0, max number of vertices to output to JSON.
BOOL print_accum
: If True, output JSON to standard output
STRING result_attr
: If not empty, store community ID values (INT) to this attribute
STRING file_path
: If not empty, write output to this file.
Result Size
V = number of vertices
Time Complexity
O(E*d), E = number of edges, d = max(diameter of components)
Graph Types
Undirected edges