Strongly Connected Components
A strongly connected component (SCC) is a subgraph such that there is a path from any vertex to every other vertex. A graph can contain more than one separate SCC. An SCC algorithm finds the maximal SCCs within a graph.
Our implementation is based on the DivideandConquer Strong Components (DCSC) algorithm[1]. In each iteration, pick a pivot vertex v
randomly, and find its descendant and predecessor sets, where descendant set D_v
is the vertex reachable from v
, and predecessor set P_v
is the vertices which can reach v
(stated another way, reachable from v
through reverse edges). The intersection of these two sets is a strongly connected component SCC_v
.
The graph can be partitioned into 4 sets: SCC_v
, descendants D_v
excluding SCC_v
, predecessors P_v
excluding SCC
, and the remainders R_v
. It is proved that any SCC is a subset of one of the 4 sets [1]. Thus, we can divide the graph into different subsets and detect the SCCs independently and iteratively.
The problem of this algorithm is unbalanced load and slow convergence when there are a lot of small SCCs, which is often the case in realworld use cases [3]. We added two trimming stages to improve the performance: size1 SCC trimming[2] and weakly connected components[3].
The implementation of this algorithm requires reverse edges for all directed edges considered in the graph.
[1] Fleischer, Lisa K., Bruce Hendrickson, and Ali Pınar. "On identifying strongly connected components in parallel." International Parallel and Distributed Processing Symposium. Springer, Berlin, Heidelberg, 2000.
[2] Mclendon Iii, William, et al. "Finding strongly connected components in distributed graphs." Journal of Parallel and Distributed Computing 65.8 (2005): 901910.
[3] Hong, Sungpack, Nicole C. Rodia, and Kunle Olukotun. "On fast parallel detection of strongly connected components (SCC) in smallworld graphs." Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, 2013.
Specifications
tg_scc (SET<STRING> v_type, SET<STRING> e_type, SET<STRING> rev_e_type,
INT top_k_dist, INT output_limit, INT max_iter = 500, INT iter_wcc = 5,
BOOL print_accum = TRUE, STRING attr= "", STRING file_path="")
Time complexity
This algorithm has a time complexity of \$O(k * d)\$, where \$k\$ is equal to the number of iterations and \$d\$ is equal to the maximum component diameter.
Characteristic  Value 

Result 
Assigns a component id (INT) to each vertex, such that members of the same component have the same id value. 
Input Parameters 

Result Size 
V = number of vertices 
Graph Types 
Directed edges with reverse direction edges as well 
Example
We ran scc
on the social26 graph. A portion of the JSON result is shown below.
[
{
"i": 1
},
{
"trim_set.size()": 8
},
{
"trim_set.size()": 5
},
{
"trim_set.size()": 2
},
{
"trim_set.size()": 2
},
{
"trim_set.size()": 0
},
{
"@@cluster_dist_heap": [
{
"csize": 9,
"num": 1
},
{
"csize": 1,
"num": 17
}
]
},
The first element "i"=1
means the whole graph is processed in just one iteration. The 5 "trim_set.size()"
elements mean there were 5 rounds of size1 SCC trimming. The final "@@.cluster_dist_heap" object"
reports on the size distribution of SCCs.
There is one SCC with 9 vertices, and 17 SCCs with only 1 vertex in the graph.