Cycle Detection

Algorithm link: Cycle Detection

The Cycle Detection problem seeks to find all the cycles (loops) in a graph. We apply the usual restriction that the cycles must be "simple cycles", that is, they are paths that start and end at the same vertex but otherwise never visit any vertex twice.

There are two versions of the task: for directed graphs and undirected graphs. The GSQL algorithm library currently supports only directed cycle detection. The Rocha—​Thatte algorithm is an efficient distributed algorithm, which detects all the cycles in a directed graph. The algorithm will self-terminate, but it is also possible to stop at k iterations, which finds all the cycles having lengths up to k edges.

The basic idea of the algorithm is to (potentially) traverse every edge in parallel, again and again, forming all possible paths. At each step, if a path forms a cycle, it records it and stops extending it. More specifically:

Initialization
For each vertex, record one path consisting of its own id. Mark the vertex as Active.

Iteration steps:
For each Active vertex v:

  1. Send its list of paths to each of its out-neighbors.

  2. Inspect each path P in the list of the paths received:

    • If the first id in P is also id(v), a cycle has been found:

      • Remove P from its list.

      • If id(v) is the least id of any id in P, then add P to the Cycle List. (The purpose is to count each cycle only once.)

    • Else, if id(v) is somewhere else in the path, then remove P from the path list (because this cycle must have been counted already).

    • Else, append id(v) to the end of each of the remaining paths in its list.

Specifications

tg_cycle_detection (SET<STRING> v_type, SET<STRING> e_type, INT depth,
  BOOL print_accum = TRUE, STRING file_path = "")

Time complexity

The algorithm has a time complexity of \$O(E * k)\$, where \$E\$ is the number of edges and \$k\$ is the lesser of either the maximum cycle length or the depth parameter.

Characteristic Value

Result

Computes a list of vertex id lists, each of which is a cycle. The result is available in 2 forms:

  • streamed out in JSON format

  • written to a file in tabular format

Input Parameters

  • SET<STRING> v_type: Names of vertex types to use

  • SET<STRING> e_type: Names of edge types to use

  • INT depth: Maximum cycle length to search for = maximum number of iterations

  • BOOL print_accum: If True, output JSON to standard output

  • STRING file_path: If not empty, write output to this file.

Result Size

Number of cycles * average cycle length

Both of these measures are not known in advance.

Graph Types

Directed

Example

In the social10 graph, there are 5 cycles, all with the Fiona-George-Howard-Ivy cluster.

Visualized results of cycle_detection("Person"
[
  {
    "@@cycles": [
      [
        "Fiona",
        "Ivy"
      ],
      [
        "George",
        "Ivy"
      ],
      [
        "Fiona",
        "George",
        "Ivy"
      ],
      [
        "George",
        "Howard",
        "Ivy"
      ],
      [
        "Fiona",
        "George",
        "Howard",
        "Ivy"
      ]
    ]
  }
]