Multiple Hop Patterns

Multiple Hop Pattern Shortest Path Semantics

Repeating the same hop is useful sometimes, but the real power of pattern matching comes from expressing multi-hop patterns, with specific characteristics for each hop. For example, the well-known product recommendation phrase "People who bought this product also bought this other product", is expressed by the following 2-hop pattern:

As you see, a 2-hop pattern is a simple concatenation and merging of two 1-hop patterns where the two patterns share a common endpoint. Below, Y:y is the connecting end point.

2-hop pattern

Similarly, a 3-hop pattern concatenates three 1-hop patterns in sequence, each pair of adjacent hops sharing one end point. Below, Y:y and Z:z are the connecting end points.

3-hop pattern

In general, we can connect n 1-hop patterns into a n-hop pattern. The database will search the graph topology to find subgraphs that match this n-hop pattern.

Unnamed Intermediate Vertex Set

A multi-hop pattern has two endpoint vertex sets and one or more intermediate vertex sets. If the query does not need to express any conditions for an intermediate vertex set, then the vertex set can be omitted and the two surrounding edge sets can be joined with a simple "." For example, in the 2-hop pattern example above, if we did not need to specify that the intermediate vertex type is Y, nor need to refer to that vertex set in any of the query's other clauses (such as WHERE or ACCUM), then the pattern can be reduced as follows:

Shortest paths Only for Variable Length Patterns

If a pattern has a Kleene star to repeat an edge, GSQL pattern matching selects only the shortest paths which match the pattern. If we did not apply this restriction, computer science theory tells us that the computation time could be unbounded or extreme (NP, to be technical). If we instead matched ALL paths regardless of length when a Kleene star is used without an upper bound, there could be an infinite number of matches, if there are loops in the graph. Even without loops or with an upper bound, the number of paths to check grows exponentially with the number of hops.

For the pattern 1 - (_*) - 5 in Figure 3 above, you can see the following:

  • There are TWO shortest paths: 1-2-3-4-5 and 1-2-6-4-5

    • These have 4 hops, so we can stop searching after 4 hops. This makes the task tractable.

  • If we search for ALL paths which do not repeat any vertices:

    • There are THREE non-repeated-vertex paths: 1-2-3-4-5, 1-2-6-4-5, and 1-2-9-10-11-12-4-5

    • The actual number of matches is small, but number of paths to consider is NP.

  • If we search for ALL paths which do not repeat any edges:

    • There are FOUR non-repeated-edge paths: 1-2-3-4-5, 1-2-6-4-5, 1-2-9-10-11-12-4-5, and 1-2-3-7-8-3-4-5

    • The actual number of matches is small, but number of paths to consider is NP.

  • If we search for ALL paths with no restrictions:

    • There are infinite matches, because we can go around the 3-7-8-3 cycle any number of times.

Additional Details about Pattern Matching

Each vertex set or edge set in a pattern (except edges with Kleene stars) can have an alias variable associated with it. When the query runs and finds matches, it associates, or binds, each alias to the matching vertices or edges in the graph.

SELECT Clause

The SELECT clause specifies the output vertex set of a SELECT statement. For a multiple-hop pattern, we can only select one of the two endpoints of the pattern. None of the intermediate aliases can be selected. The example below shows the two possible choices for the given pattern:

FROM Clause

For a multiple-hop pattern, if you don't need to refer to the intermediate vertex points, you can just use "." to connect the edge patterns, giving a more succinct representation. For example, below we remove y and z, and connect E2, <E3 and E4 using the period symbol. Note that you cannot have an alias for a multi-hop sequence like E2>.<E3.E4.

WHERE Clause

In a multiple-hop pattern, the WHERE clause is a conjunction of local hop predicates (conditions), with the exception that the starting end point can appear in the last hop local predicate.

Consider a pattern

  • Local hop predicate means a predicate can refer to a single hop’s alias (x_i, e_i, x_(i+1)) only.

  • Last hop local predicate can refer to the pattern's starting end point. I.e. (x1, x3, e3, x4).

  • A WHERE clause is a conjunction (AND) of local hop predicates.

  • Kleene Star breaks local hop predicate. When a local hop's edge has kleene star, we cannot compose a local predicate using the local alias'.

ACCUM and POST-ACCUM Clauses

In the current version of GSQL, only certain parts of a pattern (and their corresponding aliases) are available in ACCUM and POST-ACCUM clauses. Refer to the example pattern and its highlighted parts in Figure 4 below.

  1. Accumulators can only be attached to the pattern's endpoints (x and u in the figure). The accumulation statements may only access data from either the left endpoint (x) or the rightmost hop (z, e4, and u). Below is an example of a valid ACCUM clause.

  2. In the POST-ACCUM clause, only the pattern's endpoints can be accessed.

  3. For queries in Distributed mode, accumulators may only be on the right endpoint (x in the figure).

The example below shows a valid ACCUM clause:

Examples of Multiple Hop Pattern Match

Example 1. Find the 3rd superclass of the Tag class whose name is "TennisPlayer".

You can copy the above GSQL script to a file named example1.gsql, and invoke this script file in a Linux shell.

Example 2. Find in which continents were the 3 most recent messages in Jan 2011 created.

You can copy the above GSQL script to a file named example2.gsql, and invoke this script file in a Linux shell.

Example 3. Find Viktor Akhiezer's favorite author of 2012 whose last name begins with character 'S', and how many LIKES Viktor give to the author's post or comment.

You can copy the above GSQL script to a file named example3.gsql, and invoke this script file in a Linux shell.