Multiple Hop Patterns

Multiple Hop Pattern Shortest Path Semantics

Repeating the same hop is useful sometimes, but the real power of pattern matching comes from expressing multi-hop patterns, with specific characteristics for each hop. For example, the well-known product recommendation phrase "People who bought this product also bought this other product", is expressed by the following 2-hop pattern:

FROM This_Product:p -(<Bought:b1)- Customer:c -(Bought>:b2)- Product:p2
WHERE p2 != p

As you see, a 2-hop pattern is a simple concatenation and merging of two 1-hop patterns where the two patterns share a common endpoint. Below, Y:y is the connecting end point.

2-hop pattern

FROM X:x - (E1:e1) - Y:y - (E2>:e2) - Z:z

Similarly, a 3-hop pattern concatenates three 1-hop patterns in sequence, each pair of adjacent hops sharing one end point. Below, Y:y and Z:z are the connecting end points.

3-hop pattern

FROM X:x - (E2>:e2) -Y:y - (<E3:e3) - Z:z - (E4:e4) - U:u

In general, we can connect n 1-hop patterns into a n-hop pattern. The database will search the graph topology to find subgraphs that match this n-hop pattern.

Unnamed Intermediate Vertex Set

A multi-hop pattern has two endpoint vertex sets and one or more intermediate vertex sets. If the query does not need to express any conditions for an intermediate vertex set, then the vertex set can be omitted and the two surrounding edge sets can be joined with a simple "." For example, in the 2-hop pattern example above, if we did not need to specify that the intermediate vertex type is Y, nor need to refer to that vertex set in any of the query's other clauses (such as WHERE or ACCUM), then the pattern can be reduced as follows:

FROM X:x - (E1:e1.E2>:e2) - Z:z

Shortest paths Only for Variable Length Patterns

If a pattern has a Kleene star to repeat an edge, GSQL pattern matching selects only the shortest paths which match the pattern. If we did not apply this restriction, computer science theory tells us that the computation time could be unbounded or extreme (NP, to be technical). If we instead matched ALL paths regardless of length when a Kleene star is used without an upper bound, there could be an infinite number of matches, if there are loops in the graph. Even without loops or with an upper bound, the number of paths to check grows exponentially with the number of hops.

For the pattern 1 - (_*) - 5 in Figure 3 above, you can see the following:

  • There are TWO shortest paths: 1-2-3-4-5 and 1-2-6-4-5

    • These have 4 hops, so we can stop searching after 4 hops. This makes the task tractable.

  • If we search for ALL paths which do not repeat any vertices:

    • There are THREE non-repeated-vertex paths: 1-2-3-4-5, 1-2-6-4-5, and 1-2-9-10-11-12-4-5

    • The actual number of matches is small, but number of paths to consider is NP.

  • If we search for ALL paths which do not repeat any edges:

    • There are FOUR non-repeated-edge paths: 1-2-3-4-5, 1-2-6-4-5, 1-2-9-10-11-12-4-5, and 1-2-3-7-8-3-4-5

    • The actual number of matches is small, but number of paths to consider is NP.

  • If we search for ALL paths with no restrictions:

    • There are infinite matches, because we can go around the 3-7-8-3 cycle any number of times.

Additional Details about Pattern Matching

Each vertex set or edge set in a pattern (except edges with Kleene stars) can have an alias variable associated with it. When the query runs and finds matches, it associates, or binds, each alias to the matching vertices or edges in the graph.

TigerGraph 2.4 has certain restrictions on how accumulators and aliases can be used. Some or all of these restrictions will be lifted in future releases. We elaborate on them below.

SELECT Clause

The SELECT clause specifies the output vertex set of a SELECT statement. For a multiple-hop pattern, we can only select one of the two endpoints of the pattern. None of the intermediate aliases can be selected. The example below shows the two possible choices for the given pattern:

SELECT Clause Can Select End Points Only
#select starting end point x
SELECT x
FROM X:x-(E2>:e2)-Y:y-(<E3:e3)-Z:z-(E4:e4)-U:u;

#select ending end point u
SELECT u
FROM X:x-(E2>:e2)-Y:y-(<E3:e3)-Z:z-(E4:e4)-U:u;

FROM Clause

For a multiple-hop pattern, if you don't need to refer to the intermediate vertex points, you can just use "." to connect the edge patterns, giving a more succinct representation. For example, below we remove y and z, and connect E2, <E3 and E4 using the period symbol. Note that you cannot have an alias for a multi-hop sequence like E2>.<E3.E4.

Omitting
#select starting end point x
SELECT x
FROM X:x-(E2>:e2)-Y:y-(<E3:e3)-Z:z-(E4:e4)-U:u;

#if we don't need to access y, z, we can write
SELECT u
FROM X:x-(E2>.<E3.E4)-U:u;

WHERE Clause

In a multiple-hop pattern, the WHERE clause is a conjunction of local hop predicates (conditions), with the exception that the starting end point can appear in the last hop local predicate.

Consider a pattern

X1:x1-(E1:e1)-X2:x2-(E2:e2)-X3:x3-(E3:e3)-X4:x4
  • Local hop predicate means a predicate can refer to a single hop’s alias (x_i, e_i, x_(i+1)) only.

  • Last hop local predicate can refer to the pattern's starting end point. I.e. (x1, x3, e3, x4).

  • A WHERE clause is a conjunction (AND) of local hop predicates.

WHERE Clause Support "AND" of Local Predicate
# (x, e2, y) belongs to the 1-hop
# (y, e3, z) belongs to the 2-hop
# (x, z, e4, u) is the last hop local predicate
SELECT x
FROM X:x-(E2>:e2)-Y:y-(<E3:e3)-Z:z-(E4:e4)-U:u
WHERE x.age > y.age AND y.name != z.name AND (x.salary + z.salary) < u.salary 
  • Kleene Star breaks local hop predicate. When a local hop's edge has kleene star, we cannot compose a local predicate using the local alias'.

Kleene Star Break Local Predicate
# (x,y) belongs to the 1-hop 
# which has *, then semantic error will be given
SELECT x
FROM X:x-(E2>*:e2)-Y:y-(<E3:e3)-Z:z-(E4:e4)-U:u
WHERE x.age > y.age AND y.name != z.name AND (x.salary + z.salary) < u.salary 

ACCUM and POST-ACCUM Clauses

In the current version of GSQL, only certain parts of a pattern (and their corresponding aliases) are available in ACCUM and POST-ACCUM clauses. Refer to the example pattern and its highlighted parts in Figure 4 below.

  1. Accumulators can only be attached to the pattern's endpoints (x and u in the figure). The accumulation statements may only access data from either the left endpoint (x) or the rightmost hop (z, e4, and u). Below is an example of a valid ACCUM clause.

  2. In the POST-ACCUM clause, only the pattern's endpoints can be accessed.

  3. For queries in Distributed mode, accumulators may only be on the right endpoint (x in the figure).

The example below shows a valid ACCUM clause:

ACCUM To The Two End Points of A Pattern
# (z,e4, u) belongs to the last-hop aliass 
# (x, u) are the end points of the pattern
SELECT x
FROM X:x-(E2>*:e2)-Y:y-(<E3:e3)-Z:z-(E4:e4)-U:u
ACCUM x.@cnt += z.id, u.@cnt += e4.id

Examples of Multiple Hop Pattern Match

Example 1. Find the 3rd superclass of the Tag class whose name is "TennisPlayer".

Example1. Succict Representation Of Multiple-hop Pattern
USE GRAPH ldbc_snb
SET syntax_version="v2"

INTERPRET QUERY () FOR GRAPH ldbc_snb {

  TagClass1 = {TagClass.*};

  TagClass2 =  SELECT t
    FROM TagClass1:s - (TagClass_IS_SUBCLASS_OF_TagClass>.TagClass_IS_SUBCLASS_OF_TagClass>.TagClass_IS_SUBCLASS_OF_TagClass>)-TagClass:t
    WHERE s.name == "TennisPlayer";

  PRINT  TagClass2;
}

You can copy the above GSQL script to a file named example1.gsql, and invoke this script file in a Linux shell.

Linux Bash
gsql example1.gsql
Output of Example 1
Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"TagClass2": [{
    "v_id": "239",
    "attributes": {
      "name": "Agent",
      "id": 239,
      "url": "http://dbpedia.org/ontology/Agent"
    },
    "v_type": "TagClass"
  }]}]
}

Example 2. Find in which continents were the 3 most recent messages in Jan 2011 created.

Example1. Disjunction In A Succict Representation Of Multiple-hop Pattern
USE GRAPH ldbc_snb
SET syntax_version="v2"

INTERPRET QUERY () FOR GRAPH ldbc_snb {

  SumAccum<String> @continentName;

  msg = {Comment.*, Post.*};

  accMsgContinent =
    SELECT s
    FROM msg:s-((Post_IS_LOCATED_IN_Country>|Comment_IS_LOCATED_IN_Country>).Country_IS_PART_OF_Continent>)-Continent:t
    WHERE  year(s.creationDate) == 2011 AND month(s.creationDate) == 1
    ACCUM s.@continentName = t.name
    ORDER BY s.creationDate DESC
    LIMIT 3;

  PRINT  accMsgContinent;
}

You can copy the above GSQL script to a file named example2.gsql, and invoke this script file in a Linux shell.

Linux Bash
gsql example2.gsql
Output of Example 2
Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"accMsgContinent": [
    {
      "v_id": "824640012997",
      "attributes": {
        "browserUsed": "Firefox",
        "length": 7,
        "locationIP": "27.112.21.246",
        "@continentName": "Asia",
        "id": 824640012997,
        "creationDate": "2011-01-31 23:54:28",
        "content": "no way!"
      },
      "v_type": "Comment"
    },
    {
      "v_id": "824636727408",
      "attributes": {
        "browserUsed": "Firefox",
        "length": 3,
        "locationIP": "31.2.225.17",
        "@continentName": "Europe",
        "id": 824636727408,
        "creationDate": "2011-01-31 23:57:46",
        "content": "thx"
      },
      "v_type": "Comment"
    },
    {
      "v_id": "824634837528",
      "attributes": {
        "imageFile": "",
        "browserUsed": "Internet Explorer",
        "length": 115,
        "locationIP": "87.251.6.121",
        "@continentName": "Asia",
        "id": 824634837528,
        "creationDate": "2011-01-31 23:58:03",
        "lang": "tk",
        "content": "About Adolf Hitler, iews. His writings and methods were often adapted to need and circumstance, although there were"
      },
      "v_type": "Post"
    }
  ]}]
}

Example 3. Find Viktor Akhiezer's favorite author of 2012 whose last name begins with character 'S', and how many LIKES Viktor give to the author's post or comment.

Example 3. Multiple-hop Pattern With Accumulator Applied To All Matched Paths
USE GRAPH ldbc_snb
SET syntax_version="v2"

INTERPRET QUERY () FOR GRAPH ldbc_snb {
  SumAccum<int> @likesCnt;
  PersonAll = {Person.*};

  FavoriteAuthors =  SELECT t
               FROM PersonAll:s-((Person_LIKES_Comment>|Person_LIKES_Post>)) -:msg - ((Comment_HAS_CREATOR_Person>|Post_HAS_CREATOR_Person>))-Person:t
               WHERE s.firstName == "Viktor" AND s.lastName == "Akhiezer" AND t.lastName LIKE "S%" AND year(msg.creationDate) == 2012
               ACCUM t.@likesCnt +=1;

  PRINT  FavoriteAuthors[FavoriteAuthors.firstName, FavoriteAuthors.lastName, FavoriteAuthors.@likesCnt];
}

You can copy the above GSQL script to a file named example3.gsql, and invoke this script file in a Linux shell.

Linux Bash
gsql example3.gsql
Output of Example 3
Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"FavoriteAuthors": [
    {
      "v_id": "8796093025410",
      "attributes": {
        "FavoriteAuthors.firstName": "Priyanka",
        "FavoriteAuthors.lastName": "Singh",
        "FavoriteAuthors.@likesCnt": 1
      },
      "v_type": "Person"
    },
    {
      "v_id": "2199023260091",
      "attributes": {
        "FavoriteAuthors.firstName": "Janne",
        "FavoriteAuthors.lastName": "Seppala",
        "FavoriteAuthors.@likesCnt": 1
      },
      "v_type": "Person"
    },
    {
      "v_id": "15393162796846",
      "attributes": {
        "FavoriteAuthors.firstName": "Mario",
        "FavoriteAuthors.lastName": "Santos",
        "FavoriteAuthors.@likesCnt": 1
      },
      "v_type": "Person"
    }
  ]}]
}

Last updated