Conjunctive Pattern Matching (Beta)

What is Conjunctive Pattern Matching?

So far, we have described pattern matching as one path pattern in a FROM clause. In this section, we introduce GSQL’s capability to match multiple patterns in one FROM clause. This extension is called Conjunctive Pattern Matching (CPM), because the query asks for the conjunction (logical AND) of the patterns. To get a match, all of the patterns must be satisfied, and the patterns can interrelate. Visually, you can think of patterns formed by a set of intersecting line segments. This feature, introduced as a Beta feature in TigerGraph 3.0, enables you to express complex patterns concisely in a single query block.

In general, a CPM query block consists of multiple patterns in the FROM clause. It has a structure illustrated below.

# Conjunctive Pattern Matching Syntax

SelectBlock := SELECT alias
               FROM pattern
               [sampleClause]
               [whereClause]
               [ [perClause] accumClause]
               [postAccumClause]*
                    ...

pattern := vertexPattern | edgePattern | (pathPattern ["," pathPattern])
// vertexPattern and edgePattern are from classic GSQL

We elaborate on each of the clause.

SELECT Clause

The SELECT clause selects only one vertex alias from all the patterns in the FROM clause.

FROM Clause - Conjunctive Matching

This is where the conjunctive matching is expressed. The FROM clause consists of a list of path patterns, which are separated by commas. Evaluating each pattern against the underlying graph data produces a match table. If two patterns share a vertex alias, then we form the natural join of the two match tables.

For example, consider this CPM:

FROM X:x - (E1:e1) - Y:y - (E2>:e2) - Z:z,
     Z:z - (E3:e3) - U:u - (E4>:e4) - V:v

The first pattern’s variables are x, e1, y, e2, and z; the second pattern’s variables are z, e3, u, e4, and v. Considering the two patterns independently would yield the follwing match table schemas:

#first match table
(x, e1, y, e2, z)
#second match table
(z, e3, u, e4, v)

Natural Join of Match Tables

Natural joining two match tables compares all the shared vertex aliases between the two tables, and the resulting joined table contains all non-shared variables plus one copy of each of the shared vertex variables. Here is the match table for the CPM above:

#natural join result; the shared vertex variable z appears once.
(x, e1, y, e2, z, e3, u, e4, v)

Valid Conjunctive Patterns

The match table of the conjunctive pattern match is the natural join of all the patterns' match tables. By design, a row in the CPM match table must simultaneously satisfy all the match tables.

If the match tables of the patterns in a FROM clause can be naturally joined into one match table, then the FROM clause has a valid CPM input. Otherwise, the FROM clause has an invalid pattern input list.

For example, below we show two valid CPM inputs and one invalid CPM input.

// a valid CPM, since the two patterns natrually join on :tgt
SELECT
FROM Person:p - (KNOWS) - :tgt,
     Post:s -(<LIKES) - :tgt

// a valid CPM, since the two patterns naturally join on :f
SELECT
FROM Person:p - (KNOWS) - :f - (LIKES>) - Post:tgt,
     :f - (LIKES>) - Comment:c

// an invalid CPM, since the two patterns do not share any vertex variables, they cannot be naturally joined. One pattern has (p, tgt); the other has (s, t).
SELECT
FROM Person:p - (KNOWS) - :tgt,
     Post:s - (<LIKES) - Person:t

WHERE Clause

The predicates in the WHERE clause can use any of the vertex or edge aliases in any of the patterns, including predicates which combine variables from different constituent paths. CPM queries do not have any special restrictions on the WHERE predicate. Distance matters, however, for performance. Conditions that are local, measured both cross-path and within-path, can be resolved earlier and therefore are faster.

In the example below, x2.age > x4.age is a cross-pattern predicate, e1.timestamp < e3.timestamp is a cross-pattern predicate, and x1.gender == x4.gender is a local predicate of the second pattern.

FROM X1:x1-(E1:e1)-X2:x2-(E2:e2)-X3:x3,
     X1:x1-(E3:e3)-X4:x4
WHERE x2.age > x4.age AND e1.timestamp < e3.timestamp AND x1.gender == x4.gender

ACCUM Clause

You can ACCUM to any vertex variable in a CPM block.

The ACCUM clause by default will execute *as many times* *as* the row (match) count of the CPM match table; each execution uses one row from the match table.

ACCUM To The Three Vertex Variables of A CPM Pattern

//accum to x1, x2 and x4.
FROM X1:x1-(E1:e1)-X2:x2-(E2:e2)-X3:x3,
     X1:x1-(E3:e3)-X4:x4
ACCUM x1.@cnt +=1, x2.@cnt += x3.quantity, x4.@cnt += x3.quantity

POST-ACCUM Clause

POST-ACCUM for CPM behaves the same as POST-ACCUM for single path patterns. That is, each POST-ACCUM clause can refer to one vertex alias, and the clause executes iteratively over that vertex set (e.g. one vertex column in the matching table). Its statements can access the aggregated accumulator result computed in the ACCUM clause. The query can have multiple POST-ACCUM clauses, one for each vertex alias you wish to work on. The multiple POST-ACCUM clauses are processed in parallel; it doesn’t matter in what order you write them. (For each binding, the statements within a clause are executed in order.)

For example, below we have three POST-ACCUM clauses. The first one iterates throughx1, and for each x1, we do @@cnt += x1.@cnt. The second and third POST-ACCUMs iterate through x2 and x3 respectively, and accumulates their @cnt accumulator value into @@cnt.

POST-ACCUM to a global accumulator @@cnt, using three CPM Vertex Variables

FROM X1:x1-(E1:e1)-X2:x2-(E2:e2)-X3:x3,
     X1:x1-(E3:e3)-X4:x4
ACCUM x1.@cnt +=1, x2.@cnt += x3.quantity, x4.@cnt += x3.quantity
POST-ACCUM @@cnt += x1.@cnt
POST-ACCUM @@cnt += x2.@cnt
POST-ACCUm @@cnt += x3.@cnt;

Examples

Example 1. Find Viktor Akhiezer’s liked messages (100+ days after their creation) whose author’s last name begin with letter S. Output the message’s forum.

USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2 {

  SumAccum<int> @@cnt;

  F  =  SELECT f
        FROM Person:s - (Likes>:e1) - :msg - (Has_Creator>) - Person:t,
             Forum:f - (Container_Of>:e2) - :msg
        WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer"
              AND t.last_name LIKE "S%"
              AND e1.creation_date >DATETIME_ADD(msg.creation_date, INTERVAL 100 DAY);

  PRINT F;
}

// results
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"F": [{
    "v_id": "962072688797",
    "attributes": {
      "id": 962072688797,
      "creation_date": "2011-04-12 09:36:50",
      "title": "Album 12 of Mario Santos"
    },
    "v_type": "Forum"
  }]}]
}

Example 2. Find any authors who wrote posts that Viktor Akhiezer’s liked and whose last name begins with S. Find the country for each of these authors and report on the countries.

USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2 {

    SumAccum<int> @@cnt;

    C  =  SELECT ctry
        FROM Person:s - (Likes>:e1) - Post:msg - (Has_Creator>) - Person:t,
             :t - (Work_At>:e2) - Company:c,
             :c - (Is_Located_In>) - Country:ctry
        WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer"
              AND t.last_name LIKE "S%" ;

    PRINT C;
}
// results
{
    "error": false,
    "message": "",
    "version": {
        "schema": 0,
        "edition": "enterprise",
        "api": "v2"
    },
    "results": [{"C": [{
        "v_id": "93",
        "attributes": {
            "name": "Portugal",
            "id": 93,
            "url": "http://dbpedia.org/resource/Portugal"
        },
        "v_type": "Country"
    }]}]
}

Example 3. Given a TagClass and a Country, find all the Forums created in the given Country, containing at least one Post with Tags belonging directly to the given TagClass. The location of a Forum is identified by the location of the Forum’s moderator.

CREATE QUERY bi_4(STRING tc_name, STRING c_name) FOR GRAPH ldbc_snb SYNTAX v2 {
  SetAccum<vertex<Post>> @post_set;
  SumAccum<int> @person_id, @post_count;

  forum_set =
    SELECT f
    FROM Forum:f -(Has_Moderator>)- Person:a -(Is_Located_In>.Is_Part_Of>)- Country:c,
         :f -(Container_Of>)- Post:p -(Has_Tag>.Has_Type>)- Tag_Class:tc
    WHERE c.name == c_name AND tc.name == tc_name
    ACCUM f.@person_id = a.id, f.@post_set += p
    POST-ACCUM f.@post_count = f.@post_set.size(), f.@post_set.clear()
    ORDER BY f.@post_count DESC, f.id ASC
    LIMIT 3;

  PRINT forum_set[forum_set.id, forum_set.title, forum_set.creation_date,
                 forum_set.@person_id, forum_set.@post_count];
}

RUN QUERY bi_4("MusicalArtist", "Burma")
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"forum_set": [
    {
      "v_id": "81903",
      "attributes": {
        "forum_set.@person_id": 5226,
        "forum_set.@post_count": 65,
        "forum_set.title": "Wall of Donald Steele-Perkins",
        "forum_set.id": 81903,
        "forum_set.creation_date": "2010-02-15 06:48:04"
      },
      "v_type": "Forum"
    },
    {
      "v_id": "137438953686",
      "attributes": {
        "forum_set.@person_id": 2199023262994,
        "forum_set.@post_count": 65,
        "forum_set.title": "Wall of Eric Law-Yone",
        "forum_set.id": 137438953686,
        "forum_set.creation_date": "2010-04-25 22:10:32"
      },
      "v_type": "Forum"
    },
    {
      "v_id": "687194810508",
      "attributes": {
        "forum_set.@person_id": 10995116283784,
        "forum_set.@post_count": 39,
        "forum_set.title": "Wall of Hector Hugh Michie",
        "forum_set.id": 687194810508,
        "forum_set.creation_date": "2010-12-19 15:33:30"
      },
      "v_type": "Forum"
    }
  ]}]
}

Example 4. For a given country, count all the distinct triples of Persons such that:

a is a friend of b.
b is a friend of c
c is a friend of a.

Distinct means that if a certain 3 vertices appear once in the results, it will not be repeated: it will appear only once. KNOWS is an undirected relationship, so it doesn’t matter in what order we list the 3 vertices.

CREATE QUERY bi_17(STRING c_name) FOR GRAPH ldbc_snb SYNTAX v2 {
  TYPEDEF TUPLE <uint a, uint b, uint c> triplet;
  SetAccum<triplet> @@triplet_set;
  SumAccum<int> @@triplet_count;

  C =
    SELECT c
    FROM Country:c -(<Is_Part_Of.<Is_Located_In)- Person:p1,
         :c -(<Is_Part_Of.<Is_Located_In)- Person:p2,
         :c -(<Is_Part_Of.<Is_Located_In)- Person:p3,
         :p1 -(Knows)- :p2 -(Knows)- :p3 -(Knows)- :p1
    WHERE c.name == c_name AND p1.id < p2.id AND p2.id < p3.id
    ACCUM @@triplet_set += triplet(p1.id, p2.id, p3.id);

  @@triplet_count = @@triplet_set.size();
  @@triplet_set.clear();
  PRINT @@triplet_count;
}
// results of RUN QUERY bi_17("Spain")
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"@@triplet_count": 242}]
}

More Examples. We translated LDBC-SNB BI and IC queries using CPM, and shared the translation in github. Please refer to the query translation here. Most of the queries are installed as functions, you can find sample parameter(s) of the functions from here.

Source Vertex Set Flexibility

As mentioned when we first described pattern matching, in One-hop patterns, the source (leftmost) vertex set can be a vertex type, an alternation of types, or even omitted.

Example 1. Find Viktor Akhiezer’s favorite messages' creators whose last name begins with letter S. Count them.

USE GRAPH ldbc_snb

// start from a vertex type "Person"
INTERPRET QUERY () SYNTAX v2 {

  SumAccum<int> @@cnt;

  F  =  SELECT t
        FROM Person:s -(Likes>:e1)- :msg -(Has_Creator>)- Person:t
        WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer"
              AND t.last_name LIKE "S%"
        POST-ACCUM @@cnt+=1;

  PRINT  @@cnt;

}

// results
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"@@cnt": 8}]
}

Example 2. Same query as example 1, but without beginning with vertex types. GSQL compiler can infer the types of :s.

USE GRAPH ldbc_snb

// both end points of the pattern do not have vertex types.
INTERPRET QUERY () SYNTAX v2 {

  SumAccum<int> @@cnt;

  F  =  SELECT t
        FROM :s -(Likes>:e1)- :msg -(Has_Creator>)- :t
        WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer" AND t.last_name LIKE "S%"
        POST-ACCUM @@cnt+=1;

  PRINT  @@cnt;

}

// results
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"@@cnt": 2190095}]
}

Example 3. Count the LIKES edge.

USE GRAPH ldbc_snb

// a pattern starts without any information.
INTERPRET QUERY () SYNTAX v2 {

  SumAccum<int> @@cnt;

  F  =  SELECT msg
        FROM  -(Likes>:e1)- :msg
        ACCUM @@cnt+=1;

  PRINT  @@cnt;

}

// results
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"@@cnt": 2190095}]
}