Conjunctive Pattern Matching (Beta)
What is Conjunctive Pattern Matching?
So far, we have described pattern matching as one path pattern in a FROM clause. In this section, we introduce GSQL’s capability to match multiple patterns in one FROM clause. This extension is called Conjunctive Pattern Matching (CPM), because the query asks for the conjunction (logical AND) of the patterns. To get a match, all of the patterns must be satisfied, and the patterns can interrelate. Visually, you can think of patterns formed by a set of intersecting line segments. This feature, introduced as a Beta feature in TigerGraph 3.0, enables you to express complex patterns concisely in a single query block.
In general, a CPM query block consists of multiple patterns in the FROM clause. It has a structure illustrated below.
SelectBlock := SELECT alias
FROM pattern
[sampleClause]
[whereClause]
[ [perClause] accumClause]
[postAccumClause]*
...
pattern := vertexPattern | edgePattern | (pathPattern ["," pathPattern])
// vertexPattern and edgePattern are from classic GSQL
We elaborate on each of the clause.
SELECT Clause
The SELECT clause selects only one vertex alias from all the patterns in the FROM clause.
FROM Clause - Conjunctive Matching
This is where the conjunctive matching is expressed. The FROM clause consists of a list of path patterns, which are separated by commas. Evaluating each pattern against the underlying graph data produces a match table. If two patterns share a vertex alias, then we form the natural join of the two match tables.
For example, consider this CPM:
FROM X:x - (E1:e1) - Y:y - (E2>:e2) - Z:z,
Z:z - (E3:e3) - U:u - (E4>:e4) - V:v
The first pattern’s variables are x, e1, y, e2, and z; the second pattern’s variables are z, e3, u, e4, and v. Considering the two patterns independently would yield the follwing match table schemas:
#first match table
(x, e1, y, e2, z)
#second match table
(z, e3, u, e4, v)
Natural Join of Match Tables
Natural joining two match tables compares all the shared vertex aliases between the two tables, and the resulting joined table contains all non-shared variables plus one copy of each of the shared vertex variables. Here is the match table for the CPM above:
#natural join result; the shared vertex variable z appears once.
(x, e1, y, e2, z, e3, u, e4, v)
Valid Conjunctive Patterns
The match table of the conjunctive pattern match is the natural join of all the patterns' match tables. By design, a row in the CPM match table must simultaneously satisfy all the match tables.
If the match tables of the patterns in a FROM clause can be naturally joined into one match table, then the FROM clause has a valid CPM input. Otherwise, the FROM clause has an invalid pattern input list.
For example, below we show two valid CPM inputs and one invalid CPM input.
// a valid CPM, since the two patterns natrually join on :tgt
SELECT
FROM Person:p - (KNOWS) - :tgt,
Post:s -(<LIKES) - :tgt
// a valid CPM, since the two patterns naturally join on :f
SELECT
FROM Person:p - (KNOWS) - :f - (LIKES>) - Post:tgt,
:f - (LIKES>) - Comment:c
// an invalid CPM, since the two patterns do not share any vertex variables, they cannot be naturally joined. One pattern has (p, tgt); the other has (s, t).
SELECT
FROM Person:p - (KNOWS) - :tgt,
Post:s - (<LIKES) - Person:t
WHERE Clause
The predicates in the WHERE clause can use any of the vertex or edge aliases in any of the patterns, including predicates which combine variables from different constituent paths. CPM queries do not have any special restrictions on the WHERE predicate. Distance matters, however, for performance. Conditions that are local, measured both cross-path and within-path, can be resolved earlier and therefore are faster.
In the example below, x2.age > x4.age
is a cross-pattern predicate, e1.timestamp < e3.timestamp
is a cross-pattern predicate, and x1.gender == x4.gender
is a local predicate of the second pattern.
FROM X1:x1-(E1:e1)-X2:x2-(E2:e2)-X3:x3,
X1:x1-(E3:e3)-X4:x4
WHERE x2.age > x4.age AND e1.timestamp < e3.timestamp AND x1.gender == x4.gender
ACCUM Clause
You can ACCUM to any vertex variable in a CPM block.
The ACCUM clause by default will execute *as many times* *as* the row (match) count of the CPM match table; each execution uses one row from the match table.
//accum to x1, x2 and x4.
FROM X1:x1-(E1:e1)-X2:x2-(E2:e2)-X3:x3,
X1:x1-(E3:e3)-X4:x4
ACCUM x1.@cnt +=1, x2.@cnt += x3.quantity, x4.@cnt += x3.quantity
POST-ACCUM Clause
POST-ACCUM for CPM behaves the same as POST-ACCUM for single path patterns. That is, each POST-ACCUM clause can refer to one vertex alias, and the clause executes iteratively over that vertex set (e.g. one vertex column in the matching table). Its statements can access the aggregated accumulator result computed in the ACCUM clause. The query can have multiple POST-ACCUM clauses, one for each vertex alias you wish to work on. The multiple POST-ACCUM clauses are processed in parallel; it doesn’t matter in what order you write them. (For each binding, the statements within a clause are executed in order.)
For example, below we have three POST-ACCUM clauses. The first one iterates throughx1
, and for each x1, we do @@cnt += x1.@cnt
. The second and third POST-ACCUMs iterate through x2
and x3
respectively, and accumulates their @cnt
accumulator value into @@cnt
.
FROM X1:x1-(E1:e1)-X2:x2-(E2:e2)-X3:x3,
X1:x1-(E3:e3)-X4:x4
ACCUM x1.@cnt +=1, x2.@cnt += x3.quantity, x4.@cnt += x3.quantity
POST-ACCUM @@cnt += x1.@cnt
POST-ACCUM @@cnt += x2.@cnt
POST-ACCUm @@cnt += x3.@cnt;
Examples
Example 1. Find Viktor Akhiezer’s liked messages (100+ days after their creation) whose author’s last name begin with letter S. Output the message’s forum.
USE GRAPH ldbc_snb
INTERPRET QUERY () SYNTAX v2 {
SumAccum<int> @@cnt;
F = SELECT f
FROM Person:s - (Likes>:e1) - :msg - (Has_Creator>) - Person:t,
Forum:f - (Container_Of>:e2) - :msg
WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer"
AND t.last_name LIKE "S%"
AND e1.creation_date >DATETIME_ADD(msg.creation_date, INTERVAL 100 DAY);
PRINT F;
}
// results
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"F": [{
"v_id": "962072688797",
"attributes": {
"id": 962072688797,
"creation_date": "2011-04-12 09:36:50",
"title": "Album 12 of Mario Santos"
},
"v_type": "Forum"
}]}]
}
Example 2. Find any authors who wrote posts that Viktor Akhiezer’s liked and whose last name begins with S. Find the country for each of these authors and report on the countries.
USE GRAPH ldbc_snb
INTERPRET QUERY () SYNTAX v2 {
SumAccum<int> @@cnt;
C = SELECT ctry
FROM Person:s - (Likes>:e1) - Post:msg - (Has_Creator>) - Person:t,
:t - (Work_At>:e2) - Company:c,
:c - (Is_Located_In>) - Country:ctry
WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer"
AND t.last_name LIKE "S%" ;
PRINT C;
}
// results
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"C": [{
"v_id": "93",
"attributes": {
"name": "Portugal",
"id": 93,
"url": "http://dbpedia.org/resource/Portugal"
},
"v_type": "Country"
}]}]
}
Example 3. Given a TagClass and a Country, find all the Forums created in the given Country, containing at least one Post with Tags belonging directly to the given TagClass. The location of a Forum is identified by the location of the Forum’s moderator.
CREATE QUERY bi_4(STRING tc_name, STRING c_name) FOR GRAPH ldbc_snb SYNTAX v2 {
SetAccum<vertex<Post>> @post_set;
SumAccum<int> @person_id, @post_count;
forum_set =
SELECT f
FROM Forum:f -(Has_Moderator>)- Person:a -(Is_Located_In>.Is_Part_Of>)- Country:c,
:f -(Container_Of>)- Post:p -(Has_Tag>.Has_Type>)- Tag_Class:tc
WHERE c.name == c_name AND tc.name == tc_name
ACCUM f.@person_id = a.id, f.@post_set += p
POST-ACCUM f.@post_count = f.@post_set.size(), f.@post_set.clear()
ORDER BY f.@post_count DESC, f.id ASC
LIMIT 3;
PRINT forum_set[forum_set.id, forum_set.title, forum_set.creation_date,
forum_set.@person_id, forum_set.@post_count];
}
RUN QUERY bi_4("MusicalArtist", "Burma")
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"forum_set": [
{
"v_id": "81903",
"attributes": {
"forum_set.@person_id": 5226,
"forum_set.@post_count": 65,
"forum_set.title": "Wall of Donald Steele-Perkins",
"forum_set.id": 81903,
"forum_set.creation_date": "2010-02-15 06:48:04"
},
"v_type": "Forum"
},
{
"v_id": "137438953686",
"attributes": {
"forum_set.@person_id": 2199023262994,
"forum_set.@post_count": 65,
"forum_set.title": "Wall of Eric Law-Yone",
"forum_set.id": 137438953686,
"forum_set.creation_date": "2010-04-25 22:10:32"
},
"v_type": "Forum"
},
{
"v_id": "687194810508",
"attributes": {
"forum_set.@person_id": 10995116283784,
"forum_set.@post_count": 39,
"forum_set.title": "Wall of Hector Hugh Michie",
"forum_set.id": 687194810508,
"forum_set.creation_date": "2010-12-19 15:33:30"
},
"v_type": "Forum"
}
]}]
}
Example 4. For a given country, count all the distinct triples of Persons such that:
-
a is a friend of b.
-
b is a friend of c
-
c is a friend of a.
Distinct means that if a certain 3 vertices appear once in the results, it will not be repeated: it will appear only once. KNOWS is an undirected relationship, so it doesn’t matter in what order we list the 3 vertices.
CREATE QUERY bi_17(STRING c_name) FOR GRAPH ldbc_snb SYNTAX v2 {
TYPEDEF TUPLE <uint a, uint b, uint c> triplet;
SetAccum<triplet> @@triplet_set;
SumAccum<int> @@triplet_count;
C =
SELECT c
FROM Country:c -(<Is_Part_Of.<Is_Located_In)- Person:p1,
:c -(<Is_Part_Of.<Is_Located_In)- Person:p2,
:c -(<Is_Part_Of.<Is_Located_In)- Person:p3,
:p1 -(Knows)- :p2 -(Knows)- :p3 -(Knows)- :p1
WHERE c.name == c_name AND p1.id < p2.id AND p2.id < p3.id
ACCUM @@triplet_set += triplet(p1.id, p2.id, p3.id);
@@triplet_count = @@triplet_set.size();
@@triplet_set.clear();
PRINT @@triplet_count;
}
// results of RUN QUERY bi_17("Spain")
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"@@triplet_count": 242}]
}
Source Vertex Set Flexibility
As mentioned when we first described pattern matching, in One-hop patterns, the source (leftmost) vertex set can be a vertex type, an alternation of types, or even omitted.
Example 1. Find Viktor Akhiezer’s favorite messages' creators whose last name begins with letter S. Count them.
USE GRAPH ldbc_snb
// start from a vertex type "Person"
INTERPRET QUERY () SYNTAX v2 {
SumAccum<int> @@cnt;
F = SELECT t
FROM Person:s -(Likes>:e1)- :msg -(Has_Creator>)- Person:t
WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer"
AND t.last_name LIKE "S%"
POST-ACCUM @@cnt+=1;
PRINT @@cnt;
}
// results
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"@@cnt": 8}]
}
Example 2. Same query as example 1, but without beginning with vertex types. GSQL compiler can infer the types of :s.
USE GRAPH ldbc_snb
// both end points of the pattern do not have vertex types.
INTERPRET QUERY () SYNTAX v2 {
SumAccum<int> @@cnt;
F = SELECT t
FROM :s -(Likes>:e1)- :msg -(Has_Creator>)- :t
WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer" AND t.last_name LIKE "S%"
POST-ACCUM @@cnt+=1;
PRINT @@cnt;
}
// results
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"@@cnt": 2190095}]
}
Example 3. Count the LIKES edge.
USE GRAPH ldbc_snb
// a pattern starts without any information.
INTERPRET QUERY () SYNTAX v2 {
SumAccum<int> @@cnt;
F = SELECT msg
FROM -(Likes>:e1)- :msg
ACCUM @@cnt+=1;
PRINT @@cnt;
}
// results
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"@@cnt": 2190095}]
}