Pattern matching produces a virtual match table, and the ACCUM clause acts like a FOREACH loop, executing the clause's statement once for each row of the match table.
Patterns are paths in the graphs, and each row in the match table is a distinct path. However, paths may share some vertices or edges. Some applications do not want to do aggregations per path. Instead, they want to execute the ACCUM clause per distinct group of vertex aliases.
For example, consider the following query which counts the number of paths in a simple 2-hop pattern:
Suppose the query produces the following match table.
By default, the ACCUM clause will execute the @@cnt += 1
statement 4 times, for each row in the match table. The result will be @@cnt = 4
.
For the same query, what if the user wants to
count the number of distinct path endings in the match table? For this case, we would want to iterate on the alias t
.
count the number of distinct (start, end) pairs in the match table? For that case, we would want to iterate on distinct pairs of the aliases (s, t)
.
To provide users with this added flexibility and finer control over ACCUM iteration, TigerGraph 3.0 adds the PER clause to pattern matching (V2) syntax.
The PER Clause is an optional clause that comes at the start of the ACCUM clause in a SELECT statement. As illustrated below, it starts with the keyword PER, and followed by a pair of parenthesis, in which user can put one or more distinct vertex aliases found in the FROM pattern.
Examples. Below are multiple examples of the PER Clause using the same FROM clause.
The PER Clause specifies a list of vertex aliases, which are used to group the rows in the match table, one group per distinct value of the alias or of the alias list. If there are N distinct groups, we will execute the ACCUM clause N times, once per distinct vertex aliases' binding. Note that the PER clause has no effect on POST-ACCUM clauses semantic, except confining the POST-ACCUM vertex alias.
Suppose s, m, and t are vertex aliases in a pattern. Below are some interpretations of the PER Clause based on the graph element bindings found in the match table.
PER (s) ACCUM
means that per each distinct s vertex, execute the ACCUM clause once.
PER (s,t) ACCUM
means that per each distinct (s, t) pair, execute the ACCUM clause once.
PER (s,m,t) ACCUM
means that per each distinct (s, m, t) tuple, execute the ACCUM clause once.
Examples to show PER clause semantics.
If the PER Clause is used in a SELECT query block, then the vertex aliases used in the SELECT, ACCUM , and POST-ACCUM clauses must be confined to the aliases that appear in the PER clause.
Below are some illegal cases.
Example 1. Count the number of Countries that has a City which has a resident that likes a post.
Example 2. Count the number of posts liked by a person who is located in a city that belongs to a country. (All cities are in a country, but humor us. We are reusing the same FROM pattern in several examples.)
Example 3. Find for each country in ("Dominican_Republic","Angola", "Cambodia") the number of posts that is liked by a person living in that country.
Example 4. Find for each country in ("Dominican_Republic","Angola", "Cambodia") the number of posts that is liked by a person living in that country. Use local accumulators this time.
The PER Clause not only helps users to control the semantics of the ACCUM clause, it also boosts the performance of the pattern match query, as it uses the PER clause to optimize the query execution.
To get the best performance, we recommend three guidelines for writing efficient queries.
Per target is in general faster than Per source. In the example below, query q2 is faster than q1. The only difference between these two queries is q2's FROM pattern is the flip of q1's FROM pattern.
The match table is built by traversing the pattern from left to right. Follow the basic principle of pruning early rather than late by orienting the query the smaller cardinality set on the left. This practice will result in producing the least number of candidate matches during the query computation. For example, if there are fewer distinct tags than persons, then query q4 is faster than q3.
Specifying complete type information improves performance. For example, query q6 is faster than q5 even though they are known to be logically identical. Forum
is the CONTAINER_OF
Post
, so it does not need to be specified in q5, but explicitly saying Forum
in q6 speeds up performance.
Using the PER clause and linear regular path pattern, we have translated all of the LDBC-SNB queries. You can find them on github at https://github.com/tigergraph/ecosys/tree/ldbc/ldbc_benchmark/tigergraph/queries_linear/queries. Most of the queries are installed as functions. You can find sample parameter(s) of the functions from https://github.com/tigergraph/ecosys/tree/ldbc/ldbc_benchmark/tigergraph/queries/seeds.
Pattern Matching GSQL supports Insert, Update, and Delete operations. The syntax is identical to that in classic GSQL (v1), though the full range of data modification operations are not yet support.
In general, data modification can be at two levels in GSQL:
Top level. The statement does not need to within any other statement.
Within a SELECT query statement. The FROM-WHERE clauses define a match table, and the data modification is performed based on the vertex and edge information in the match table. The GSQL specifications calls these within-SELECT statements DML-sub statements.
Insert, Update, and Delete currently work in compiled mode only (e.g., you must run INSTALL QUERY before RUN QUERY.) Data Modification in interpreted mode is not yet available.
SELECT queries with data modification may only have one POST-ACCUM clause.
Pattern matching Insert is supported at both the top-level and within-SELECT levels, using the same syntax as in classic GSQL. You can insert vertices and edges.
Example 1. Create a Person vertex, whose name is Tiger Woods. Next, find Viktor's favorite 2012 posts' authors, whose last name is prefixed with S. Finally, insert KNOWS edges connecting Tiger Woods with Viktor's favorite authors.
You can verify the result by running a simple built-in REST endpoint.
Check the inserted vertex.
Check the inserted edges.
To update vertex attributes, use assignment statements in a POST-ACCUM
clause. To update edge attributes, use assignment statements in an ACCUM
clause. In addition, data updates can only be performed if the FROM
statement only contains a single-hop and fix-length pattern.
Query-body level UPDATE
statements are not yet supported in syntax v2.
Example 2. For all KNOWS
edges that connect Viktor Akhiezer and his friends whose lastName
begins with "S", update the edge creationDate
to "2020-10-01". Also, for the Person vertex (Tiger Woods) update the vertex's creationDate and language he speaks.
To verify the update, we can use REST calls.
Check Tiger Woods' creationDate and language he speaks.
Check KNOWS edges whose source is tiger woods.
You can use delete () function to delete edges and vertices in ACCUM and POST-ACCUM clauses.
Top-levels DELETE statements are not yet supported in SYNTAX v2.
Edges can only be deleted in the ACCUM clause.
For best performance, vertices should be deleted in the POST-ACCUM clause.
To perform within-SELECT deletes, the FROM pattern can only be a single hop, fixed length pattern.
Example 3. Delete vertex Tiger Woods and its KNOWS edges.
To verify the result, you can use built-in REST calls.
For a top-level statement, use ,
Inside an ACCUM or POST-ACCUM clause, use the statement.
So far, we have described pattern matching as one path pattern in a FROM clause. In this section, we introduce GSQL's capability to match multiple patterns in one FROM clause. This extension is called Conjunctive Pattern Matching (CPM), because the query asks for the conjunction (logical AND) of the patterns. To get a match, all of the patterns must be satisfied, and the patterns can interrelate. Visually, you can think of patterns formed by a set of intersecting line segments. This feature, introduced as a Beta feature in TigerGraph 3.0, enables you to express complex patterns concisely in a single query block.
In general, a CPM query block consists of multiple patterns in the FROM clause. It has a structure illustrated below.
We elaborate on each of the clause.
The SELECT clause selects only one vertex alias from all the patterns in the FROM clause.
This is where the conjunctive matching is expressed. The FROM clause consists of a list of path patterns, which are separated by commas. Evaluating each pattern against the underlying graph data produces a match table. If two patterns share a vertex alias, then we form the natural join of the two match tables.
For example, consider this CPM:
The first pattern's variables are x, e1, y, e2, and z; the second pattern's variables are z, e3, u, e4, and v. Considering the two patterns independently would yield the follwing match table schemas:
Natural joining two match tables compares all the shared vertex aliases between the two tables, and the resulting joined table contains all non-shared variables plus one copy of each of the shared vertex variables. Here is the match table for the CPM above:
The match table of the conjunctive pattern match is the natural join of all the patterns' match tables. By design, a row in the CPM match table must simultaneously satisfy all the match tables.
If the match tables of the patterns in a FROM clause can be naturally joined into one match table, then the FROM clause has a valid CPM input. Otherwise, the FROM clause has an invalid pattern input list.
For example, below we show two valid CPM inputs and one invalid CPM input.
The predicates in the WHERE clause can use any of the vertex or edge aliases in any of the patterns, including predicates which combine variables from different constituent paths. CPM queries do not have any special restrictions on the WHERE predicate. Distance matters, however, for performance. Conditions that are local, measured both cross-path and within-path, can be resolved earlier and therefore are faster.
In the example below, x2.age > x4.age
is a cross-pattern predicate, e1.timestamp < e3.timestamp
is a cross-pattern predicate, and x1.gender == x4.gender
is a local predicate of the second pattern.
You can ACCUM to any vertex variable in a CPM block.
The ACCUM clause by default will execute as many times as the row (match) count of the CPM match table; each execution uses one row from the match table.
POST-ACCUM for CPM behaves the same as POST-ACCUM for single path patterns. That is, each POST-ACCUM clause can refer to one vertex alias, and the clause executes iteratively over that vertex set (e.g. one vertex column in the matching table). Its statements can access the aggregated accumulator result computed in the ACCUM clause. The query can have multiple POST-ACCUM clauses, one for each vertex alias you wish to work on. The multiple POST-ACCUM clauses are processed in parallel; it doesn't matter in what order you write them. (For each binding, the statements within a clause are executed in order.)
For example, below we have three POST-ACCUM clauses. The first one iterates throughx1
, and for each x1, we do @@cnt += x1.@cnt
. The second and third POST-ACCUMs iterate through x2
and x3
respectively, and accumulates their @cnt
accumulator value into @@cnt
.
Example 1. Find Viktor Akhiezer's liked messages (100+ days after their creation) whose author's last name begin with letter S. Output the message's forum.
Example 2. Find any authors who wrote posts that Viktor Akhiezer's liked and whose last name begins with S. Find the country for each of these authors and report on the countries.
Example 3. Given a TagClass and a Country, find all the Forums created in the given Country, containing at least one Post with Tags belonging directly to the given TagClass. The location of a Forum is identified by the location of the Forum’s moderator.
Example 4. For a given country, count all the distinct triples of Persons such that:
a is a friend of b.
b is a friend of c
c is a friend of a.
Distinct means that if a certain 3 vertices appear once in the results, it will not be repeated: it will appear only once. KNOWS is an undirected relationship, so it doesn't matter in what order we list the 3 vertices.
More Examples. We translated LDBC-SNB BI and IC queries using CPM, and shared the translation in github. Please refer to the query translation here. Most of the queries are installed as functions, you can find sample parameter(s) of the functions from here.
As mentioned when we first described pattern matching, in One-hop patterns, the source (leftmost) vertex set can be a vertex type, an alternation of types, or even omitted.
Example 1. Find Viktor Akhiezer's favorite messages' creators whose last name begins with letter S. Count them.
Example 2. Same query as example 1, but without beginning with vertex types. GSQL compiler can infer the types of :s.
Example 3. Count the LIKES edge.
This section includes some advanced features related to pattern match. It includes using the PER clause to fine control the ACCUM execution, DML support in pattern match, and the conjunctive pattern match syntax which allows multiple patterns in one FROM clause. We dedicate a subsection for each topic.