SELECT Statement
This page discusses the SELECT statement in GSQL Syntax V2, which is the default syntax for TigerGraph version 3.5 and above.
To learn about the SELECT statement in the legacy V1 syntax, see SELECT Statement (Syntax V1)
|
Overview
The SELECT
statement is an assignment statement with a SELECT
block on the right-hand side.
The SELECT
block uses a path pattern to select some of the graph’s vertices and edges.
There are a number of optional clauses that define and/or refine the selection by constraining the vertex or edge set or the result set.
The final output of a query is either a vertex set known as the result set or a table.
There is a maximum size limit of 2 GB for the result set of a SELECT
block.
If the result of the SELECT
block is larger than 2 GB, the system will return no data.
No error message is produced.
gsqlSelectBlock := gsqlSelectClause
fromClause
[sampleClause]
[whereClause]
[accumClause]
[postAccumClause]*
[havingClause]
[orderClause]
[limitClause]
gsqlSelectClause := vertexSetName "=" SELECT vertexAlias
The initial clause is the SELECT
clause: SELECT vertexAlias
.
Its purpose is to specify which set of vertices from the FROM
clause is to become the output.
The SELECT
clause may contain only one item: a vertex alias defined in the FROM
clause.
The vertex alias may be from anywhere in a multi-hop pattern, not only an endpoint.
The FROM
clause defines a path pattern to traverse in the graph, and each vertex in the path pattern can be given a vertex alias.
Thus, the SELECT
clause picks the set of vertices at one of these points in the pattern — the source vertices, the target vertices, or those from an interior point in a multi-hop path — to be the output vertices.
The SELECT
block has many optional clauses, which fit together in a logical flow.
Overall, the SELECT
block starts from a source set of vertices and returns a result set that is either a subset of the source vertices or a subset of their neighboring vertices.
Along the way, computations can be performed on the selected vertices and edges.
FROM Clause (Syntax V3 - GQL Compatible)
The FROM clause in GSQL is used to define path patterns for traversing the graph structure. It is analogous to the MATCH clause in OpenCypher.
It has the form
FROM path_pattern
A path is a chain of vertices, linked by edges.
A path pattern is a template expression for a path.
Formally, it is a chain of vertex patterns, linked by edge patterns.
The FROM
clause searches for all paths that satisfy the path pattern.
Here is an example of a FROM clause:
FROM (s:Student) -[e:visited]-> (c:Country)
(s:Student)
and (c:Country)
are vertex patterns.
-[e:visited]->
is an edge pattern.
GSQL V3 supports the path pattern syntax of GQL as well as the GSQL V2 syntax. The language parser automatically detects which syntax you are using. |
Pattern Matching and Aliases
The path pattern in preceding section means find the students who visited at least one country.
When a query is executed, a FROM clause generates a binding table. In this example, the columns of the table are s
, e
, and c
, the three aliases. Each row of the table is a full set of aliases values that satisfies the pattern.
For example, if
(Maria) -[visited]-> (Germany) (Ahmed) -[visited]-> (Japan)
Aliases enable the subsequent parts of the query to refer to the pattern matches. For example s
is the set (Maria, Ahmed). Likewise c
is (Germany, Japan)
However, if a clause refers to multiple aliases, it will maintain the row-wise integrity of the table.
For example, if an expression refers to (s, c)
, it will yield (Maria, Germany) and (Ahmed, Japan).
It will not yield (Maria, Japan) or (Ahmed, Germany).
Vertex Patterns
A vertex pattern expresses a set of candidate vertices.
The fundamental notation is a pair of parentheses ( )
.
Within the parentheses is an alias, following by a colon, following by one or more vertex sets or vertex types, separated by vertical bars. This is optionally followed by a pattern filter.
(c:company)
Strictly speaking, the alias and vertex sets are optional, as shown in the table below.
Example | Explanation |
---|---|
|
All Account vertices, aliased with |
|
All Account vertices. No alias, so the matching vertices cannot be referred to later in the query. |
|
All vertices, aliased with |
|
All Account and Post vertices, aliased with |
|
All Post vertices which satisfy the filter of attribute |
Edge Patterns
An edge pattern expresses a set of candidate edges. Edge patterns are more complicated than vertex patterns because edges have directionality. The fundamental notation is a pair of square brackets [ ], which are themselves bracketed by directionality markers.
Within the square brackets, the syntax is analogous to that of vertex patterns: an alias, a set of edge types or edge sets, and an optional filter.
The table below lists the directionality markers.
Orientation | Example |
---|---|
Directed Left |
|
Directed Right |
|
Directed Left or Right |
|
Undirected |
|
Undirected or Directed Left |
|
Undirected or Directed Right |
|
Undirect, Directed Left, or Directed Right |
|
Pattern Quantifiers
GSQL has a special syntax for a repeated sequence of a pattern (usually repeated edges). This shorthand is useful when the pattern is focusing only on the relationships (e.g., number of hops) and does not need to refer to the vertices amidst the repeated edges. For example,
(s:Person{name:"Paul Erdos"}) -[:coauthor]-{1:3} (t:Person)
means find all the Persons (alias t
) who are within 3 degrees of coauthor connection with Paul Erdos.
GSQL V3 supports two notations for edge quantifiers, GQL style and the GSQL style originating in GSQL V2.
Orientation | Description | Example |
---|---|---|
|
from m to n repetitions |
|
|
m or more repetitions |
|
|
from 0 to n repetitions |
|
|
0 or more repetitions |
|
|
1 or more repetitions |
|
Orientation | Description | Example |
---|---|---|
|
from m to n repetitions |
|
|
m or more repetitions |
|
|
0 to n repetitions |
|
|
exactly n repetitions |
|
|
1 or more times |
|
Conjunctive Pattern Matching
The pattern we have described so far are all linear path patterns. You can combine two or more linear path patterns, simply by forming them as a comma-separated list. They form a conjunction, meaning all of them must be satisfied in order to have a valid match result.
The aliases used among them are shared.
So, if you use s
in one pattern and s
in another pattern, the are the same s
.
This enables you to combine the linear patterns to for a "V"-shaped or "X"-shaped pattern
For example:
(p:Person{name:"Paul Erdos"}) -[:coauthor]-{1:3} (t:Person), (k:Person{name:"Kevin Bacon"}) -[:worked_with]-{1:3} (t:Person)
Find Persons (alias t
) who both are within 3 coauthorship hops of Paul Erdos and within 3 coworker hops of Kevin Bacon.
SAMPLE
The SAMPLE
clause is an optional clause that selects a uniform random sample from the population of edges or target vertices specified in the FROM
argument.
Known issue: The SAMPLE clause is not supported in syntax V2 in 4.2.
To use SAMPLE clauses, write your query in syntax V1.
|
If you want to sample from a set of vertices directly, not from edges or from neighboring (target) vertices, then the following technique is simpler and faster: Select k random vertices from a vertex set S
|
The SAMPLE
clause draws from the edge population consisting of those edges which satisfy all three parts — source set, edge type, and target type — of the FROM
clause.
The SAMPLE
clause is intended to provide a representative sample of the distribution of edges (or vertices) connected to hub vertices, instead of dealing with all edges. A hub vertex is a vertex with a relatively high degree.
sampleClause := SAMPLE ( expr | expr "%" ) EDGE WHEN condition (1)
| SAMPLE expr TARGET WHEN condition (2)
| SAMPLE expr "%" TARGET PINNED WHEN condition (3)
1 | Sample an absolute number (or a percentage) of edges for each source vertex. |
2 | Sample an absolute number of edges incident to each target vertex. |
3 | Sample a percentage of edges incident to each target vertex. |
The expression following SAMPLE
specifies the sample size, either an absolute number or a percentage of the population.
The expression in a SAMPLE
clause must evaluate to a positive integer.
There are two sampling methods:
-
Sampling based on edge ID
-
Sampling based on target vertex ID: if a target vertex ID is sampled, all edges from this source vertex to the sampled target vertex are sampled.
Currently, the |
Given that the sampling is random, some details of each of the example queries may change each time they are run.
The following query displays two modes of sampling: an absolute number of edges from a source vertex and a percentage of edges from a source vertex. We use the Computer_Net graph (see Appendix D). In Computer_Net, there are 31 vertices and 43 edges, but only 7 vertices are source vertices. Moreover, c1, c12, and c23 are hub nodes, with at least 10 outgoing edges each. For the absolute count case, we set the size to 1 edge per source vertex, which is equivalent to a random walk. We expect exactly 7 edges to be selected. For the percentage sampling case, we sample 33% of the edges for vertices which have 3 or more outgoing edges. We expect about 15 edges, but the number may vary.
CREATE QUERY sample_ex_3() FOR GRAPH Computer_Net {
// record each selected edge as (src->tgt)
MapAccum<STRING,ListAccum<STRING>> @@abs_edges;
SumAccum<INT> @@total_abs;
// record each selected edge as (src->tgt)
MapAccum<STRING,ListAccum<STRING>> @@pct_edges;
SumAccum<INT> @@total_pct;
start = {Computer.*};
// Sample one outgoing edge per source vertex = Random Walk
abs_sample = SELECT v FROM (s:start) -[e]- (v)
SAMPLE 1 EDGE WHEN s.outdegree() >= 1 // sample 1 target vertex from each source vertex
ACCUM
@@abs_edges += (s.id -> v.id),
@@total_abs += 1;
PRINT @@total_abs, @@abs_edges;
pct_sample = SELECT v FROM start:s -(:e)- :v
SAMPLE 33% EDGE WHEN s.outdegree() >= 3 # select ~1/3 of edges when outdegree >= 3
ACCUM
@@pct_edges += (s.id -> v.id),
@@total_pct += 1;
PRINT @@total_pct, @@pct_edges;
}
RUN QUERY sample_ex_3()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [
{
"@@total_abs": 64,
"@@abs_edges": {
"c31": ["c23"],
"c30": ["c23"],
"c11": [
"c12",
"c10"
],
"c10": [
"c11",
"c1"
],
"c13": ["c12"],
"c12": [
"c13",
"c14",
"c21",
"c16",
"c19",
"c17",
"c18",
"c22",
"c20",
"c15",
"c11"
],
"c15": ["c12"],
"c14": [
"c23",
"c24",
"c12"
],
"c17": ["c12"],
"c16": ["c12"],
"c19": ["c12"],
"c18": ["c12"],
"c20": ["c12"],
"c22": ["c12"],
"c21": ["c12"],
"c24": [
"c14",
"c23"
],
"c23": [
"c29",
"c31",
"c27",
"c26",
"c25",
"c24",
"c28",
"c30",
"c14",
"c4"
],
"c26": ["c23"],
"c25": ["c23"],
"c28": ["c23"],
"c27": ["c23"],
"c29": ["c23"],
"c1": [
"c7",
"c4",
"c3",
"c2",
"c10",
"c5",
"c9",
"c8",
"c6"
],
"c2": ["c1"],
"c3": ["c1"],
"c4": [
"c23",
"c1"
],
"c5": ["c1"],
"c6": ["c1"],
"c7": ["c1"],
"c8": ["c1"],
"c9": ["c1"]
}
},
{
"@@total_pct": 64,
"@@pct_edges": {
"c31": ["c23"],
"c30": ["c23"],
"c11": [
"c12",
"c10"
],
"c10": [
"c11",
"c1"
],
"c13": ["c12"],
"c12": [
"c13",
"c14",
"c21",
"c16",
"c19",
"c17",
"c18",
"c22",
"c20",
"c15",
"c11"
],
"c15": ["c12"],
"c14": [
"c23",
"c24",
"c12"
],
"c17": ["c12"],
"c16": ["c12"],
"c19": ["c12"],
"c18": ["c12"],
"c20": ["c12"],
"c22": ["c12"],
"c21": ["c12"],
"c24": [
"c14",
"c23"
],
"c23": [
"c29",
"c31",
"c27",
"c26",
"c25",
"c24",
"c28",
"c30",
"c14",
"c4"
],
"c26": ["c23"],
"c25": ["c23"],
"c28": ["c23"],
"c27": ["c23"],
"c29": ["c23"],
"c1": [
"c7",
"c4",
"c3",
"c2",
"c10",
"c5",
"c9",
"c8",
"c6"
],
"c2": ["c1"],
"c3": ["c1"],
"c4": [
"c23",
"c1"
],
"c5": ["c1"],
"c6": ["c1"],
"c7": ["c1"],
"c8": ["c1"],
"c9": ["c1"]
}
}
]
}
Below is an example of using SELECT
to only traverse one edge for each source vertex.
The vertex-attached accumulators @times_traversed_no_sample
and @times_traversed_with_sample
are used to keep track of the number of times an edge is traversed to reach the target vertex.
Without using sampling, this occurs once for each edge; thus @times_traversed_no_sample
has the same number as the in-degree of the vertex.
With sampling edges, the number of edges is restricted.
This is reflected in the @times_traversed_with_sample
accumulator.
Notice the difference in the result set.
Because only one edge per source vertex is traversed when the SAMPLE
clause is used, not all target vertices are reached.
The vertex company3
has 3 incident edges, but in one instance of the query execution, it is never reached.
Additionally, company2
has 6 incident edges, but only 4 source vertices sampled an edge incident to company2
.
SAMPLE
using an absolute number of edgesCREATE QUERY sample_ex_1() FOR GRAPH Work_Net {
SumAccum<INT> @times_traversed_no_sample;
SumAccum<INT> @times_traversed_with_sample;
workers = {Person.*};
// the 'before_sample' result set encapsulates the normal functionality of
// a SELECT statement, where 'times_traversed_no_sample' vertex accumulator is increased for
// each edge incident to the vertex.
before_sample = SELECT v FROM (t:workers) -[e]- (v)
ACCUM v.@times_traversed_no_sample += 1;
// The 'after_sample' result set is formed by those vertices which can be
// reached when for each source vertex, only one edge is used for traversal.
// This is demonstrated by the values of 'times_traversed_with_sample' vertex accumulator, which
// is increased for each edge incident to the vertex which is used in the
// sample.
after_sample = SELECT v FROM (t:workers) -[e]- (v)
SAMPLE 1 EDGE WHEN t.outdegree() >= 1 // only use 1 edge from the source vertex
ACCUM v.@times_traversed_with_sample += 1;
PRINT before_sample;
PRINT after_sample;
}
GSQL > RUN QUERY sample_ex_1()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [
{"before_sample": [
{
"v_id": "company2",
"attributes": {
"country": "chn",
"@times_traversed_with_sample": 2,
"@times_traversed_no_sample": 6,
"id": "company2"
},
"v_type": "Company"
},
{
"v_id": "company4",
"attributes": {
"country": "us",
"@times_traversed_with_sample": 1,
"@times_traversed_no_sample": 1,
"id": "company4"
},
"v_type": "Company"
},
{
"v_id": "company3",
"attributes": {
"country": "jp",
"@times_traversed_with_sample": 2,
"@times_traversed_no_sample": 3,
"id": "company3"
},
"v_type": "Company"
},
{
"v_id": "company1",
"attributes": {
"country": "us",
"@times_traversed_with_sample": 6,
"@times_traversed_no_sample": 6,
"id": "company1"
},
"v_type": "Company"
},
{
"v_id": "company5",
"attributes": {
"country": "can",
"@times_traversed_with_sample": 1,
"@times_traversed_no_sample": 1,
"id": "company5"
},
"v_type": "Company"
}
]},
{"after_sample": [
{
"v_id": "company2",
"attributes": {
"country": "chn",
"@times_traversed_with_sample": 2,
"@times_traversed_no_sample": 6,
"id": "company2"
},
"v_type": "Company"
},
{
"v_id": "company4",
"attributes": {
"country": "us",
"@times_traversed_with_sample": 1,
"@times_traversed_no_sample": 1,
"id": "company4"
},
"v_type": "Company"
},
{
"v_id": "company3",
"attributes": {
"country": "jp",
"@times_traversed_with_sample": 2,
"@times_traversed_no_sample": 3,
"id": "company3"
},
"v_type": "Company"
},
{
"v_id": "company5",
"attributes": {
"country": "can",
"@times_traversed_with_sample": 1,
"@times_traversed_no_sample": 1,
"id": "company5"
},
"v_type": "Company"
},
{
"v_id": "company1",
"attributes": {
"country": "us",
"@times_traversed_with_sample": 6,
"@times_traversed_no_sample": 6,
"id": "company1"
},
"v_type": "Company"
}
]}
]
}
Since the |
WHERE
The WHERE
clause is an optional clause that constrains edges and vertices specified in the FROM
and SAMPLE
clauses.
WHERE
ClausewhereClause := WHERE condition
The WHERE
clause uses a boolean condition to test each vertex or edge in the FROM
set (or the sampled vertex and edge sets, if the SAMPLE
clause was used).
If the expression evaluates to false for vertex/edge X, then X is excluded from further consideration in the result set.
The expression may use constants or any variables or parameters within the scope of the SELECT
block.
The expression may use arithmetic operators, comparison operators, boolean operators, set operators and parentheses to enforce precedence.
The WHERE
conditional expression may use any of the variables within its scope (global accumulators, vertex set variables, query input parameters, the FROM
clause’s vertex and edge sets (or their vertex and edge aliases), or any of the attributes or accumulators of the vertex/edge sets.)
For a more formal explanation of condition, see the EBNF definitions of condition
and expr
.
Using built-in vertex and edge attributes and functions, such as .type
and .neighbors()
, the WHERE
clause can be used to implement sophisticated selection rules for the edge traversal.
In the following example, the selection conditions are completely specified in the WHERE
clause, with no edge types or vertex types mentioned in the FROM
clause.
WHERE
used as a filterresult_set1 = SELECT v FROM (v:S) -[e:E1|E2|E3]- (t:V1|V2);
result_set2 = SELECT v FROM (v:S) -[e:]- (t)
WHERE t.type IN ("V1", "V2") AND t IN v.neighbors("E1|E2|E3");
The following examples demonstrate using the WHERE
clause to limit the resulting vertex set based on a vertex attribute.
SELECT WHERE
CREATE QUERY print_cat_posts() FOR GRAPH Social_Net {
cat_posts = SELECT v FROM (v:Post) // select only those post vertices
WHERE v.subject == "cats"; // which have a subset of 'cats'
PRINT cat_posts;
}
GSQL > RUN QUERY print_cat_posts()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"cat_posts": [
{
"v_id": "8",
"attributes": {
"subject": "cats",
"post_time": "2011-02-03 17:05:52"
},
"v_type": "Post"
},
{
"v_id": "3",
"attributes": {
"subject": "cats",
"post_time": "2011-02-05 01:02:44"
},
"v_type": "Post"
},
{
"v_id": "9",
"attributes": {
"subject": "cats",
"post_time": "2011-02-05 23:12:42"
},
"v_type": "Post"
},
{
"v_id": "10",
"attributes": {
"subject": "cats",
"post_time": "2011-02-04 03:02:31"
},
"v_type": "Post"
},
{
"v_id": "11",
"attributes": {
"subject": "cats",
"post_time": "2011-02-03 01:02:21"
},
"v_type": "Post"
}
]}]
}
CREATE QUERY find_graph_focused_posts() FOR GRAPH Social_Net {
results = SELECT v FROM (v:Post) // select only post vertices
WHERE v.subject IN ("Graph", "tigergraph"); // which have a subject of either 'Graph' or 'tigergraph'
PRINT results;
}
GSQL > RUN QUERY find_graph_focused_posts()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"results": [
{
"v_id": "1",
"attributes": {
"subject": "tigergraph",
"post_time": "2011-03-03 23:02:00"
},
"v_type": "Post"
},
{
"v_id": "6",
"attributes": {
"subject": "tigergraph",
"post_time": "2011-02-05 02:02:05"
},
"v_type": "Post"
},
{
"v_id": "5",
"attributes": {
"subject": "tigergraph",
"post_time": "2011-02-06 01:02:02"
},
"v_type": "Post"
}
]}]
}
The |
The following example shows the equivalence of using WHERE
as a type filter as well as its limitations.
SELECT
statement with WHERE
clause using AND
/OR
CREATE QUERY find_female_members() FOR GRAPH Social_Net
{
// Finds female person in the social network. all of the following statements
// are equivalent (i.e., produce the same results)
all_vertices = {ANY}; # includes all posts and person
females = SELECT v FROM (v:all_vertices)
WHERE v.type == "Person" AND v.gender != "Male";
females = SELECT v FROM (v:all_vertices)
WHERE v.type == "Person" AND v.gender == "Female";
females = SELECT v FROM (v:all_vertices)
WHERE v.type == "Person" AND
NOT v.gender == "Male";
females = SELECT v FROM (v:all_vertices)
WHERE v.type != "Post" AND
NOT v.gender == "Male";
/* does not compile. cannot use NOT operator in combination with type attribute
females = SELECT v FROM (v:all_vertices)
WHERE NOT v.type != "Person" AND
NOT v.gender == "Male";
does not compile. cannot use NOT operator in combination with type attribute
females = SELECT v FROM (v:all_vertices)
WHERE NOT v.type == "Post" AND
NOT v.gender == "Male";
person_vertices = {Person.*};
females = SELECT v FROM (person_vertices:v)
WHERE NOT v.gender == "Male";
females = SELECT v FROM (person_vertices:v)
WHERE v.gender != "Male";
females = SELECT v FROM (person_vertices:v)
WHERE v.gender != "Male" AND true;
females = SELECT v FROM (person_vertices:v)
WHERE v.gender != "Male" OR false;
PRINT females;
}
GSQL > RUN QUERY find_female_members()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"females": [
{
"v_id": "person5",
"attributes": {
"gender": "Female",
"id": "person5"
},
"v_type": "Person"
},
{
"v_id": "person4",
"attributes": {
"gender": "Female",
"id": "person4"
},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {
"gender": "Female",
"id": "person2"
},
"v_type": "Person"
}
]}]
}
The following example uses edge attributes to determine which workers are registered as full time for any company.
WHERE
using edge attributesCREATE QUERY full_time_workers() FOR GRAPH Work_Net {
// find all workers who are full time at some company
start = {Person.*};
full_time_workers = SELECT v FROM (v:start) -[e:Works_For]- (t:Company)
WHERE e.full_time; // full_time is a boolean attribute on the edge
PRINT full_time_workers;
}
full_time_workers
ResultsGSQL > RUN QUERY full_time_workers()
{
"error": false,
"message": "",
"version": {
"edition": "developer",
"schema": 0,
"api": "v2"
},
"results": [{"full_time_workers": [
{
"v_id": "person4",
"attributes": {
"interestList": ["football"],
"skillSet": [ 10, 1, 4 ],
"skillList": [ 4, 1, 10 ],
"locationId": "us",
"interestSet": ["football"],
"id": "person4"
},
"v_type": "Person"
},
{
"v_id": "person11",
"attributes": {
"interestList": [ "sport", "football" ],
"skillSet": [10],
"skillList": [10],
"locationId": "can",
"interestSet": [ "football", "sport" ],
"id": "person11"
},
"v_type": "Person"
},
{
"v_id": "person10",
"attributes": {
"interestList": [ "football", "sport" ],
"skillSet": [3],
"skillList": [3],
"locationId": "us",
"interestSet": [ "sport", "football" ],
"id": "person10"
},
"v_type": "Person"
},
{
"v_id": "person1",
"attributes": {
"interestList": [ "management", "financial" ],
"skillSet": [ 3, 2, 1 ],
"skillList": [ 1, 2, 3 ],
"locationId": "us",
"interestSet": [ "financial", "management" ],
"id": "person1"
},
"v_type": "Person"
},
{
"v_id": "person6",
"attributes": {
"interestList": [ "music", "art" ],
"skillSet": [ 10, 7 ],
"skillList": [ 7, 10 ],
"locationId": "jp",
"interestSet": [ "art", "music" ],
"id": "person6"
},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {
"interestList": ["engineering"],
"skillSet": [ 6, 5, 3, 2 ],
"skillList": [ 2, 3, 5, 6 ],
"locationId": "chn",
"interestSet": ["engineering"],
"id": "person2"
},
"v_type": "Person"
},
{
"v_id": "person8",
"attributes": {
"interestList": ["management"],
"skillSet": [ 2, 5, 1 ],
"skillList": [ 1, 5, 2 ],
"locationId": "chn",
"interestSet": ["management"],
"id": "person8"
},
"v_type": "Person"
},
{
"v_id": "person12",
"attributes": {
"interestList": [
"music",
"engineering",
"teaching",
"teaching",
"teaching"
],
"skillSet": [ 2, 5, 1 ],
"skillList": [ 1, 5, 2, 2, 2 ],
"locationId": "jp",
"interestSet": [ "teaching", "engineering", "music" ],
"id": "person12"
},
"v_type": "Person"
},
{
"v_id": "person3",
"attributes": {
"interestList": ["teaching"],
"skillSet": [ 6, 1, 4 ],
"skillList": [ 4, 1, 6 ],
"locationId": "jp",
"interestSet": ["teaching"],
"id": "person3"
},
"v_type": "Person"
},
{
"v_id": "person9",
"attributes": {
"interestList": [ "financial", "teaching" ],
"skillSet": [ 2, 7, 4 ],
"skillList": [ 4, 7, 2 ],
"locationId": "us",
"interestSet": [ "teaching", "financial" ],
"id": "person9"
},
"v_type": "Person"
}
]}]
}
If multiple edge types are specified in edge-induced selection, the Multiple Edge Type
WHERE clause
The above query is compilable.
However, if we use line 5 as the |
ACCUM
The ACCUM
clause enables sophisticated aggregation and other computations across the set of vertices or edges selected by the preceding FROM
, SAMPLE
, and WHERE
clauses.
Syntax
The primary purpose of the ACCUM
clause is to collect information about the graph by updating accumulators (via +=
or =
).
However, other kinds of statements (e.g., branching, iteration, local assignments) are permitted to support more complex computations or to log activity.
The EBNF syntax below defines the allowable kinds of statements that can occur within an ACCUM
clause.
ACCUM
clauseaccumClause := [perClauseV2] ACCUM dmlSubStmtList (1)
dmlSubStmtList := dmlSubStmt ["," dmlSubStmt]*
dmlSubStmt := assignStmt // Assignment (2) (3)
| funcCallStmt // Function Call
| gAccumAccumStmt // Assignment
| lAccumAccumStmt // Assignment
| attrAccumStmt // Assignment
| vAccumFuncCall // Function Call
| localVarDeclStmt // Declaration
| dmlSubCaseStmt // Control Flow
| dmlSubIfStmt // Control Flow
| dmlSubWhileStmt // Control Flow
| dmlSubForEachStmt // Control Flow
| BREAK // Control Flow
| CONTINUE // Control Flow
| insertStmt // Data Modification
| dmlSubDeleteStmt // Data Modification
| printlnStmt // Output
| logStmt // Output
1 | DML-sub-statements do not include global accumulator assignment statement (gAccumAssignStmt) but global accumulator accumulation statement (gAccumAccumStmt).
Global accumulators may perform accumulation += but not assignment = within an ACCUM clause. |
2 | Global variable assignment is permitted in an ACCUM clause, but the change in value will not take place until the query completes.
Therefore, if there are multiple assignment statements for the same variable, only the final one will take effect. |
3 | Vertex attribute assignment = is not permitted in an ACCUM clause. However, edge attribute assignment is permitted.
This is because the ACCUM clause iterates over an edge set.
Vertex attribute assignment is permitted in the POST-ACCUM clause.
Like all updates, the change in value does not take place until the query completes. |
Iteration model
The ACCUM
clause is executed once for each set of vertices and edges in the graph which match the pattern and constraints given in the FROM
and WHERE
clauses.
You can think of FROM-WHERE
as producing a virtual table.
The columns of this matching table are the alias variables from the FROM
clause pattern, and the rows are each possible set of vertex and edge aliases (e.g. a path) which fit the pattern.
For a simple 1-hop pattern below:
FROM Person:A -(IS_LOCATED_IN:B)- City:C
The above FROM
clause produces a match table with 3 columns: A, B, and C.
Each row is a tuple (A,B,C) where there is a has_lived_in
edge B from a Person
vertex A to a City
vertex C.
We say that the match table provides a binding between the pattern aliases and graph’s vertices and edges.
A multi-hop pattern simply has more columns than a 1-hop pattern.
Since the ACCUM
clause iterates over edges, and often two edges will connect to the same source vertex or to the same target vertex, the ACCUM
clause can be repeated multiple times for one vertex.
Operations that are to be performed exactly once per vertex should be performed in the POST-ACCUM
clause.
Parallelism in ACCUM
clause
TigerGraph uses parallelism to improve performance.
The statements within the ACCUM
clause are executed sequentially for a given vertex or edge.
However, there is no fixed order in which a vertex set or edge set is processed.
Any inspection of an intermediate result within the ACCUM
clause is incomplete and may not be that meaningful.
Due to the parallel nature of the statements within an ACCUM
clause, you cannot reference updated variables within an ACCUM
clause since there is no guarantee that the variable was indeed updated when it’s referenced.
For example, in the following query, the second statement in the ACCUM
clause does not necessarily run after the first statement.
CREATE QUERY accum_wrong_example() SYNTAX v3 {
SumAccum<INT> @@count_total;
SumAccum<INT> @active_flag = 0;
result = SELECT p
FROM (p:Person) -[:KNOWS]- (w:Person)
WHERE w.lastName == "Wang" AND p.firstName == "Peter"
ACCUM p.@active_flag += 1,
@@count_total += p.@active_flag; (1)
PRINT @@count_total, result[result.@active_flag];
}
1 | This is the un-updated value of @active_flag. The read value of an accumulator does not change until the ACCUM phase ends. |
The correct way to execute the logic is to put the second statement in the POST-ACCUM
clause.
This guarantees that the second statement executes after the first statement:
CREATE QUERY accum_example() SYNTAX v3 {
SumAccum<INT> @@count_total;
SumAccum<INT> @active_flag = 0;
result = SELECT p
FROM (p:Person) -[:KNOWS]- (w:Person)
WHERE w.lastName == "Wang" AND p.firstName == "Peter"
ACCUM p.@active_flag += 1
POST-ACCUM @@count_total += p.@active_flag;
PRINT @@count_total, result[result.@active_flag];
}
The For example, consider the following clauses:
This finds the friends of the friends of |
Edge/Vertex type inference and conflict
If multiple edge types are specified in an ACCUM
clause, each ACCUM
statement in the ACCUM
clause checks whether edge types are conflicted.
If only a subset of edge types are effective in an ACCUM
statement, this statement is not executed on other edge types.
For example:
ACCUM
statement checkCREATE QUERY multiple_edge_type_check_ex (VERTEX<Person> m1) FOR GRAPH Social_Net {
ListAccum<STRING> @@test_list_1, @@test_list_2, @@test_list_3;
all_user = {m1};
all_user = SELECT s
FROM (s:all_user) -[e:Posted|Liked|Friend]- (t:Post|Person)
ACCUM @@test_list_1 += to_string(datetime_to_epoch(e.action_time)),
@@test_list_2 += t.gender,
@@test_list_3 += to_string(datetime_to_epoch(e.action_time)) + t.gender (1)
;
PRINT @@test_list_1, @@test_list_2, @@test_list_3;
}
1 | This statement causes a compilation error. |
In the above example, line 6 is only executed on Liked
edges, because action_time
is the attribute of liked
edge only.
Similarly, line 7 is only executed on Friend
edges, because gender
is the attribute of person
only, and only Friend
edge uses person
as target vertex.
However, line 8 causes a compilation error, because it uses multiple edges where some edges cannot be supported in a part of the statement, i.e., liked
edges doesn’t have t.gender
, Friend
edges doesn’t have e.action_time
.
We strongly suggest that if multiple edge types are specified in edge-induced selection, Multiple Edge Type
ACCUM statement check 2
The above query is compilable.
However, if we switch line 8 and line 10, the edge-type conflict checking generates errors because |
Examples
This example uses ACCUM
to find all the subjects a user posted about.
CREATE QUERY user_posts() FOR GRAPH Social_Net {
// For each person, make a list of all their post subjects
ListAccum<STRING> @person_posts;
start = {Person.*};
// Find all user post topics and append them to the vertex list accum
user_postings = SELECT s FROM (s:start) -[:Posted]- (g)
ACCUM s.@person_posts += g.subject;
PRINT user_postings;
}
GSQL > RUN QUERY user_posts()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"user_postings": [
{
"v_id": "person7",
"attributes": {
"gender": "Male",
"@person_posts": [
"tigergraph",
"cats"
],
"id": "person7"
},
"v_type": "Person"
},
{
"v_id": "person5",
"attributes": {
"gender": "Female",
"@person_posts": [
"coffee",
"cats"
],
"id": "person5"
},
"v_type": "Person"
},
{
"v_id": "person1",
"attributes": {
"gender": "Male",
"@person_posts": ["Graphs"],
"id": "person1"
},
"v_type": "Person"
},
{
"v_id": "person4",
"attributes": {
"gender": "Female",
"@person_posts": ["cats"],
"id": "person4"
},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {
"gender": "Female",
"@person_posts": ["tigergraph"],
"id": "person2"
},
"v_type": "Person"
},
{
"v_id": "person8",
"attributes": {
"gender": "Male",
"@person_posts": [
"cats",
"Graphs"
],
"id": "person8"
},
"v_type": "Person"
},
{
"v_id": "person6",
"attributes": {
"gender": "Male",
"@person_posts": [
"tigergraph",
"cats"
],
"id": "person6"
},
"v_type": "Person"
},
{
"v_id": "person3",
"attributes": {
"gender": "Male",
"@person_posts": ["query languages"],
"id": "person3"
},
"v_type": "Person"
}
]}]
}
This example shows each person’s posted vertices and each person’s like behaviors (liked edges).
ACCUM<VERTEX>
and ACCUM<EDGE>
ExampleCREATE QUERY user_posts_2() FOR GRAPH Social_Net {
// Show each user's post and liked post time
ListAccum<VERTEX> @person_posts;
ListAccum<EDGE> @person_liked_info;
start = {Person.*};
// Find all user post topics and append them to the vertex list accum
user_postings = SELECT s FROM (s:start) -[:Posted]- (g)
ACCUM s.@person_posts += g;
user_postings = SELECT s FROM (s:start) -[e:Liked]- (g)
ACCUM s.@person_liked_info += e;
PRINT start;
}
GSQL > RUN QUERY user_posts2()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"start": [
{
"v_id": "person7",
"attributes": {
"gender": "Male",
"@person_posts": [
"6",
"9"
],
"id": "person7",
"@person_liked_info": [{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person7",
"to_id": "10",
"attributes": {"action_time": "2010-01-12 11:22:05"},
"e_type": "Liked"
}]
},
"v_type": "Person"
},
{
"v_id": "person5",
"attributes": {
"gender": "Female",
"@person_posts": [
"4",
"11"
],
"id": "person5",
"@person_liked_info": [{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person5",
"to_id": "6",
"attributes": {"action_time": "2010-01-12 21:12:05"},
"e_type": "Liked"
}]
},
"v_type": "Person"
},
{
"v_id": "person1",
"attributes": {
"gender": "Male",
"@person_posts": ["0"],
"id": "person1",
"@person_liked_info": [{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person1",
"to_id": "0",
"attributes": {"action_time": "2010-01-11 11:32:00"},
"e_type": "Liked"
}]
},
"v_type": "Person"
},
{
"v_id": "person4",
"attributes": {
"gender": "Female",
"@person_posts": ["3"],
"id": "person4",
"@person_liked_info": [{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person4",
"to_id": "4",
"attributes": {"action_time": "2010-01-13 03:16:05"},
"e_type": "Liked"
}]
},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {
"gender": "Female",
"@person_posts": ["1"],
"id": "person2",
"@person_liked_info": [
{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person2",
"to_id": "0",
"attributes": {"action_time": "2010-01-12 10:52:15"},
"e_type": "Liked"
},
{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person2",
"to_id": "3",
"attributes": {"action_time": "2010-01-11 16:02:26"},
"e_type": "Liked"
}
]
},
"v_type": "Person"
},
{
"v_id": "person6",
"attributes": {
"gender": "Male",
"@person_posts": [
"5",
"10"
],
"id": "person6",
"@person_liked_info": [{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person6",
"to_id": "8",
"attributes": {"action_time": "2010-01-14 11:23:05"},
"e_type": "Liked"
}]
},
"v_type": "Person"
},
{
"v_id": "person8",
"attributes": {
"gender": "Male",
"@person_posts": [
"8",
"7"
],
"id": "person8",
"@person_liked_info": [{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person8",
"to_id": "4",
"attributes": {"action_time": "2010-01-11 03:26:05"},
"e_type": "Liked"
}]
},
"v_type": "Person"
},
{
"v_id": "person3",
"attributes": {
"gender": "Male",
"@person_posts": ["2"],
"id": "person3",
"@person_liked_info": [{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person3",
"to_id": "0",
"attributes": {"action_time": "2010-01-16 05:15:53"},
"e_type": "Liked"
}]
},
"v_type": "Person"
}
]}]
}
This example counts the total number of times each topic is used.
CREATE QUERY user_posts_by_topic() FOR GRAPH Social_Net {
//Show number of total posts by topic
MapAccum<STRING, INT> @@post_topic_counts;
start = {Person.*};
// Append subject and update the appearance count in the global map accum
posts = SELECT g FROM (:start) -[:Posted]- (g)
ACCUM @@post_topic_counts += (g.subject -> 1);
PRINT @@post_topic_counts;
}
GSQL > RUN QUERY user_posts_by_topic()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"@@post_topic_counts": {
"cats": 5,
"coffee": 1,
"Graphs": 2,
"query languages": 1,
"tigergraph": 3
}}]
}
POST-ACCUM
The optional POST-ACCUM
clause enables aggregation and other computations across the set of vertices (but not edges) selected by the preceding clauses.
POST-ACCUM
can be used without ACCUM
.
If it is preceded by an ACCUM
clause, then it can be used for 2-stage accumulative computation: a first stage in ACCUM
followed by a second stage in POST-ACCUM
.
When you reference a vertex alias in a DML-sub statement in a POST-ACCUM
statement, you bind that vertex alias to the POST-ACCUM
clause implicitly.
You can also explicitly bind a vertex alias with a POST-ACCUM
clause by putting the vertex alias in parentheses immediately after the keyword POST-ACCUM
.
Each POST-ACCUM
clause must be bound with one and only one vertex alias.
A SELECT
statement can have multiple POST-ACCUM
clauses.
If you need to run aggregation and other computations by referencing more than one vertex alias, you can use more than one POST-ACCUM
clause.
Each POST-ACCUM
clauses are processed in parallel; it doesn’t matter in what order you write them.
Syntax
POST-ACCUM
clausepostAccumClause := POST-ACCUM ["(" vertexAlias ")"] dmlSubStmtList
dmlSubStmtList := dmlSubStmt ["," dmlSubStmt]*
dmlSubStmt := assignStmt // Assignment
| funcCallStmt // Function Call
| gAccumAccumStmt // Assignment
| lAccumAccumStmt // Assignment
| attrAccumStmt // Assignment
| vAccumFuncCall // Function Call
| localVarDeclStmt // Declaration
| dmlSubCaseStmt // Control Flow
| dmlSubIfStmt // Control Flow
| dmlSubWhileStmt // Control Flow
| dmlSubForEachStmt // Control Flow
| BREAK // Control Flow
| CONTINUE // Control Flow
| insertStmt // Data Modification
| dmlSubDeleteStmt // Data Modification
| printlnStmt // Output
| logStmt // Output
Iteration model
The ACCUM
clause executes for each full path that matches the pattern in the FROM
clause.
In contrast, the POST-ACCUM
clause executes for each vertex in one vertex set; its statements can access the aggregated accumulator result computed in the ACCUM
clause.
You can think of the matching result of the FROM
clause and the WHERE
clause as a virtual table.
The columns of this matching table are the alias variables from the FROM
clause pattern, and the rows are each possible set of vertex and edge aliases (e.g. a path) which fit the pattern.
A POST-ACCUM
clause acts like a FOREACH
loop on the columns of the vertex result set specified in the SELECT
clause and only occurs once for each vertex.
To perform per-vertex updates for more than one vertex alias, use a separate POST-ACCUM
clause for each vertex alias.
For example, below we have three POST-ACCUM
clauses.
INTERPRET QUERY () {
SumAccum<INT> @cnt1;
SumAccum<INT> @cnt2;
SumAccum<INT> @@global_t_count;
R = SELECT s
FROM (s:Person) -[:Likes]\-> (:msg) -[:Has_Creator]\-> (t:Person)
WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer"
AND t.last_name LIKE "S%" AND year(msg.creation_date) == 2012
ACCUM s.@cnt1 +=1 //execute this per match of the FROM pattern.
POST-ACCUM s.@cnt2 += s.@cnt1 (1)
POST-ACCUM t.@cnt2 +=1 (2)
POST-ACCUM(t) @@global_t_count += 1; (3)
PRINT R [R.first_name, R.last_name, R.@cnt1, R.@cnt2];
}
1 | The first one iterates through s , and for each s , we do s.@cnt2 += s.@cnt1 . |
2 | The second POST-ACCUM iterates through t . |
3 | The third POST-ACCUM also iterates through t , but its DML-sub statements do not reference t .
Rather, the POST-ACCUM is bound to t explicitly. |
However, the following is not allowed, since it involves two aliases (t and s) in one POST-ACCUM
clause.
POST-ACCUM t.@cnt1 += 1,
s.@cnt1 += 1
Also, you may not use more than one alias in a single assignment. The following is not allowed:
POST-ACCUM t.@cnt1 += s.@cnt + 1
Multiple POST-ACCUM
clauses
A SELECT
statement can have multiple POST-ACCUM
clauses. Each POST-ACCUM
may refer to only one vertex alias.
See the POST-ACCUM
section in the Pattern Matching tutorial for more details.
Examples
This is an example of using ACCUM
and POST-ACCUM
in conjunction.
The ACCUM
traverses the graph and finds all people who live and work in the same country.
After this is determined, POST-ACCUM
examines each vertex (person) to see if they work where they live.
POST-ACCUM
ExampleCREATE QUERY resident_employees() FOR GRAPH Work_Net {
// Show all persons who both work and live in the same country
ListAccum<STRING> @company;
OrAccum @works_and_lives;
start = {Person.*};
employees = SELECT s FROM (s:start) -[:Works_For]- (c)
// If a person works for a company in the same country where they live, add the company to the list
ACCUM CASE
WHEN (s.location_id == c.country) THEN
s.@company += c.id
END
// Check each vertex and see if a person works where they live
POST-ACCUM CASE
WHEN (s.@company.size() > 0) THEN
s.@works_and_lives += TRUE
ELSE
s.@works_and_lives += FALSE
END;
PRINT employees WHERE (employees.@works_and_lives == TRUE);
}
GSQL > RUN QUERY resident_employees()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"employees": [
{
"v_id": "person1",
"attributes": {
"skill_set": [
1,
2,
3
],
"skill_list": [
1,
2,
3
],
"@works_and_lives": true,
"interest_list": [
"management",
"financial"
],
"id": "person1",
"interest_set": [
"financial",
"management"
],
"location_id": "us",
"@company": ["company1"]
},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {
"skill_set": [
2,
3,
5,
6
],
"skill_list": [
2,
3,
5,
6
],
"@works_and_lives": true,
"interest_list": ["engineering"],
"id": "person2",
"interest_set": ["engineering"],
"location_id": "chn",
"@company": ["company2"]
},
"v_type": "Person"
},
{
"v_id": "person11",
"attributes": {
"skill_set": [10],
"skill_list": [10],
"@works_and_lives": true,
"interest_list": [
"sport",
"football"
],
"id": "person11",
"interest_set": [
"football",
"sport"
],
"location_id": "can",
"@company": ["company5"]
},
"v_type": "Person"
},
{
"v_id": "person10",
"attributes": {
"skill_set": [3],
"skill_list": [3],
"@works_and_lives": true,
"interest_list": [
"football",
"sport"
],
"id": "person10",
"interest_set": [
"sport",
"football"
],
"location_id": "us",
"@company": ["company1"]
},
"v_type": "Person"
}
]}]
}
This is an example of a POST-ACCUM
only that counts the number of people with a particular gender.
CREATE QUERY person_gender(STRING gender) FOR GRAPH Social_Net {
// Count the number of persons of a given gender
SumAccum<INT> @@gender_count;
start = {ANY};
// Select all person vertices and check the gender attribute
friends = SELECT v FROM (v:start)
WHERE v.type == "Person"
POST-ACCUM (v) CASE
WHEN (v.gender == gender) THEN
@@gender_count += 1
END;
PRINT @@gender_count;
}
GSQL > RUN QUERY person_gender("Female")
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"@@gender_count": 3}]
}
Updating vertex-attached accumulators
Vertices referenced via a vertex-attached accumulator of a selected vertex may have their vertex-attached accumulators updated in the ACCUM
clause (but not in the POST-ACCUM
clause).
That is, a vertex referenced by a selected vertex can be updated, with some limitations explained below.
Some examples will help to illustrate this more complex condition.
-
Suppose a query declares a vertex-attached accumulator which holds vertex information. We call this a vertex-holding accumulator. This could take several forms:
-
A scalar accumulator, e.g.,
MaxAccum< VERTEX > @maxV
; -
A collection accumulator: e.g.,
ListAccum< VERTEX > @listV
; -
An accumulator containing tuple(s), where the tuple type contains a
VERTEX
field.
-
-
If a vertex
V
is selected, then not only canV’s accumulators be updated, but the vertices stored in its vertex-holding accumulators can also be updated, in the `ACCUM
clause. -
Before these indirectly referenced vertices can be used, they need to be activated. There are two ways to activate an indirect vertex:
-
A vertex from a vertex-holding accumulator is first assigned to a local vertex variable. The vertex can now be updated through the local vertex variable.
-
ACCUM
vertex<Person> mx = tgt.@max_v, # assign to local variable
mx.@cur_id += src.id # access via local variable
-
A FOREACH loop can iterate on a vertex-holding collection accumulator. The vertices can now be updated through the loop variable.
ACCUM
FOREACH vtx IN src.@set_ids DO # iterate on collection accumulator
vtx.@cur_id += tgt.id # access via loop variable
END
The following uses are NOT supported:
|
The following query demonstrates updates to indirectly activated vertices.
CREATE QUERY v_update_indirect_accum() FOR GRAPH Social_Net {
SetAccum<vertex<Person>> @posters;
SetAccum<vertex<Person>> @fellows;
persons = {Person.*};
// To each post, attach a list of persons who liked the post
liked_posts = SELECT p
FROM (src:persons) -[e:Liked]\-> (p:Post)
ACCUM
p.@posters += src;
// To each person who liked a post, attach a list of everyone
// who also liked one of this person's liked posts.
liked_posts = SELECT src
FROM (src:liked_posts)
ACCUM
FOREACH v IN src.@posters DO
v.@fellows += src.@posters
END
ORDER BY src.subject;
PRINT persons[persons.@fellows];
}
GSQL > RUN QUERY v_update_indirect_accums()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"persons": [
{
"v_id": "person7",
"attributes": {"persons.@fellows": ["person7"]},
"v_type": "Person"
},
{
"v_id": "person5",
"attributes": {"persons.@fellows": ["person5"]},
"v_type": "Person"
},
{
"v_id": "person1",
"attributes": {"persons.@fellows": [
"person3",
"person2",
"person1"
]},
"v_type": "Person"
},
{
"v_id": "person4",
"attributes": {"persons.@fellows": [
"person4",
"person8"
]},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {"persons.@fellows": [
"person3",
"person2",
"person1"
]},
"v_type": "Person"
},
{
"v_id": "person3",
"attributes": {"persons.@fellows": [
"person3",
"person2",
"person1"
]},
"v_type": "Person"
},
{
"v_id": "person8",
"attributes": {"persons.@fellows": [
"person4",
"person8"
]},
"v_type": "Person"
},
{
"v_id": "person6",
"attributes": {"persons.@fellows": ["person6"]},
"v_type": "Person"
}
]}]
}
PER
The PER
clause is an optional prefix to an ACCUM
clause, affecting only that clause.
The FROM
clause of a SELECT
statement produces a match table.
The PER
clause allows the user to specify that they wish to aggregate the match table, so that there is one row per alias.
For more information see the PER
Clause section in the Pattern Matching tutorial.
HAVING
The optional HAVING
clause provides constraints on the result set of the SELECT
statement.
The constraints are applied after ACCUM
and POST-ACCUM
actions.
This differs from the WHERE
clause, which is applied before the ACCUM
and POST-ACCUM
actions.
HAVING
ClausehavingClause := HAVING condition
The condition in a HAVING
clause is applied to each vertex in the SELECT
set (either source or target vertices) which also fulfilled the FROM
and WHERE
conditions.
The HAVING
clause is intended to test one or more of the accumulator variables that were updated in the ACCUM
or POST-ACCUM
clause, though the condition may be anything that equates to a boolean value.
If the condition is false for a particular vertex, then that vertex is excluded from the result set.
The following example demonstrates using the HAVING
clause to constrain a result set based on the vertex accumulator variable which was updated during the ACCUM
clause.
The following query finds all persons meeting a given activity threshold, based on how many posts or likes a person has made.
CREATE QUERY active_members (INT activity_threshold) FOR GRAPH Social_Net {
SumAccum<INT> @activity_amount;
start = {Person.*};
result = SELECT v FROM (v:start) -[e]- (tgt:Post)
ACCUM v.@activity_amount +=1
HAVING v.@activity_amount >= activity_threshold;
PRINT result;
}
If the activity_threshold
parameter is set to 3, the query returns 5 vertices:
GSQL > RUN QUERY active_members(3)
CREATE QUERY active_members (INT activity_threshold) FOR GRAPH Social_Net {
SumAccum<INT> @activity_amount;
start = {Person.*};
result = SELECT v FROM (v:start) -[e]- (tgt:Post)
ACCUM v.@activity_amount +=1
HAVING v.@activity_amount >= activity_threshold;
PRINT result;
}
If the threshold is set to 2, the query would return 8 vertices. If the threshold is set to 4, the query would return no vertices.
The following example demonstrates the equivalence of a SELECT
statement in which the condition for the HAVING
clause is always true.
The following query finds all person meeting a given activity threshold, based on how many posts or likes a person has made
CREATE QUERY print_member_activity() FOR GRAPH Social_Net
{
SumAccum<INT> @activity_amount;
start = {Person.*};
/* --- equivalent statements -----
result = SELECT v FROM (v:start) -[e]- (tgt:Post)
ACCUM v.@activity_amount +=1
HAVING true; */
result = SELECT v FROM (v:start) -[e]- (tgt:Post)
ACCUM v.@activity_amount +=1;
PRINT result;
}
GSQL > RUN QUERY print_member_activity()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"result": [
{
"v_id": "person7",
"attributes": {
"gender": "Male",
"@activity_amount": 3,
"id": "person7"
},
"v_type": "Person"
},
{
"v_id": "person5",
"attributes": {
"gender": "Female",
"@activity_amount": 3,
"id": "person5"
},
"v_type": "Person"
},
{
"v_id": "person1",
"attributes": {
"gender": "Male",
"@activity_amount": 2,
"id": "person1"
},
"v_type": "Person"
},
{
"v_id": "person4",
"attributes": {
"gender": "Female",
"@activity_amount": 2,
"id": "person4"
},
"v_type": "Person"
},
{
"v_id": "person3",
"attributes": {
"gender": "Male",
"@activity_amount": 2,
"id": "person3"
},
"v_type": "Person"
},
{
"v_id": "person6",
"attributes": {
"gender": "Male",
"@activity_amount": 3,
"id": "person6"
},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {
"gender": "Female",
"@activity_amount": 3,
"id": "person2"
},
"v_type": "Person"
},
{
"v_id": "person8",
"attributes": {
"gender": "Male",
"@activity_amount": 3,
"id": "person8"
},
"v_type": "Person"
}
]}]
}
The following shows an example of equivalent result sets from using WHERE
vs. HAVING
.
Recall that the WHERE
clause is evaluated before the ACCUM
and that the HAVING
clause is evaluated after the ACCUM
.
Both constrain the result set based on a condition that vertices must meet.
The query below computes the total post activity for each male person.
Because the gender of the vertex does not change, evaluating whether the person vertex is male before the ACCUM
clause with WHERE
or after the ACCUM
clause with HAVING
does not change the result.
However, if the condition in the HAVING
clause could change within the ACCUM
clause, these statements would produce different results.
HAVING
vs. WHERE
CREATE QUERY active_male_members() FOR GRAPH Social_Net
{
SumAccum<INT> @activity_amount;
start = {Person.*};
// The two statements produce equivalent results
result1 = SELECT v FROM (v:start) -[e:]-> (tgt:Post)
WHERE v.gender == "Male"
ACCUM v.@activity_amount +=1;
result2 = SELECT v FROM (v:start) -[e:]-> (tgt:Post)
ACCUM v.@activity_amount +=1
HAVING v.gender == "Male";
PRINT result1[result1.@activity_amount];
PRINT result2[result2.@activity_amount];
}
- Results
The following example has a compilation error because the result set is taken from the source vertices, but the HAVING
condition is checking the target vertices.
This query does not compile because the having condition is testing the wrong vertex set
HAVING
the wrong vertex setCREATE QUERY print_member_about_cats() FOR GRAPH Social_Net
{
start = {Person.*};
result = SELECT v FROM (v:start) -[e]- (tgt:Post)
HAVING tgt.subject == "cats";
PRINT result;
}
$ gsql printMemberAboutCats.gsql
Semantic Check Error in query printMemberAboutCats (SEM-50): line 8, col 33
The SELECT block selects src, but the HAVING clause uses tgt
ORDER BY
The optional ORDER BY
clause sorts the result set.
orderClause := ORDER BY expr [ASC | DESC] ["," expr [ASC | DESC]]*
ASC
specifies ascending order, and DESC
specifies descending order.
If neither is specified, then ascending order is used.
Each expression must refer to the attributes or accumulators of a member of the result set, and the expression must evaluate to a sortable value (e.g., a number or a string).
ORDER BY
offers hierarchical sorting by allowing a comma-separated list of expressions, sorting first by the leftmost expr.
It uses the next expression only to sort items where the current sort expression results in identical values.
Any items in the result set which cannot be sorted (because the sort expressions do not pertain to them) will appear at the end of the set, after the sorted items.
The following example demonstrates the use of ORDER BY
with multiple expressions.
The returned vertex set is first ordered by the number of friends of the vertex, and then ordered by the number of coworkers of that vertex.
This query finds the most popular people, sorting first based on the number of friends, then (in case of a tie) by the number of coworkers.
CREATE QUERY top_popular() FOR GRAPH Friend_Net {
SumAccum<INT> @num_friends;
SumAccum<INT> @num_coworkers;
start = {Person.*};
result = SELECT v FROM (:start) -[e:Friend|Coworker]- (v:Person)
ACCUM CASE
WHEN e.type == "Friend" THEN v.@num_friends += 1
WHEN e.type == "Coworker" THEN v.@num_coworkers += 1
END
ORDER BY v.@num_friends DESC, v.@num_coworkers DESC;
PRINT result;
}
RUN QUERY top_popular()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"result": [
{
"v_id": "person9",
"attributes": {
"@num_friends": 5,
"@num_coworkers": 3,
"id": "person9"
},
"v_type": "Person"
},
{
"v_id": "person12",
"attributes": {
"@num_friends": 4,
"@num_coworkers": 1,
"id": "person12"
},
"v_type": "Person"
},
{
"v_id": "person8",
"attributes": {
"@num_friends": 4,
"@num_coworkers": 1,
"id": "person8"
},
"v_type": "Person"
},
{
"v_id": "person6",
"attributes": {
"@num_friends": 3,
"@num_coworkers": 4,
"id": "person6"
},
"v_type": "Person"
},
{
"v_id": "person1",
"attributes": {
"@num_friends": 3,
"@num_coworkers": 3,
"id": "person1"
},
"v_type": "Person"
},
{
"v_id": "person4",
"attributes": {
"@num_friends": 2,
"@num_coworkers": 5,
"id": "person4"
},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {
"@num_friends": 2,
"@num_coworkers": 3,
"id": "person2"
},
"v_type": "Person"
},
{
"v_id": "person3",
"attributes": {
"@num_friends": 2,
"@num_coworkers": 3,
"id": "person3"
},
"v_type": "Person"
},
{
"v_id": "person10",
"attributes": {
"@num_friends": 2,
"@num_coworkers": 1,
"id": "person10"
},
"v_type": "Person"
},
{
"v_id": "person7",
"attributes": {
"@num_friends": 1,
"@num_coworkers": 6,
"id": "person7"
},
"v_type": "Person"
},
{
"v_id": "person5",
"attributes": {
"@num_friends": 1,
"@num_coworkers": 5,
"id": "person5"
},
"v_type": "Person"
},
{
"v_id": "person11",
"attributes": {
"@num_friends": 1,
"@num_coworkers": 1,
"id": "person11"
},
"v_type": "Person"
}
]}]
}
LIMIT
The optional LIMIT
clause sets constraints on the number and ranking of items included in the final result set.
LIMIT
ClauselimitClause := LIMIT ( expr | expr "," expr | expr OFFSET expr )
Each of the expressions must evaluate to a non-negative integer.
To understand LIMIT
, note that the tentative result set is held in the computer as a list of vertices.
If the query has an ORDER BY
clause, the order is specified; otherwise the list order is unknown.
Assume we number the vertices as v_1
, v_2
, …, v_n
.
The LIMIT
clause specifies a range of vertices, starting from a lower position in the list to an upper position.
There are three forms:
LIMIT
scenariosresult = SELECT v FROM (:S) -[e]- (v) LIMIT k; (1)
result = SELECT v FROM (:S) -[e]- (v) LIMIT j, k; (2)
result = SELECT v FROM (:S) -[e]- (v) LIMIT k OFFSET j; (3)
1 | Case 1: k = Count |
2 | Case 2: j = Offset from the start of the list, k = Count |
3 | Case 3: k = Count, j = Offset from the start of the list |
Case 1: LIMIT k
-
When a single expression is provided,
LIMIT
returns the first k elements from the tentative result set. If there are fewer than k elements available, then all elements will be returned in the result set. If k=5 and the tentative result set has at least 5 items, then the final result list is[v_1, v_2, v_3, v_4, v_5]
.
Case 2: LIMIT j, k
-
When a comma separates two expressions,
LIMIT
treats the first expression j as an offset. That is, it skips the first j items in the list. The second expression k tells the maximum number of items to include. If the list has at least 7 items, thenLIMIT 2, 5
returns[v_3, v_4, v_5, v_6, v_7]
.
Case 3: LIMIT k OFFSET j
-
The behavior of Case 3 is the same as that of Case 2, except that the syntax is different. The keyword
OFFSET
separates the two expressions, and the count comes before the offset, rather than vice versa. If the list has at least 7 items, thenLIMIT 5 OFFSET 2
returns[v_3, v_4, v_5, v_6, v_7]
.
If any of the expressions evaluate to a negative integer, the results are undefined.
|
The following examples demonstrate the various forms of the LIMIT
clause.
The first example shows the LIMIT
clause when used as an upper limit. It returns a result set with a maximum size of 4 elements in the set.
CREATE QUERY limit_ex_1 (INT k) FOR GRAPH Friend_Net {
start = {Person.*};
result1 = SELECT v FROM (v:start)
ORDER BY v.id
LIMIT k;
PRINT result1[result1.id]; // api v2
}
RUN QUERY limit_ex_1(4)
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"result1": [
{
"v_id": "person1",
"attributes": {"result1.id": "person1"},
"v_type": "Person"
},
{
"v_id": "person10",
"attributes": {"result1.id": "person10"},
"v_type": "Person"
},
{
"v_id": "person11",
"attributes": {"result1.id": "person11"},
"v_type": "Person"
},
{
"v_id": "person12",
"attributes": {"result1.id": "person12"},
"v_type": "Person"
}
]}]
}
The following example shows how to use the LIMIT
clause with an offset.
LIMIT
with lower-bound and sizeCREATE QUERY limit_ex_2 (INT j, INT k) FOR GRAPH Friend_Net
{
start = {Person.*};
result2 = SELECT v FROM (v:start)
ORDER BY v.id
LIMIT j, k;
PRINT result2[result2.id]; // api v2
}
RUN QUERY limit_ex_2(2,3)
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"result2": [
{
"v_id": "person11",
"attributes": {"result2.id": "person11"},
"v_type": "Person"
},
{
"v_id": "person12",
"attributes": {"result2.id": "person12"},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {"result2.id": "person2"},
"v_type": "Person"
}
]}]
}
The following example shows the alternative syntax for a result size limit with an offset.
This time we try larger values for offset and size.
In a large data set, limitTest(5,20)
might return 20 vertices, but since we don’t have 20 vertices in the original data, the output is fewer than 20 vertices.
LIMIT
with OFFSET
CREATE QUERY limit_ex_3 (INT j, INT k) FOR GRAPH Friend_Net {
start = {Person.*};
result3 = SELECT v FROM (v:start)
ORDER BY v.id
LIMIT k OFFSET j;
PRINT result3[result3.id]; // api v2
}
RUN QUERY limit_ex_3(5,20)
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"result3": [
{
"v_id": "person3",
"attributes": {"result3.id": "person3"},
"v_type": "Person"
},
{
"v_id": "person4",
"attributes": {"result3.id": "person4"},
"v_type": "Person"
},
{
"v_id": "person5",
"attributes": {"result3.id": "person5"},
"v_type": "Person"
},
{
"v_id": "person6",
"attributes": {"result3.id": "person6"},
"v_type": "Person"
},
{
"v_id": "person7",
"attributes": {"result3.id": "person7"},
"v_type": "Person"
},
{
"v_id": "person8",
"attributes": {"result3.id": "person8"},
"v_type": "Person"
},
{
"v_id": "person9",
"attributes": {"result3.id": "person9"},
"v_type": "Person"
}
]}]
}