Develop Parameterized Queries - GSQL 101
We just saw how easy and quick it is to run simple built-in queries. However, you’ll undoubtedly want to create more customized or complex queries. GSQL puts maximum power in your hands through parameterized vertex set queries.
Parameterized queries let you traverse the graph from one vertex set to an adjacent set of vertices, again and again, performing computations along the way, with built-in parallel execution and handy aggregation operations. You can even have one query call another query. But we’ll start simple.
A GSQL parameterized query has three steps.
-
Define your query in GSQL. This query will be added to the GSQL catalog.
-
Install one or more queries in the catalog, generating a REST endpoint for each query.
-
Run an installed query, supplying appropriate parameters, either as a GSQL command or by sending an HTTP request to the REST endpoint.
A simple 1-hop query
Now, let’s write our first GSQL query. We’ll display all the direct (1-hop) neighbors of a person, given as an input parameter.
If you run this query in interactive mode, you’ll need to add BEGIN
and END
statements.
However, just examine it for now. You will save this query to a file and run it from the GSQL shell.
USE GRAPH Social
SET syntax_version="v2"
CREATE QUERY hello (VERTEX<Person> p) {
start = {p};
result = SELECT tgt
FROM start:s -(Friendship:e)- Person:tgt;
PRINT result;
}
This query features one SELECT
statement.
The query starts by seeding a vertex set start
with the person vertex identified by parameter p passed in from the query call.
The curly braces tell GSQL to construct a set containing the enclosed items.
Next, the SELECT
statement describes a 1-hop traversal according to the pattern described in the FROM
clause:
start:s -(Friendship:e)- Person:tgt
The pattern means we select all edges beginning from the given source set (start
), which have the given undirected edge type (Friendship
) and which end at the given vertex type (Person
).
The FROM
clause defines vertex and edge set aliases using :<alias>
:
-
s
is the alias for the source vertex, -
e
is the edge alias -
tgt
is the target vertex alias.
Refer back to the initial clause and the assignment (result = SELECT tgt
).
Here we see the target set’s alias tgt
.
This means that the SELECT
statement returns the target vertex set (as filtered and processed by the full set of clauses in the SELECT
query block) and assign that output set to the variable called result
.
Last, we print out the result
vertex set, in JSON format.
Create a query
Rather than defining our query in interactive mode, we can store the query in a file and invoke the file from within the GSQL shell, using the @filename
syntax. Copy and paste the above query into a file /home/tigergraph/hello.gsql
. Then, enter the GSQL shell and invoke the file using @hello.qsql
.
If you are not in the /home/tigergraph folder when you start GSQL, you can use the absolute path to invoke a .gsql file. e.g., @/home/tigergraph/hello.gsql
|
Run the ls
command to see that the query is now in the catalog.
GSQL > @hello.gsql
Using graph 'Social'
Successfully created queries: [hello].
GSQL > ls
---- Graph Social
Vertex Types:
- VERTEX Person(PRIMARY_ID name STRING, name STRING, age INT, gender STRING, state STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE"
Edge Types:
- UNDIRECTED EDGE Friendship(from Person, to Person, connect_day DATETIME)
Graphs:
- Graph Social(Person:v, Friendship:e)
Jobs:
- CREATE LOADING JOB load_Social FOR GRAPH Social {
DEFINE FILENAME file2 = "/home/tigergraph/friendship.csv";
DEFINE FILENAME file1 = "/home/tigergraph/person.csv";
LOAD file1 TO VERTEX Person VALUES($"name", $"name", $"age", $"gender", $"state") USING SEPARATOR=",", HEADER="true", EOL="\n";
LOAD file2 TO EDGE Friendship VALUES($0, $1, $2) USING SEPARATOR=",", HEADER="true", EOL="\n";
}
Queries:
- hello(vertex<Person> p)
SessionParameters:
- syntax_version: v2
Install a query
However, the query is not installed yet. In the GSQL shell, type the following command to install the newly-added query "hello".
INSTALL QUERY hello
GSQL > INSTALL QUERY hello
Start installing queries, about 1 minute ...
hello query: curl -X GET 'http://127.0.0.1:14240/restpp/query/social/hello?p=VALUE'. Add -H "Authorization: Bearer TOKEN" if authentication is enabled.
Select 'm1' as compile server, now connecting ...
Node 'm1' is prepared as compile server.
[========================================================================================================] 100% (1/1)
It takes about 1 minute for the database to install this new query. Be patient! For queries on large datasets, this small investment pays off many times over in faster query execution, particularly if you will run the query many times, with different parameters. The installation will generate machine instructions and a REST endpoint. After the progress bar reaches 100%, we are ready to run this query.
Run a query in GSQL
To run a query in GSQL, use RUN QUERY
followed by the query name and a set of parameter values.
RUN QUERY hello("Tom")
The result is presented in JSON format. Tom has two 1-hop neighbors, namely Dan and Jenny.
GSQL > RUN QUERY hello("Tom")
{
"error": false,
"message": "",
"version": {
"edition": "developer",
"schema": 0,
"api": "v2"
},
"results": [{"Result": [
{
"v_id": "Dan",
"attributes": {
"gender": "male",
"name": "Dan",
"state": "ny",
"age": 34
},
"v_type": "Person"
},
{
"v_id": "Jenny",
"attributes": {
"gender": "female",
"name": "Jenny",
"state": "tx",
"age": 25
},
"v_type": "Person"
}
]}]
}
Run a query as a REST endpoint
Under the hood, installing a query will also generate a REST endpoint, so that the parameterized query can be invoked by an HTTP call. In Linux, the curl
command is the most popular way to submit an HTTP request.
The JSON result will be returned to the Linux shell’s standard output. Our parameterized query thus becomes an HTTP service.
curl -X GET 'http://localhost:14240/restpp/query/Social/hello?p=Tom'
Finally, to see the GSQL text of a query in the catalog, you can use SHOW QUERY <query_name>
.
Congratulations! At this point, you have gone through the whole process of defining, installing, and running a query.
Running anonymous queries without installing
Installing a query will give the fastest query speed, but the user needs to wait for the installation overhead.
The Interpreted Mode for GSQL lets us skip the INSTALL
step, and even run a query as soon as we create it, to offer a more interactive experience.
These one-step interpreted queries are unnamed (anonymous) and parameterless, just like SQL.
Please refer to Pattern Matching for this mode.
Query with accumulators
Now, let’s write a more advanced query. This time, we are going to learn to use the powerful built-in accumulators, which serve as the runtime attributes (properties) attachable to each vertex visited during our traversal on the graph.
"Runtime" means they exist only while the query is running; they are called accumulators because they are specially designed to gather (accumulate) data during the implicitly parallel processing of the query.
Say we need to write a query to find all the persons which are exactly 2 hops away from the parameterized input Person. Just for fun, let’s also compute the average age of those 2-hop neighbors.
Method
To get all Person
vertices that are two hops away from the starting vertex, we can use the following logic:
-
From the starting vertex, visit vertices that are one stop away from the starting vertex, and mark every vertex we visit as visited, including the starting vertex. This gives us the neighbors that are 1 hop away from the starting vertex.
-
Now that we have the 1-hop neighbors, we can find vertices that are 1-hop away from those neighbors, excluding all vertices that we have already visited in the previous step. This gives us all vertices that are exactly two hops away from the original vertex. As we visit the 2-hop neighbors, we can also calculate their average age.
Query
We will use a SELECT
statement to perform the first hop from the starting vertex.
In order to keep track of vertices that have been visited, we declare a vertex-attached accumulator of the type OrAccum
, also called a boolean accumulator.
In addition, we also need to declare an AvgAccum
to calculate the average age of the final vertex set.
To declare a vertex-attached accumulator, prefix an identifier name with a single @
symbol.
After you declare the accumulator, they are attached to vertices in the query, and you can access its value in an ACCUM
or POST-ACCUM
clause of the SELECT
statement by using the dot.
operator following a vertex alias as if they are an attribute of the vertices.
After the first hop, we use another SELECT
statement to perform a second hop, and filter out all vertices that we have visited previously.
This gives us the neighbors that are exactly two hops away, and we can use the AvgAccum
to calculate the average age of the 2-hop neighbors.
CREATE QUERY hello2 (VERTEX<Person> p) {
OrAccum @visited = FALSE;
AvgAccum @@avg_age;
start = {p};
first_neighbors = SELECT tgt
FROM start:s -(Friendship:e)- Person:tgt
ACCUM tgt.@visited += TRUE, s.@visited += TRUE; (1) (2)
second_neighbors = SELECT tgt (3)
FROM first_neighbors -(:e)- :tgt (4)
WHERE tgt.@visited == FALSE (5)
POST-ACCUM @@avg_age += tgt.age; (6)
PRINT second_neighbors;
PRINT @@avg_age;
}
1 | After defining the Start set, we then have our first one 1-hop traversal.
The SELECT and FROM clauses are the same as in our first example, but there is an additional ACCUM clause. |
2 | The statements in the ACCUM clause are executed once for each edge matching the FROM clause.
The += operator within an ACCUM clause means that for each edge matching the FROM clause pattern, we accumulate the right-hand-side expression (true ) to the left-hand-accumulator (tgt.@visited as well as s.@visited ). |
3 | The second SELECT block will do one hop further, starting from the first_neighbors vertex set variable, and reaching the 2-hop neighbors. |
4 | Types are omitted in this FROM clause, which is interpreted to be all types. |
5 | The WHERE clause filters out the vertices which have been marked as visited before (the 1-hop neighbors and the starting vertex p ). |
6 | POST_ACCUM traverses the vertex sets instead of the edge sets, guaranteeing that we do not double-count any vertices.
Here, we accumulate the ages of the 2-hop neighbors to get their average. |
Note that a source vertex or target vertex may be visited multiple times in the first SELECT
statement.
Referring to Figure 1, if we start at vertex Tom, there are two edges incidental to Tom, so the ACCUM
clause in the first SELECT
statement will visit Tom twice.
Since the accumulator type is OrAccum
, the cumulative effect of the two traversals is the following:
Tom.@visited = (initial value: false) OR (true) OR (true)
We can see it does not matter which of the two edges was processed first.
The net effect is that as long as a vertex is visited at least once, it will end up with @visited = true
.
Run query from a GSQL script
This time, we put all following GSQL commands into one file hello2.gsql:
USE GRAPH Social
CREATE QUERY hello2 (VERTEX<Person> p) {
OrAccum @visited = FALSE;
AvgAccum @@avg_age;
start = {p};
first_neighbors = SELECT tgt
FROM start:s -(Friendship:e)- Person:tgt
ACCUM tgt.@visited += TRUE, s.@visited += TRUE; (1) (2)
second_neighbors = SELECT tgt (3)
FROM first_neighbors -(:e)- :tgt (4)
WHERE tgt.@visited == FALSE (5)
POST-ACCUM @@avg_age += tgt.age;
PRINT second_neighbors;
PRINT @@avg_age;
}
INSTALL QUERY hello2
RUN QUERY hello2("Tom")
We can execute this full set of commands without entering the GSQL shell.
Please copy and paste the above GSQL commands into a Linux file named /home/tigergraph/hello2.gsql
.
In a Linux shell, under /home/tigergraph
, type the following to create, install, and execute the query:
gsql hello2.gsql
GSQL query summary
-
Queries are installed in the catalog and can have one or more input parameters, enabling reuse of queries.
-
A GSQL query consists of a series of SELECT query blocks, each generating a named vertex set.
-
Each SELECT query block can start traversing the graph from any of the previously defined vertex sets (that is, the sequence does not have to form a linear chain).
-
Accumulators are runtime variables with built-in accumulation operations, for efficient multithreaded computation.
-
Query can call another query.
-
Output is in JSON format.