Develop Parameterized Queries - GSQL 101

We just saw how easy and quick it is to run simple built-in queries. However, you’ll undoubtedly want to create more customized or complex queries. GSQL puts maximum power in your hands through parameterized vertex set queries.

Parameterized queries let you traverse the graph from one vertex set to an adjacent set of vertices, again and again, performing computations along the way, with built-in parallel execution and handy aggregation operations. You can even have one query call another query. But we’ll start simple.

A GSQL parameterized query has three steps.

  1. Define your query in GSQL. This query will be added to the GSQL catalog.

  2. Install one or more queries in the catalog, generating a REST endpoint for each query.

  3. Run an installed query, supplying appropriate parameters, either as a GSQL command or by sending an HTTP request to the REST endpoint.

A simple 1-hop query

Now, let’s write our first GSQL query. We’ll display all the direct (1-hop) neighbors of a person, given as an input parameter.

If you run this query in interactive mode, you’ll need to add BEGIN and END statements. However, just examine it for now. You will save this query to a file and run it from the GSQL shell.

GSQL command
USE GRAPH Social

SET syntax_version="v2"

CREATE QUERY hello (VERTEX<Person> p) {
  start = {p};
  result = SELECT tgt
           FROM start:s -(Friendship:e)- Person:tgt;
  PRINT result;
}

This query features one SELECT statement. The query starts by seeding a vertex set start with the person vertex identified by parameter p passed in from the query call. The curly braces tell GSQL to construct a set containing the enclosed items.

Next, the SELECT statement describes a 1-hop traversal according to the pattern described in the FROM clause:

start:s -(Friendship:e)- Person:tgt

The pattern means we select all edges beginning from the given source set (start), which have the given undirected edge type (Friendship) and which end at the given vertex type (Person). The FROM clause defines vertex and edge set aliases using :<alias>:

  • s is the alias for the source vertex,

  • e is the edge alias

  • tgt is the target vertex alias.

Refer back to the initial clause and the assignment (result = SELECT tgt). Here we see the target set’s alias tgt. This means that the SELECT statement returns the target vertex set (as filtered and processed by the full set of clauses in the SELECT query block) and assign that output set to the variable called result.

Last, we print out the result vertex set, in JSON format.

Create a query

Rather than defining our query in interactive mode, we can store the query in a file and invoke the file from within the GSQL shell, using the @filename syntax. Copy and paste the above query into a file /home/tigergraph/hello.gsql. Then, enter the GSQL shell and invoke the file using @hello.qsql.

If you are not in the /home/tigergraph folder when you start GSQL, you can use the absolute path to invoke a .gsql file. e.g., @/home/tigergraph/hello.gsql

Run the ls command to see that the query is now in the catalog.

GSQL shell
GSQL > @hello.gsql
Using graph 'Social'
Successfully created queries: [hello].
GSQL > ls
---- Graph Social
Vertex Types:
  - VERTEX Person(PRIMARY_ID name STRING, name STRING, age INT, gender STRING, state STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE"
Edge Types:
  - UNDIRECTED EDGE Friendship(from Person, to Person, connect_day DATETIME)

Graphs:
  - Graph Social(Person:v, Friendship:e)
Jobs:
  - CREATE LOADING JOB load_Social FOR GRAPH Social {
      DEFINE FILENAME file2 = "/home/tigergraph/friendship.csv";
      DEFINE FILENAME file1 = "/home/tigergraph/person.csv";

      LOAD file1 TO VERTEX Person VALUES($"name", $"name", $"age", $"gender", $"state") USING SEPARATOR=",", HEADER="true", EOL="\n";
      LOAD file2 TO EDGE Friendship VALUES($0, $1, $2) USING SEPARATOR=",", HEADER="true", EOL="\n";
    }

Queries:
  - hello(vertex<Person> p)

SessionParameters:
  - syntax_version: v2

Install a query

However, the query is not installed yet. In the GSQL shell, type the following command to install the newly-added query "hello".

GSQL command
INSTALL QUERY hello
GSQL shell
GSQL > INSTALL QUERY hello
Start installing queries, about 1 minute ...
hello query: curl -X GET 'http://127.0.0.1:14240/restpp/query/social/hello?p=VALUE'. Add -H "Authorization: Bearer TOKEN" if authentication is enabled.
Select 'm1' as compile server, now connecting ...
Node 'm1' is prepared as compile server.

[========================================================================================================] 100% (1/1)

It takes about 1 minute for the database to install this new query. Be patient! For queries on large datasets, this small investment pays off many times over in faster query execution, particularly if you will run the query many times, with different parameters. The installation will generate machine instructions and a REST endpoint. After the progress bar reaches 100%, we are ready to run this query.

Run a query in GSQL

To run a query in GSQL, use RUN QUERY followed by the query name and a set of parameter values.

GSQL command - run query examples
RUN QUERY hello("Tom")

The result is presented in JSON format. Tom has two 1-hop neighbors, namely Dan and Jenny.

GSQL shell
GSQL > RUN QUERY hello("Tom")
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"Result": [
    {
      "v_id": "Dan",
      "attributes": {
        "gender": "male",
        "name": "Dan",
        "state": "ny",
        "age": 34
      },
      "v_type": "Person"
    },
    {
      "v_id": "Jenny",
      "attributes": {
        "gender": "female",
        "name": "Jenny",
        "state": "tx",
        "age": 25
      },
      "v_type": "Person"
    }
  ]}]
}

Run a query as a REST endpoint

Under the hood, installing a query will also generate a REST endpoint, so that the parameterized query can be invoked by an HTTP call. In Linux, the curl command is the most popular way to submit an HTTP request.

The JSON result will be returned to the Linux shell’s standard output. Our parameterized query thus becomes an HTTP service.

Linux shell
curl -X GET 'http://localhost:14240/restpp/query/Social/hello?p=Tom'

Finally, to see the GSQL text of a query in the catalog, you can use SHOW QUERY <query_name>.

Congratulations! At this point, you have gone through the whole process of defining, installing, and running a query.

Running anonymous queries without installing

Installing a query will give the fastest query speed, but the user needs to wait for the installation overhead.

The Interpreted Mode for GSQL lets us skip the INSTALL step, and even run a query as soon as we create it, to offer a more interactive experience. These one-step interpreted queries are unnamed (anonymous) and parameterless, just like SQL. Please refer to Pattern Matching for this mode.

Query with accumulators

Now, let’s write a more advanced query. This time, we are going to learn to use the powerful built-in accumulators, which serve as the runtime attributes (properties) attachable to each vertex visited during our traversal on the graph.

"Runtime" means they exist only while the query is running; they are called accumulators because they are specially designed to gather (accumulate) data during the implicitly parallel processing of the query.

Say we need to write a query to find all the persons which are exactly 2 hops away from the parameterized input Person. Just for fun, let’s also compute the average age of those 2-hop neighbors.

Method

To get all Person vertices that are two hops away from the starting vertex, we can use the following logic:

  1. From the starting vertex, visit vertices that are one stop away from the starting vertex, and mark every vertex we visit as visited, including the starting vertex. This gives us the neighbors that are 1 hop away from the starting vertex.

  2. Now that we have the 1-hop neighbors, we can find vertices that are 1-hop away from those neighbors, excluding all vertices that we have already visited in the previous step. This gives us all vertices that are exactly two hops away from the original vertex. As we visit the 2-hop neighbors, we can also calculate their average age.

Query

We will use a SELECT statement to perform the first hop from the starting vertex. In order to keep track of vertices that have been visited, we declare a vertex-attached accumulator of the type OrAccum, also called a boolean accumulator. In addition, we also need to declare an AvgAccum to calculate the average age of the final vertex set.

To declare a vertex-attached accumulator, prefix an identifier name with a single @ symbol. After you declare the accumulator, they are attached to vertices in the query, and you can access its value in an ACCUM or POST-ACCUM clause of the SELECT statement by using the dot. operator following a vertex alias as if they are an attribute of the vertices.

After the first hop, we use another SELECT statement to perform a second hop, and filter out all vertices that we have visited previously. This gives us the neighbors that are exactly two hops away, and we can use the AvgAccum to calculate the average age of the 2-hop neighbors.

CREATE QUERY hello2 (VERTEX<Person> p) {
    OrAccum  @visited = FALSE;
    AvgAccum @@avg_age;
    start = {p};

    first_neighbors = SELECT tgt
        FROM start:s -(Friendship:e)- Person:tgt
        ACCUM tgt.@visited += TRUE, s.@visited += TRUE; (1) (2)

    second_neighbors = SELECT tgt (3)
        FROM first_neighbors -(:e)- :tgt (4)
        WHERE tgt.@visited == FALSE (5)
        POST-ACCUM @@avg_age += tgt.age; (6)

    PRINT second_neighbors;
    PRINT @@avg_age;
}
1 After defining the Start set, we then have our first one 1-hop traversal. The SELECT and FROM clauses are the same as in our first example, but there is an additional ACCUM clause.
2 The statements in the ACCUM clause are executed once for each edge matching the FROM clause. The += operator within an ACCUM clause means that for each edge matching the FROM clause pattern, we accumulate the right-hand-side expression (true) to the left-hand-accumulator (tgt.@visited as well as s.@visited).
3 The second SELECT block will do one hop further, starting from the first_neighbors vertex set variable, and reaching the 2-hop neighbors.
4 Types are omitted in this FROM clause, which is interpreted to be all types.
5 The WHERE clause filters out the vertices which have been marked as visited before (the 1-hop neighbors and the starting vertex p ).
6 POST_ACCUM traverses the vertex sets instead of the edge sets, guaranteeing that we do not double-count any vertices. Here, we accumulate the ages of the 2-hop neighbors to get their average.

Note that a source vertex or target vertex may be visited multiple times in the first SELECT statement. Referring to Figure 1, if we start at vertex Tom, there are two edges incidental to Tom, so the ACCUM clause in the first SELECT statement will visit Tom twice. Since the accumulator type is OrAccum, the cumulative effect of the two traversals is the following:

Tom.@visited = (initial value: false) OR (true) OR (true)

We can see it does not matter which of the two edges was processed first. The net effect is that as long as a vertex is visited at least once, it will end up with @visited = true.

Run query from a GSQL script

This time, we put all following GSQL commands into one file hello2.gsql:

GSQL command file - hello2.gsql
USE GRAPH Social
CREATE QUERY hello2 (VERTEX<Person> p) {
    OrAccum  @visited = FALSE;
    AvgAccum @@avg_age;
    start = {p};

    first_neighbors = SELECT tgt
        FROM start:s -(Friendship:e)- Person:tgt
        ACCUM tgt.@visited += TRUE, s.@visited += TRUE; (1) (2)

    second_neighbors = SELECT tgt (3)
        FROM first_neighbors -(:e)- :tgt (4)
        WHERE tgt.@visited == FALSE (5)
        POST-ACCUM @@avg_age += tgt.age;

    PRINT second_neighbors;
    PRINT @@avg_age;
}
INSTALL QUERY hello2
RUN QUERY hello2("Tom")

We can execute this full set of commands without entering the GSQL shell. Please copy and paste the above GSQL commands into a Linux file named /home/tigergraph/hello2.gsql.

In a Linux shell, under /home/tigergraph, type the following to create, install, and execute the query:

Linux shell
gsql hello2.gsql

GSQL query summary

  • Queries are installed in the catalog and can have one or more input parameters, enabling reuse of queries.

  • A GSQL query consists of a series of SELECT query blocks, each generating a named vertex set.

  • Each SELECT query block can start traversing the graph from any of the previously defined vertex sets (that is, the sequence does not have to form a linear chain).

  • Accumulators are runtime variables with built-in accumulation operations, for efficient multithreaded computation.

  • Query can call another query.

  • Output is in JSON format.