1 of 7

GSQL 101

In this exercise, we will go through the 3-step process of writing GSQL-- define a schema, load data, and write a query.

This tutorial is written so that you can follow along and perform the steps on your TigerGraph system as your read.

Get Set

Introduction

In this tutorial, we will show you how to create a graph schema, load data in your graph, write simple parameterized queries, and run your queries. Before you start, you need to have installed the TigerGraph system, verified that it is working, and cleared out any previous data. It'll also help to become familiar with our graph terminology.

What is a Graph?

A graph is a collection of data entities and the connections between them. That is, it's a network of data entities.

Many people call a data entity a node ; at TigerGraph we called it a vertex. The plural is vertices. We call a connection an edge. Both vertices and edges can have properties or attributes. The figure below is a visual representation of a graph containing 7 vertices (shown as circles) and 7 edges (the lines).

A graph schema is the model which describes the types of vertices (nodes) and edge (connections) which can appear in your graph. The graph above has one type of vertex (person) and one type of edge (friendship).

A schema diagram looks like a small graph, except each node represents one type of vertex, and each link represents one type of edge.

The friendship loop shows that a friendship is between a person and another person.

Data Set

For this tutorial, we will create and query the simple friendship social graph shown in Figure Friendship Social Graph. The data for this graph consists of two files in csv (comma-separated values) format. To follow along with this tutorial, please save these two files, person.csv and friendship.csv, to your TigerGraph local disk. In our running example, we use the /home/tigergraph/ folder to store the two csv files.

person.csv

name,gender,age,state
Tom,male,40,ca
Dan,male,34,ny
Jenny,female,25,tx
Kevin,male,28,az
Amily,female,22,ca
Nancy,female,20,ky
Jack,male,26,fl

friendship.csv

person1,person2,date
Tom,Dan,2017-06-03
Tom,Jenny,2015-01-01
Dan,Jenny,2016-08-03
Jenny,Amily,2015-06-08
Dan,Nancy,2016-01-03
Nancy,Jack,2017-03-02
Dan,Kevin,2015-12-30

Prepare Your TigerGraph Environment

First, let's check that you can access GSQL.

Open a Linux shell.
Type gsql as below. A GSQL shell prompt should appear as below.
Linux Shell
```
$ gsql 
GSQL >
```
If the GSQL shell does not launch, try resetting the system with "gadmin start all". If you need further help, please see manage TigerGraph with gadmin and TigerGraph Knowledge Base and FAQs .

If this is your first time using GSQL, the TigerGraph data store is probably empty. However, if you or someone else has already been working on the system, there may already be a database. You can check by listing out the database catalog with the "ls" command. This is what should look like if it is empty:

GSQL shell - an empty database catalog

GSQL > ls
---- Global vertices, edges, and all graphs
Vertex Types:
Edge Types:

Graphs:
Jobs:

Json API version: v2

If the data catalog is not empty, you will need to empty it to start this tutorial. We'll assume you have your coworkers' permission. Use the command DROP ALL to delete all the database data, its schema, and all related definitions. This command takes about a minute to run.

GSQL shell - DROP ALL

GSQL > drop all

Dropping all, about 1 minute ...
Abort all active loading jobs 
[ABORT_SUCCESS] No active Loading Job to abort.

Shutdown restpp gse gpe ...
Graph store /usr/local/tigergraph/gstore/0/ has been cleared!
Everything is dropped.

Restarting TigerGraph

If you need to restart TigerGraph for any reason, use the following command sequence:

Linux Shell - Restarting TigerGraph services

# Switch to the user account set up during installation
# The default is user=tigergraph, password=tigergraph
$ su tigergraph
Password:tigergraph

# Start all services
$ gadmin restart -y

Running GSQL commands from Linux

You can also run GSQL commands from a Linux shell. To run a single command, just use "gsql" followed by the command line enclosed in single quotes. (The quotes aren't necessary if there is no parsing ambiguity; it's safer to just use them.) For example,

Linux shell - GSQL commands from a Linux shell

# "-g graphname" is need for a given graph
gsql -g social 'ls'
gsql 'drop all'
gsql 'ls'

You can also execute a series of commands which you have stored in a file, by simply invoking "gsql" following by the name of the file.

When you are done, you can exit the GSQL shell with the command "quit" (without the quotes).

Define a Schema

Introduction

For this tutorial, we will work mostly in the GSQL shell, in interactive mode. A few commands will be from a Linux shell. The first step in creating a GSQL graph is to define its schema. GSQL provides a set of DDL (Data Definition Language) commands, similar to SQL DDL commands, to model vertex types, edge types and a graph.

Create a Vertex Type

Use CREATE VERTEX command to define a vertex type named person. Here, PRIMARY_ID is required: each person must have a unique identifier. The rest is the optional list of attributes which characterize each person vertex, in the format attribute_name data_type, attribute_name data_type, ...

GSQL command

CREATE VERTEX person (
    PRIMARY_ID name STRING,
    name STRING, age INT,
    gender STRING, state STRING
)

We show GSQL keywords in ALL CAPS to highlight them, but they are case-insensitive.

GSQL will confirm the creation of the vertex type.

GSQL shell

GSQL > CREATE VERTEX person (PRIMARY_ID name STRING, name STRING, age INT, gender STRING, state STRING)
The vertex type person is created.
GSQL >

You can create as many vertex types as you need.

Create an Edge Type

Next, use the CREATE ... EDGE command to create an edge type named friendship. The keyword UNDIRECTED indicates this edge is a bidirectional edge, meaning that information can flow starting from either vertex. If you'd rather have a unidirectional connection where information flows only from the FROM vertex, use the DIRECTED keyword in place of UNDIRECTED. Here, FROM and TO are required to specify which two vertex types the edge type connects. An individual edge is specifying by giving the primary_ids of its source (FROM) vertex and target (TO) vertex. These are followed by an optional list of attributes, just as in the vertex definition.

GSQL command

CREATE UNDIRECTED EDGE friendship (FROM person, TO person, connect_day DATETIME)

GSQL will confirm the creation of the edge type.

GSQL shell

GSQL > CREATE UNDIRECTED EDGE friendship (FROM person, TO person, connect_day DATETIME)
The edge type friendship is created.
GSQL >

You can create as many edge types as you need.

Create a Graph

Next, use the CREATE GRAPH command to create a graph named social. Here, we just list the vertex types and edge types that we want to include in this graph.

GSQL command

CREATE GRAPH social (person, friendship)

GSQL will confirm the creation of the first graph after several seconds, during which it pushes the catalog information to all services, such as the GSE, GPE and RESTPP.

GSQL shell

GSQL > CREATE GRAPH social (person, friendship)

Restarting gse gpe restpp ...

Finish restarting services in 16.554 seconds!
The graph social is created.

At this point, we have created a person vertex type, a friendship edge type, and a social graph that includes them. You've now built your first graph schema! Let's take a look what's in the catalog by typing the ls command in the GSQL shell.

GSQL shell

GSQL > ls
---- Global vertices, edges, and all graphs
Vertex Types:
  - VERTEX person(PRIMARY_ID name STRING, name STRING, age INT, gender STRING, state STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE"
Edge Types:
  - UNDIRECTED EDGE friendship(FROM person, TO person, connect_day DATETIME)

Graphs:
  - Graph social(person:v, friendship:e)
Jobs:


Json API version: v2

Load Data

After creating a graph schema, the next step is to load data into it. The task here is to instruct the GSQL loader how to associate ("map") the fields in a set of data files to the attributes in your vertex types and edge types of the graph schema we just defined.

You should have the two data files person.csv and friendship.csv on your local disk. It's not necessary that they are in the same folder with you.

If you need to exit the GSQL shell for any reason, you can do so by typing "quit" without the quotes. Type gsql to enter again.

Define a Loading Job

The loading job below assumes that your data files are in the folder /home/tigergraph. If they are elsewhere, then in the loading job script below replace /home/tigergraph/person.csv and /home/tigergraph/friendship.csv with their corresponding file path respectively. Assuming you're (back) in the GSQL shell, enter the following set of commands.

GSQL commands to define a loading job

USE GRAPH social
BEGIN
CREATE LOADING JOB load_social FOR GRAPH social {
   DEFINE FILENAME file1="/home/tigergraph/person.csv";
   DEFINE FILENAME file2="/home/tigergraph/friendship.csv";

   LOAD file1 TO VERTEX person VALUES ($"name", $"name", $"age", $"gender", $"state") USING header="true", separator=",";
   LOAD file2 TO EDGE friendship VALUES ($0, $1, $2) USING header="true", separator=",";
}
END

Let's walk through the commands:

USE GRAPH social : Tells GSQL which graph you want to work with.
BEGIN ... END: Indicates multiple-line mode. The GSQL shell will treat everything between these markers as a single statement. These is only needed for interactive mode. If you run GSQL statements that are stored in a command file, the command interpreter will study your whole file, so it doesn't need the BEGIN and END hints.
CREATE LOADING JOB: One loading job can describe the mappings from multiple files to multiple graph objects. Each file must be assigned to a filename variable. The field labels can be either by name or by position. By-name labelling requires a header line in the source file. By-position labelling uses integers to indicate source column position 0, 1,... In the example above, the first LOAD statement refers to the source file columns by name, whereas the second LOAD statement refers to the source file columns by position. Note the following details:
- The column "name" in file1 gets mapped to two fields, both the PRIMARY_ID and the "name" attribute of the person vertex.
- In file1, gender comes before age. In the person vertex, gender comes after age. When loading, state your attributes in the order needed by the target object (in this case, the person vertex).
- Each LOAD statement has a USING clause. Here it tells GSQL that both files contain a header (whether we choose to use the names or not, GSQL still needs to know whether to consider the first line as data or not). It also says the column separator is comma. GSQL can handle any single-character separator, not just commas.

When you run the CREATE LOADING JOB statement, GSQL checks for syntax errors and checks that you have data files in the locations specified. If it detects no errors, it compiles and saves your job.

GSQL shell

GSQL > USE GRAPH social
Using graph 'social'
GSQL > BEGIN
GSQL > CREATE LOADING JOB load_social FOR GRAPH social {
GSQL >    DEFINE FILENAME file1="/home/tigergraph/person.csv";
GSQL >    DEFINE FILENAME file2="/home/tigergraph/friendship.csv";
GSQL >
GSQL >    LOAD file1 TO VERTEX person VALUES ($"name", $"name", $"age", $"gender", $"state") USING header="true", separator=",";
GSQL >    LOAD file2 TO EDGE friendship VALUES ($0, $1, $2) USING header="true", separator=",";
GSQL > }
GSQL > END
The job load_social is created.

Run a Loading Job

You can now run your loading job to load data into your graph:

GSQL command

RUN LOADING JOB load_social

The result is shown below.

GSQL shell

GSQL > run loading job load_social
[Tip: Use "CTRL + C" to stop displaying the loading status update, then use "SHOW LOADING STATUS jobid" to track the loading progress again]
[Tip: Manage loading jobs with "ABORT/RESUME LOADING JOB jobid"]
Starting the following job, i.e.
  JobName: load_social, jobid: social_m1.1528095850854
  Loading log: '/home/tigergraph/tigergraph/logs/restpp/restpp_loader_logs/social/social_m1.1528095850854.log'

Job "social_m1.1528095850854" loading status
[FINISHED] m1 ( Finished: 2 / Total: 2 )
  [LOADED]
  +---------------------------------------------------------------------------+
  |                       FILENAME |   LOADED LINES |   AVG SPEED |   DURATION|
  |/home/tigergraph/friendship.csv |              8 |       8 l/s |     1.00 s|
  |    /home/tigergraph/person.csv |              8 |       7 l/s |     1.00 s|
  +---------------------------------------------------------------------------+

Notice the location of the loading log file. The example assumes that you installed TigerGraph in the default location, /home/tigergraph/. In your installation folder is the main product folder, tigergraph. Within the tigergraph folder are several subfolders, such as logs, document, config, bin, and gstore. If you installed in a different location, say /usr/local/, then you would find the product folder at /usr/local/tigergraph.

Run Built-in Queries

You now have a graph with data! You can run some queries using the built-in REST endpoint calls.

Get Vertex/Edge Statistics

Below we call two functions, stat_vertex_number and stat_edge_number to return the cardinality of each vertex and edge type.

REST endpoints return results in JSON format. JSON data are used for various purposes. But JSON data can’t be read easily from JSON file by using bash script like other normal files. jq tool is used to solve this problem

We recommend you install jq and redirect the REST call result to jq before it is output.

Linux Shell

#get vertex cardinality
curl -X POST 'http://localhost:9000/builtins/social' -d  '{"function":"stat_vertex_number","type":"*"}'  | jq .

#results
{
  "version": {
    "edition": "enterprise",
    "api": "v2",
    "schema": 0
  },
  "error": false,
  "message": "",
  "results": [
    {
      "v_type": "person",
      "count": 7
    }
  ]
}

Linux Shell

#get edge cardinality
curl -X POST 'http://localhost:9000/builtins/social' -d  '{"function":"stat_edge_number","type":"*"}' | jq .

#results
{
  "version": {
    "edition": "enterprise",
    "api": "v2",
    "schema": 0
  },
  "error": false,
  "message": "",
  "results": [
    {
      "e_type": "friendship",
      "count": 7
    }
  ]
}

Select Vertices

If you want to lookup the details about a vertex with its primary_id, you can use the following REST call.

Linux Shell

curl -X GET "http://localhost:9000/graph/{graph_name}/vertices/{vertex_type}/{vertex_id}"

Example. Find a person vertex whose primary_id is "Tom".

Linux Shell

 curl -X GET "http://localhost:9000/graph/social/vertices/person/Tom" | jq .

 #result
 {
  "version": {
    "edition": "enterprise",
    "api": "v2",
    "schema": 0
  },
  "error": false,
  "message": "",
  "results": [
    {
      "v_id": "Tom",
      "v_type": "person",
      "attributes": {
        "name": "Tom",
        "age": 40,
        "gender": "male",
        "state": "ca"
      }
    }
  ]
}

Select Edges

In similar fashion, we can see details about edges. To describe an edge, you name the types of vertices and edges in the two parts or three parts of a URL.

Linux Shell

#two parts
curl -X GET "http://localhost:9000/graph/edges/{source_vertex_type}/{source_vertex_id}/{edge_type}/"

#three parts
curl -X GET "http://localhost:9000/graph/edges/{source_vertex_type}/{source_vertex_id}/{edge_type}/{target_vertex_type}/{target_vertex_id}"

Example. Find all friendship edges whose source vertex's primary_id is "Tom".

Linux Shell

curl -X GET "http://localhost:9000/graph/social/edges/person/Tom/friendship/" | jq .

#result
{
  "version": {
    "edition": "enterprise",
    "api": "v2",
    "schema": 0
  },
  "error": false,
  "message": "",
  "results": [
    {
      "e_type": "friendship",
      "directed": false,
      "from_id": "Tom",
      "from_type": "person",
      "to_id": "Dan",
      "to_type": "person",
      "attributes": {
        "connect_day": "2017-06-03 00:00:00"
      }
    },
    {
      "e_type": "friendship",
      "directed": false,
      "from_id": "Tom",
      "from_type": "person",
      "to_id": "Jenny",
      "to_type": "person",
      "attributes": {
        "connect_day": "2015-01-01 00:00:00"
      }
    }
  ]
}

For more built-in REST endpoints, you can refer to the Built-in Endpoints page.

Develop Parameterized Queries

Develop, install, and run parameterized GSQL queries

We just saw how easy and quick it is to run simple built-in queries. However you'll undoubtedly want to create more customized or complex queries. GSQL puts maximum power in your hands through parameterized vertex set queries. Parameterized queries let you traverse the graph from one vertex set to an adjacent set of vertices, again and again, performing computations along the way, with built-in parallel execution and handy aggregation operations. You can even have one query call another query. But we'll start simple.

A GSQL parameterized query has three steps.

Define your query in GSQL. This query will be added to the GSQL catalog.
Install one or more queries in the catalog, generating a REST endpoint for each query.
Run an installed query, supplying appropriate parameters, either as a GSQL command or by sending an HTTP request to the REST endpoint.

A Simple 1-Hop Query

Now, let's write our first GSQL query. We'll display all the direct (1-hop) neighbors of a person, given as an input parameter.

GSQL command

USE GRAPH social
CREATE QUERY hello(VERTEX<person> p) {
  Start = {p};
  Result = SELECT tgt
           FROM Start:s-(friendship:e) ->person:tgt;
  PRINT Result;
}

This query features one SELECT statement. The SELECT statements here are much more powerful than the ones in built-in queries. Here you can do the following:The query starts by seeding a vertex set "Start" with the person vertex identified by parameter p passed in from the query call. The curly braces tell GSQL to construct a set containing the enclosed items.

Next, the SELECT statement describes a 1-hop traversal according to the pattern described in the FROM clause:

Start:s -(friendship:e)-> person:tgt

This is basically the same syntax we used for the built-in select edges query. Namely, we select all edges beginning from the given source set (Start), which have the given edge type (friendship) and which end at the given vertex type (person). A feature we haven't seen before is the use of vertex and edge set aliases defined by ":alias": "s" is the alias for the source vertex set, "e" is the edge set alias, and "tgt" is the target vertex set alias.

Refer back to the initial clause and the assignment ("Result = SELECT tgt"). Here we see the target set's alias tgt. This means that the SELECT statement should return the target vertex set (as filtered and processed by the full set of clauses in the SELECT query block) and assign that output set to the variable called Result.

Last, we print out the Result vertex set, in JSON format.

Create A Query

Rather than defining our query in interactive mode, we can store the query in a file and invoke the file from within the GSQL shell, using the @filename syntax. Copy and paste the above query into a file /home/tigergraph/hello.gsql. Then, enter the GSQL shell and invoke the file using @hello.qsql ( Note that if you are not in the /home/tigergraph folder when you start gsql, then you can use the absolute path to invoke a gsql file. e.g., @/home/tigergraph/hello.gsql ) Then run the "ls" command to see that the query is now in the catalog.

GSQL shell

GSQL > @hello.gsql
Using graph 'social'
The query hello has been added!
GSQL > ls
---- Graph social
Vertex Types: 
  - VERTEX person(PRIMARY_ID name STRING, name STRING, age INT, gender STRING, state STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE"
Edge Types: 
  - UNDIRECTED EDGE friendship(from person, to person, connect_day DATETIME)

Graphs: 
  - Graph social(person:v, friendship:e)
Jobs: 
  - CREATE LOADING JOB load_social FOR GRAPH social {
      DEFINE FILENAME file2 = "/home/tigergraph/friendship.csv";
      DEFINE FILENAME file1 = "/home/tigergraph/person.csv";

      LOAD file1 TO VERTEX person VALUES($"name", $"name", $"age", $"gender", $"state") USING SEPARATOR=",", HEADER="true", EOL="\n";
      LOAD file2 TO EDGE friendship VALUES($0, $1, $2) USING SEPARATOR=",", HEADER="true", EOL="\n";
    }

Queries: 
  - hello(vertex<person> p)

Install a Query

However, the query is not installed yet; it is not ready to run. In the GSQL shell, type the following command to installed the just added query "hello".

GSQL command

INSTALL QUERY hello

GSQL shell

GSQL > INSTALL QUERY hello
Start installing queries, about 1 minute ...
hello query: curl -X GET 'http://127.0.0.1:9000/query/social/hello?p=VALUE'. Add -H "Authorization: Bearer TOKEN" if authentication is enabled.

[========================================================================================================] 100% (1/1)

It takes about 1 minute for the database to install this new query. Be patient! For queries on large datasets, this small investment pays off many times over in faster query execution, particularly if you will run the query many times, with different parameters. The installation will generate machine instructions and a REST endpoint. After the progress bar reaches 100%, we are ready to run this query.

Run a Query in GSQL

To run a query in GSQL, use "RUN QUERY" followed by the query name and a set of parameter values.

GSQL command - run query examples

RUN QUERY hello("Tom")

The result is presented in JSON format. Tom has two 1-hop neighbors, namely Dan and Jenny.

GSQL shell

GSQL > RUN QUERY hello("Tom")
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"Result": [
    {
      "v_id": "Dan",
      "attributes": {
        "gender": "male",
        "name": "Dan",
        "state": "ny",
        "age": 34
      },
      "v_type": "person"
    },
    {
      "v_id": "Jenny",
      "attributes": {
        "gender": "female",
        "name": "Jenny",
        "state": "tx",
        "age": 25
      },
      "v_type": "person"
    }
  ]}]
}

Run a Query as a REST Endpoint

Under the hood, installing a query will also generate a REST endpoint, so that the parameterized query can be invoked by an http call. In Linux, the curl command is the most popular way to submit an http request. In the example below, the portion that is standard for all queries is shown in bold ; the portion in normal weight pertains to this particular query and parameter value. The JSON result will be returned to the Linux shell's standard output. So, our parameterized query becomes a http service!

Linux shell

curl -X GET 'http://localhost:9000/query/social/hello?p=Tom'

Finally, to see the GSQL text of a query in the catalog, you can use

GSQL command - show query example

#SHOW QUERY query_name. E.g.
SHOW QUERY hello

Congratulations! At this point, you have gone through the whole process of defining, installing, and running a query.

Running Anonymous Queries Without Installing

Installing query will give the fastest query speed, but user needs to wait for the installation overhead.

The Interpreted Mode for GSQL, introduced in TigerGraph 2.4, let us skip the INSTALL step, and even run a query as soon as we create it, to offer a more interactive experience. These one-step interpreted queries are unnamed (anonymous) and parameterless, just like SQL. Please refer to GSQL 102 for this mode.

A More Advanced Query

Now, let's do a more advanced query. This time, we are going to learn to use the powerful built-in accumulators, which serves as the runtime attributes (properties) attachable to each vertex visited during our traversal on the graph. Runtime means they exist only while the query is running; they are called accumulators because they are specially designed to gather (accumulate) data during an implicitly parallel processing of the query.

GSQL command file - hello2.gsql

USE GRAPH social
CREATE QUERY hello2 (VERTEX<person> p) {
  OrAccum  @visited = false;
  AvgAccum @@avgAge;
  Start = {p};

  FirstNeighbors = SELECT tgt
                   FROM Start:s -(friendship:e)-> person:tgt
                   ACCUM tgt.@visited += true, s.@visited += true;

  SecondNeighbors = SELECT tgt
                    FROM FirstNeighbors -(:e)-> :tgt
                    WHERE tgt.@visited == false
                    POST_ACCUM @@avgAge += tgt.age;

  PRINT SecondNeighbors;
  PRINT @@avgAge;
}
INSTALL QUERY hello2
RUN QUERY hello2("Tom")

In this query we will find all the persons which are exactly 2 hops away from the parameterized input person. Just for fun, let's also compute the average age of those 2-hop neighbors.

In the standard approach for this kind of graph traversal algorithm, you use a boolean variable to mark the first time that the algorithm "visits" a vertex, so that it knows not to count it again. To fit this need, we'll define a local accumulator of the type OrAccum. To declare a local accumulator, we prefix an identifier name with a single "@" symbol. Each accumulator type has a default initial value; the default value for boolean accumulators is false. Optionally, you can specify an initial value.

We also want to compute one average, so we will define a global AvgAccum. The identifier for a global accumulator begins with two "@"s.

After defining the Start set, we then have our first one 1-hop traversal. The SELECT and FROM clauses are the same as in our first example, but there is an additional ACCUM clause. The += operator within an ACCUM clause means that for each edge matching the FROM clause pattern, we accumulate the right-hand-side expression (true) to the left-hand-accumulator (tgt.@visited as well as s.@visited). Note that a source vertex or target vertex may be visited multiple times. Referring to Figure 1, if we start at vertex Tom, there are two edges incidents to Tom, so the ACCUM clause in the first SELECT statement will visit Tom two times. Since the accumulator type is OrAccum, the cumulative effect of the two traversals is the following:

Tom.@visited <== (initial value: false) OR (true) OR (true)

Note that it does not matter which of the two edges was processed first, so this operation is suitable for multithreaded parallel processing. The net effect is that as long as a vertex is visited at least once, it will end up with @visited = true. The result of this first SELECT statement is assigned to the variable FirstNeighbors.

The second SELECT block will do one hop further, starting from the FirstNeighbors vertex set variable, and reaching the 2-hop neighbors. Note that this time, we have omitted the edge type friendship and the target vertex type person from the FROM clause, but we retained the aliases. If no type is mentioned for an alias, then it is interpreted as ALL types. Since our graph has only one vertex type and one edge type, it is logically the same as if we had specified the types. The WHERE clause filters out the vertices which have been marked as visited before (the 1-hop neighbors and the starting vertex p ). This SELECT statement uses POST_ACCUM instead of ACCUM. The reason is that POST_ACCUM traverses the vertex sets instead of the edge sets, guaranteeing that we do not double-count any vertices. Here, we accumulate the ages of the 2-hop neighbors to get their average.

Finally, the SecondNeighbors of p are printed out.

This time, we put all of the following GSQL commands into one file hello2.gsql:

USE GRAPH social
The query definition
Installing the query
Running the query

We can execute this full set of commands without entering the GSQL shell. Please copy and paste the above GSQL commands into a Linux file named /home/tigergraph/hello2.gsql.

In a Linux shell, under /home/tigergraph, type the following:

Linux shell

gsql hello2.gsql

GSQL Query Summary:

Queries are installed in the catalog and can have one or more input parameters, enabling reuse of queries.
A GSQL query consists of a series of SELECT query blocks, each generating a named vertex set.
Each SELECT query block can start traversing the graph from any of the previously defined vertex sets (that is, the sequence does not have to form a linear chain).
Accumulators are runtime variables with built-in accumulation operations, for efficient multithreaded computation.
Query can call another query.
Output is in JSON format.

Review

You have learned a lot in GSQL 101!

With just the knowledge from GSQL 101 and a little practice, you should be able to do the following:

Create a graph schema containing multiple vertex types and edge types.
Define a loading job that takes one or more CSV files and maps the data directly to the vertices and edges of your graph.
Write and run simple parameterized queries which start at one vertex and then traverse one or more hops to generate a final vertex set. Make a simple additive computation and return the results.

Want to learn more?

To learn to do the same types of operations using the GraphStudio Visual SDK and UI, see the TigerGraph GraphStudio UI Guide.
Continue learning Accumulators and Pattern Matching.
To see more GSQL examples, see IS, IC, and BI workloads based on LDBC benchmark schema, and GSQL Demo Examples .
To get answers to common questions, see the TigerGraph Knowledge Base and FAQs .
To see the full GSQL specification (whose table of contents with give you and idea of what is available) see
- GSQL Language Reference Part 1 - Defining Graphs and Loading Data
- GSQL Language Reference Part 2 - Querying
To discuss and get help with fellow GSQL users and GSQL developers, join GSQL community forum.

Develop Parameterized Queries

Develop, install, and run parameterized GSQL queries

A GSQL parameterized query has three steps.

Define your query in GSQL. This query will be added to the GSQL catalog.
Install one or more queries in the catalog, generating a REST endpoint for each query.
Run an installed query, supplying appropriate parameters, either as a GSQL command or by sending an HTTP request to the REST endpoint.

A Simple 1-Hop Query

Now, let's write our first GSQL query. We'll display all the direct (1-hop) neighbors of a person, given as an input parameter.

GSQL command

USE GRAPH social
CREATE QUERY hello(VERTEX<person> p) {
  Start = {p};
  Result = SELECT tgt
           FROM Start:s-(friendship:e) ->person:tgt;
  PRINT Result;
}

Next, the SELECT statement describes a 1-hop traversal according to the pattern described in the FROM clause:

Start:s -(friendship:e)-> person:tgt

Last, we print out the Result vertex set, in JSON format.

Create A Query

GSQL shell

GSQL > @hello.gsql
Using graph 'social'
The query hello has been added!
GSQL > ls
---- Graph social
Vertex Types: 
  - VERTEX person(PRIMARY_ID name STRING, name STRING, age INT, gender STRING, state STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE"
Edge Types: 
  - UNDIRECTED EDGE friendship(from person, to person, connect_day DATETIME)

Graphs: 
  - Graph social(person:v, friendship:e)
Jobs: 
  - CREATE LOADING JOB load_social FOR GRAPH social {
      DEFINE FILENAME file2 = "/home/tigergraph/friendship.csv";
      DEFINE FILENAME file1 = "/home/tigergraph/person.csv";

      LOAD file1 TO VERTEX person VALUES($"name", $"name", $"age", $"gender", $"state") USING SEPARATOR=",", HEADER="true", EOL="\n";
      LOAD file2 TO EDGE friendship VALUES($0, $1, $2) USING SEPARATOR=",", HEADER="true", EOL="\n";
    }

Queries: 
  - hello(vertex<person> p)

Install a Query

However, the query is not installed yet; it is not ready to run. In the GSQL shell, type the following command to installed the just added query "hello".

GSQL command

INSTALL QUERY hello

GSQL shell

GSQL > INSTALL QUERY hello
Start installing queries, about 1 minute ...
hello query: curl -X GET 'http://127.0.0.1:9000/query/social/hello?p=VALUE'. Add -H "Authorization: Bearer TOKEN" if authentication is enabled.

[========================================================================================================] 100% (1/1)

Run a Query in GSQL

To run a query in GSQL, use "RUN QUERY" followed by the query name and a set of parameter values.

GSQL command - run query examples

RUN QUERY hello("Tom")

The result is presented in JSON format. Tom has two 1-hop neighbors, namely Dan and Jenny.

GSQL shell

GSQL > RUN QUERY hello("Tom")
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"Result": [
    {
      "v_id": "Dan",
      "attributes": {
        "gender": "male",
        "name": "Dan",
        "state": "ny",
        "age": 34
      },
      "v_type": "person"
    },
    {
      "v_id": "Jenny",
      "attributes": {
        "gender": "female",
        "name": "Jenny",
        "state": "tx",
        "age": 25
      },
      "v_type": "person"
    }
  ]}]
}

Run a Query as a REST Endpoint

Linux shell

curl -X GET 'http://localhost:9000/query/social/hello?p=Tom'

Finally, to see the GSQL text of a query in the catalog, you can use

GSQL command - show query example

#SHOW QUERY query_name. E.g.
SHOW QUERY hello

Congratulations! At this point, you have gone through the whole process of defining, installing, and running a query.

Running Anonymous Queries Without Installing

Installing query will give the fastest query speed, but user needs to wait for the installation overhead.

A More Advanced Query

GSQL command file - hello2.gsql

USE GRAPH social
CREATE QUERY hello2 (VERTEX<person> p) {
  OrAccum  @visited = false;
  AvgAccum @@avgAge;
  Start = {p};

  FirstNeighbors = SELECT tgt
                   FROM Start:s -(friendship:e)-> person:tgt
                   ACCUM tgt.@visited += true, s.@visited += true;

  SecondNeighbors = SELECT tgt
                    FROM FirstNeighbors -(:e)-> :tgt
                    WHERE tgt.@visited == false
                    POST_ACCUM @@avgAge += tgt.age;

  PRINT SecondNeighbors;
  PRINT @@avgAge;
}
INSTALL QUERY hello2
RUN QUERY hello2("Tom")

In this query we will find all the persons which are exactly 2 hops away from the parameterized input person. Just for fun, let's also compute the average age of those 2-hop neighbors.

We also want to compute one average, so we will define a global AvgAccum. The identifier for a global accumulator begins with two "@"s.

Tom.@visited <== (initial value: false) OR (true) OR (true)

Finally, the SecondNeighbors of p are printed out.

This time, we put all of the following GSQL commands into one file hello2.gsql:

USE GRAPH social
The query definition
Installing the query
Running the query

We can execute this full set of commands without entering the GSQL shell. Please copy and paste the above GSQL commands into a Linux file named /home/tigergraph/hello2.gsql.

In a Linux shell, under /home/tigergraph, type the following:

Linux shell

gsql hello2.gsql

GSQL Query Summary:

Queries are installed in the catalog and can have one or more input parameters, enabling reuse of queries.
A GSQL query consists of a series of SELECT query blocks, each generating a named vertex set.
Each SELECT query block can start traversing the graph from any of the previously defined vertex sets (that is, the sequence does not have to form a linear chain).
Accumulators are runtime variables with built-in accumulation operations, for efficient multithreaded computation.
Query can call another query.
Output is in JSON format.