1 of 13

GSQL102 Pattern Matching

Get Set

Introduction

In this tutorial, we will show you how to write and run Pattern Matching queries. Pattern Matching is available in TigerGraph v2.4+.

We assume you have finished GSQL 101. If not, please complete GSQL 101 first.

This tutorial was updated for TigerGraph 3.0. If you are using an older version, please change to the documentation for that version.

What is a Graph Pattern?

A graph pattern is a traversal trace on the graph schema. A pattern can contain repeated steps. A pattern can be a linear trace, or a non-linear trace (tree, circle etc.). For example, imagine a simple schema consisting of a Person vertex type and a Friendship edge type. A pattern could be a trace on this simple schema,

Person - (Friendship) - Person - (Friendship) - Person

or, use *2 to denote the two consecutive Friendship edges,

Person - (Friendship*2) - Person

What is Pattern Matching?

Pattern matching is the process of finding subgraphs in a data graph that conform to a given query pattern.

Prepare Your TigerGraph Environment

We assume you are running your own TigerGraph instance as the sole user with full privileges. If you are on a multiuser Enterprise Edition, consult with your DB administrator. You need to have Designer or Admin privilege on an empty graph. At various points in this tutorial, there are links to download files. Most are small, but the graph data file is 1GB when uncompressed.

First, let's check that you can access GSQL, and that your version is 3.0 or higher.

Open a Linux shell.
Type gsql as below. A GSQL shell prompt should appear as below.
Type version in GSQL shell. It should show 2.4 or higher as below. If not, please download and install the latest version from https://www.tigergraph.com/download/
Linux Shell
```
$ gsql 
GSQL > version
GSQL version: 3.0
```
If the GSQL shell does not launch, try resetting the system with "gadmin start all". This will launch each service if they have not been started yet. If you need further help, please see the manage TigerGraph with gadmin, and TigerGraph Knowledge Base and FAQs.
You need to start from an empty data catalog. If necessary, run drop all to clear the catalog first.

Cheatsheet

The following general use commands were introduced in GSQL 101.

The % prefix indicates Linux shell commands. You need TigerGraph admin privilege to run most gadmin commands.
The GSQL> prefix indicates GSQL shell commands.

Define the Schema

Data Set

We will use the LDBC Social Network Benchmark (LDBC SNB) data set. This data set models a typical social forum, where communities of persons can post messages on a forum to discuss a topic. It comes with a data generator, which allows you to generate data at different scale factors. Scale factor 1 generates roughly 1GB of raw data, scale factor 10 generates roughly 10GB of raw data, etc.

Figure 1 shows the schema (from the LDBC SNB specification). It models the activities and relationships of social forum participants. For example, a forum Member can publish Posts on a Forum, and other Members of the Forum can make a Comment on the Post or on someone else's Comment. A Person's home location is a hierarchy (Continent>Country>City), and a person can be affiliated with a University or a Company. A Tag can be used to classify a Forum, a Message, or a Person's interests. Tags can further be classified by TagClass. The relationships between entities are modeled as directed edges, except person KNOWS person is modeled as an undirected edge. For example, Person connects to Tag by the hasInterest edge. Forum connects to Person by two different edges, hasMember and hasModerator.

LDBC SNB schema uses inheritance to model certain relationships:

Message is the superclass of Post and Comment.
Place is the superclass of City, Country, and Continent.
Organization is the superclass of University and Company.

We do not use the superclasses in our graph model. When there is an edge type connecting an entity to a superclass, we instead create an edge type from the entity to each of the subclasses of the superclass. For example, Message has an isLocatedIn relationship to Country. Since Message has two subclasses, Post and Comment, both connected with Country by the isLocatedIn relationship, we create an edge type IS_LOCATED_IN, connecting both vertex type pairs using the compound edge type available in TigerGraph 3.0.

CREATE DIRECTED EDGE IS_LOCATED_IN ( FROM Comment, TO Country | FROM Post, TO Country )

This new DDL syntax allows a general edge type to be defined over multiple vertex pairs. For example, there are many relationships in the LDBC schema all called isLocatedIn which connect something to a geographical entity. We can model them all as a single edge type IS_LOCATED_IN. The result is a more succinct graph model and less GSQL code when expressing pattern matching queries.

CREATE DIRECTED EDGE IS_LOCATED_IN (FROM Comment, TO Country | FROM Post, TO Country | FROM Company, TO Country | FROM Person, TO City | FROM University, TO City) WITH REVERSE_EDGE="IS_LOCATED_IN_REVERSE"

Schema Naming Conventions

Vertex Type

For each entity in Figure 1 (the rectangular boxes), we create a vertex type with the entity's name in UpperCamelCase.

Person is a person who participates in a forum.
Forum is a place where persons discuss topics.
City, Country, and Continent are geographic locations of other entities.
Company and University are organizations with which a person can be affiliated.
Comment and Post are the interaction messages created by persons in a forum.
Tag is a topic or a concept.
TagClass is a class or a category. TagClass can form a hierarchy of tags.

Edge Type

For each relationship type, we create an edge type with the relationship name (all capitalized and words are separated by an underscore). For example,

CONTAINER_OF: Forum is the container of posts.
HAS_INTERESTS: Person has interest in tag(s).
IS_SUBCLASS_OF: Tag is a subclass of another Tag.

When multiple relationships share the same semantics in Figure 1, we merge them into a single compound edge type. For example:

IS_LOCATED_IN: Comment or Post is located in Country, Company is located in Country, Person is located in City, and University is located in City.

GSQL Schema DDL

Two GSQL scripts for defining the LDBC-SNB schema are shown below. Choose the one that serve your needs. They are not equivalent if you have different organizations using the same graph database instance.

Method 1. Bottom-up DDL

In this method, global vertex/edge type containers are created first. Next, graphs are created to group them. In other words, the global vertex/edge type containers can be shared across graphs.

#LDBC-SNB schema

//clear the current catalog. 
// It may take a while since it restarts the subsystem services. 
DROP ALL

//create vertex types

## Post and Comment
CREATE VERTEX Comment (PRIMARY_ID id UINT, creationDate DATETIME, locationIP STRING,
    browserUsed STRING, content STRING, length UINT) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX Post (PRIMARY_ID id UINT, imageFile STRING, creationDate DATETIME,
    locationIP STRING, browserUsed STRING, lang STRING, content STRING,
    length UINT) WITH primary_id_as_attribute="TRUE"
## organisation
CREATE VERTEX Company (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX University (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE"
## place
CREATE VERTEX City (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX Country (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX Continent (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE"
## etc
CREATE VERTEX Forum (PRIMARY_ID id UINT, title STRING, creationDate DATETIME) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX Person (PRIMARY_ID id UINT, firstName STRING, lastName STRING, gender STRING, birthday DATETIME, 
   creationDate DATETIME, locationIP STRING, browserUsed STRING, speaks set<STRING>, email set<STRING>) 
   WITH primary_id_as_attribute="TRUE"
CREATE VERTEX Tag (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX TagClass (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE"

// create edge types
CREATE DIRECTED EDGE CONTAINER_OF (FROM Forum, TO Post) WITH REVERSE_EDGE="CONTAINER_OF_REVERSE"
CREATE DIRECTED EDGE HAS_CREATOR (FROM Comment|Post, TO Person) WITH REVERSE_EDGE="HAS_CREATOR_REVERSE"
CREATE DIRECTED EDGE HAS_INTEREST (FROM Person, TO Tag) WITH REVERSE_EDGE="HAS_INTEREST_REVERSE"
CREATE DIRECTED EDGE HAS_MEMBER (FROM Forum, TO Person, joinDate DATETIME) WITH REVERSE_EDGE="HAS_MEMBER_REVERSE"
CREATE DIRECTED EDGE HAS_MODERATOR (FROM Forum, TO Person) WITH REVERSE_EDGE="HAS_MODERATOR_REVERSE"
CREATE DIRECTED EDGE HAS_TAG (FROM Comment|Post|Forum, TO Tag) WITH REVERSE_EDGE="HAS_TAG_REVERSE"
CREATE DIRECTED EDGE HAS_TYPE (FROM Tag, TO TagClass) WITH REVERSE_EDGE="HAS_TYPE_REVERSE"
CREATE DIRECTED EDGE IS_LOCATED_IN (FROM Comment, TO Country
                                  | FROM Post, TO Country
                                  | FROM Company, TO Country
                                  | FROM Person, TO City
                                  | FROM University, TO City) WITH REVERSE_EDGE="IS_LOCATED_IN_REVERSE"
CREATE DIRECTED EDGE IS_PART_OF (FROM City, TO Country
                               | FROM Country, TO Continent) WITH REVERSE_EDGE="IS_PART_OF_REVERSE"
CREATE DIRECTED EDGE IS_SUBCLASS_OF (FROM TagClass, TO TagClass) WITH REVERSE_EDGE="IS_SUBCLASS_OF_REVERSE"
CREATE UNDIRECTED EDGE KNOWS (FROM Person, TO Person, creationDate DATETIME)
CREATE DIRECTED EDGE LIKES (FROM Person, TO Comment|Post, creationDate DATETIME) WITH REVERSE_EDGE="LIKES_REVERSE"
CREATE DIRECTED EDGE REPLY_OF (FROM Comment, TO Comment|Post) WITH REVERSE_EDGE="REPLY_OF_REVERSE"
CREATE DIRECTED EDGE STUDY_AT (FROM Person, TO University, classYear INT) WITH REVERSE_EDGE="STUDY_AT_REVERSE"
CREATE DIRECTED EDGE WORK_AT (FROM Person, TO Company, workFrom INT) WITH REVERSE_EDGE="WORK_AT_REVERSE"

//create graph type
CREATE GRAPH ldbc_snb (*)

Method 2. Top-down DDL

In this method, an empty graph is created first. Next, local vertex/edge type containers are added to the empty graph via a schema change job. The vertex/egde type containers added this way will be private to the graph, no other graph can see them.

#LDBC-SNB schema

//clear the current catalog. 
// It may take a while since it restarts the subsystem services. 
DROP ALL

# 1. Create graph
CREATE GRAPH ldbc_snb ()

# 2. Create schema_change job to include all vertex/edge types
CREATE SCHEMA_CHANGE JOB change_schema_of_ldbc  FOR GRAPH ldbc_snb {

  ## Post and Comment
  ADD VERTEX Comment (PRIMARY_ID id UINT, creationDate DATETIME, locationIP STRING,
    browserUsed STRING, content STRING, length UINT) WITH primary_id_as_attribute="TRUE";
    
  ADD VERTEX Post (PRIMARY_ID id UINT, imageFile STRING, creationDate DATETIME,
    locationIP STRING, browserUsed STRING, lang STRING, content STRING,
    length UINT) WITH primary_id_as_attribute="TRUE";
  ## organisation
  ADD VERTEX Company (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE";
  ADD VERTEX University (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE";
  ## place
  ADD VERTEX City (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE";
  ADD VERTEX Country (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE";
  ADD VERTEX Continent (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE";
  ## etc
  ADD  VERTEX Forum (PRIMARY_ID id UINT, title STRING, creationDate DATETIME) WITH primary_id_as_attribute="TRUE";
  ADD  VERTEX Person (PRIMARY_ID id UINT, firstName STRING, lastName STRING, gender STRING, birthday DATETIME, 
   creationDate DATETIME, locationIP STRING, browserUsed STRING, speaks set<STRING>, email set<STRING>) 
   WITH primary_id_as_attribute="TRUE";
  ADD VERTEX Tag (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE";
  ADD VERTEX TagClass (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE";

  // create edge types
  ADD DIRECTED EDGE CONTAINER_OF (FROM Forum, TO Post) WITH REVERSE_EDGE="CONTAINER_OF_REVERSE";
  ADD  DIRECTED EDGE HAS_CREATOR (FROM Comment|Post, TO Person) WITH REVERSE_EDGE="HAS_CREATOR_REVERSE";
  ADD  DIRECTED EDGE HAS_INTEREST (FROM Person, TO Tag) WITH REVERSE_EDGE="HAS_INTEREST_REVERSE";
  ADD DIRECTED EDGE HAS_MEMBER (FROM Forum, TO Person, joinDate DATETIME) WITH REVERSE_EDGE="HAS_MEMBER_REVERSE";
  ADD DIRECTED EDGE HAS_MODERATOR (FROM Forum, TO Person) WITH REVERSE_EDGE="HAS_MODERATOR_REVERSE";
  ADD DIRECTED EDGE HAS_TAG (FROM Comment|Post|Forum, TO Tag) WITH REVERSE_EDGE="HAS_TAG_REVERSE";
  ADD DIRECTED EDGE HAS_TYPE (FROM Tag, TO TagClass) WITH REVERSE_EDGE="HAS_TYPE_REVERSE";
  ADD  DIRECTED EDGE IS_LOCATED_IN (FROM Comment, TO Country
                                  | FROM Post, TO Country
                                  | FROM Company, TO Country
                                  | FROM Person, TO City
                                  | FROM University, TO City) WITH REVERSE_EDGE="IS_LOCATED_IN_REVERSE";
  ADD DIRECTED EDGE IS_PART_OF (FROM City, TO Country
                               | FROM Country, TO Continent) WITH REVERSE_EDGE="IS_PART_OF_REVERSE";
  ADD DIRECTED EDGE IS_SUBCLASS_OF (FROM TagClass, TO TagClass) WITH REVERSE_EDGE="IS_SUBCLASS_OF_REVERSE";
  ADD UNDIRECTED EDGE KNOWS (FROM Person, TO Person, creationDate DATETIME)
  ADD DIRECTED EDGE LIKES (FROM Person, TO Comment|Post, creationDate DATETIME) WITH REVERSE_EDGE="LIKES_REVERSE";
  ADD DIRECTED EDGE REPLY_OF (FROM Comment, TO Comment|Post) WITH REVERSE_EDGE="REPLY_OF_REVERSE";
  ADD DIRECTED EDGE STUDY_AT (FROM Person, TO University, classYear INT) WITH REVERSE_EDGE="STUDY_AT_REVERSE";
  ADD DIRECTED EDGE WORK_AT (FROM Person, TO Company, workFrom INT) WITH REVERSE_EDGE="WORK_AT_REVERSE";
}

# 3. Run schema_change job
RUN SCHEMA_CHANGE JOB change_schema_of_ldbc

# 4. Drop schema_change job
DROP JOB change_schema_of_ldbc

Load Data

Define the Loading Job

Below, we use the GSQL loading language to define a loading job script, which encodes all the mappings from the source CSV file (generated by the LDBC SNB benchmark data generator) to our schema.

The script to load the LDBC-SNB data is below.

# GSQL script for loading LDBC-SNB data

USE GRAPH ldbc_snb
CREATE LOADING JOB load_ldbc_snb FOR GRAPH ldbc_snb {
  // define vertex
  DEFINE FILENAME v_comment_file;
  DEFINE FILENAME v_post_file;
  DEFINE FILENAME v_organisation_file;
  DEFINE FILENAME v_place_file;
  DEFINE FILENAME v_forum_file;
  DEFINE FILENAME v_person_file;
  DEFINE FILENAME v_tag_file;
  DEFINE FILENAME v_tagclass_file;

  // define edge
  DEFINE FILENAME forum_containerOf_post_file;
  DEFINE FILENAME comment_hasCreator_person_file;
  DEFINE FILENAME post_hasCreator_person_file;
  DEFINE FILENAME person_hasInterest_tag_file;
  DEFINE FILENAME forum_hasMember_person_file;
  DEFINE FILENAME forum_hasModerator_person_file;
  DEFINE FILENAME comment_hasTag_tag_file;
  DEFINE FILENAME post_hasTag_tag_file;
  DEFINE FILENAME forum_hasTag_tag_file;
  DEFINE FILENAME tag_hasType_tagclass_file;
  DEFINE FILENAME organisation_isLocatedIn_place_file;
  DEFINE FILENAME comment_isLocatedIn_place_file;
  DEFINE FILENAME post_isLocatedIn_place_file;
  DEFINE FILENAME person_isLocatedIn_place_file;
  DEFINE FILENAME place_isPartOf_place_file;
  DEFINE FILENAME tagclass_isSubclassOf_tagclass_file;
  DEFINE FILENAME person_knows_person_file;
  DEFINE FILENAME person_likes_comment_file;
  DEFINE FILENAME person_likes_post_file;
  DEFINE FILENAME comment_replyOf_comment_file;
  DEFINE FILENAME comment_replyOf_post_file;
  DEFINE FILENAME person_studyAt_organisation_file;
  DEFINE FILENAME person_workAt_organisation_file;
  
  // load vertex
  LOAD v_comment_file
    TO VERTEX Comment VALUES ($0, $1, $2, $3, $4, $5) USING header="true", separator="|";
  LOAD v_post_file
    TO VERTEX Post VALUES ($0, $1, $2, $3, $4, $5, $6, $7) USING header="true", separator="|";
  LOAD v_organisation_file
    TO VERTEX Company VALUES ($0, $2, $3) WHERE $1=="company",
    TO VERTEX University VALUES ($0, $2, $3) WHERE $1=="university" USING header="true", separator="|";
  LOAD v_place_file
    TO VERTEX City VALUES ($0, $1, $2) WHERE $3=="city",
    TO VERTEX Country VALUES ($0, $1, $2) WHERE $3=="country",
    TO VERTEX Continent VALUES ($0, $1, $2) WHERE $3=="continent" USING header="true", separator="|";
  LOAD v_forum_file
    TO VERTEX Forum VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD v_person_file
    TO VERTEX Person VALUES ($0, $1, $2, $3, $4, $5, $6, $7, SPLIT($8,";"), SPLIT($9,";")) USING header="true", separator="|";
  LOAD v_tag_file
    TO VERTEX Tag VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD v_tagclass_file
    TO VERTEX TagClass VALUES ($0, $1, $2) USING header="true", separator="|";
    
  // load edge
  LOAD forum_containerOf_post_file
    TO EDGE CONTAINER_OF VALUES ($0, $1) USING header="true", separator="|";
  LOAD comment_hasCreator_person_file
    TO EDGE HAS_CREATOR VALUES ($0 Comment, $1) USING header="true", separator="|";
  LOAD post_hasCreator_person_file
    TO EDGE HAS_CREATOR VALUES ($0 Post, $1) USING header="true", separator="|";
  LOAD person_hasInterest_tag_file
    TO EDGE HAS_INTEREST VALUES ($0, $1) USING header="true", separator="|";
  LOAD forum_hasMember_person_file
    TO EDGE HAS_MEMBER VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD forum_hasModerator_person_file
    TO EDGE HAS_MODERATOR VALUES ($0, $1) USING header="true", separator="|";
  LOAD comment_hasTag_tag_file
    TO EDGE HAS_TAG VALUES ($0 Comment, $1) USING header="true", separator="|";
  LOAD post_hasTag_tag_file
    TO EDGE HAS_TAG VALUES ($0 Post, $1) USING header="true", separator="|";
  LOAD forum_hasTag_tag_file
    TO EDGE HAS_TAG VALUES ($0 Forum, $1) USING header="true", separator="|";
  LOAD tag_hasType_tagclass_file
    TO EDGE HAS_TYPE VALUES ($0, $1) USING header="true", separator="|";
  LOAD organisation_isLocatedIn_place_file
    TO EDGE IS_LOCATED_IN VALUES ($0 Company, $1 Country) WHERE to_int($1) < 111,
    TO EDGE IS_LOCATED_IN VALUES ($0 University, $1 City) WHERE to_int($1) > 110 USING header="true", separator="|";
  LOAD comment_isLocatedIn_place_file
    TO EDGE IS_LOCATED_IN VALUES ($0 Comment, $1 Country) USING header="true", separator="|";
  LOAD post_isLocatedIn_place_file
    TO EDGE IS_LOCATED_IN VALUES ($0 Post, $1 Country) USING header="true", separator="|";
  LOAD person_isLocatedIn_place_file
    TO EDGE IS_LOCATED_IN VALUES ($0 Person, $1 City) USING header="true", separator="|";
  LOAD place_isPartOf_place_file
    TO EDGE IS_PART_OF VALUES ($0 Country, $1 Continent) WHERE to_int($0) < 111,
    TO EDGE IS_PART_OF VALUES ($0 City, $1 Country) WHERE to_int($0) > 110 USING header="true", separator="|";
  LOAD tagclass_isSubclassOf_tagclass_file
    TO EDGE IS_SUBCLASS_OF VALUES ($0, $1) USING header="true", separator="|";
  LOAD person_knows_person_file
    TO EDGE KNOWS VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD person_likes_comment_file
    TO EDGE LIKES VALUES ($0, $1 Comment, $2) USING header="true", separator="|";
  LOAD person_likes_post_file
    TO EDGE LIKES VALUES ($0, $1 Post, $2) USING header="true", separator="|";
  LOAD comment_replyOf_comment_file
    TO EDGE REPLY_OF VALUES ($0, $1 Comment) USING header="true", separator="|";
  LOAD comment_replyOf_post_file
    TO EDGE REPLY_OF VALUES ($0, $1 Post) USING header="true", separator="|";
  LOAD person_studyAt_organisation_file
    TO EDGE STUDY_AT VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD person_workAt_organisation_file
    TO EDGE WORK_AT VALUES ($0, $1, $2) USING header="true", separator="|";
}

Prepare The Raw Data

We have generated a data set with scale factor 1 (approximate 1GB). You can download it from https://s3-us-west-1.amazonaws.com/tigergraph-benchmark-dataset/LDBC/SF-1/ldbc_snb_data-sf1.tar.gz

# Linux Bash

wget https://s3-us-west-1.amazonaws.com/tigergraph-benchmark-dataset/LDBC/SF-1/ldbc_snb_data-sf1.tar.gz

After downloading the raw file, run the tar command below to decompress the downloaded file.

# Linux Bash

tar -xzf  ldbc_snb_data-sf1.tar.gz

After decompressing the file, you will see a folder named "ldbc_snb_data". Within it, you will see two subfolders

social_network
substitution_parameters

The raw data is in the social_network folder.

Run The Loading Job

Download setup_schema.gsql which combines the schema script and loading job script shown before.

Specify the environment variable LDBC_SNB_DATA_DIR to point to your raw file folder un-tarred in the previous section. In our example below, the raw data is in /home/tigergraph/ldbc_snb_data/social_network, so we use the export shell command to specify its location. Then, start your TigerGraph services if needed. Finally, run the setup_schema.gsql script to create your LDBC Social Network graph.

# Linux Bash

#change the directory to your raw file directory
export LDBC_SNB_DATA_DIR=/home/tigergraph/ldbc_snb_data/social_network/

#start all TigerGraph services
gadmin start all

#setup schema and loading job
gsql setup_schema.gsql

Download the loading job script and invoke it on the command line. #

# Linux Bash

./load_data.sh

# Sample Loading Progress Output

tigergraph/gsql_102$ ./load_data.sh
[Tip: Use "CTRL + C" to stop displaying the loading status update, then use "SHOW LOADING STATUS jobid" to track the loading progress again]
[Tip: Manage loading jobs with "ABORT/RESUME LOADING JOB jobid"]
Starting the following job, i.e.
  JobName: load_ldbc_snb, jobid: ldbc_snb.load_ldbc_snb.file.m1.1558053156447
  Loading log: '/mnt/data/tigergraph/logs/restpp/restpp_loader_logs/ldbc_snb/ldbc_snb.load_ldbc_snb.file.m1.1558053156447.log'

Job "ldbc_snb.load_ldbc_snb.file.m1.1558053156447" loading status
[FINISHED] m1 ( Finished: 31 / Total: 31 )
  [LOADED]
  +----------------------------------------------------------------------------------------------------------------------------------+
  |                                                                              FILENAME |   LOADED LINES |   AVG SPEED |   DURATION|
  |                       /mnt/data/download/ldbc_snb_data/social_network/comment_0_0.csv |        2052170 |    281 kl/s |     7.28 s|
  |     /mnt/data/download/ldbc_snb_data/social_network/comment_hasCreator_person_0_0.csv |        2052170 |    251 kl/s |     8.17 s|
  |            /mnt/data/download/ldbc_snb_data/social_network/comment_hasTag_tag_0_0.csv |        2698394 |    422 kl/s |     6.38 s|
  |     /mnt/data/download/ldbc_snb_data/social_network/comment_isLocatedIn_place_0_0.csv |        2052170 |    291 kl/s |     7.04 s|
  |       /mnt/data/download/ldbc_snb_data/social_network/comment_replyOf_comment_0_0.csv |        1040750 |    253 kl/s |     4.11 s|
  |          /mnt/data/download/ldbc_snb_data/social_network/comment_replyOf_post_0_0.csv |        1011421 |    248 kl/s |     4.07 s|
  |                         /mnt/data/download/ldbc_snb_data/social_network/forum_0_0.csv |          90493 |     87 kl/s |     1.03 s|
  |        /mnt/data/download/ldbc_snb_data/social_network/forum_containerOf_post_0_0.csv |        1003606 |    240 kl/s |     4.18 s|
  |        /mnt/data/download/ldbc_snb_data/social_network/forum_hasMember_person_0_0.csv |        1611870 |    431 kl/s |     3.74 s|
  |     /mnt/data/download/ldbc_snb_data/social_network/forum_hasModerator_person_0_0.csv |          90493 |     89 kl/s |     1.01 s|
  |              /mnt/data/download/ldbc_snb_data/social_network/forum_hasTag_tag_0_0.csv |         309767 |    297 kl/s |     1.04 s|
  |                  /mnt/data/download/ldbc_snb_data/social_network/organisation_0_0.csv |           7956 |      7 kl/s |     1.00 s|
  |/mnt/data/download/ldbc_snb_data/social_network/organisation_isLocatedIn_place_0_0.csv |           7956 |      7 kl/s |     1.00 s|
  |                        /mnt/data/download/ldbc_snb_data/social_network/person_0_0.csv |           9893 |      9 kl/s |     1.05 s|
  |        /mnt/data/download/ldbc_snb_data/social_network/person_hasInterest_tag_0_0.csv |         229167 |    223 kl/s |     1.03 s|
  |      /mnt/data/download/ldbc_snb_data/social_network/person_isLocatedIn_place_0_0.csv |           9893 |      9 kl/s |     1.00 s|
  |           /mnt/data/download/ldbc_snb_data/social_network/person_knows_person_0_0.csv |         180624 |    169 kl/s |     1.06 s|
  |          /mnt/data/download/ldbc_snb_data/social_network/person_likes_comment_0_0.csv |        1438419 |    449 kl/s |     3.20 s|
  |             /mnt/data/download/ldbc_snb_data/social_network/person_likes_post_0_0.csv |         751678 |    331 kl/s |     2.27 s|
  |   /mnt/data/download/ldbc_snb_data/social_network/person_studyAt_organisation_0_0.csv |           7950 |      7 kl/s |     1.00 s|
  |    /mnt/data/download/ldbc_snb_data/social_network/person_workAt_organisation_0_0.csv |          21655 |     21 kl/s |     1.00 s|
  |                         /mnt/data/download/ldbc_snb_data/social_network/place_0_0.csv |           1461 |      1 kl/s |     1.00 s|
  |          /mnt/data/download/ldbc_snb_data/social_network/place_isPartOf_place_0_0.csv |           1455 |      1 kl/s |     1.00 s|
  |                          /mnt/data/download/ldbc_snb_data/social_network/post_0_0.csv |        1003606 |    195 kl/s |     5.14 s|
  |        /mnt/data/download/ldbc_snb_data/social_network/post_hasCreator_person_0_0.csv |        1003606 |    320 kl/s |     3.13 s|
  |               /mnt/data/download/ldbc_snb_data/social_network/post_hasTag_tag_0_0.csv |         713259 |    341 kl/s |     2.09 s|
  |        /mnt/data/download/ldbc_snb_data/social_network/post_isLocatedIn_place_0_0.csv |        1003606 |    327 kl/s |     3.07 s|
  |                           /mnt/data/download/ldbc_snb_data/social_network/tag_0_0.csv |          16081 |     16 kl/s |     1.00 s|
  |          /mnt/data/download/ldbc_snb_data/social_network/tag_hasType_tagclass_0_0.csv |          16081 |     16 kl/s |     1.00 s|
  |                      /mnt/data/download/ldbc_snb_data/social_network/tagclass_0_0.csv |             72 |      71 l/s |     1.00 s|
  |/mnt/data/download/ldbc_snb_data/social_network/tagclass_isSubclassOf_tagclass_0_0.csv |             71 |      70 l/s |     1.00 s|
  +----------------------------------------------------------------------------------------------------------------------------------+

After loading, you can check the graph's size using built-in REST endpoint calls.

Below we call two functions, stat_vertex_number and stat_edge_number to return the cardinality of each vertex and edge type.

Linux shell

curl -X POST 'http://localhost:9000/builtins/ldbc_snb' -d  '{"function":"stat_vertex_number","type":"*"}'  | jq .
curl -X POST 'http://localhost:9000/builtins/ldbc_snb' -d  '{"function":"stat_edge_number","type":"*"}' | jq .

One-hop patterns

Introduction

‌Pattern matching by nature is declarative. It enables users to focus on specifying what they want from a query without worrying about the underlying query processing.‌

A pattern usually appears in the FROM clause, the most fundamental part of the query structure. The pattern specifies sets of vertex types and how they are connected by edge types. A pattern can be refined further with conditions in the WHERE clause. In this tutorial, we'll start with simple one-hop path patterns, and then extend it multi-hop patterns and finally multiple-path patterns.

Currently, pattern matching may only be used in read-only queries. DML support will be added in the near future release.

Pattern matching queries support nested queries

1-Hop Pattern

The easiest way to understand patterns is to start with a simple 1-Hop pattern. Even a single hop has several options. After we've tackled single hops, then we'll see how to add repetition to make variable length patterns and how to connect single hops to form bigger patterns.

In pattern matching, we use the punctuation -( )- to denote a 1-hop pattern, where the edge type(s) is enclosed in the parentheses () and the hyphens - symbolize connection without specifying direction. Instead, directionality is explicitly stated for each edge type.

For an undirected edge E, no added decoration: E
For a directed edge E from left to right, use a suffix: E>
For a directed edge E from right to left, use a prefix: <E

‌For example, in the LDBC SNB schema, there are two directed relationships between Person and Message: person LIKES message, and message HAS_CREATOR person. Despite the fact that these relationships are in opposite directions, we can include both of them in the same pattern very concisely using an alternation separator | :

Edge Type Wildcards

The underscore _ is a wildcard meaning any edge type. Arrowheads are still used to indicate direction, e.g., _> or <_ or _ The empty parentheses () means any edge, directed or undirected.

Vertex Type Wildcards and Path Symmetry

Prior to TigerGraph 3.0, the source (leftmost) vertex set needed to be defined as an explicit set, prior to the SELECT statement. A typical approach is shown here.

Beginning in TigerGraph 3.0, SYNTAX V2 treats the source vertex set the same as the target vertex set. That is, the source or the target vertex set may be:

a vertex type SELECT t FROM Person:s -(IS_LOCATED_IN>:e) - City:t
an alternation of vertex types SELECT t FROM (Post|Comment):s -(IS_LOCATED_IN>:e) - Country:t
omitted, with only an alias, meaning any vertex type SELECT s FROM :s -(IS_LOCATED_IN>:e) - Country:t
omitted, without an alias, meaning any vertex type SELECT t FROM -(IS_LOCATED_IN>:e) - Country:t

Performance may be better when types are explicitly provided.

Examples of 1-Hop Patterns

FROM X:x - (E1:e1) - Y:y
- E1 is an undirected edge, x and y bind to the end points of E1, and e1 is the alias of E1.
FROM X:x - (E2>:e2) - Y:y
- Right directed edge x binds to the source of E2; y binds to the target of E2.
FROM X:x - (<E3:e3) - Y:y
- Left directed edge; y binds to the source of E3; x binds to the target of E3.
FROM X:x - (_:e) - Y:y
- Any undirected edge between a member of X and a member of Y.
FROM X:x - (_>:e) - Y:y
- Any right directed edge with source in X and target in Y.
FROM X:x - (<_:e) - Y:y
- Any left directed edge with source in Y and target in X.
FROM X:x - ((<_|_):e) - Y:y
- Any left directed or any undirected; "|" means OR, and parentheses enclose the group of edge descriptors; e is the alias for the edge pattern (<_|_).
FROM X:x - ((E1|E2>|<E3):e) - Y:y
- Any one of the three edge patterns.
FROM X:x - () - Y:y
- any edge (directed or undirected)
- Same as (<_|_>|_)

Entering Pattern Match Syntax Mode

To use pattern matching, you need to either set a session parameter or specify it in the query. There are currently two syntax versions for queries:

"v1" is the classic syntax, traversing one hop per SELECT statement. This is the default mode.
"v2" enhances the v1 syntax with pattern matching.

syntax_version Session Parameter

You can use the SET command to assign a value to the syntax_version session parameter: v1 for classic syntax; v2 for pattern matching. If the parameter is never set, the classic v1 syntax is enabled. Once set, the selection remains valid for the duration of the GSQL client session, or until it is changed with another SET command.

Query-Level SYNTAX option

You can also select the syntax by using the SYNTAX clause in the CREATE QUERY statement: v1 for classic syntax (default); v2 for pattern matching. The query-level SYNTAX option overrides the syntax_version session parameter.

Running Anonymous Queries Without Installing

In this tutorial, we will use Interpreted Mode for GSQL, introduced in TigerGraph 2.4. Interpreted mode lets us skip the INSTALL step, and even run a query as soon as we create it, to offer a more interactive experience. These one-step interpreted queries are unnamed (anonymous) and parameterless, just like SQL.

To run an anonymous query, replace the keyword CREATE with INTERPRET. Remember, no parameters:

Recommendation: Increase the query timeout threshold.

Interpreted queries may run slower than installed queries, so we recommend increasing the query timeout threshold:

Examples of 1-Hop Fixed Length Query

Example 1. Find persons who know the person named "Viktor Akhiezer" and return the top 3 oldest such persons.

Syntax Enhancement in TigerGraph 3.0+

In Example 1, "FOR GRAPH ldbc_snb" is not used after () in the query signature. It's an optional component in 3.0+ when "USE GRAPH graphName" is used; Or from the command line, "gsql -g graphName " precedes any query invocation.
In the FROM clause, we directly use vertex type Person as the starting vertex set. This syntax enhancement is available in syntax V2 only.

You can copy the above GSQL script to a file named example1.gsql and invoke this script file in Linux.

Example 2. Find the total number of comments and total number of posts liked by Viktor. A Person can reach Comments or Posts via a directed edge LIKES.

You can copy the above GSQL script to a file named example2.gsql, and invoke this script file in Linux.

Example 3. Solve the same problem as in Example 2, but use a left-directed edge pattern.

Note below (line 8) that the source vertex set are now Comment and Post, and the target is Person.

You can copy the above GSQL script to a file named example3.gsql, and invoke this script file in linux command line. The output should be the same as in Example 2.

Example 4. Find Viktor Akhiezer's total number of related comments and total number of related posts. That is, a comment or post is either created by Viktor or is liked by Viktor. Note that the HAS_CREATOR edge type starts from Comment|Post, and the LIKES edge type starts from Person.

You can copy the above GSQL script to a file named example4.gsql, and invoke this script file in Linux:

Example 5. Find the total number of comments or posts related to "Viktor Akhiezer". This time, we count them together and, we use wildcard "_" to represent the two types of edges: HAS_CREATOR and LIKES_REVERSE. Both are following the same direction.

You can copy the above GSQL script to a file named example5.gsql, and invoke this script file in Linux:

Repeating a 1-Hop Pattern

A common pattern is the two-step "Friend of a Friend". Or, how many entities might receive a message if it is passed up to three times? Do you have any known change of connections to a celebrity?

GSQL pattern matching makes it easy to express such variable-length patterns which repeat a single-hop. Everything else stays the same as introduced in the previous section, except we append an asterisk (or Kleene star for you regular expressionists) and an optional min..max range to an edge pattern.

(E*) means edge type E repeats any number of times (including zero!)
(E*1..3) means edge type E occurs one to three times.

Below are more illustrative examples:

1-hop star pattern — repetition of an edge pattern 0 or more times
1. FROM X:x - (E*) - Y:y
2. FROM X:x - (F>*) - Y:y
3. FROM X:x - (<G*) - Y:y
4. FROM X:x - (_*) - Y:y
  - Any undirected edge can be chosen at each repetition.
5. FROM X:x - (_>*) - Y:y
  - Any right-directed edge can be chosen at each repetition.
6. FROM X:x - (<_*) - Y:y
  - Any left-directed edge can be chosen at each repetition.
7. FROM X:x - ((E|F>|<G)*) - Y:y
  - Either E, F> or <G can be chosen at each repetition.
1-hop star pattern with bounds
1. FROM X:x - (E*2..) - Y:y
  - Lower bounds only. There is a chain of at least 2 E edges.
2. FROM X:x - (F>*..3) - Y:y
  - Upper bounds only. There is a chain of between 0 and 3 F edges.
3. FROM X:x - (<G*3..5) - Y:y
  - Both Lower and Upper bounds. There is a chain of 3 to 5 G edges.
4. FROM X:x - ((E|F>|<G)*3) - Y:y
  - Exact bound. There is a chain of exactly 3 edges, where each edge is either E, F>, or <G.

Remarks

No alias allowed for edge with Kleene star An edge alias may not be used when a Kleene star is used. The reason is that when there are a variable number of edges, we cannot associate or bind the alias to a specific edge in the pattern.
Shortest path semantics When an edge is repeated with a Kleene star, only the shortest matching occurrences are selected. See the example below:

In Figure 2, or Pattern 1 - (E>*) - 4, any of the following paths reach 4 from 1.

1->2->3->4
1->2->3->5->6->2->3->4
any path that goes through the cycle 2->3->5->6->2 two or more times and jumps out at 3.

The first path is shorter than the rest; it is considered the only match.

Examples of Variable Hop Queries

In this tutorial, we will use the Interpreted Mode for GSQL, introduced in TigerGraph 2.4. Interpreted mode lets us skip the INSTALL step, and even run a query as soon as we create it, to offer a more interactive experience. These one-step interpreted queries are unnamed (anonymous) and parameterless, just like SQL.

Example 1. Find the direct or indirect superclass (including the self class) of the TagClass whose name is "TennisPlayer".

Example 1. Directed Edge Pattern Unconstrained Repetition

USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2 {

  TagClass1 =  SELECT t
               FROM TagClass:s - (IS_SUBCLASS_OF>*) - TagClass:t
               WHERE s.name == "TennisPlayer";

    PRINT  TagClass1;
}

You can copy the above GSQL script to a file named example1.gsql, and invoke this script file in a Linux shell.

Linux Bash

gsql example1.gsql

Note below that the starting vertex s, whose name is TennisPlayer, is also a match, using a path with zero hops.

Output of Example 1

Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"TagClass2": [
    {
      "v_id": "211",
      "attributes": {
        "name": "Person",
        "id": 211,
        "url": "http://dbpedia.org/ontology/Person"
      },
      "v_type": "TagClass"
    },
    {
      "v_id": "0",
      "attributes": {
        "name": "Thing",
        "id": 0,
        "url": "http://www.w3.org/2002/07/owl#Thing"
      },
      "v_type": "TagClass"
    },
    {
      "v_id": "149",
      "attributes": {
        "name": "Athlete",
        "id": 149,
        "url": "http://dbpedia.org/ontology/Athlete"
      },
      "v_type": "TagClass"
    },
    {
      "v_id": "59",
      "attributes": {
        "name": "TennisPlayer",
        "id": 59,
        "url": "http://dbpedia.org/ontology/TennisPlayer"
      },
      "v_type": "TagClass"
    },
    {
      "v_id": "239",
      "attributes": {
        "name": "Agent",
        "id": 239,
        "url": "http://dbpedia.org/ontology/Agent"
      },
      "v_type": "TagClass"
    }
  ]}]
}

Example 2. Find the immediate superclass of the TagClass whose name is "TennisPlayer". (This is equivalent to a 1-hop non-repeating pattern.)

Exmaple 2. Exactly 1 Repetition of A Directed Edge

USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2 {

  TagClass1 =  SELECT t
               FROM TagClass:s - (IS_SUBCLASS_OF>*1) - TagClass:t
               WHERE s.name == "TennisPlayer";

  PRINT TagClass1;
}

You can copy the above GSQL script to a file named example2.gsql, and invoke this script file in a Linux shell.

Linux Bash

gsql example2.gsql

Output of Example 2

Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"TagClass2": [{
    "v_id": "149",
    "attributes": {
      "name": "Athlete",
      "id": 149,
      "url": "http://dbpedia.org/ontology/Athlete"
    },
    "v_type": "TagClass"
  }]}]
}

Example 3. Find the 1 to 2 hops direct and indirect superclasses of the TagClass whose name is "TennisPlayer".

Example 3. 1 to 2 Repetition Of A Directed Edge.

USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2 {

  TagClass1 =  SELECT t
               FROM TagClass:s - (IS_SUBCLASS_OF>*1..2) - TagClass:t
               WHERE s.name == "TennisPlayer";

    PRINT  TagClass1;
}

You can copy the above GSQL script to a file named example3.gsql, and invoke this script file in a Linux shell.

Linux Bash

gsql example3.gsql

Output of Example 3

Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"TagClass2": [
    {
      "v_id": "149",
      "attributes": {
        "name": "Athlete",
        "id": 149,
        "url": "http://dbpedia.org/ontology/Athlete"
      },
      "v_type": "TagClass"
    },
    {
      "v_id": "211",
      "attributes": {
        "name": "Person",
        "id": 211,
        "url": "http://dbpedia.org/ontology/Person"
      },
      "v_type": "TagClass"
    }
  ]}]
}

Example 4. Find the superclasses within 2 hops of the TagClass whose name is "TennisPlayer".

Example 4. Up-to 2 Repetition Of A Directed Edge.

USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2 {

  TagClass1 =  SELECT t
               FROM TagClass:s - (IS_SUBCLASS_OF>*..2) - TagClass:t
               WHERE s.name == "TennisPlayer";

    PRINT  TagClass1;
}

You can copy the above GSQL script to a file named example4.gsql, and invoke this script file in a Linux shell.

Linux Bash

gsql example4.gsql

Output of Example 4

Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"TagClass2": [
    {
      "v_id": "211",
      "attributes": {
        "name": "Person",
        "id": 211,
        "url": "http://dbpedia.org/ontology/Person"
      },
      "v_type": "TagClass"
    },
    {
      "v_id": "149",
      "attributes": {
        "name": "Athlete",
        "id": 149,
        "url": "http://dbpedia.org/ontology/Athlete"
      },
      "v_type": "TagClass"
    },
    {
      "v_id": "59",
      "attributes": {
        "name": "TennisPlayer",
        "id": 59,
        "url": "http://dbpedia.org/ontology/TennisPlayer"
      },
      "v_type": "TagClass"
    }
  ]}]
}

Example 5. Find the superclasses at least one hop from the TagClass whose name is "TennisPlayer".

Example 5. At Least 1 Repetition Of A Directed Edge.

USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2 {

  TagClass1 =  SELECT t
               FROM TagClass:s - (IS_SUBCLASS_OF>*1..) - TagClass:t
               WHERE s.name == "TennisPlayer";

    PRINT  TagClass1;
}

You can copy the above GSQL script to a file named example5.gsql, and invoke this script file in a Linux shell.

Linux Bash

gsql example5.gsql

Output of Example 5

Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"TagClass2": [
    {
      "v_id": "211",
      "attributes": {
        "name": "Person",
        "id": 211,
        "url": "http://dbpedia.org/ontology/Person"
      },
      "v_type": "TagClass"
    },
    {
      "v_id": "0",
      "attributes": {
        "name": "Thing",
        "id": 0,
        "url": "http://www.w3.org/2002/07/owl#Thing"
      },
      "v_type": "TagClass"
    },
    {
      "v_id": "149",
      "attributes": {
        "name": "Athlete",
        "id": 149,
        "url": "http://dbpedia.org/ontology/Athlete"
      },
      "v_type": "TagClass"
    },
    {
      "v_id": "239",
      "attributes": {
        "name": "Agent",
        "id": 239,
        "url": "http://dbpedia.org/ontology/Agent"
      },
      "v_type": "TagClass"
    }
  ]}]
}

Example 6. Find the 3 most recent comments that are liked or created by Viktor Akhiezer, and the total number of comments related to (created or liked by) Viktor Akhiezer.

Example 6. Disjunctive 1-Repetition Directed Edge.

USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2{
  SumAccum<INT> @@commentCnt = 0;

  # find top 3 latest comments that is liked or created by Viktor Akhiezer
  # and the total number of comments related to Viktor Akhiezer
  Top3Comments = SELECT p
                 FROM Person:s - ((<HAS_CREATOR|LIKES>)*1) - Comment:p
                 WHERE s.firstName == "Viktor" AND s.lastName == "Akhiezer"
                 ACCUM @@commentCnt += 1
                 ORDER BY p.creationDate DESC
                 LIMIT 3;

  PRINT Top3Comments;
  # total number of comments related to Viktor Akhiezer
  PRINT  @@commentCnt;
}

You can copy the above GSQL script to a file named example6.gsql, and invoke this script file in a Linux shell.

Linux Bash

gsql example6.gsql

Output of Example 6

Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [
    {"Top3Comments": [
      {
        "v_id": "2061584720640",
        "attributes": {
          "browserUsed": "Chrome",
          "length": 4,
          "locationIP": "194.62.64.117",
          "id": 2061584720640,
          "creationDate": "2012-09-06 06:46:31",
          "content": "fine"
        },
        "v_type": "Comment"
      },
      {
        "v_id": "2061586872389",
        "attributes": {
          "browserUsed": "Chrome",
          "length": 90,
          "locationIP": "31.216.177.175",
          "id": 2061586872389,
          "creationDate": "2012-08-28 14:54:46",
          "content": "About Hector Berlioz, his compositions Symphonie fantastique and GraAbout Who Knew, the gu"
        },
        "v_type": "Comment"
      },
      {
        "v_id": "2061590804929",
        "attributes": {
          "browserUsed": "Chrome",
          "length": 83,
          "locationIP": "194.62.64.117",
          "id": 2061590804929,
          "creationDate": "2012-09-04 16:16:56",
          "content": "About Muttiah Muralitharan, mit by nine degrees, five degrees being thAbout Steve M"
        },
        "v_type": "Comment"
      }
    ]},
    {"@@commentCnt": 152}
  ]
}

Example - A Recommender

We have demonstrated the basic pattern match syntax. You should have mastered the basics by this point. In this section, we show two end-to-end solutions using the pattern match syntax.

A Recommendation Application

In this example, we want to recommend some messages (comments or posts) to the person Viktor Akhiezer.

How do we do this?

One way is to find Others who likes the same messages Viktor likes. And then recommend the messages that Others like but Viktor have not seen. The pattern is roughly like below

Viktor - (Likes>) - Message - (<Likes) - Others
Others - (Likes>) - NewMessage
Recommend NewMessage to Viktor

However, this is too fine granularity, and we are overfitting the message level data with a collaborative filtering algorithm. The intuition is that two persons are similar to each other when their "liked" messages fall into the same category (tag). This makes more sense and common than finding two persons that "likes" the same set of messages. As a result, one way to avoid this overfitting is to go one level above. That is, instead of finding common messages as a similarity base, we find common messages' tags as a similarity base. Person A and Person B are similar if they like messages that belong to the same tag. This scheme fixes the overfitting problem. In pattern match vocabulary, we have

Viktor - (Likes>) - Message - (Has>) - Tag - (<Has) - Message - (<Likes) - Others
Others - (Likes>) - NewMessage
Recommend NewMessage to Viktor

GSQL. RecommendMessage Application.

This time, we create the query first, and interpret the query by calling the query name with parameters. If we are satisfied with this query, we can use "install query queryName" to get the compiled query installed which has the best performance.

You can copy the above GSQL script to a file named app1.gsql, and invoke this script file in linux command line.

Install the query

When you are satisfied with your query in the GSQL interpret mode, you can now install it as a generic service which has a much faster speed. Since we have been using "CREATE QUERY .." syntax, the query is added into the catalog, we can set the syntax version and install it.

The above use log-cosine as a similarity measurement. We can also use cosine similarity by using two persons liked messages.

Advanced Features

This section includes some advanced features related to pattern match. It includes using the PER clause to fine control the ACCUM execution, DML support in pattern match, and the conjunctive pattern match syntax which allows multiple patterns in one FROM clause. We dedicate a subsection for each topic.

Per Clause (Beta)

Introduction

Pattern matching produces a virtual match table, and the ACCUM clause acts like a FOREACH loop, executing the clause's statement once for each row of the match table.

Patterns are paths in the graphs, and each row in the match table is a distinct path. However, paths may share some vertices or edges. Some applications do not want to do aggregations per path. Instead, they want to execute the ACCUM clause per distinct group of vertex aliases.

For example, consider the following query which counts the number of paths in a simple 2-hop pattern:

SumAccum<int> @@cnt; 

S = SELECT t 
    FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
    ACCUM @@cnt += 1;

Suppose the query produces the following match table.

(s, edge1, m , edge2, t)//match table schema
v1, e1, v3, e2, v2 //match 1 
v1, e3, v4, e4, v2 //match 2 
v5, e5, v6, e6, v7 //match 3
v8, e7, v9, e8, v7 //match 4

By default, the ACCUM clause will execute the @@cnt += 1statement 4 times, for each row in the match table. The result will be @@cnt = 4.

For the same query, what if the user wants to

count the number of distinct path endings in the match table? For this case, we would want to iterate on the alias t.
count the number of distinct (start, end) pairs in the match table? For that case, we would want to iterate on distinct pairs of the aliases (s, t).

To provide users with this added flexibility and finer control over ACCUM iteration, TigerGraph 3.0 adds the PER clause to pattern matching (V2) syntax.

Syntax

The PER Clause is an optional clause that comes at the start of the ACCUM clause in a SELECT statement. As illustrated below, it starts with the keyword PER, and followed by a pair of parenthesis, in which user can put one or more distinct vertex aliases found in the FROM pattern.

selectBlock := SELECT alias 
               FROM pattern
               [sampleClause]
               [whereClause]
               [[perClause] accumClause]
               [postAccumClause]*
               [havingClause]
               [orderClause]
               [limitClause]
                    
perClause := PER (vertex_alias_1, vertex_alias_2, ...)

Examples. Below are multiple examples of the PER Clause using the same FROM clause.

S1 = SELECT s 
     FROM S:s - (E1:edge1) - M:m - (E2:edge2) - T:t
     PER (s) 
     ACCUM @@cnt += 1;
    
S2 = SELECT t 
     FROM S:s - (E1:edge1) - M:m - (E2:edge2) - T:t
     PER (t)
     ACCUM @@cnt += 1;

S3 = SELECT m 
     FROM S:s - (E1:edge1) - M:m - (E2:edge2) - T:t
     PER (m)
     ACCUM @@cnt += 1;
    
S4 = SELECT t 
     FROM S:s - (E1:edge1) - M:m - (E2:edge2) - T:t
     PER (s, t)
     ACCUM @@cnt += 1;
    
S5 = SELECT t 
     FROM S:s - (E1:edge1) - M:m - (E2:edge2) - T:t
     PER (s, m, t)
     ACCUM @@cnt += 1;

Semantics

The PER Clause specifies a list of vertex aliases, which are used to group the rows in the match table, one group per distinct value of the alias or of the alias list. If there are N distinct groups, we will execute the ACCUM clause N times, once per distinct vertex aliases' binding. Note that the PER clause has no effect on POST-ACCUM clauses semantic, except confining the POST-ACCUM vertex alias.

Suppose s, m, and t are vertex aliases in a pattern. Below are some interpretations of the PER Clause based on the graph element bindings found in the match table.

PER (s) ACCUM means that per each distinct s vertex, execute the ACCUM clause once.
PER (s,t) ACCUM means that per each distinct (s, t) pair, execute the ACCUM clause once.
PER (s,m,t) ACCUM means that per each distinct (s, m, t) tuple, execute the ACCUM clause once.

Examples to show PER clause semantics.

//match table
(s, edge1, m , edge2, t)//schema
v1, e1, v3, e2, v2 //match 1 
v1, e3, v4, e4, v2 //match 2 
v5, e5, v6, e6, v7 //match 3
v8, e7, v9, e8, v7 //match 4

//since we have v1, v5, and v8 three distinct vertices bind to s, 
//we execute ACCUM clause 3 times.  
S1 = SELECT s 
     FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
     PER (s) 
     ACCUM @@cnt += 1;
  
//since we have v2, v7 two distinct vertices bind to t, 
//we execute ACCUM clause twice.     
S2 = SELECT t 
     FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
     PER (t)
     ACCUM @@cnt += 1;
     
//since we have v3, v4, v6, v9 four distinct vertices bind to m, 
//we execute ACCUM clause 4 times. 
S3 = SELECT m
     FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
     PER (m)
     ACCUM @@cnt += 1;
     
//since we have (v1, v2), (v5, v7) and (v8, v7) three distinct vertex pairs 
//bind to (s,t), we execute ACCUM clause 3 times.     
S4 = SELECT t 
     FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
     PER (s, t)
     ACCUM @@cnt += 1;

    
//since we have (v1, v3, v2), (v1, v4, v2), (v5, v6, v7) and (v8, v9, v7) four
//distinct vertex groups bind to (s,m,t), we execute ACCUM clause 4 times.      
S5 = SELECT t 
     FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
     PER (s, m, t)
     ACCUM @@cnt += 1;

If the PER Clause is used in a SELECT query block, then the vertex aliases used in the SELECT, ACCUM , and POST-ACCUM clauses must be confined to the aliases that appear in the PER clause.

Below are some illegal cases.

//semantic error. SELECT t, but t doesn't appear in PER clause. 
S1 = SELECT t 
     FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
     PER (s, m)
     ACCUM @@cnt += 1;
     
//semantic error. ACCUM t.@cnt, but t doesn't appear in PER clause.
S2 = SELECT t 
     FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
     PER (s, m)
     ACCUM t.@cnt += 1;
     
//semantic error. POST-ACCUM t.@cnt, but t doesn't appear in PER clause.
S3 = SELECT s
     FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
     PER (s)
     ACCUM s.@cnt += 1
     POST-ACCUM t.@cnt =1;

PER Clause Examples

Example 1. Count the number of Countries that has a City which has a resident that likes a post.

//Example 1. 
USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2 {
  SumAccum<int> @@cnt;

 R =   SELECT c
       FROM   Country:c -(<IS_PART_OF.<IS_LOCATED_IN.LIKES>)- Post:p
       PER    (c)
       ACCUM  @@cnt +=1;

 PRINT @@cnt;
}

//results
Using graph 'ldbc_snb'
The query AA is dropped.
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"@@cnt": 111}]
}

Example 2. Count the number of posts liked by a person who is located in a city that belongs to a country. (All cities are in a country, but humor us. We are reusing the same FROM pattern in several examples.)

//Example 2. 
USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2 {
  SumAccum<int> @@cnt;

 R =   SELECT p
       FROM   Country:c -(<IS_PART_OF.<IS_LOCATED_IN.LIKES>)- Post:p
       PER    (p)
       ACCUM  @@cnt +=1;

 PRINT @@cnt;

//result
Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"@@cnt": 70668}]
}

Example 3. Find for each country in ("Dominican_Republic","Angola", "Cambodia") the number of posts that is liked by a person living in that country.

//Exmaple 3
USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2{

 MapAccum<string, SumAccum<int>> @@postPerCountry;

 R =   SELECT p
       FROM   Country:c -(<IS_PART_OF.<IS_LOCATED_IN.LIKES>)- Post:p
       WHERE  c.name in  ("Dominican_Republic","Angola", "Cambodia")
       PER    (c, p)
       ACCUM  @@postPerCountry += (c.name -> 1);

 PRINT @@postPerCountry;
}

//results
Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"@@postPerCountry": {
    "Dominican_Republic": 395,
    "Angola": 12,
    "Cambodia": 4002
  }}]
}

Example 4. Find for each country in ("Dominican_Republic","Angola", "Cambodia") the number of posts that is liked by a person living in that country. Use local accumulators this time.

USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2{

 SumAccum<int> @postCnt;

 R =   SELECT c
       FROM   Country:c -(<IS_PART_OF.<IS_LOCATED_IN.LIKES>)- Post:p
       WHERE  c.name in  ("Dominican_Republic","Angola", "Cambodia")
       PER    (c, p) //per (country, post), add 1 to c.@postCnt
       ACCUM  c.@postCnt += 1;

 PRINT R;
}

//results
Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"R": [
    {
      "v_id": "2",
      "attributes": {
        "@postCnt": 12,
        "name": "Angola",
        "id": 2,
        "url": "http://dbpedia.org/resource/Angola"
      },
      "v_type": "Country"
    },
    {
      "v_id": "67",
      "attributes": {
        "@postCnt": 4002,
        "name": "Cambodia",
        "id": 67,
        "url": "http://dbpedia.org/resource/Cambodia"
      },
      "v_type": "Country"
    },
    {
      "v_id": "11",
      "attributes": {
        "@postCnt": 395,
        "name": "Dominican_Republic",
        "id": 11,
        "url": "http://dbpedia.org/resource/Dominican_Republic"
      },
      "v_type": "Country"
    }
  ]}]
}

Performance and Best Practices

The PER Clause not only helps users to control the semantics of the ACCUM clause, it also boosts the performance of the pattern match query, as it uses the PER clause to optimize the query execution.

To get the best performance, we recommend three guidelines for writing efficient queries.

Use PER (target) If Possible

Per target is in general faster than Per source. In the example below, query q2 is faster than q1. The only difference between these two queries is q2's FROM pattern is the flip of q1's FROM pattern.

USE GRAPH ldbc_snb

# not recommended, since it does per (src).
CREATE QUERY q1 () SYNTAX v2 {

  SumAccum<int> @@cnt ;

  T = SELECT c
      FROM Comment:c - (<LIKES) - Person:ps - (IS_LOCATED_IN>) - City:city
      WHERE year(c.creationDate) >= 2006
      PER (c)
      ACCUM @@cnt += 1;

  PRINT @@cnt;
}

# recommended, since it does per (tgt)
CREATE QUERY q2 () SYNTAX v2 {

  SumAccum<int> @@cnt ;

  T = SELECT c
      FROM City:city - (<IS_LOCATED_IN) - Person:ps - (LIKES>) - Comment:c
      WHERE year(c.creationDate) >= 2006
      PER (c)
      ACCUM @@cnt += 1;

  PRINT @@cnt;
}

Write Patterns with Smallest Expected Vertex Set on the Left

The match table is built by traversing the pattern from left to right. Follow the basic principle of pruning early rather than late by orienting the query the smaller cardinality set on the left. This practice will result in producing the least number of candidate matches during the query computation. For example, if there are fewer distinct tags than persons, then query q4 is faster than q3.

USE GRAPH ldbc_snb

# not recommended, since the pattern starts from a large cardinality vertex type
# (Person), and ends at a small cardinality vertex type (Tag). 
CREATE QUERY q3 () SYNTAX v2 {

  SumAccum<int> @cnt;

  V = SELECT s
      FROM Person:s- (LIKES>)-Post:p - (<CONTAINER_OF)-:f - (HAS_TAG>) - :t
      PER (s)
      ACCUM s.@cnt += 1;

  PRINT V.size();
}

# recommended, start from small cardinality end (Tag), and use per tgt
CREATE QUERY q4 () SYNTAX v2 {

  SumAccum<int> @cnt;

  V = SELECT s
      FROM Tag:t-(<HAS_TAG)-Forum:f -(CONTAINER_OF>)-Post:p  - (<LIKES)- Person:s
      PER (s)
      ACCUM s.@cnt += 1;

  PRINT V.size();
}

Specify Complete Type Information

Specifying complete type information improves performance. For example, query q6 is faster than q5 even though they are known to be logically identical. Forum is the CONTAINER_OF Post, so it does not need to be specified in q5, but explicitly saying Forum in q6 speeds up performance.

USE GRAPH ldbc_snb

#we do not put Forum befoe :f, even if we know it.
CREATE QUERY q5 () SYNTAX v2 {

  SumAccum<int> @@person_cnt;

  V = SELECT s
      FROM Person:s- (LIKES>)-Post:p - (<CONTAINER_OF)-:f - (HAS_TAG>) - :t
      PER (s)
      ACCUM @@person_cnt += 1;

  PRINT @@person_cnt;
}


#recommended: we put Forum as the type info. 
CREATE QUERY q6 () SYNTAX v2 {

  SumAccum<int> @@person_cnt;

  V = SELECT s
      FROM Person:s- (LIKES>)-Post:p - (<CONTAINER_OF)-Forum:f - (HAS_TAG>) - :t
      PER (s)
      ACCUM @@person_cnt += 1;

  PRINT @@person_cnt;
}

LDBC Benchmark Queries

Using the PER clause and linear regular path pattern, we have translated all of the LDBC-SNB queries. You can find them on github at https://github.com/tigergraph/ecosys/tree/ldbc/ldbc_benchmark/tigergraph/queries_linear/queries. Most of the queries are installed as functions. You can find sample parameter(s) of the functions from https://github.com/tigergraph/ecosys/tree/ldbc/ldbc_benchmark/tigergraph/queries/seeds.

Data Modification

Pattern Matching GSQL supports Insert, Update, and Delete operations. The syntax is identical to that in classic GSQL (v1), though the full range of data modification operations are not yet support.

In general, data modification can be at two levels in GSQL:

Top level. The statement does not need to within any other statement.
Within a SELECT query statement. The FROM-WHERE clauses define a match table, and the data modification is performed based on the vertex and edge information in the match table. The GSQL specifications calls these within-SELECT statements DML-sub statements.

Insert, Update, and Delete currently work in compiled mode only (e.g., you must run INSTALL QUERY before RUN QUERY.) Data Modification in interpreted mode is not yet available.
SELECT queries with data modification may only have one POST-ACCUM clause.

INSERT

Pattern matching Insert is supported at both the top-level and within-SELECT levels, using the same syntax as in classic GSQL. You can insert vertices and edges.

For a top-level statement, use INSERT INTO,
Inside an ACCUM or POST-ACCUM clause, use the DML-sub INSERT statement.

Example 1. Create a Person vertex, whose name is Tiger Woods. Next, find Viktor's favorite 2012 posts' authors, whose last name is prefixed with S. Finally, insert KNOWS edges connecting Tiger Woods with Viktor's favorite authors.

USE GRAPH ldbc_snb

#find Viktor's 2012 favorite posts' authors, whose lastName prefix with S.
INTERPRET QUERY() SYNTAX V2 {

  R  =  SELECT t
        FROM Person:s -(LIKES>)- :msg -(HAS_CREATOR>)- Person:t
        WHERE s.firstName == "Viktor" AND s.lastName == "Akhiezer"
         AND t.lastName LIKE "S%" AND year(msg.creationDate) == 2012;

  PRINT R[R.id, R.firstName, R.lastName];
}

#result
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"R": [
    {
      "v_id": "8796093025410",
      "attributes": {
        "R.id": 8796093025410,
        "R.firstName": "Priyanka",
        "R.lastName": "Singh"
      },
      "v_type": "Person"
    },
    {
      "v_id": "2199023260091",
      "attributes": {
        "R.id": 2199023260091,
        "R.firstName": "Janne",
        "R.lastName": "Seppala"
      },
      "v_type": "Person"
    },
    {
      "v_id": "15393162796846",
      "attributes": {
        "R.id": 15393162796846,
        "R.firstName": "Mario",
        "R.lastName": "Santos"
      },
      "v_type": "Person"
    }
  ]}]
}

# create a Person node, whose name is tiger,
# and connect this Person with above Victor's favorite authors
CREATE QUERY InsertEdgeAndVertex () SYNTAX v2{

  #add a celebrity person node using INSERT INTO statement.
  INSERT INTO Person VALUES (100000000,"Tiger", "Woods", "m", _, _,_,_,_,_); 
  
  R  =  SELECT t
        FROM Person:s -(LIKES>)- :msg -(HAS_CREATOR>)- Person:t
        WHERE s.firstName == "Viktor" AND s.lastName == "Akhiezer"
         AND t.lastName LIKE "S%" AND year(msg.creationDate) == 2012
        PER (s, t)
        ACCUM
           #add edges connecting "tiger" and t with a 6/1/2020 time stamp
          INSERT INTO KNOWS VALUES(100000000, t, to_datetime("2020-06-01"));
        
  PRINT R [R.id, R.firstName, R.lastName];
}

INSTALL QUERY InsertEdgeAndVertex
RUN QUERY InsertEdgeAndVertex()

You can verify the result by running a simple built-in REST endpoint.

Check the inserted vertex.

Linux Shell

#check the inserted vertex
curl -X GET "http://localhost:9000/graph/ldbc_snb/vertices/Person/100000000" | jq .

#result
{
  "version": {
    "edition": "enterprise",
    "api": "v2",
    "schema": 1
  },
  "error": false,
  "message": "",
  "results": [
    {
      "v_id": "100000000",
      "v_type": "Person",
      "attributes": {
        "id": 100000000,
        "firstName": "Tiger",
        "lastName": "Woods",
        "gender": "m",
        "birthday": "1970-01-01 00:00:00",
        "creationDate": "1970-01-01 00:00:00",
        "locationIP": "",
        "browserUsed": "",
        "speaks": [],
        "email": []
      }
    }
  ]
}

Check the inserted edges.

Linux file

#check the inserted edges using tiger's id (100,000,000)
curl -X GET "http://localhost:9000/graph/ldbc_snb/edges/Person/100000000/KNOWS" | jq .
#result 
{
  "version": {
    "edition": "enterprise",
    "api": "v2",
    "schema": 0
  },
  "error": false,
  "message": "",
  "results": [
    {
      "e_type": "KNOWS",
      "directed": false,
      "from_id": "100000000",
      "from_type": "Person",
      "to_id": "8796093025410",
      "to_type": "Person",
      "attributes": {
        "creationDate": "2020-06-01 00:00:00"
      }
    },
    {
      "e_type": "KNOWS",
      "directed": false,
      "from_id": "100000000",
      "from_type": "Person",
      "to_id": "2199023260091",
      "to_type": "Person",
      "attributes": {
        "creationDate": "2020-06-01 00:00:00"
      }
    },
    {
      "e_type": "KNOWS",
      "directed": false,
      "from_id": "100000000",
      "from_type": "Person",
      "to_id": "15393162796846",
      "to_type": "Person",
      "attributes": {
        "creationDate": "2020-06-01 00:00:00"
      }
    }
  ]
}
#note you can use the vertex lookup API to verify the three connected authors. E.g
curl -X GET "http://localhost:9000/graph/ldbc_snb/vertices/Person/8796093025410" | jq .

UPDATE

Top-level UPDATE statements are not yet supported in syntax v2.
Vertex attributes can only be updated in POST-ACCUM clause, and edge attributes can only be updated in the ACCUM clause.
To perform within-SELECT updates, the FROM pattern can only be a single hop, fixed length pattern.

Example 2. For all KNOWS edges that connect Viktor Akhiezer and his friends whose lastName begins with "S", update the edge creationDate to "2020-10-01". Also, for the Person vertex (Tiger Woods) update the vertex's creationDate and language he speaks.

USE GRAPH ldbc_snb

DROP QUERY UpdateKnowsTS

CREATE QUERY UpdateKnowsTS () SYNTAX v2 {

  # update the vertex tiger's attributes
  # creationDate and languages spoken in POST-ACCUM
  R = SELECT p
      FROM Person:p
      WHERE p.firstName == "Tiger" AND p.lastName == "Woods"
      POST-ACCUM
            # update simple base type attribute
            p.creationDate = to_datetime("2020-6-1"),
            # update  collection-type attribute
            p.speaks = ("english", "golf");

  #DML-sub level, update KNOWS edge attribute "creationDate" in ACCUM
  R  =  SELECT t
        FROM Person:s-(KNOWS:e) -:t
        WHERE s.firstName == "Tiger" and s.lastName == "Woods"
        #update the KNOWS edge time stamp
        ACCUM e.creationDate = to_datetime("2020-10-01");
}

INSTALL QUERY UpdateKnowsTS
RUN QUERY UpdateKnowsTS()

To verify the update, we can use REST calls.

Check Tiger Woods' creationDate and language he speaks.

Linux Shell

curl -X GET "http://localhost:9000/graph/ldbc_snb/vertices/Person/100000000" | jq .
#result
{
  "version": {
    "edition": "enterprise",
    "api": "v2",
    "schema": 0
  },
  "error": false,
  "message": "",
  "results": [
    {
      "v_id": "100000000",
      "v_type": "Person",
      "attributes": {
        "id": 100000000,
        "firstName": "Tiger",
        "lastName": "Woods",
        "gender": "m",
        "birthday": "1970-01-01 00:00:00",
        "creationDate": "2020-06-01 00:00:00",
        "locationIP": "",
        "browserUsed": "",
        "speaks": [
          "english",
          "golf"
        ],
        "email": []
      }
    }
  ]
}

Check KNOWS edges whose source is tiger woods.

Linux Shell

curl -X GET "http://localhost:9000/graph/ldbc_snb/edges/Person/100000000/KNOWS" | jq .

#result
{
  "version": {
    "edition": "enterprise",
    "api": "v2",
    "schema": 0
  },
  "error": false,
  "message": "",
  "results": [
    {
      "e_type": "KNOWS",
      "directed": false,
      "from_id": "100000000",
      "from_type": "Person",
      "to_id": "8796093025410",
      "to_type": "Person",
      "attributes": {
        "creationDate": "2020-10-01 00:00:00"
      }
    },
    {
      "e_type": "KNOWS",
      "directed": false,
      "from_id": "100000000",
      "from_type": "Person",
      "to_id": "2199023260091",
      "to_type": "Person",
      "attributes": {
        "creationDate": "2020-10-01 00:00:00"
      }
    },
    {
      "e_type": "KNOWS",
      "directed": false,
      "from_id": "100000000",
      "from_type": "Person",
      "to_id": "15393162796846",
      "to_type": "Person",
      "attributes": {
        "creationDate": "2020-10-01 00:00:00"
      }
    }
  ]
}

DELETE

You can use delete () function to delete edges and vertices in ACCUM and POST-ACCUM clauses.

Top-levels DELETE statements are not yet supported in SYNTAX v2.
Edges can only be deleted in the ACCUM clause.
For best performance, vertices should be deleted in the POST-ACCUM clause.
To perform within-SELECT deletes, the FROM pattern can only be a single hop, fixed length pattern.

Example 3. Delete vertex Tiger Woods and its KNOWS edges.

USE GRAPH ldbc_snb

DROP QUERY  DeleteEdgeAndVertex

CREATE QUERY DeleteEdgeAndVertex () SYNTAX v2{

  R  =  SELECT t
        FROM Person:s -(KNOWS:e)- Person:t
        WHERE s.firstName == "Tiger" AND s.lastName == "Woods"
        ACCUM
           //delete edges
           DELETE(e)
        POST-ACCUM DELETE(s); //delete src vertex


  PRINT  R [R.id, R.firstName, R.lastName];
}

INSTALL QUERY DeleteEdgeAndVertex
RUN QUERY DeleteEdgeAndVertex()

To verify the result, you can use built-in REST calls.

curl -X GET "http://localhost:9000/graph/ldbc_snb/vertices/Person/100000000" | jq .
#vertexresults
{
  "version": {
    "edition": "enterprise",
    "api": "v2",
    "schema": 0
  },
  "error": true,
  "message": "The input vertex id '100000000' is not a valid vertex id for vertex type = Person.",
  "code": "601"
}

curl -X GET "http://localhost:9000/graph/ldbc_snb/edges/Person/100000000/KNOWS" | jq .
#edge results
{
  "version": {
    "edition": "enterprise",
    "api": "v2",
    "schema": 0
  },
  "error": true,
  "message": "The input source_vertex_id '100000000' is not a valid vertex id for vertex type = Person.",
  "code": "601"
}

Conjunctive Pattern Matching (Beta)

What is Conjunctive Pattern Matching?

So far, we have described pattern matching as one path pattern in a FROM clause. In this section, we introduce GSQL's capability to match multiple patterns in one FROM clause. This extension is called Conjunctive Pattern Matching (CPM), because the query asks for the conjunction (logical AND) of the patterns. To get a match, all of the patterns must be satisfied, and the patterns can interrelate. Visually, you can think of patterns formed by a set of intersecting line segments. This feature, introduced as a Beta feature in TigerGraph 3.0, enables you to express complex patterns concisely in a single query block.

In general, a CPM query block consists of multiple patterns in the FROM clause. It has a structure illustrated below.

# Conjunctive Pattern Matching Syntax

SelectBlock := SELECT alias 
               FROM pattern
               [sampleClause]
               [whereClause]
               [ [perClause] accumClause]
               [postAccumClause]*
                    ...
                    
pattern := vertexPattern | edgePattern | (pathPattern ["," pathPattern])
# vertexPattern and edgePattern are from classic GSQL

We elaborate on each of the clause.

SELECT Clause

The SELECT clause selects only one vertex alias from all the patterns in the FROM clause.

FROM Clause - Conjunctive Matching

This is where the conjunctive matching is expressed. The FROM clause consists of a list of path patterns, which are separated by commas. Evaluating each pattern against the underlying graph data produces a match table. If two patterns share a vertex alias, then we form the natural join of the two match tables.

For example, consider this CPM:

FROM X:x - (E1:e1) - Y:y - (E2>:e2) - Z:z,
     Z:z - (E3:e3) - U:u - (E4>:e4) - V:v

The first pattern's variables are x, e1, y, e2, and z; the second pattern's variables are z, e3, u, e4, and v. Considering the two patterns independently would yield the follwing match table schemas:

#first match table
(x, e1, y, e2, z)
#second match table
(z, e3, u, e4, v)

Natural Join of Match Tables

Natural joining two match tables compares all the shared vertex aliases between the two tables, and the resulting joined table contains all non-shared variables plus one copy of each of the shared vertex variables. Here is the match table for the CPM above:

#natural join result; the shared vertex variable z appears once.
(x, e1, y, e2, z, e3, u, e4, v)

Valid Conjunctive Patterns

The match table of the conjunctive pattern match is the natural join of all the patterns' match tables. By design, a row in the CPM match table must simultaneously satisfy all the match tables.

If the match tables of the patterns in a FROM clause can be naturally joined into one match table, then the FROM clause has a valid CPM input. Otherwise, the FROM clause has an invalid pattern input list.

For example, below we show two valid CPM inputs and one invalid CPM input.

# a valid CPM, since the two patterns natrually join on :tgt
SELECT
FROM Person:p - (KNOWS) - :tgt, 
     Post:s -(<LIKES) - :tgt 

# a valid CPM, since the two patterns naturally join on :f
SELECT
FROM Person:p - (KNOWS) - :f - (LIKES>) - Post:tgt, 
     :f - (LIKES>) - Comment:c 

# an invalid CPM, since the two patterns do not share any vertex variables.
# they cannot be naturally joined. 
# One pattern has (p, tgt); the other has (s, t). 
SELECT
FROM Person:p - (KNOWS) - :tgt, 
     Post:s - (<LIKES) - Person:t

WHERE Clause

The predicates in the WHERE clause can use any of the vertex or edge aliases in any of the patterns, including predicates which combine variables from different constituent paths. CPM queries do not have any special restrictions on the WHERE predicate. Distance matters, however, for performance. Conditions that are local, measured both cross-path and within-path, can be resolved earlier and therefore are faster.

In the example below, x2.age > x4.age is a cross-pattern predicate, e1.timestamp < e3.timestamp is a cross-pattern predicate, and x1.gender == x4.gender is a local predicate of the second pattern.

FROM X1:x1-(E1:e1)-X2:x2-(E2:e2)-X3:x3,
     X1:x1-(E3:e3)-X4:x4
WHERE x2.age > x4.age AND e1.timestamp < e3.timestamp AND x1.gender == x4.gender

ACCUM Clause

You can ACCUM to any vertex variable in a CPM block.

The ACCUM clause by default will execute as many times as the row (match) count of the CPM match table; each execution uses one row from the match table.

ACCUM To The Three Vertex Variables of A CPM Pattern

#accum to x1, x2 and x4. 
FROM X1:x1-(E1:e1)-X2:x2-(E2:e2)-X3:x3, 
     X1:x1-(E3:e3)-X4:x4
ACCUM x1.@cnt +=1, x2.@cnt += x3.quantity, x4.@cnt += x3.quantity

POST-ACCUM Clause

POST-ACCUM for CPM behaves the same as POST-ACCUM for single path patterns. That is, each POST-ACCUM clause can refer to one vertex alias, and the clause executes iteratively over that vertex set (e.g. one vertex column in the matching table). Its statements can access the aggregated accumulator result computed in the ACCUM clause. The query can have multiple POST-ACCUM clauses, one for each vertex alias you wish to work on. The multiple POST-ACCUM clauses are processed in parallel; it doesn't matter in what order you write them. (For each binding, the statements within a clause are executed in order.)

For example, below we have three POST-ACCUM clauses. The first one iterates throughx1, and for each x1, we do @@cnt += x1.@cnt. The second and third POST-ACCUMs iterate through x2 and x3 respectively, and accumulates their @cnt accumulator value into @@cnt.

POST-ACCUM to a global accumulator @@cnt, using three CPM Vertex Variables


FROM X1:x1-(E1:e1)-X2:x2-(E2:e2)-X3:x3, 
     X1:x1-(E3:e3)-X4:x4
ACCUM x1.@cnt +=1, x2.@cnt += x3.quantity, x4.@cnt += x3.quantity
POST-ACCUM @@cnt += x1.@cnt
POST-ACCUM @@cnt += x2.@cnt
POST-ACCUm @@cnt += x3.@cnt;

Examples

Example 1. Find Viktor Akhiezer's liked messages (100+ days after their creation) whose author's last name begin with letter S. Output the message's forum.

USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2 {

  SumAccum<int> @@cnt;

  F  =  SELECT f
        FROM Person:s - (LIKES>:e1) - :msg - (HAS_CREATOR>) - Person:t, 
             Forum:f - (CONTAINER_OF>:e2) - :msg
        WHERE s.firstName == "Viktor" AND s.lastName == "Akhiezer" 
              AND t.lastName LIKE "S%" 
              AND e1.creationDate >DATETIME_ADD(msg.creationDate, INTERVAL 100 DAY);

  PRINT F;
}

#result
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"F": [{
    "v_id": "962072688797",
    "attributes": {
      "id": 962072688797,
      "title": "Album 12 of Mario Santos",
      "creationDate": "2011-04-12 09:36:50"
    },
    "v_type": "Forum"
  }]}]
}

Example 2. Find any authors who wrote posts that Viktor Akhiezer's liked and whose last name begins with S. Find the country for each of these authors and report on the countries.

USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2 {

  SumAccum<int> @@cnt;

  C  =  SELECT ctry
        FROM Person:s - (LIKES>:e1) - Post:msg - (HAS_CREATOR>) - Person:t,
             :t - (WORK_AT>:e2) - Company:c, 
             :c - (IS_LOCATED_IN>) - Country:ctry
        WHERE s.firstName == "Viktor" AND s.lastName == "Akhiezer"
              AND t.lastName LIKE "S%" ;

  PRINT C;
}

#result
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"C": [{
    "v_id": "93",
    "attributes": {
      "name": "Portugal",
      "id": 93,
      "url": "http://dbpedia.org/resource/Portugal"
    },
    "v_type": "Country"
  }]}]
}

Example 3. Given a TagClass and a Country, find all the Forums created in the given Country, containing at least one Post with Tags belonging directly to the given TagClass. The location of a Forum is identified by the location of the Forum’s moderator.

USE GRAPH ldbc_snb

DROP QUERY bi_4

CREATE QUERY bi_4(string tcName, string cName) for graph ldbc_snb syntax v2 {
  SetAccum<vertex<Post>> @postSet;
  SumAccum<int> @personId, @postCount;

  ForumSet =
    SELECT f
    FROM Forum:f -(HAS_MODERATOR>)- Person:a -(IS_LOCATED_IN>.IS_PART_OF>)- Country:c,
         :f -(CONTAINER_OF>)- Post:p -(HAS_TAG>.HAS_TYPE>)- TagClass:tc
    WHERE c.name == cName and tc.name == tcName
    ACCUM f.@personId = a.id, f.@postSet += p
    POST-ACCUM f.@postCount = f.@postSet.size(), f.@postSet.clear()
    ORDER BY f.@postCount DESC, f.id ASC
    LIMIT 3;

  PRINT ForumSet[ForumSet.id, ForumSet.title, ForumSet.creationDate,
                 ForumSet.@personId, ForumSet.@postCount];
}

INSTALL QUERY bi_4

RUN QUERY bi_4("MusicalArtist", "Burma")

#result
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"ForumSet": [
    {
      "v_id": "81903",
      "attributes": {
        "ForumSet.title": "Wall of Donald Steele-Perkins",
        "ForumSet.@personId": 5226,
        "ForumSet.id": 81903,
        "ForumSet.@postCount": 65,
        "ForumSet.creationDate": "2010-02-15 06:48:04"
      },
      "v_type": "Forum"
    },
    {
      "v_id": "137438953686",
      "attributes": {
        "ForumSet.title": "Wall of Eric Law-Yone",
        "ForumSet.@personId": 2199023262994,
        "ForumSet.id": 137438953686,
        "ForumSet.@postCount": 65,
        "ForumSet.creationDate": "2010-04-25 22:10:32"
      },
      "v_type": "Forum"
    },
    {
      "v_id": "687194810508",
      "attributes": {
        "ForumSet.title": "Wall of Hector Hugh Michie",
        "ForumSet.@personId": 10995116283784,
        "ForumSet.id": 687194810508,
        "ForumSet.@postCount": 39,
        "ForumSet.creationDate": "2010-12-19 15:33:30"
      },
      "v_type": "Forum"
    }
  ]}]
}

Example 4. For a given country, count all the distinct triples of Persons such that:

a is a friend of b.
b is a friend of c
c is a friend of a.

Distinct means that if a certain 3 vertices appear once in the results, it will not be repeated: it will appear only once. KNOWS is an undirected relationship, so it doesn't matter in what order we list the 3 vertices.


USE GRAPH ldbc_snb

CREATE QUERY bi_17(string cName) FOR GRAPH ldbc_snb SYNTAX v2 {
  TYPEDEF TUPLE <uint a, uint b, uint c> triplet;
  SetAccum<triplet> @@tripletSet;
  SumAccum<int> @@tripletCount;

  C =
    SELECT c
    FROM Country:c -(<IS_PART_OF.<IS_LOCATED_IN)- Person:p1,
         :c -(<IS_PART_OF.<IS_LOCATED_IN)- Person:p2,
         :c -(<IS_PART_OF.<IS_LOCATED_IN)- Person:p3,
         :p1 -(KNOWS)- :p2 -(KNOWS)- :p3 -(KNOWS)- :p1
    WHERE c.name == cName AND p1.id < p2.id AND p2.id < p3.id
    ACCUM @@tripletSet += triplet(p1.id, p2.id, p3.id);

  @@tripletCount = @@tripletSet.size();
  @@tripletSet.clear();
  PRINT @@tripletCount;
}


INSTALL QUERY bi_17

RUN QUERY bi_17("Spain")

#result
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"@@tripletCount": 242}]
}

More Examples. We translated LDBC-SNB BI and IC queries using CPM, and shared the translation in github. Please refer to the query translation here. Most of the queries are installed as functions, you can find sample parameter(s) of the functions from here.

Source Vertex Set Flexibility

As mentioned when we first described pattern matching, in One-hop patterns, the source (leftmost) vertex set can be a vertex type, an alternation of types, or even omitted.

Example 1. Find Viktor Akhiezer's favorite messages' creators whose last name begins with letter S. Count them.


USE GRAPH ldbc_snb

#start from a vertex type "Person"
INTERPRET QUERY () SYNTAX v2 {

  SumAccum<int> @@cnt;

  F  =  SELECT t
        FROM Person:s -(LIKES>:e1)- :msg -(HAS_CREATOR>)- Person:t
        WHERE s.firstName == "Viktor" AND s.lastName == "Akhiezer" 
              AND t.lastName LIKE "S%"
        POST-ACCUM @@cnt+=1;

  PRINT  @@cnt;

}
#result
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"@@cnt": 8}]
}

Example 2. Same query as example 1, but without beginning with vertex types. GSQL compiler can infer the types of :s.

USE GRAPH ldbc_snb

#both end points of the pattern do not have vertex types.
INTERPRET QUERY () SYNTAX v2 {

  SumAccum<int> @@cnt;

  F  =  SELECT t
        FROM :s -(LIKES>:e1)- :msg -(HAS_CREATOR>)- :t
        WHERE s.firstName == "Viktor" AND s.lastName == "Akhiezer" AND t.lastName LIKE "S%"
        POST-ACCUM @@cnt+=1;

  PRINT  @@cnt;

}
#result
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"@@cnt": 8}]
}

Example 3. Count the LIKES edge.

USE GRAPH ldbc_snb

# a pattern starts without any information.
INTERPRET QUERY () SYNTAX v2 {

  SumAccum<int> @@cnt;

  F  =  SELECT msg
        FROM  -(LIKES>:e1)- :msg
        ACCUM @@cnt+=1;

  PRINT  @@cnt;

}
#result
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"@@cnt": 2190095}]
}

Summary

We have covered a lot of territory in GSQL 102:

Showed how to invoke GSQL Pattern Matching syntax
Explained how Pattern Matching extends the classic FROM clause grammar:
- Each hop can be a choice of multiple, individually directed edge types
- The Kleene star and a min...max range enable each hop to be repeated.
- GSQL automatically finds the shortest paths that satisfy a variable length path.
- A virtual match table has a column for each vertex or edge alias in a multi-hop path, and a row for each graph path that satisfies the pattern.
- The ACCUM clause iterates on each row in the match table.
- A POST-ACCUM clause iterates on one vertex alias; a query can have multiple POST-ACCUM clauses.
Described the improvements to Pattern Matching in TigerGraph 3.0:
- The source (leftmost) vertex set can be specified with the same flexibility as the other vertex sets: a vertex type, an alteration of types, or omitted. Explicit seed sets are no longer needed
- Restrictions on which vertex aliases may be used in the ACCUM clause have been lifted.
- Described three major advanced options:
  - The PER <vertex alias> clause enables users to fine tune the ACCUM iteration.
  - Data modification (insert, update, delete) are now supported.
  - Conjunctive Pattern Matching let users express a complex pattern as a set of path patterns which must all be satisfied.
Provided best practices for writing queries, especially pattern matching queries:
- Put the smaller vertex set on the left end.
- Specify all vertex and edge types explicitly.
- Use the PER clause to reduce the match table size
Provided numerous examples and the full set of LDBC Social Network benchmark queries.

With a little practice, you will be writing GSQL pattern matching queries to efficiently solve real-world problems. You can post your feedback and questions on the GSQL community forum. Our community members and developers love to hear any feedback from your graph journey of using GSQL and are ready to help clarifying any doubts.

Multiple Hop Patterns and Accumulation

Multiple Hop Pattern Shortest Path Semantics

Repeating the same hop is useful sometimes, but the real power of pattern matching comes from expressing multi-hop patterns, with specific characteristics for each hop. For example, the well-known product recommendation phrase "People who bought this product also bought this other product", is expressed by the following 2-hop pattern:

FROM This_Product:p -(<Bought:b1)- Customer:c -(Bought>:b2)- Product:p2
WHERE p2 != p

As you see, a 2-hop pattern is a simple concatenation and merging of two 1-hop patterns where the two patterns share a common endpoint. Below, Y:y is the connecting end point.

2-hop pattern

FROM X:x - (E1:e1) - Y:y - (E2>:e2) - Z:z

Similarly, a 3-hop pattern concatenates three 1-hop patterns in sequence, each pair of adjacent hops sharing one end point. Below, Y:y and Z:z are the connecting end points.

3-hop pattern

FROM X:x - (E2>:e2) - Y:y - (<E3:e3) - Z:z - (E4:e4) - U:u

In general, we can connect N 1-hop patterns into a N-hop pattern. The database will search the graph topology to find subgraphs that match this N-hop pattern.

Unnamed Intermediate Vertex Set

A multi-hop pattern has two endpoint vertex sets and one or more intermediate vertex sets. If the query does not need to express any conditions for an intermediate vertex set, then the vertex set can be omitted and the two surrounding edge sets can be joined with a simple ".". For example, in the 2-hop pattern example above, if we do not need to specify the type of the intermediate vertex Y, nor need to refer to it in any of the query's other clauses (such as WHERE or ACCUM), then the pattern can be reduced as follows:

FROM X:x - (E1.E2>) - Z:z

Note that when we abbreviate that path in this way, we do not support aliases for the edges or intermediate vertices in the abbreviated section.

Shortest Paths Only for Variable Length Patterns

If a pattern has a Kleene star to repeat an edge, GSQL pattern matching selects only the shortest paths which match the pattern. If we did not apply this restriction, computer science theory tells us that the computation time could be unbounded or extreme (NP = non-polynomial, to be technical). If we instead matched ALL paths regardless of length when a Kleene star is used without an upper bound, there could be an infinite number of matches, if there are loops in the graph. Even without loops or with an upper bound, the number of paths to check grows exponentially with the number of hops.

For the pattern 1 - (_*) - 5 in Figure 3 above, you can see the following:

There are TWO shortest paths: 1-2-3-4-5 and 1-2-6-4-5
- These have 4 hops, so we can stop searching after 4 hops. This makes the task tractable.
If we search for ALL paths which do not repeat any vertices:
- There are THREE non-repeated-vertex paths: 1-2-3-4-5, 1-2-6-4-5, and 1-2-9-10-11-12-4-5
- The actual number of matches is small, but the number of paths is theoretically very large.
If we search for ALL paths which do not repeat any edges:
- There are FOUR non-repeated-edge paths: 1-2-3-4-5, 1-2-6-4-5, 1-2-9-10-11-12-4-5, and 1-2-3-7-8-3-4-5
- The actual number of matches is small, but number of paths to consider is NP.
If we search for ALL paths with no restrictions:
- There are an infinite number of matches, because we can go around the 3-7-8-3 cycle any number of times.

Additional Details about Pattern Matching

In the early version of Pattern Matching (TigerGraph v2.4 to v2.6), there were a number of restrictions on the WHERE, ACCUM and POST-ACCUM clauses In TigerGraph 3.0, most of these restrictions are lifted.

Each vertex set or edge set in a pattern (except edges with Kleene stars) can have an alias variable associated with it. When the query runs and finds matches, it associates, or binds, each alias to the matching vertices or edges in the graph.

SELECT Clause

The SELECT clause specifies the output vertex set of a SELECT statement. For a multiple-hop pattern, we can select any vertex alias in the pattern. The example below shows the 4 possible choices for the given pattern:

SELECT Clause Can Select Any Vertex Alias

#select starting end point x
SELECT x
FROM X:x-(E2>:e2)-Y:y-(<E3:e3)-Z:z-(E4:e4)-U:u;

#select y
SELECT y
FROM X:x-(E2>:e2)-Y:y-(<E3:e3)-Z:z-(E4:e4)-U:u;

#select z
SELECT z
FROM X:x-(E2>:e2)-Y:y-(<E3:e3)-Z:z-(E4:e4)-U:u;

#select ending end point u
SELECT u
FROM X:x-(E2>:e2)-Y:y-(<E3:e3)-Z:z-(E4:e4)-U:u;

FROM Clause

For a multiple-hop pattern, if you don't need to refer to the intermediate vertex points, you can just use "." to connect the edge patterns, giving a more succinct representation. For example, below we remove y and z, and connect E2>, <E3 and E4 using the period symbol. Note that you cannot have an alias for a multi-hop sequence like E2>.<E3.E4.

Omitting intermediate vertex y and z.

#select starting end point x
SELECT x
FROM X:x-(E2>:e2)-Y:y-(<E3:e3)-Z:z-(E4:e4)-U:u;

#if we don't need to access y, z, we can write
SELECT u
FROM X:x-(E2>.<E3.E4)-U:u;

WHERE Clause

Beginning with TigerGraph v3.0, each predicate (simple true/false condition) can refer to any of the aliases in the path. As with any database query, more complex conditions may not be as performant as simpler queries with simpler, more local predicate conditions. Consider the pattern and query below:

FROM X1:x1-(E1:e1)-X2:x2-(E2:e2)-X3:x3-(E3:e3)-X4:x4

WHERE Clause Support both Local Predicate and Cross-hop predicates.

# (x1, e1, x2) belongs to the 1st-hop
# (x2, e2, x3) belongs to the 2nd-hop
# (x3, e3, x4) belongs to the last-hop
# below x1.age > x2.age is a local predicate
# x2.@cnt != x4.@cnt is a cross-hop predicate
# (x1.salary + x3.salary) < x4.salary is a cross-hop predicate
SELECT x
FROM X1:x1-(E1:e1)-X2:x2-(E2:e2)-X3:x3-(E3:e3)-X4:x4
WHERE x1.age>x2.age AND x2.@cnt!=x4.@cnt AND (x1.salary+x3.salary)<x4.salary

Path Patterns as a Regular Expression Language

GSQL's pattern matching syntax provides the essentials for a regular expression language for paths in graphs. Consider the three basic requirements for a regular expression language:

The empty set --> A path of length zero (no match)
Concatenation --> Form a path by adding one on two another. You can write an N-hop pattern, and M-hop pattern, and then combine them to have a (N+M)-hop pattern.
Alternation (either-or) --> You can use alternation for both vertex sets and edge sets, e.g. FROM (Source1 | Source2) -(Edge1> | <Edge 2)- (Target1 | Target2) Note: This is not the same as FROM (Source1 -(Edge1>)- Target 1) | (Source2 -(<Edge2)- Target 2) The latter can be achieved by writing two SELECT query blocks and getting the UNION of their results.

Working with Your Pattern Matches

The point of pattern matching is to identity sets of graph entities that match your input pattern. Once you've done that, GSQL enables you to do advanced and efficient computation on that data, from simply counting the matches to advanced algorithms and analytics. This section compares accumulation in the current Pattern Matching syntax to earlier versions, but it does not attempt to explain accumulators in full. You may want to consult the Accumulators Tutorial and and the GSQL Language Reference's section on the ACCUM and POST-ACCUM clauses.

ACCUM Clause

TigerGraph 3.0 removes the Pattern Matching (SYNTAX v2)-related restrictions on the ACCUM and POST-ACCUM clause.

Just as in classic GSQL syntax, the ACCUM clause it executed once (in parallel) for each set of vertices and edges in the graph which match the pattern and constraints given in the FROM and WHERE clauses. You can think of FROM-WHERE as producing a virtual table. The columns of this matching table are the alias variables from the FROM clause pattern, and the rows are each possible set of vertex and edge aliases (e.g. a path) which fit the pattern.

A simple pattern 1-hop pattern, which could be syntax v1 or v2, like this:

FROM Person:A -(IS_LOCATED_IN:B)- City:C

produces a match table with 3 columns: A, B, and C. Each row is a tuple (A,B,C) where there is a has_lived_in edge B from a Person vertex A to a City vertex C. We say that the match table provides a binding between the pattern aliases and graph's vertices and edges. A multi-hop pattern simply has more columns than a 1-hop pattern.

The ACCUM clause iterates through ALL matches. If you do not have an alias on every vertex in the pattern, then the number of distinct matches may be less than that number of matches.

For, example, consider

FROM Person:A -(KNOWS.KNOWS)- Person.C 
WHERE C.email = "Andy@www.com"
ACCUM C.@patternCount += 1

This asks who are the friends of friends of Andy@www.com. Suppose Andy knows 3 persons (Larry, Moe, and Curly) who know Wendy. The accumulator C.@patternCount will be incremented 3 times for C = Wendy. This is similar to a SQL SELECT C, COUNT(*) ... GROUP BY C query. There is no alias for the vertex in the middle of KNOWS.KNOWS so the identities of Larry, Moe, and Curly cannot be reported.

POST-ACCUM Clause

As of TigerGraph 3.0, Pattern Matching (V2) syntax supports multiple POST-ACCUM clauses.

At the end of the ACCUM clause, all the requested accumulation (+=) operators are processed in bulk, and the updated values are now visible. You can now use POST-ACCUM clauses to perform a second, different round of computation on the results of your pattern matching.

The ACCUM clause executes for each full path that matches the pattern in the FROM clause. In contrast, the POST-ACCUM clause executes for each vertex in one vertex set (e.g. one vertex column in the matching table); its statements can access the aggregated accumulator result computed in the ACCUM clause. New for v3.0, if you want to perform per-vertex updates for more than one vertex alias, you should use a separate POST-ACCUM clause for each vertex alias. The multiple POST-ACCUM clauses are processed in parallel; it doesn't matter in what order you write them. (For each binding, the statements within a clause are executed in order.)

For example, below we have two POST-ACCUM clauses. The first one iterates through s, and for each s, we do s.@cnt2 += s.@cnt1. The second POST-ACCUM iterations through t.

USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2 {

  SumAccum<int> @cnt1;
  SumAccum<int> @cnt2;

  R   =  SELECT s
         FROM Person:s-(LIKES>) -:msg - (HAS_CREATOR>)-Person:t
         WHERE s.firstName == "Viktor" AND s.lastName == "Akhiezer"
               AND t.lastName LIKE "S%" AND year(msg.creationDate) == 2012
         ACCUM s.@cnt1 +=1 //execute this per match of the FROM pattern.
         POST-ACCUM s.@cnt2 += s.@cnt1 //execute once per s. 
         POST-ACCUM t.@cnt2 +=1;//execute once per t

  PRINT R [R.firstName, R.lastName, R.@cnt1, R.@cnt2];
}

which produces the result

Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [
    {"R": [{
      "v_id": "28587302323577",
      "attributes": {
        "R.firstName": "Viktor",
        "R.@cnt1": 3,
        "R.lastName": "Akhiezer",
        "R.@cnt2": 3
      },
      "v_type": "Person"
    }]},
  ]
}

However, the following is not allowed, since it involves two aliases (t and s) in one POST-ACCUM clause.

 POST-ACCUM t.@cnt1 += 1, 
            s.@cnt1 += 1

Also, you may not use more than one alias in a single assignment. The following is not allowed:

 POST-ACCUM t.@cnt1 += s.@cnt + 1

Examples of Multiple Hop Pattern Match

Example 1. Find the 3rd superclass of the Tag class whose name is "TennisPlayer".

Example1. Succinct Representation Of Multiple-hop Pattern

USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2 {

  TagClass1 = 
       SELECT t
       FROM TagClass:s-(IS_SUBCLASS_OF>.IS_SUBCLASS_OF>.IS_SUBCLASS_OF>)-TagClass:t
       WHERE s.name == "TennisPlayer";

  PRINT TagClass1;
}

You can copy the above GSQL script to a file named example1.gsql, and invoke this script file in a Linux shell.

Linux Bash

gsql example1.gsql

Output of Example 1

Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"TagClass2": [{
    "v_id": "239",
    "attributes": {
      "name": "Agent",
      "id": 239,
      "url": "http://dbpedia.org/ontology/Agent"
    },
    "v_type": "TagClass"
  }]}]
}

Example 2. Find in which continents were the 3 most recent messages in Jan 2011 created.

Example1. Disjunction In A Succinct Representation Of Multiple-hop Pattern

USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2{

  SumAccum<String> @continentName;

  accMsgContinent = 
                 SELECT s
                 FROM (Comment|Post):s-(IS_LOCATED_IN>.IS_PART_OF>)-Continent:t
                 WHERE year(s.creationDate) == 2011 AND month(s.creationDate) == 1
                 ACCUM s.@continentName = t.name
                 ORDER BY s.creationDate DESC
                 LIMIT 3;

  PRINT accMsgContinent;
}

You can copy the above GSQL script to a file named example2.gsql, and invoke this script file in a Linux shell.

Linux Bash

gsql example2.gsql

Output of Example 2

Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"accMsgContinent": [
    {
      "v_id": "824640012997",
      "attributes": {
        "browserUsed": "Firefox",
        "length": 7,
        "locationIP": "27.112.21.246",
        "@continentName": "Asia",
        "id": 824640012997,
        "creationDate": "2011-01-31 23:54:28",
        "content": "no way!"
      },
      "v_type": "Comment"
    },
    {
      "v_id": "824636727408",
      "attributes": {
        "browserUsed": "Firefox",
        "length": 3,
        "locationIP": "31.2.225.17",
        "@continentName": "Europe",
        "id": 824636727408,
        "creationDate": "2011-01-31 23:57:46",
        "content": "thx"
      },
      "v_type": "Comment"
    },
    {
      "v_id": "824634837528",
      "attributes": {
        "imageFile": "",
        "browserUsed": "Internet Explorer",
        "length": 115,
        "locationIP": "87.251.6.121",
        "@continentName": "Asia",
        "id": 824634837528,
        "creationDate": "2011-01-31 23:58:03",
        "lang": "tk",
        "content": "About Adolf Hitler, iews. His writings and methods were often adapted to need and circumstance, although there were"
      },
      "v_type": "Post"
    }
  ]}]
}

Example 3. Find Viktor Akhiezer's favorite author of 2012 whose last name begins with character 'S'. Also find how many LIKES Viktor has given to the author's post or comment.

Example 3. Multiple-hop Pattern With Accumulator Applied To All Matched Paths

USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2{
  SumAccum<int> @likesCnt;

  FavoriteAuthors = 
            SELECT t
            FROM Person:s-(LIKES>) -:msg - (HAS_CREATOR>)-Person:t
            WHERE s.firstName == "Viktor" AND s.lastName == "Akhiezer"
                      AND t.lastName LIKE "S%" AND year(msg.creationDate) == 2012
            ACCUM t.@likesCnt +=1;

  PRINT FavoriteAuthors[FavoriteAuthors.firstName, FavoriteAuthors.lastName, FavoriteAuthors.@likesCnt];
}

You can copy the above GSQL script to a file named example3.gsql, and invoke this script file in a Linux shell.

Linux Bash

gsql example3.gsql

Output of Example 3

Using graph 'ldbc_snb'
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"FavoriteAuthors": [
    {
      "v_id": "8796093025410",
      "attributes": {
        "FavoriteAuthors.firstName": "Priyanka",
        "FavoriteAuthors.lastName": "Singh",
        "FavoriteAuthors.@likesCnt": 1
      },
      "v_type": "Person"
    },
    {
      "v_id": "2199023260091",
      "attributes": {
        "FavoriteAuthors.firstName": "Janne",
        "FavoriteAuthors.lastName": "Seppala",
        "FavoriteAuthors.@likesCnt": 1
      },
      "v_type": "Person"
    },
    {
      "v_id": "15393162796846",
      "attributes": {
        "FavoriteAuthors.firstName": "Mario",
        "FavoriteAuthors.lastName": "Santos",
        "FavoriteAuthors.@likesCnt": 1
      },
      "v_type": "Person"
    }
  ]}]
}

Multi-Block Queries

We have shown how complex multi-hop patterns, containing even a conjunctive of patterns, can be expressed in a single FROM clause of a single SELECT query. There are times, however, when it is better or necessary to write query as more than one SELECT block. This could be because of the need to do computation and decision matching in stages, to make the query easier to read, or to optimize performance.

Regardless of the reason, GSQL has always supported writing procedural queries containing multiple SELECT query blocks. Moreover, each SELECT statement outputs a vertex set. This vertex set can be used in the FROM clause of an subsequence SELECT block.

For example, if Set1, Set2, and Set3 were the outputs of three previous SELECT blocks in this query, then each of these FROM clauses can take place later in the query:

FROM Set1:x1 -(mh1)- :x2 -(mh2)- Set3:x3
FROM :x1 -(mh1)- :x2 -(mh2)- Set3:x3
FROM Set2:x1 -(mh1)- :x2 -(mh2)- Set2:x3

Example 1. Find Viktor Akhiezer's liked messages' authors, whose last name starts with letter S. Find these authors alumni count.

USE GRAPH ldbc_snb

# a computed vertex set F is used to constrain the second pattern.
INTERPRET QUERY () SYNTAX v2 {

  SumAccum<int> @@cnt;

  F  =  SELECT t
        FROM :s -(LIKES>:e1)- :msg -(HAS_CREATOR>)- :t
        WHERE s.firstName == "Viktor" AND s.lastName == "Akhiezer" AND t.lastName LIKE "S%";

  Alumni = SELECT p
           FROM Person:p -(STUDY_AT>) -:u - (<STUDY_AT)- F:s
           WHERE s != p
           Per (p)
           POST-ACCUM @@cnt+=1;


  PRINT @@cnt;

}

#result
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"@@cnt": 216}]
}

Example 2. Find Viktor Akhiezer's liked posts' authors A, and his liked comments' authors B. Count the common universities that both A and B have members studied at.

USE GRAPH ldbc_snb

#A and B are used to constraint the third pattern.
INTERPRET QUERY () SYNTAX v2 {

  SumAccum<int> @@cnt;

  A  =  SELECT t
        FROM :s -(LIKES>:e1)- Post:msg -(HAS_CREATOR>)- :t
        WHERE s.firstName == "Viktor" AND s.lastName == "Akhiezer" ;


  B  =  SELECT t
        FROM :s -(LIKES>:e1)- Comment:msg -(HAS_CREATOR>)- :t
        WHERE s.firstName == "Viktor" AND s.lastName == "Akhiezer" ;

  Univ = SELECT u
         FROM A:p -(STUDY_AT>) -:u - (<STUDY_AT)- B:s
         WHERE s != p
         Per (u)
         POST-ACCUM @@cnt+=1;


  PRINT @@cnt;

}

#result
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"@@cnt": 4}]
}

Example 3. Find Viktor Akhiezer's liked posts' authors A. See how many pair of persons in A that one person likes a message authored by another person.

USE GRAPH ldbc_snb

# a computed vertex set A is used twice in the second pattern.
INTERPRET QUERY () SYNTAX v2 {

  SumAccum<int> @@cnt;

  A  =  SELECT t
        FROM :s -(LIKES>:e1)- Post:msg -(HAS_CREATOR>)- :t
        WHERE s.firstName == "Viktor" AND s.lastName == "Akhiezer" ;

  A = SELECT p
      FROM A:p -(LIKES>) -:msg - (HAS_CREATOR>) - A:p2
      WHERE p2 != p
      Per (p, p2)
      ACCUM @@cnt +=1;


  PRINT @@cnt;

}

#result
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"@@cnt": 14833}]
}

Example 4. Find how many messages are created and liked by the same person whose first name begins with letter T.

USE GRAPH ldbc_snb

# the same alias is used twice in a pattern
INTERPRET QUERY () SYNTAX v2 {

  SumAccum<int> @@cnt;

  A  =  SELECT msg
        FROM :s -(LIKES>:e1)- :msg -(HAS_CREATOR>)- :s
        WHERE s.firstName LIKE "T%"
        PER (msg)
        ACCUM @@cnt +=1;


  PRINT @@cnt;

}
#result 
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [{"@@cnt": 207}]
}

#to further verify, we picked one message from the above query result.
#see if there exists a person who like her own message.
INTERPRET QUERY () SYNTAX v2 {

    R = SELECT s
        FROM :msg -(HAS_CREATOR>)- :s
        WHERE msg.id == 1374390714042;

    T =  SELECT s
         FROM R:s -(LIKES>)- :msg
         WHERE msg.id == 1374390714042;

  PRINT R;
  PRINT T;

}

#result
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "edition": "enterprise",
    "api": "v2"
  },
  "results": [
    {"R": [{
      "v_id": "13194139533433",
      "attributes": {
        "birthday": "1985-11-26 00:00:00",
        "firstName": "Taras",
        "lastName": "Kofler",
        "gender": "female",
        "speaks": [
          "uk",
          "ro",
          "en"
        ],
        "browserUsed": "Internet Explorer",
        "locationIP": "31.131.28.133",
        "id": 13194139533433,
        "creationDate": "2011-01-29 01:14:27",
        "email": [
          "Taras13194139533433@gmail.com",
          "Taras13194139533433@yahoo.com"
        ]
      },
      "v_type": "Person"
    }]},
    {"T": [{
      "v_id": "13194139533433",
      "attributes": {
        "birthday": "1985-11-26 00:00:00",
        "firstName": "Taras",
        "lastName": "Kofler",
        "gender": "female",
        "speaks": [
          "uk",
          "ro",
          "en"
        ],
        "browserUsed": "Internet Explorer",
        "locationIP": "31.131.28.133",
        "id": 13194139533433,
        "creationDate": "2011-01-29 01:14:27",
        "email": [
          "Taras13194139533433@gmail.com",
          "Taras13194139533433@yahoo.com"
        ]
      },
      "v_type": "Person"
    }]}
  ]
}