Workflow and Notation

The GSQL™ software program is the TigerGraph comprehensive environment for designing graph schemas, loading and managing data to build a graph, and querying the graph to perform data analysis. In short, TigerGraph users do most of their work via the GSQL program. This document presents the syntax and features of the GSQL language.

This document is a reference manual, not a tutorial. The user should read GSQL Demo Examples prior to using this document. There are also User Guides or Tutorials for particular aspects of the GSQL environment. This document is best used when the reader already has some basic familiarity with running GSQL and then wants a more detailed understanding of a particular topic.

This document is Part 1 of the GSQL Language Reference, which describes system basics, defining a graph schema, and loading data. Part 2 describes querying.

A handy GSQL Reference Card lists the syntax for the most commonly used GSQL commands for graph definition and data loading . Look for the reference card on our User Document home page.

GSQL Workflow

The GSQL workflow has four major steps:

  1. Define a graph schema or model.

  2. Load data into the TigerGraph system.

  3. Create and install queries.

  4. Run queries.

After initial data and queries have been installed, the user can run queries or go back to load more data and create additional queries. This document provides specifications and details for steps 1 and 2. The Appendix contains flowcharts which provide a visual understanding of the required and allowed sequence of commands to proceed through the workflow.

Language Basics

  • Identifiers Identifiers are user-defined names. An identifier consists of letters, digits, and the underscore. Identifiers may not begin with a digit. Identifiers are case sensitive.

  • Keywords and Reserved Words Keywords are words with a predefined semantic meaning in the language. Keywords are not case sensitive. Reserved words are set aside for use by the language, either now or in the future. Reserved words may not be reused as user-defined identifiers. In most cases, a keyword is also a reserved word. For example, VERTEX is a keyword. It is also a reserved word, so VERTEX may not be used as an identifier.

  • Statements Each line corresponds to one statement (except in multi-line mode). Usually, there is no punctuation at the end of a top-level statement. Some statements, such as CREATE LOADING JOB, are block statements which enclose a set of statements within themselves. Some punctuation may be needed to separate the statements within a block.

  • Comments Within a command file, comments are text that is ignored by the language interpreter. Single line comments begin with either # or //. A comment may be on the same line with interpreted code . Text to the left of the comment marker is interpreted, and text to the right of the marker is ignored. Multi-line comment blocks begin with /* and end with */

Documentation Notation

In the documentation, code examples are either template code (formally describing the syntax of part of the language) or actual code examples . Actual code examples show code that can be run exactly as shown, e.g., copy-and-paste. Template code, on the other hand, cannot be run exactly as shown because it uses placeholder names and additional symbols to explain the syntax. It should be clear from context whether an example is template code or actual code.

This guide uses conventional notation for software documentation. In particular, note the following:

  • Shell prompts Most of the examples in this document take place within the GSQL shell. When clarity is needed, the GSQL shell prompt is represented by a greater-than arrow: > When a command is to be issued from the operating system, outside of the GSQL shell, the prompt is the following: os$

  • Keywords In the GSQL language, keywords are not case sensitive, but user-defined identifiers are case sensitive. In code examples, keywords are in ALL CAPS to make clear the distinction between keywords and user-defined identifiers.

In a very few cases, some option keywords are case-sensitive. For example, in the command to delete all data from the graph store, clear graph store -HARD

the option -HARD must be in all capital letters.

  • Placeholder identifiers and values In template code, any token that is not a keyword, a literal value, or punctuation is a placeholder identifier or a placeholder value. Example:

CREATE UNDIRECTED EDGE edge_type_name (FROM vertex_type_name1 , TO vertex_type_name2 , 
attribute_name type [DEFAULT default_value ],...) 

The user-defined identifiers are edge_type_ name , vertex_type_name1, vertex_type_name2, attribute_name and default_value . As explained in the Create Vertex section, type is one of the attribute data types.

  • Quotation Marks When quotation marks are shown, they are to be typed as shown (unless stated otherwise). A placeholder for a string value will not have quotation marks in the template code, but if a template is converted to actual code, quotation marks should be used around string values.

  • Choices The vertical bar | is used to separate the choices, when the syntax requires that the user choose one out of a set of values. Example: Either the keyword VERTEX or EDGE is to be used. Also, note the inclusion of quotation marks.

    Template:

LOAD " file_path " TO VERTEX|EDGE object_type_name VALUES (id_expr, attr_expr1 , attr_expr2 ,...)

Possible actual values:

LOAD "data/users.csv" TO VERTEX user VALUES ($0, $1, $2)
  • Optional content Square brackets are used to enclose a portion that is optional. Options can be nested. Square brackets themselves are rarely used as part of the GSQL language itself. Example: In the RUN JOB statement, the -n flag is optional. If used, -n is to be followed by a value.

RUN JOB [-n count ] job_name 

Sometimes, options are nested, which means that an inner option can only be used if the outer option is used:

RUN JOB [-n [ first_line_num , ] last_line_num ] job_name 

means that first_line_num may be specified if and only if last_line_num is specified first. These options provide three possible forms for this statement:

RUN JOB job_name 
RUN JOB -n last_line_num job_name 
RUN JOB -n first_line_num , last_line_num job_name 
  • Repeated zero or more times In template code, it is sometimes desirable to show that a term is repeated an arbitrary number of times. For example, a vertex definition contains zero or more user-defined attributes. A loading job contains one or more LOAD statements. In formal template code, if an asterisk (Kleene star) immediately follows option brackets, then the bracketed term can be repeated zero or more times. For example:

TO VERTEX|EDGE object_name VALUES ( id_expr [, attr_expr ]*)

means that the VALUES list contains at least one attribute expression. It may be followed by any number of additional attribute expressions. Each additional attribute expression must be preceded by a comma.

  • Long lines

    For more convenient display, long statements in this guide may sometimes be displayed on multiple lines. This is for display purposes only; the actual code must be entered as a single line (unless the multi-line mode is used). When necessary, the examples may show a shell prompt before the start of a statement, to clearly mark where each statement begins. Example: A SELECT query is grammatically a single statement, so GSQL requires that it be entered as a single line.

Long statement displayed as one line
SELECT *|attribute_name FROM vertex_type_name [WHERE conditions] [ORDER BY attribute1,attribute2,...] [LIMIT k]

However, the statement is easier to read and to understand when displayed one clause per line:

Long statement displayed on multiple lines but with only one prompt
SELECT *|attribute_name 
    FROM vertex_type_name
    [WHERE conditions]
    [ORDER BY attribute1,attribute2,...]
    [LIMIT k]

Last updated