1 of 8

Part 1 - Data Definition & Loading

This work is licensed under a Creative Commons Attribution 4.0 International License.

Table of Contents

Defining a Graph Schema

Before data can be loaded into the graph store, the user must define a graph schema. A graph schema is a "dictionary" that defines the types of entities, vertices and edges , in the graph and how those types of entities are related to one another. Each vertex or edge type has a name and a set of attributes (properties) associated with it. For example, a Book vertex could have title, author, publication year, genre, and language attributes.

In the figure below, circles represent vertex types, and lines represent edge types. The labeling text shows the name of each type. This example has four types of vertices: User, Occupation, Book, and Genre . Also, the example has 3 types of edges: user_occupation, user_book_rating, and book_genre . Note that this diagram does not say anything about how many users or books are in the graph database. It also does not indicate the cardinality of the relationship. For example, it does not specify whether a User may connect to multiple occupations.

An edge connects two vertices; in TigerGraph terminology these two vertices are the source vertex and the target vertex . An edge type can be either directed or undirected . A directed edge has a clear semantic direction, from the source vertex to the target vertex. For example, if there is an edge type that represents a plane flight segment, each segment needs to distinguish which airport is the origin (source vertex) and which airport is the destination (target vertex). In the example schema below, all of the edges are undirected. A useful test to decide whether an edge should be directed or undirected is the following: "An edge type is directed if knowing there is a relationship from A to B does not tell me whether there is a relationship from B to A." Having nonstop service from Chicago to Shanghai does not automatically imply there is nonstop service from Shanghai to Chicago.

An expanded schema is shown below, containing all the original vertex and edge types plus three additional edge types: friend_of, sequel_of, and user_book_read . Note that friend_of joins a User to a User. The friendship is assumed to be bidirectional, so the edge type is undirected. Sequel_of joins a Book to a Book but it is directed, as evidenced by the arrowhead. The Two Towers is the sequel of The Fellowship of the Ring , but the reverse is not true. User_book_read is added to illustrate that there may be more than one edge type between a pair of vertex types.

The TigerGraph system user designs a graph schema to fit the source data and the user's needs and interests. The TigerGraph system user should consider what type of relationships are of interest and what type of analysis is needed. The TigerGraph system lets the user modify an existing schema, so the user is not locked into the initial design decision.

In the first schema diagram above, there are seven entities: four vertex types and three edge types.You may wonder why it was decided to make Occupation a separate vertex type instead of an attribute of User. Likewise, why is Genre a vertex type instead of an attribute of Book? These are examples of design choices. Occupation and Genre were separated out as vertex types because in graph analysis, if an attribute will be used as a query variable, it is often easier to work with as a vertex type.

Once the graph designer has chosen a graph schema, the schema is ready to be formalized into a series of GSQL statements.

Graph Creation and Modification Privileges

Only superuser and globaldesigner roles can define global vertex types. global edge types, and graphs, using CREATE VERTEX / EDGE / GRAPH. However, once a graph has been created, its admin and designer roles can customize its schema, including adding new local vertex types and local edge types, by using a SCHEMA_CHANGE JOB, described in the next section.

CREATE VERTEX

Available to superuser and globaldesigner roles only.

The CREATE VERTEX statement defines a new global vertex type, with a name and an attribute list. At a high level of abstraction, the format is

CREATE VERTEX vertex_type_name (id_and_attribute_list) [ vertex_options ]

More specifically, the syntax is as follows, assuming that the vertex ID is listed first:

CREATE VERTEX Syntax

CREATE VERTEX vertex_type_name (primary_id_name_type
    [, attribute_name type [DEFAULT default_value] ]*)
    [WITH [STATS="none"|"outdegree_by_edgetype"][primary_id_as_attribute="true"]]

Keys and Attributes

Beginning with v2.3, there are two syntaxes for specifying the primary id/key:

Legacy PRIMARY_ID syntax: The legacy syntax remains valid, but there are additional options and additional flexibility:

PRIMARY KEY syntax. This syntax is modeled after SQL.

PRIMARY_ID and WITH primary_id_as_attribute

The primary_id is a required field whose purpose is to uniquely identify each vertex instance. GSQL creates a hash index on the primary id with O(1) time complexity. Its data type may be STRING, INT, or UINT. The syntax for the primary_id_name_type term is as follows:

primary_id_name_type := PRIMARY_ID id_name id_type

NOTE: In default mode, the primary_id field is not one of the attribute fields. The purpose of this distinction is to minimize storage space for vertices. The functional consequence of this difference is that a query cannot read the primary_id or use it as part of an expression.

Beginning with v2.3:

The Primary_id can be treated as an attribute, if the clause WITH primary_id_as_attribute="true" is used with the CREATE VERTEX statement.
The primary_id designation can be used with any one of the attributes; it is not restricted to the first attribute.

Example:

CREATE VERTEX movie (PRIMARY_ID id UINT, name STRING, year UINT)
    WITH primary_id_as_attribute="true"

PRIMARY KEY

Instead of the legacy PRIMARY_ID syntax, starting with v2.3, GSQL now offers another option for specifying the primary key. The keyword phrase PRIMARY KEY may be appended to any one of the attributes in the attribute list, though it is conventional for it to be the first attribute. Each vertex instance must have a unique value for the primary key attribute. GSQL creates a hash index on the PRIMARY KEY attribute with O(1) time complexity. It is recommended that the primary key data type be STRING, INT, or UINT.

primary_id_name_type := id_name_id_type PRIMARY KEY

Note the differences between PRIMARY_ID and PRIMARY KEY:

"PRIMARY_ID" precedes the (name, type) pair. "PRIMARY KEY" follows the (name, type) pair.
In default mode, a PRIMARY_ID is not an attribute, but the WITH primary_id_as_attribute="true" clause can be used to make it an attribute. Alternately, the PRIMARY KEY is always an attribute; the WITH option is unneeded.

Example:

CREATE VERTEX movie (id UINT PRIMARY KEY, name STRING, year UINT)

PRIMARY KEY is not supported in GraphStudio. If you decide to use this feature, you will only be able to use command line interface.

COMPOSITE KEY

Beginning with v2.4, GSQL PRIMARY KEY supports composite keys - grouping multiple attributes to create a primary key for a specific vertex. Composite Key usage is similar to a single PRIMARY KEY, but rather than appending "PRIMARY KEY" after an attribute, the syntax is a bit different.

composite_id_name_type := PRIMARY KEY "(" attribute_name ("," attribute_name)* ")"

Example:

CREATE VERTEX movie (id UINT, title STRING, year UINT, PRIMARY KEY (title,year,id) )

COMPOSITE KEY is not supported in GraphStudio. If you decide to use this feature, you will only be able to use command line interface.

Vertex Attribute List

The attribute list, enclosed in parentheses, is a list of one or more id definitions and attribute descriptions separated by commas:

primary_id_name_type
[, attribute_name type [DEFAULT default_value ] ]*

The available attribute types, including user-defined tuples, are listed in the section Attribute Data Types.

Every attribute data type has a built-in default value (e.g., the default value for INT type is 0). The DEFAULT default_value option overrides the built-in value.
Any number of additional attributes may be listed after the primary_id attribute. Each attribute has a name, type, and optional default value (for primitive-type, DATETIME, or STRING COMPRESS attributes only)

Example:

Create vertex types for the graph schema of Figure 1.

Vertex definitions for User-Book-Rating graph

CREATE VERTEX User (PRIMARY_ID user_id UINT, name STRING, age UINT, gender STRING, postalCode STRING)
CREATE VERTEX Occupation (PRIMARY_ID occ_id UINT, occ_name STRING)
    WITH STATS="outdegree_by_edgetype"
CREATE VERTEX Book  (PRIMARY_ID bookcode UINT, title STRING, pub_year UINT)
    WITH STATS="none"
CREATE VERTEX Genre (PRIMARY_ID genre_id STRING, genre_name STRING)

Unlike the tables in a relational database, vertex types do not need to have a foreign key attribute for one vertex type to have a relationship to another vertex type. Such relationships are handled by edge types.

WITH STATS

By default, when the loader stores a vertex and its attributes in the graph store, it also stores some statistics about the vertex's outdegree – how many connections it has to other vertices. The optional WITH STATS clause lets the user control how much information is recorded. Recording the information in the graph store will speed up queries which need degree information, but it increases the memory usage. There are two* options. If "outdegree_by_edgetype" is chosen, then each vertex records a list of degree count values, one value for each type of edge in the schema. If "none" is chosen, then no degree statistics are recorded with each vertex. If the WITH STATS clause is not used, the loader acts as if "outdegree_by_edgetype" were selected.

The graph below has two types of edges between persons: phone_call and text. For Bobby, the "outdegree_by_edgetype" option records how many phone calls Bobby made (1) and how many text messages Bobby sent (2). This information can be retrieved using the built-in vertex function outdegree(). To get the outdegree of a specific edge type, provide the edgetype name as a string parameter. To get the total outdegree, omit the parameter.

CREATE EDGE

Available to superuser and globaldesigner roles only.

The CREATE EDGE statement defines a new global edge type. There are two forms of the CREATE EDGE statement, one for directed edges and one for undirected edges. Each edge type must specify that it connects FROM one vertex type TO another vertex type. Then additional attributes may be added. Each attribute follows the same requirements as described in the Attribute List subsection for the "CREATE VERTEX" section.

CREATE UNDIRECTED EDGE

CREATE UNDIRECTED EDGE edge_type_name (FROM vertex_type_name, TO vertex_type_name
  [, attribute_name type [DEFAULT default_value]]* )

CREATE DIRECTED EDGE

CREATE DIRECTED EDGE edge_type_name (FROM vertex_type_name, TO vertex_type_name
  [, attribute_name type [DEFAULT default_value]]* )
  [WITH REVERSE_EDGE="rev_name"]

Viewed at a higher level of abstraction, the format is

CREATE UNDIRECTED|DIRECTED EDGE edge_type_name (FROM vertex_type_name,
  TO vertex_type_name,
  edge_attribute_list ) [ edge_options ]

Note that edges do not have a PRIMARY_ID field. Instead, each edge is uniquely identified by a FROM vertex, a TO vertex, and optionally other attributes. The edge type may also be a distinguishing characteristic. For example, as shown in Figure 2 above, there are two types of edges between User and Book. Therefore, both types would have attribute lists which begin (FROM User, To Book,...).

Creating an Edge from or to Any Vertex Type

An edge type can be defined which connects FROM any type of vertex and/or TO any type of vertex. Use the wildcard symbol * to indicate "any vertex type". For example, the any_edge type below can connect from any vertex to any other vertex:

Wildcard edge type

CREATE DIRECTED EDGE any_edge (FROM *, TO *, label STRING)

WITH REVERSE_EDGE

If a CREATE DIRECTED EDGE statement includes the WITH REVERSE_EDGE=" rev_name " optional clause, then an additional directed edge type called " rev_name " is automatically created, with the FROM and TO vertices swapped. Moreover, whenever a new edge is created, a reverse edge is also created. The reverse edge will have the same attributes, and whenever the principal edge is updated, the corresponding reverse edge is also updated.

In a TigerGraph system, reverse edges provide the most efficient way to perform graph queries and searches that need to look "backwards". For example, referring to the schema of Figure 2, the query "What is the sequel of Book X, if it has one?" is a forward search, usingsequel_of edges. However, the query "Is Book X a sequel? If so, what Book came before X?" requires examining reverse edges.

Example:

Create undirected edges for the three edge types in Figure 1.

Edge definitions for User-Book-Rating graph

CREATE UNDIRECTED EDGE user_occupation (FROM User, TO Occupation)
CREATE UNDIRECTED EDGE book_genre (FROM Book, TO Genre)
CREATE UNDIRECTED EDGE user_book_rating (FROM User, TO Book, rating UINT, date_time UINT)

The user_occupation and book_genre edges have no attributes. A user_book_rating edge symbolizes that a user has assigned a rating to a book. Therefore it includes an additional attribute rating . In this case the rating attribute is defined to be an integer, but it could just as easily have been set to be a float attribute.

Example :

Create the additional edges depicted in Figure 2.

Additional Edge definitions for Expanded-User-Book-Rating graph

CREATE UNDIRECTED EDGE friend_of (FROM User, TO User, on_date UINT)
CREATE UNDIRECTED EDGE user_book_read (FROM User, To Book, on_date UINT)
CREATE DIRECTED EDGE sequel_of (FROM Book, TO Book) WITH REVERSE_EDGE="preceded_by"

Every time the GSQL loader creates a sequel_of edge, it will also automatically create a preceded_by edge, pointing in the opposite direction.

Special Options

The STRING COMPRESS and STRING_SET COMPRESS data types achieve compression by mapping each unique attribute value to a small integer. The mapping table ("this string" = "this integer") is called the dictionary. If two such attributes have the same or similar sets of possible values, then it is desirable to have them share one dictionary because it uses less storage space.

When a STRING COMPRESS attribute is declared in a vertex or edge, the user can optionally provide a name for the dictionary. Any attributes which share the same dictionary name will share the same dictionary. For example, v1.attr1, v1.attr2, and e.attr1 below share the same dictionary named "e1".

Shared STRING COMPRESS dictionaries

CREATE VERTEX v1 (PRIMARY_ID main_id STRING, att1 STRING COMPRESS e1, att2 STRING COMPRESS e1)
CREATE UNDIRECTED EDGE e (FROM v1, TO v2, att1 STRING COMPRESS e1)

CREATE GRAPH

Available to superuser and globaldesigner roles only.

Multiple Graph support

If the optional MultiGraph service is enabled, CREATE GRAPH can be invoked multiple times to define multiple graphs, and vertex types and edge types may be re-used (shared) among multiple graphs. There is an option to assign an admin use for the new graph.

After all the required vertex and edge types are created, the CREATE GRAPH command defines a graph schema which contains the given vertex types and edge types, and prepares the graph store to accept data. The vertex types and edge types may be listed in any order.

CREATE GRAPH syntax

CREATE GRAPH gname (vertex_or_edge_type, vertex_or_edge_type...) [WITH ADMIN username]

The optional WITH ADMIN clause sets the named user to be the admin for the new graph.

As a convenience, executing CREATE GRAPH will set the new graph to be the working graph.

Instead of providing a list of specific vertex types and edge types, it is also possible to define a graph type which includes all the available vertex types and edge types. It is also legal to create a graph with an empty domain. A SCHEMA_CHANGE can be used later to add vertex and edge types.

Examples of CREATE GRAPH with all vertex & edge types and with an empty domain.

CREATE GRAPH everythingGraph (*)
CREATE GRAPH emptyGraph ()

Examples :

Create graph Book_rating for the edge and vertex types defined for Figure 1:

Graph definition for User-Book-Rating graph

CREATE GRAPH Book_rating (*)

The following code example shows the full set of statements to define the expanded user-book-rating graph:

Full definition for the Expanded User-Book-Rating graph

CREATE VERTEX User (PRIMARY_ID user_id UINT, name STRING, age UINT, gender STRING, postalCode STRING)
CREATE VERTEX Occupation (PRIMARY_ID occ_id UINT, occ_name STRING)
    WITH STATS="outdegree_by_edgetype"
CREATE VERTEX Book  (PRIMARY_ID bookcode UINT, title STRING, pub_year UINT)
    WITH STATS="none"
CREATE VERTEX Genre (PRIMARY_ID genre_id STRING, genre_name STRING)
CREATE UNDIRECTED EDGE user_occupation (FROM User, TO Occupation)
CREATE UNDIRECTED EDGE book_genre (FROM Book, TO Genre)
CREATE UNDIRECTED EDGE user_book_rating (FROM User, TO Book, rating UINT, date_time UINT)
CREATE UNDIRECTED EDGE friend_of (FROM User, TO User, on_date UINT)
CREATE UNDIRECTED EDGE user_book_read (FROM User, To Book, on_date UINT)
CREATE DIRECTED EDGE sequel_of (FROM Book, TO Book) WITH REVERSE_EDGE="preceded_by"
CREATE GRAPH Book_rating (*)

USE GRAPH

New requirement for MultiGraph support. Applies even if only one graph exists.

Before a user can make use of a graph, first the user must be granted a role on that graph by an admin user of that graph or by a superuser. (Superusers are automatically granted the admin role on every graph). Second, for each GSQL session, the user must set a working graph. The USE GRAPH command sets or changes the user's working graph, for the current session.

For more about roles and privileges, see the document Managing User Privileges and Authentication.

USE GRAPH syntax

USE GRAPH gname

Instead of the USE GRAPH command, gsql can be invoked with the -g <graph_name> option.

DROP GRAPH

Available to superuser and globaldesigner roles only. The effect of this command takes into account shared domains.

DROP GRAPH syntax

DROP GRAPH gname

The DROP GRAPH deletes the logical definition of the named graph. Furthermore, it will also delete all local vertex or edge types. Local vertex and edge types are created by an ADD VERTEX/EDGE statement within a SCHEMA_CHANGE JOB and so belong only to that graph. To delete only selected vertex types or edge types, see DROP VERTEX | EDGE in the Section "Modifying a Graph Schema".

SHOW - View Parts of the Catalog

The SHOW command can be used to show certain aspects of the graph, instead of manually filtering through the entire graph schema when using the ls command. You can either type the exact identifier or use regular expression / Linux globbing to search.

SHOW <VERTEX> | <EDGE> | <JOB> | <QUERY> | <GRAPH> [ <name> | <glob> | -r <regex> ]

This feature supports the ? and * from linux globbing operations, and also regular expression matching. Usage of the feature is limited to the scope of the graph the user is currently in - if you are using a global graph, you will not be able to see vertices that are not included in your current graph.

Regular expression searching will not work with escaping characters.

To use regular expressions, you will need to use the -r flag after the part of the schema you wish to show. If you wish to dive deeper into regular expressions, visit "Java Patterns". The following are a few examples of what is supported by the SHOW command.

Linux Globbing examples
SHOW VERTEX us*            //shows all vertices that start with the letters "Us"
SHOW VERTEX co?*y          //shows the vertex that starts with co and ends with y      
SHOW VERTEX ?????          //shows all vertices that are 5 letters long

Regular Expression Examples
SHOW VERTEX -r "skil{2}"    //match the pattern "skill"
SHOW EDGE -r "test[1][13579]*"    //match pattern that only contains odd numbers after "test"
SHOW JOB -r "[a-zA-Z]*"     //match all jobs that contain only letters

Modifying a Graph Schema

After a graph schema has been created , it can be modified. Data already stored in the graph and which is not logically part of the change will be retained. For example, if you had 100 Book vertices and then added an attribute to the Book schema, you would still have 100 Books, with default values for the new attribute. If you dropped a Book attribute, you still would have all your books, but one attribute would be gone.

To safely update the graph schema, the user should follow this procedure:

Create a SCHEMA_CHANGE JOB, which defines a sequence of ADD, ALTER and/or DROP statements.
Run the SCHEMA_CHANGE JOB (i.e. RUN SCHEMA_CHANGE JOB job_name), which will do the following:
- Attempt the schema change.
- If the change is successful, invalidate any loading job or query definitions which are incompatible with the new schema.
- if the change is unsuccessful, report the failure and return to the state before the attempt.

A schema change will invalidate any loading jobs or query jobs which relate to an altered part of the schema. Specifically:

A loading job becomes invalid if it refers to a vertex or and an edge which has been dropped (deleted) or altered .
A query becomes invalid if it refers to a vertex, and edge, or an attribute which has been dropped .

Invalid loading jobs are dropped, and invalid queries are uninstalled. After the schema update, the user will need to create and install new load and query jobs based on the new schema.

Jobs and queries for unaltered parts of the schema will still be available and do not need to be reinstalled. However, even though these jobs are valid (e.g., they can be run), the user may wish to examine whether they still perform the preferred operations (e.g., do you want to run them?)

Load or query operations which begin before the schema change will be completed based on the pre-change schema. Load or query operations which begin after the schema change, and which have not been invalidated, will be completed based on the post-change schema.

Global vs. Local Schema Changes

Only a superuser or globaldesigner can add, alter, or drop global vertex types or global edge types, which are those that are created using CREATE VERTEX or CREATE ... EDGE. This rule applies even if the vertex or edge type is used in only one graph. To make these changes, the user uses a GLOBAL SCHEMA_CHANGE JOB.

An admin or designer user can add, alter, or drop local vertex types or local edge types which are created in the context of that graph. Local vertex and edge types are created using an ADD statement inside a SCHEMA_CHANGE JOB. To alter or drop any of these local types, the admin user uses a regular SCHEMA_CHANGE JOB.

Local graphs can define vertex and edge types independently of the vertex and edge types in other graph. That is, the same name can be used in different graphs for (different) vertex or edge types.

It is even permitted for a local graph and the global graph to use the same name for their own vertex or edge types, as long as the global vertex/edge type is not used within the local graph.

The two types of schema change jobs are described below.

CREATE SCHEMA_CHANGE JOB (local)

The CREATE SCHEMA_CHANGE JOB block defines a sequence of ADD, ALTER, and DROP statements for changing a particular graph. It does not perform the schema change.

CREATE SCHEMA_CHANGE JOB syntax

CREATE SCHEMA_CHANGE JOB job_name FOR GRAPH graph_name {
    [sequence of DROP, ALTER, and ADD statements, each line ending with a semicolon]
}

One use of CREATE SCHEMA_CHANGE JOB is to define an additional vertex type and edge type to be the structure for a secondary index. For example, if you wanted to index the postalCode attribute of the User vertex, you could create a postalCode_idx (PRIMARY_ID id string, code string) vertex type and hasPostalCode (FROM User, TO postalCode_idx) edge type. Then create an index structure having one edge from each User to a postalCode_idx vertex.

By its nature, a SCHEMA_CHANGE JOB may contain multiple statements. If the job block is used in the interactive GSQL shell, then the BEGIN and END commands should be used to permit the SCHEMA_CHANGE JOB to be entered on several lines. if the job is stored in a command file to be read in batch mode, then BEGIN and END are not needed.

Remember to include a semicolon at the end of each DROP, ALTER, or ADD statement within the JOB block.

If a SCHEMA_CHANGE JOB defines a new edge type which connects to a new vertex type, the ADD VERTEX statement should precede the related ADD EDGE statement. However, the ADD EDGE and ADD VERTEX statements can be in the same SCHEMA_CHANGE JOB.

ADD VERTEX | EDGE (local)

The ADD statement defines a new type of vertex or edge and automatically adds it to a graph schema. The syntax for the ADD VERTEX | EDGE statement is analogous to that of the CREATE VERTEX | EDGE | GRAPH statements. It may only be used within a SCHEMA_CHANGE JOB.

ADD VERTEX / UNDIRECTED EDGE / DIRECTED EDGE

ADD VERTEX v_type_name (PRIMARY_ID id type [',' attribute_list]) [WITH STATS '=' "none"|"outdegree_by_edgetype"]; 
ADD UNDIRECTED EDGE e_type_name (FROM v_type_name',' TO v_type_name [',' edge_attribute_list])
ADD DIRECTED EDGE e_type_name (FROM v_type_name',' TO v_type_name [',' edge_attribute_list])
    [WITH REVERSE_EDGE '=' "rev_name"];

ALTER VERTEX | EDGE

The ALTER statement is used to add attributes to or remove attributes from an existing vertex type or edge type. It can also be used to add or remove source (FROM) vertex types or destination (TO) vertex types of an edge type. It may only be used within a SCHEMA_CHANGE JOB. The basic format is as follows:

ALTER VERTEX / EDGE

ALTER VERTEX|EDGE object_type_name ADD|DROP (attribute_list);

ALTER ... ADD

Added attributes are appended to the end of the schema. The new attributes may include DEFAULT fields. To add attributes to a vertex type, the syntax is as follows:

ALTER VERTEX vertex_type_name ADD
    ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* );

For example:

ALTER VERTEX Company ADD ATTRIBUTE (industry
STRING, marketcap DOUBLE)

To add to an edge's attributes, the syntax is as follows:

ALTER EDGE edge_type_name ADD 
    [ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* )];

For example:

ALTER EDGE Like ADD TO (Animal) ATTRIBUTE (suggested_by STRING)

ALTER ... DROP

The syntax for ALTER ... DROP is analogous to that of ALTER ... ADD.

ALTER ... DROP

ALTER VERTEX|EDGE object_type_name DROP ATTRIBUTE (
      attribute_name [',' attribute_name]* );

DROP VERTEX | EDGE (local)

The DROP statement removes the specified vertex type or edge type from the database dictionary. The DROP statement should only be used when graph operations are not in progress.

drop vertex / edge

DROP VERTEX v_type_name [',' v_type_name]*
DROP EDGE e_type_name [',' e_type_name]*

DROP TUPLE

For tuples that are defined within a graph schema, you can drop them by using the following command.

drop tuple

DROP TUPLE tuple_name [',' tuple_name]*

RUN SCHEMA_CHANGE JOB

RUN SCHEMA_CHANGE JOB job_name performs the schema change job. After the schema has been changed, the GSQL system checks all existing GSQL queries (described in "GSQL Language Reference, Part 2: Querying"). If an existing GSQL query uses a dropped vertex, edge, or attribute, the query becomes invalid, and GSQL will show the message "Query query_name becomes invalid after schema update, please update it.".

Below is an example. The schema change job add_reviews adds a Review vertex type and two edge types to connect reviews to users and books, respectively.

SCHEMA_CHANGE JOB example

USE GRAPH Book_rating
CREATE SCHEMA_CHANGE JOB add_reviews FOR GRAPH Book_rating {
    ADD VERTEX Review (PRIMARY_ID id UINT, review_date DATETIME, url STRING);
    ADD UNDIRECTED EDGE wrote_review (FROM User, TO Review);
    ADD UNDIRECTED EDGE review_of_book (FROM Review, TO Book);
}
RUN SCHEMA_CHANGE JOB add_reviews

DROP SCHEMA_CHANGE JOB

To drop (remove) a schema change job, run DROP JOB schema_change_job name from the GSQL shell. The specific schema change job will be removed from GSQL. Refer to the Creating a Loading Job page for more information about dropping jobs.

GSQL > USE GRAPH Book_rating
GSQL > drop job local_schema_change123
The job local_schema_change_change123 is dropped!

USE GLOBAL

The USE GLOBAL command changes a superuser's mode to Global mode. In global mode, a superuser can define or modify global vertex and edge types, as well as specifying which graphs use those global types. For example, the user should run USE GLOBAL before creating or running a GLOBAL SCHEMA_CHANGE JOB.

CREATE GLOBAL SCHEMA_CHANGE JOB

The CREATE GLOBAL SCHEMA_CHANGE JOB block defines a sequence of ADD, ALTER, and DROP statements which modify either the attributes or the graph membership of global vertex or edge types. Unlike the non-global schema_change job, the header does not include a graph name. However, the ADD/ALTER/DROP statements in the body do mention graphs.

CREATE GLOBAL SCHEMA_CHANGE JOB syntax

CREATE GLOBAL SCHEMA_CHANGE JOB job_name {
    [sequence of global DROP, ALTER, and ADD statements, each line ending with a semicolon]
}

Those both global and local schema change jobs have ADD and DROP statements, they have different meanings. The table below outlines the differences.

Remember to include a semicolon at the end of each DROP, ALTER, or ADD statement within the JOB block.

ADD VERTEX | EDGE (global)

The ADD statement adds existing global vertex or edge types to one of the graphs.

ADD VERTEX / UNDIRECTED EDGE / DIRECTED EDGE (Global)

ADD VERTEX v_type_name [',' v_type_name...] TO GRAPH gname;
ADD EDGE e_type_name [',' e_type_name...] TO GRAPH gname;

ALTER VERTEX | EDGE

The ALTER statement is used to add attributes to or remove attributes from an existing global vertex type or edge type. The ALTER VERTEX / EDGE syntax for global schema changes is the same as that for local schema change jobs.

ALTER VERTEX / EDGE

ALTER VERTEX|EDGE object_type_name ADD|DROP (attribute_list);

ALTER ... ADD

Added attributes are appended to the end of the schema. The new attributes may include DEFAULT fields. To add attributes to a vertex type, the syntax is as follows:

ALTER VERTEX vertex_type_name ADD
    ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* );

For example:

ALTER VERTEX Company ADD ATTRIBUTE (industry
STRING, marketcap DOUBLE)

To add to an edge's endpoint vertex types or attributes, the syntax is as follows:

ALTER EDGE edge_type_name ADD 
    [FROM (vertex_type_name [','vertex_type_name])]
    [TO (vertex_type_name [','vertex_type_name])]
    [ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* )];

For example:

ALTER EDGE Like ADD TO (Animal) ATTRIBUTE (suggested_by STRING)

ALTER ... DROP

The syntax for ALTER ... DROP is analogous to that of ALTER ... ADD.

ALTER ... DROP

ALTER VERTEX|EDGE object_type_name DROP ATTRIBUTE (
      attribute_name [',' attribute_name]* );

DROP VERTEX | EDGE (global)

The DROP statement removes specified global vertex or edge types from one of the graphs. The command does not delete any data.

drop vertex / edge

DROP VERTEX v_type_name [',' v_type_name...] FROM GRAPH gname;
DROP EDGE e_type_name   [',' e_type_name...] FROM GRAPH gname;

RUN GLOBAL SCHEMA_CHANGE JOB

RUN GLOBAL SCHEMA_CHANGE JOB job_name performs the global schema change job. After the schema has been changed, the GSQL system checks all existing GSQL queries (described in "GSQL Language Reference, Part 2: Querying"). If an existing GSQL query uses a dropped vertex, edge, or attribute, the query becomes invalid, and GSQL will show the message "Query query_name becomes invalid after schema update, please update it.".

Below is an example. The schema change alter_friendship_make_library drops the on_date attribute from the friend_of edge and adds Book type to the library graph.

GLOBAL SCHEMA_CHANGE JOB example

USE GLOBAL
CREATE GRAPH library()
CREATE GLOBAL SCHEMA_CHANGE JOB alter_friendship_make_library {
    ALTER EDGE friend_of DROP ATTRIBUTE (on_date);
    ADD VERTEX Book TO GRAPH library;
}
RUN GLOBAL SCHEMA_CHANGE JOB alter_friendship_make_library

DROP GLOBAL SCHEMA_CHANGE JOB

Global schema change jobs can be dropped by using the DROP JOB command. Refer to the Creating a Loading Job page for more information about dropping jobs.

DROP GLOBAL SCHEMA_CHANGE JOB example

USE GLOBAL
DROP JOB alter_friendship_make_library

Running a Loading Job

Clearing and Initializing the Graph Store

There are two aspects to clearing the system: flushing the data and clearing the schema definitions in the catalog. Two different commands are available.

CLEAR GRAPH STORE

Available only to superusers.

The CLEAR GRAPH STORE command flushes all the data out of the graph store (database). By default, the system will ask the user to confirm that you really want to discard all the graph data. To force the clear operation and bypass the confirmation question, use the -HARD option, e.g.,

Clearing the graph store does not affect the schema.

Use the -HARD option with extreme caution. There is no undo option. -HARD must be in all capital letters.
CLEAR GRAPH STORE stops all the TigerGraph servers (GPE, GSE, RESTPP, Kafka, and Zookeeper).
Loading jobs and queries are aborted.

DROP ALL

Available only to superusers.

The DROP ALL statement clears the graph store and removes all definitions from the catalog: vertex types, edge types, graph types, jobs, and queries.

Running a Loading Job

Running a loading job executes a previously installed loading job. The job reads lines from an input source, parses each line into data tokens, and applies loading rules and conditions to create new vertex and edge instances to store in the graph data store.

TigerGraph 2.0 introduces enhanced data loading with slightly modified syntax for CREATE and RUN statements. The previous RUN JOB syntaxes for v1.x online loading and offline loading and still supported for backward compatibility. Additionally, loading jobs can also be run by directly submitted a HTTP request to the REST++ server.

RUN LOADING JOB

pre-v2.0 RUN JOB syntax is deprecated

As of v2.0, RUN LOADING JOB is the preferred syntax for running all loading jobs. The pre-v2.0 syntaxes for running online post jobs and offline loading jobs are still support for now but are deprecated.

Note that the keyword LOADING is included. This makes it more clear to users and to GSQL that the job is a loading job and not some other type of job ( such as a SCHEMA_CHANGE JOB).

When a concurrent loading job is submitted, it is assigned a job ID number, which is displayed on the GSQL console. The user can use this job ID to refer to the job, for a status update, to abort the job, or to re-start the job. These operations are described later in this section.

Options

-noprint

By default, the command will print several several lines of status information while the loading is running. If the -noprint option is included, the job will run omit the progress and summary details, but it will still display the job id and the location of the log file.

-dryrun

If -dryrun is used, the system will read the data files and process the data as instructed by the job, but will NOT load any data into the graph. This option can be a useful diagnostic tool.

-n [i], j

The -n option limits the loading job to processing only a range of lines of each input data file. The -n flag accepts one or two arguments. For example, -n 50 means read lines 1 to 50. -n 10, 50 means read lines 10 to 50. The special symbol $ is interpreted as "last line", so -n 10,$ means reads from line 10 to the end.

filevar list

The optional USING clause may contain a list of file variables. Each file variable may optionally be assigned a filepath_string , obeying the same format as in the CREATE LOADING JOB. This list of file variables determines which parts of a loading job are run and what data files are used.

When a loading job is compiled, it generates one RESTPP endpoint for each filevar and filepath_string . As a consequence, a loading job can be run in parts. When RUN LOADING JOB is executed, only those endpoints whose filevar or file identifier (" __GSQL_FILENAME_n__") is mentioned in the USING clause will be used. However, if the USING clause is omitted, then the entire loading job will be run.
If a filepath_string is given, it overrides the filepath_string defined in the loading job. If a particular filevar is not assigned a filepath_string either in the loading job or in the RUN LOADING JOB statement, then an error is reported and the job exits.

CONCURRENCY

The CONCURRENCY parameter sets the maximum number of concurrent requests that the loading job may send to the GPE. The default is 256.

BATCH_SIZE

The BATCH_SIZE parameter sets the number of data lines included in each concurrent request sent to the GPE. The default is 1024.

Running Loading Jobs as REST Requests

Another way to run a loading job is to submit an HTTP request to the POST /ddl/<graph_name> endpoint of the REST++ server. Since the REST++ server has more direct access to the graph processing engine, this can execute more quickly than a RUN LOADING JOB statement in GSQL.

When a CREATE LOADING JOB block is executed, the GSQL system creates one REST endpoint for each file source. Therefore, one REST request can invoke loading for one file source at a time. Running an entire loading job may take more than one REST request.

The Linux curl command is a handy way to make HTTP requests. If the data size is small, it can be included directly in the command line by using the -d flag with a data string:

If the data size is large, it is better to reference the data filename, using the --data-binary flag:

<filepath> should be replaced with either a file variable (from a DEFINE FILENAME statement) or a position-based file identifier ("__GSQL_FILENAME_n__") for an explicit filepath_string.

Example : The code block below shows three equivalent commands for the same loading job. The first uses the gsql command RUN JOB. The second uses the Linux curl command to support a HTTP request, placing the parameter values in the URL's query string. T he third gives the parameter values through the curl command's data payload -d option.

Inspecting and Managing Loading Jobs

Starting with v2.0, there are now commands to checking loading job status, to abort a loading job and to restart a loading job.

Job ID and Status

When a loading job starts, the GSQL server assigns it a job id and displays it for the user to see. The job id format is typically the name of the graph, followed by the machine alias, following by a code number, e.g., gsql_demo_m1.1525091090494

By default, an active loading job will display periodic updates of its progress. There are two ways to inhibit these automatic output displays:

Run the loading job with the -noprint option.
After the loading job has started, enter CTRL+C. This will abort the output display process, but the loading job will continue.

SHOW LOADING STATUS

The command SHOW LOADING JOB shows the current status of either a specified loading job or all current jobs:

The display format is the same as that displayed during the periodic progress updates of the RUN LOADING JOB command. If you do not know the job id, but you know the job name and possibly the machine, then the ALL option is a handy way to see a list of active job ids.

ABORT LOADING JOB

The command ABORT LOADING JOB aborts either a specified load job or all active loading jobs:

The output will show a summary of aborted loading jobs.

RESUME LOADING JOB

The command RESUME LOADING JOB will restart a previously-run job which ended for some reason before completion.

If the job is finished, this command will do nothing. The RESUME command should pick up where the previous run ended; that is, it should not load the same data twice.

Verifying and Debugging a Loading Job

Every loading job creates a log file. When the job starts, it will display the location of the log file. Typically, the file is located at

<TigerGraph.root.dir>/logs/restpp/restpp_loader_logs/<graph_name>/<job_id>.log

This file contains the following information which most users will find useful:

A list of all the parameter and option settings for the loading job
A copy of the status information that is printed
Statistics report on the number of lines successfully read and parsed

The statistics report include how many objects of each type is created, and how many lines are invalid due to different reasons. This report also shows which lines cause the errors. Here is the list of statistics shown in the report. There are two types of statistics. One is file level (the number of lines), and the other is data object level (the number of objects). If an file level error occurs, e.g., a line does not have enough columns, this line of data is skipped for all LOAD statements in this loading job. If an object level error or failed condition occurs, only the corresponding object is not created, i.e., all other objects in the same loading job are still created if no object level error or failed condition for each corresponding object.

Note that failing a WHERE clause is not necessarily a bad result. If the user's intent for the WHERE clause is to select only certain lines, then it is natural for some lines to pass and some lines to fail.

Below is an example.

The above loading job and data generate the following report

There are a total of 7 data lines. The report shows that

Six of the lines are valid data lines
One line (Line 7) does not have enough tokens.

Of the 6 valid lines,

Three of the 6 valid lines generate valid movie vertices.
One line has an invalid attribute (Line 1: year)
Two lines (Lines 4 and 5) do not pass the WHERE clause.

Appendix

Keywords & Reserved Words

The following words are reserved for use by the Data Definition Language. That is, a graph schema or loading job may not use any of these words for a user-defined identifier, for the name of a vertex type, edge type, graph, or attribute. A separate list of reserved keywords exists for the Query language here. The compiler will reject the use of a Reserved Word as a user-defined identifier.

_SUBSTRING         ABORT              ACCESS             ADD                
ADMIN              AFTER              ALL                ALLOCATE           
ALTER              ANALYZE            AND                ANY                
APPROX_COUNT       ARCHIVE            ARE                ARRANGE            
ARRAY              AS                 ASC                ASENSITIVE         
ASYMMETRIC         AT                 ATOMIC             ATTRIBUTE          
AUTHORIZATION      AV                 AVG                BAG                
BEFORE             BEGIN              BETWEEN            BIGINT             
BINARY             BINSTORAGE         BLOB               BOOL               
BOOLEAN            BOTH               BUCKET             BUCKETS            
BY                 BYTEARRAY          CACHE              CALL               
CALLED             CASCADE            CASCADED           CASE               
CAST               CAT                CD                 CHANGE             
CHAR               CHARACTER          CHARARRAY          CHECK              
CLEAR              CLOB               CLOSE              CLUSTER            
CLUSTERED          CLUSTERSTATUS      COGROUP            COLLATE            
COLLECTION         COLUMN             COLUMNS            COMMIT             
COMPACT            COMPACTIONS        COMPRESS           COMPUTE            
CONCAT             CONCATENATE                           CONF               
CONNECT            CONST              CONSTRAINT         CONTINUE           
COPYFROMLOCAL      COPYTOLOCAL        CORRESPONDING      COUNT              
CP                 CREATE             CROSS              CUBE               
CURRENT            CURRENT_DATE       CURRENT_PATH       CURRENT_ROLE       
CURRENT_TIME       CURRENT_TIMESTAMP  CURRENT_USER       CURSOR             
CYCLE              DATA               DATA_SOURCE        DATABASE           
DATABASES          DATE               DATETIME           DAY                
DBPROPERTIES       DD                 DEALLOCATE         DEC                
DECIMAL            DECLARE            DECRYPT            DEFAULT            
DEFERRED           DEFINE             DEFINED            DELETE             
DELIMITED          DEPENDENCY         DEREF              DESC               
DESCRIBE           DETERMINISTIC      DIFF               DIRECTED           
DIRECTORIES        DIRECTORY          DISABLE            DISCONNECT         
DISTINCT           DISTRIBUTE         DISTRIBUTED        DM                 
DO                 DOUBLE             DROP               DRYRUN             
DU                 DUMP               DYNAMIC            EACH               
EDGE               ELEMENT            ELEM_TYPE          ELSE               
ELSEIF             EMPTY              ENABLE             END                
EOL                ESCAPE             ESCAPED            EVAL               
EXCEPT             EXCHANGE           EXCLUSIVE          EXEC               
EXECUTE            EXISTS             EXIT               EXPLAIN            
EXPORT             EXTENDED           EXTERN             EXTERNAL           
FALSE              FETCH              FIELDS             FILE               
FILEFORMAT         FILENAME           FILTER             FIRST              
FIXED_BINARY       FLATTEN            FLATTEN_JSON_ARRAY FLOAT              
FOLLOWING          FOR                FOREACH            FOREIGN            
FORMAT             FORMATTED          FREE               FROM               
FULL               FUNCTION           FUNCTIONS          GENERATE           
GET                GLOBAL             GPATH              GPATH_QUERY        
GQL                GQUERY             GRANT              GRAPH              
GROUP              GROUPING           GSHELL             HANDLER            
HARD               HASH_PARTITION     HAVING             HEADER             
HELP               HOLD               HOLD_DDLTIME       HOST_GRAPH         
HOUR               ICON               IDENTIFIED         IDENTITY           
IDXPROPERTIES      IF                 IGNORE             ILLUSTRATE         
IMMEDIATE          IMPORT             IN                 INCREMENTAL        
INDEX              INDEXES            INDICATOR          INIT               
INNER              INOUT              INPATH             INPUT              
INPUTDRIVER        INPUTFORMAT        INPUT_LINE_FILTER  INSENSITIVE        
INSERT             INSTALL            INT                INT16              
INT32              INT32_T            INT64_T            INT8               
INTEGER            INTERPRET          INTERSECT          INTERVAL           
INTO               INT_LIST           INT_SET            IS                 
ITEMS              ITERATE            JAR                JOB                
JOIN               JSON               KAFKA              KEY                
KEYS               KEY_TYPE           KILL               LANGUAGE           
LARGE              LATERAL            LEADING            LEAVE              
LEFT               LESS               LIKE               LIMIT              
LINES              LISTLOAD           LOADING            LOCAL              
LOCALTIME          LOCALTIMESTAMP                        LOCK               
LOCKS              LOGICAL            LONG               LOOP               
LS                 MACRO              MAP                MAPJOIN           
MATCH              MATCHES            MATERIALIZED       MAX                
MEMBER             MERGE              METHOD             MIN                
MINUS              MINUTE             MKDIR              MODIFIES           
MODULE             MONTH              MSCK               MULTISET           
MV                 NATIONAL           NATURAL            NCHAR              
NCLOB              NEW                NO                 NOPRINT                 
NONE               NOSCAN             NOT                NO_DROP            
NULL               NUMERIC            OF                 OFFLINE
OFFLINE2ONLINE     OLD                ON                 ONLINE_POST        
ONLY               ONSCHEMA           OPEN               OPTIMIZE           
OPTION             OR                 ORDER              OUT                
OUTER              OUTPUT             OUTPUTDRIVER       OUTPUTFORMAT       
OVER               OVERLAPS           OVERWRITE          OWNER              
PARALLEL           PARAMETER          PARTIALSCAN        PARTITION          
PARTITIONED        PARTITIONS         PASSWORD           PERCENT            
PIG                PIGDUMP            PIGSTORAGE         PLUS               
PRECEDING          PRECISION          PREPARE            PRESERVE           
PRETTY             PRIMARY            PRIMARY_ID         PRINCIPALS         
PROCEDURE          PROTECTION         PROXY              PURGE              
PWD                QUERY              QUIT               QUOTE              
RANGE              RANGE_PARTITION    READ               READONLY           
READS              REAL               REBUILD            RECOMPILE            
RECORDREADER       RECORDWRITER       RECURSIVE          REDUCE             
REF                REFERENCES         REFERENCING        REFRESH            
REGEXP             REGISTER           RELEASE            RENAME             
REPAIR             REPEAT             REPLACE            RESIGNAL           
RESTRICT           RESULT             RESUME             RETURN             
RETURNS            REVERSE_EDGE       REVOKE             REWRITE            
RIGHT              RLIKE              RM                 RMF                
ROLE               ROLES              ROLLBACK           ROLLUP             
ROW                ROWS               RUN                S3                 
SAMPLE             SAVEPOINT          SCHEMA             SCHEMAS            
SCHEMA_CHANGE      SCOPE              SCROLL             SEARCH             
SECOND             SECONDARY_ID       SECRET             SELECT             
SEMI               SENSITIVE          SEPARATOR          SERDE              
SERDEPROPERTIES    SERVER             SESSION_USER       SET                
SETS               SHARED             SHIP               SHOW               
SHOW_DATABASE      SIGNAL             SIMILAR            SIZE               
SKEWED             SMALLINT           SOME               SORT               
SORTED             SPECIFIC           SPECIFICTYPE       SPLIT              
SQL                SQLEXCEPTION       SQLSTATE           SQLWARNING         
SSL                START              START_ID           STATIC             
STATISTICS         STATUS             STATS              STDERR             
STDIN              STDOUT             STORE              STORED             
STREAM             STREAMTABLE        STRING             STRING_LIST        
STRING_SET         STRUCT             SUBMULTISET        SUM                
SYMMETRIC          SYSTEM             SYSTEM_USER        SYS.FILE_NAME
SYS.INTERNAL_ID    TABLE              TABLES             TABLESAMPLE        
TBLPROPERTIES      TEMPORARY          TEMP_TABLE         TERMINATED         
TEXTLOADER         THEN               THROUGH            TIME               
TIMESTAMP          TIMEZONE_HOUR      TIMEZONE_MINUTE    TINYINT            
TO                 TOKEN              TOKENIZE           TOKEN_LEN          
TOUCH              TO_FLOAT           TO_INT             TRAILING           
TRANSACTION        TRANSACTIONS       TRANSFORM          TRANSLATION        
TREAT              TRIGGER            TRUE               TRUNCATE           
TUPLE              TYPE               TYPEDEF            UDF_PARTITION      
UINT               UINT16             UINT32             UINT32_T           
UINT64_T           UINT8              UINT8_T            UINT_SET           
UNARCHIVE          UNBOUNDED          UNDIRECTED         UNDO               
UNION              UNIONTYPE          UNIQUE             UNIQUEJOIN         
UNKNOWN            UNLOCK             UNNEST             UNSET              
UNSIGNED           UNTIL              UPDATE             UPSERT             
URI                USE                USER               USING              
UTC                UTCTIMESTAMP       VAL                VALUE              
VALUES             VALUE_TYPE         VARCHAR            VARYING            
VECTOR             VERSION            VERTEX             VIEW               
VOID               WHEN               WHENEVER           WHERE              
WHILE              WINDOW             WITH               WITHIN             
WITHOUT            YEAR               CURRENT_DEFAULT_TRANSFORM_GROUP       
CURRENT_TRANSFORM_GROUP_FOR_TYPE      INT32_INT32_KV_LIST                   
UINT32_UDT_KV_LIST                    UINT32_UINT32_KV_LIST

GSQL Start-to-End Process and Data Flow

The figures below illustrates the sequence of steps and the dependencies to progress from no graph to a loaded graph and a query result, for TigerGraph platform version 0.8 and higher. Note that online and offline follow the same flow.

Modifying a Graph Schema

To safely update the graph schema, the user should follow this procedure:

Create a SCHEMA_CHANGE JOB, which defines a sequence of ADD, ALTER and/or DROP statements.
Run the SCHEMA_CHANGE JOB (i.e. RUN SCHEMA_CHANGE JOB job_name), which will do the following:
- Attempt the schema change.
- If the change is successful, invalidate any loading job or query definitions which are incompatible with the new schema.
- if the change is unsuccessful, report the failure and return to the state before the attempt.

A schema change will invalidate any loading jobs or query jobs which relate to an altered part of the schema. Specifically:

A loading job becomes invalid if it refers to a vertex or and an edge which has been dropped (deleted) or altered .
A query becomes invalid if it refers to a vertex, and edge, or an attribute which has been dropped .

Invalid loading jobs are dropped, and invalid queries are uninstalled. After the schema update, the user will need to create and install new load and query jobs based on the new schema.

Global vs. Local Schema Changes

Local graphs can define vertex and edge types independently of the vertex and edge types in other graph. That is, the same name can be used in different graphs for (different) vertex or edge types.

It is even permitted for a local graph and the global graph to use the same name for their own vertex or edge types, as long as the global vertex/edge type is not used within the local graph.

The two types of schema change jobs are described below.

CREATE SCHEMA_CHANGE JOB (local)

The CREATE SCHEMA_CHANGE JOB block defines a sequence of ADD, ALTER, and DROP statements for changing a particular graph. It does not perform the schema change.

CREATE SCHEMA_CHANGE JOB syntax

CREATE SCHEMA_CHANGE JOB job_name FOR GRAPH graph_name {
    [sequence of DROP, ALTER, and ADD statements, each line ending with a semicolon]
}

Remember to include a semicolon at the end of each DROP, ALTER, or ADD statement within the JOB block.

ADD VERTEX | EDGE (local)

ADD VERTEX / UNDIRECTED EDGE / DIRECTED EDGE

ADD VERTEX v_type_name (PRIMARY_ID id type [',' attribute_list]) [WITH STATS '=' "none"|"outdegree_by_edgetype"]; 
ADD UNDIRECTED EDGE e_type_name (FROM v_type_name',' TO v_type_name [',' edge_attribute_list])
ADD DIRECTED EDGE e_type_name (FROM v_type_name',' TO v_type_name [',' edge_attribute_list])
    [WITH REVERSE_EDGE '=' "rev_name"];

ALTER VERTEX | EDGE

ALTER VERTEX / EDGE

ALTER VERTEX|EDGE object_type_name ADD|DROP (attribute_list);

ALTER ... ADD

Added attributes are appended to the end of the schema. The new attributes may include DEFAULT fields. To add attributes to a vertex type, the syntax is as follows:

ALTER VERTEX vertex_type_name ADD
    ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* );

For example:

ALTER VERTEX Company ADD ATTRIBUTE (industry
STRING, marketcap DOUBLE)

To add to an edge's attributes, the syntax is as follows:

ALTER EDGE edge_type_name ADD 
    [ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* )];

For example:

ALTER EDGE Like ADD TO (Animal) ATTRIBUTE (suggested_by STRING)

ALTER ... DROP

The syntax for ALTER ... DROP is analogous to that of ALTER ... ADD.

ALTER ... DROP

ALTER VERTEX|EDGE object_type_name DROP ATTRIBUTE (
      attribute_name [',' attribute_name]* );

DROP VERTEX | EDGE (local)

The DROP statement removes the specified vertex type or edge type from the database dictionary. The DROP statement should only be used when graph operations are not in progress.

drop vertex / edge

DROP VERTEX v_type_name [',' v_type_name]*
DROP EDGE e_type_name [',' e_type_name]*

DROP TUPLE

For tuples that are defined within a graph schema, you can drop them by using the following command.

drop tuple

DROP TUPLE tuple_name [',' tuple_name]*

RUN SCHEMA_CHANGE JOB

Below is an example. The schema change job add_reviews adds a Review vertex type and two edge types to connect reviews to users and books, respectively.

SCHEMA_CHANGE JOB example

USE GRAPH Book_rating
CREATE SCHEMA_CHANGE JOB add_reviews FOR GRAPH Book_rating {
    ADD VERTEX Review (PRIMARY_ID id UINT, review_date DATETIME, url STRING);
    ADD UNDIRECTED EDGE wrote_review (FROM User, TO Review);
    ADD UNDIRECTED EDGE review_of_book (FROM Review, TO Book);
}
RUN SCHEMA_CHANGE JOB add_reviews

DROP SCHEMA_CHANGE JOB

GSQL > USE GRAPH Book_rating
GSQL > drop job local_schema_change123
The job local_schema_change_change123 is dropped!

USE GLOBAL

CREATE GLOBAL SCHEMA_CHANGE JOB

CREATE GLOBAL SCHEMA_CHANGE JOB syntax

CREATE GLOBAL SCHEMA_CHANGE JOB job_name {
    [sequence of global DROP, ALTER, and ADD statements, each line ending with a semicolon]
}

Those both global and local schema change jobs have ADD and DROP statements, they have different meanings. The table below outlines the differences.

Remember to include a semicolon at the end of each DROP, ALTER, or ADD statement within the JOB block.

ADD VERTEX | EDGE (global)

The ADD statement adds existing global vertex or edge types to one of the graphs.

ADD VERTEX / UNDIRECTED EDGE / DIRECTED EDGE (Global)

ADD VERTEX v_type_name [',' v_type_name...] TO GRAPH gname;
ADD EDGE e_type_name [',' e_type_name...] TO GRAPH gname;

ALTER VERTEX | EDGE

ALTER VERTEX / EDGE

ALTER VERTEX|EDGE object_type_name ADD|DROP (attribute_list);

ALTER ... ADD

Added attributes are appended to the end of the schema. The new attributes may include DEFAULT fields. To add attributes to a vertex type, the syntax is as follows:

ALTER VERTEX vertex_type_name ADD
    ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* );

For example:

ALTER VERTEX Company ADD ATTRIBUTE (industry
STRING, marketcap DOUBLE)

To add to an edge's endpoint vertex types or attributes, the syntax is as follows:

ALTER EDGE edge_type_name ADD 
    [FROM (vertex_type_name [','vertex_type_name])]
    [TO (vertex_type_name [','vertex_type_name])]
    [ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* )];

For example:

ALTER EDGE Like ADD TO (Animal) ATTRIBUTE (suggested_by STRING)

ALTER ... DROP

The syntax for ALTER ... DROP is analogous to that of ALTER ... ADD.

ALTER ... DROP

ALTER VERTEX|EDGE object_type_name DROP ATTRIBUTE (
      attribute_name [',' attribute_name]* );

DROP VERTEX | EDGE (global)

The DROP statement removes specified global vertex or edge types from one of the graphs. The command does not delete any data.

drop vertex / edge

DROP VERTEX v_type_name [',' v_type_name...] FROM GRAPH gname;
DROP EDGE e_type_name   [',' e_type_name...] FROM GRAPH gname;

RUN GLOBAL SCHEMA_CHANGE JOB

Below is an example. The schema change alter_friendship_make_library drops the on_date attribute from the friend_of edge and adds Book type to the library graph.

GLOBAL SCHEMA_CHANGE JOB example

USE GLOBAL
CREATE GRAPH library()
CREATE GLOBAL SCHEMA_CHANGE JOB alter_friendship_make_library {
    ALTER EDGE friend_of DROP ATTRIBUTE (on_date);
    ADD VERTEX Book TO GRAPH library;
}
RUN GLOBAL SCHEMA_CHANGE JOB alter_friendship_make_library

DROP GLOBAL SCHEMA_CHANGE JOB

Global schema change jobs can be dropped by using the DROP JOB command. Refer to the Creating a Loading Job page for more information about dropping jobs.

DROP GLOBAL SCHEMA_CHANGE JOB example

USE GLOBAL
DROP JOB alter_friendship_make_library

Defining a Graph Schema

Once the graph designer has chosen a graph schema, the schema is ready to be formalized into a series of GSQL statements.

Graph Creation and Modification Privileges

CREATE VERTEX

Available to superuser and globaldesigner roles only.

The CREATE VERTEX statement defines a new global vertex type, with a name and an attribute list. At a high level of abstraction, the format is

CREATE VERTEX vertex_type_name (id_and_attribute_list) [ vertex_options ]

More specifically, the syntax is as follows, assuming that the vertex ID is listed first:

CREATE VERTEX Syntax

CREATE VERTEX vertex_type_name (primary_id_name_type
    [, attribute_name type [DEFAULT default_value] ]*)
    [WITH [STATS="none"|"outdegree_by_edgetype"][primary_id_as_attribute="true"]]

Keys and Attributes

Beginning with v2.3, there are two syntaxes for specifying the primary id/key:

Legacy PRIMARY_ID syntax: The legacy syntax remains valid, but there are additional options and additional flexibility:

PRIMARY KEY syntax. This syntax is modeled after SQL.

PRIMARY_ID and WITH primary_id_as_attribute

primary_id_name_type := PRIMARY_ID id_name id_type

Beginning with v2.3:

The Primary_id can be treated as an attribute, if the clause WITH primary_id_as_attribute="true" is used with the CREATE VERTEX statement.
The primary_id designation can be used with any one of the attributes; it is not restricted to the first attribute.

Example:

CREATE VERTEX movie (PRIMARY_ID id UINT, name STRING, year UINT)
    WITH primary_id_as_attribute="true"

PRIMARY KEY

primary_id_name_type := id_name_id_type PRIMARY KEY

Note the differences between PRIMARY_ID and PRIMARY KEY:

"PRIMARY_ID" precedes the (name, type) pair. "PRIMARY KEY" follows the (name, type) pair.
In default mode, a PRIMARY_ID is not an attribute, but the WITH primary_id_as_attribute="true" clause can be used to make it an attribute. Alternately, the PRIMARY KEY is always an attribute; the WITH option is unneeded.

Example:

CREATE VERTEX movie (id UINT PRIMARY KEY, name STRING, year UINT)

PRIMARY KEY is not supported in GraphStudio. If you decide to use this feature, you will only be able to use command line interface.

COMPOSITE KEY

composite_id_name_type := PRIMARY KEY "(" attribute_name ("," attribute_name)* ")"

Example:

CREATE VERTEX movie (id UINT, title STRING, year UINT, PRIMARY KEY (title,year,id) )

COMPOSITE KEY is not supported in GraphStudio. If you decide to use this feature, you will only be able to use command line interface.

Vertex Attribute List

The attribute list, enclosed in parentheses, is a list of one or more id definitions and attribute descriptions separated by commas:

primary_id_name_type
[, attribute_name type [DEFAULT default_value ] ]*

The available attribute types, including user-defined tuples, are listed in the section Attribute Data Types.

Every attribute data type has a built-in default value (e.g., the default value for INT type is 0). The DEFAULT default_value option overrides the built-in value.
Any number of additional attributes may be listed after the primary_id attribute. Each attribute has a name, type, and optional default value (for primitive-type, DATETIME, or STRING COMPRESS attributes only)

Example:

Create vertex types for the graph schema of Figure 1.

Vertex definitions for User-Book-Rating graph

CREATE VERTEX User (PRIMARY_ID user_id UINT, name STRING, age UINT, gender STRING, postalCode STRING)
CREATE VERTEX Occupation (PRIMARY_ID occ_id UINT, occ_name STRING)
    WITH STATS="outdegree_by_edgetype"
CREATE VERTEX Book  (PRIMARY_ID bookcode UINT, title STRING, pub_year UINT)
    WITH STATS="none"
CREATE VERTEX Genre (PRIMARY_ID genre_id STRING, genre_name STRING)

WITH STATS

CREATE EDGE

Available to superuser and globaldesigner roles only.

CREATE UNDIRECTED EDGE

CREATE UNDIRECTED EDGE edge_type_name (FROM vertex_type_name, TO vertex_type_name
  [, attribute_name type [DEFAULT default_value]]* )

CREATE DIRECTED EDGE

CREATE DIRECTED EDGE edge_type_name (FROM vertex_type_name, TO vertex_type_name
  [, attribute_name type [DEFAULT default_value]]* )
  [WITH REVERSE_EDGE="rev_name"]

Viewed at a higher level of abstraction, the format is

CREATE UNDIRECTED|DIRECTED EDGE edge_type_name (FROM vertex_type_name,
  TO vertex_type_name,
  edge_attribute_list ) [ edge_options ]

Creating an Edge from or to Any Vertex Type

Wildcard edge type

CREATE DIRECTED EDGE any_edge (FROM *, TO *, label STRING)

WITH REVERSE_EDGE

Example:

Create undirected edges for the three edge types in Figure 1.

Edge definitions for User-Book-Rating graph

CREATE UNDIRECTED EDGE user_occupation (FROM User, TO Occupation)
CREATE UNDIRECTED EDGE book_genre (FROM Book, TO Genre)
CREATE UNDIRECTED EDGE user_book_rating (FROM User, TO Book, rating UINT, date_time UINT)

Example :

Create the additional edges depicted in Figure 2.

Additional Edge definitions for Expanded-User-Book-Rating graph

CREATE UNDIRECTED EDGE friend_of (FROM User, TO User, on_date UINT)
CREATE UNDIRECTED EDGE user_book_read (FROM User, To Book, on_date UINT)
CREATE DIRECTED EDGE sequel_of (FROM Book, TO Book) WITH REVERSE_EDGE="preceded_by"

Every time the GSQL loader creates a sequel_of edge, it will also automatically create a preceded_by edge, pointing in the opposite direction.

Special Options

Shared STRING COMPRESS dictionaries

CREATE VERTEX v1 (PRIMARY_ID main_id STRING, att1 STRING COMPRESS e1, att2 STRING COMPRESS e1)
CREATE UNDIRECTED EDGE e (FROM v1, TO v2, att1 STRING COMPRESS e1)

CREATE GRAPH

Available to superuser and globaldesigner roles only.

Multiple Graph support

CREATE GRAPH syntax

CREATE GRAPH gname (vertex_or_edge_type, vertex_or_edge_type...) [WITH ADMIN username]

The optional WITH ADMIN clause sets the named user to be the admin for the new graph.

As a convenience, executing CREATE GRAPH will set the new graph to be the working graph.

Examples of CREATE GRAPH with all vertex & edge types and with an empty domain.

CREATE GRAPH everythingGraph (*)
CREATE GRAPH emptyGraph ()

Examples :

Create graph Book_rating for the edge and vertex types defined for Figure 1:

Graph definition for User-Book-Rating graph

CREATE GRAPH Book_rating (*)

The following code example shows the full set of statements to define the expanded user-book-rating graph:

Full definition for the Expanded User-Book-Rating graph

CREATE VERTEX User (PRIMARY_ID user_id UINT, name STRING, age UINT, gender STRING, postalCode STRING)
CREATE VERTEX Occupation (PRIMARY_ID occ_id UINT, occ_name STRING)
    WITH STATS="outdegree_by_edgetype"
CREATE VERTEX Book  (PRIMARY_ID bookcode UINT, title STRING, pub_year UINT)
    WITH STATS="none"
CREATE VERTEX Genre (PRIMARY_ID genre_id STRING, genre_name STRING)
CREATE UNDIRECTED EDGE user_occupation (FROM User, TO Occupation)
CREATE UNDIRECTED EDGE book_genre (FROM Book, TO Genre)
CREATE UNDIRECTED EDGE user_book_rating (FROM User, TO Book, rating UINT, date_time UINT)
CREATE UNDIRECTED EDGE friend_of (FROM User, TO User, on_date UINT)
CREATE UNDIRECTED EDGE user_book_read (FROM User, To Book, on_date UINT)
CREATE DIRECTED EDGE sequel_of (FROM Book, TO Book) WITH REVERSE_EDGE="preceded_by"
CREATE GRAPH Book_rating (*)

USE GRAPH

New requirement for MultiGraph support. Applies even if only one graph exists.

For more about roles and privileges, see the document Managing User Privileges and Authentication.

USE GRAPH syntax

USE GRAPH gname

Instead of the USE GRAPH command, gsql can be invoked with the -g <graph_name> option.

DROP GRAPH

Available to superuser and globaldesigner roles only. The effect of this command takes into account shared domains.

DROP GRAPH syntax

DROP GRAPH gname

SHOW - View Parts of the Catalog

SHOW <VERTEX> | <EDGE> | <JOB> | <QUERY> | <GRAPH> [ <name> | <glob> | -r <regex> ]

Regular expression searching will not work with escaping characters.

Linux Globbing examples
SHOW VERTEX us*            //shows all vertices that start with the letters "Us"
SHOW VERTEX co?*y          //shows the vertex that starts with co and ends with y      
SHOW VERTEX ?????          //shows all vertices that are 5 letters long

Regular Expression Examples
SHOW VERTEX -r "skil{2}"    //match the pattern "skill"
SHOW EDGE -r "test[1][13579]*"    //match pattern that only contains odd numbers after "test"
SHOW JOB -r "[a-zA-Z]*"     //match all jobs that contain only letters

Creating a Loading Job

After a graph schema has been created, the system is ready to load data into the graph store. The GSQL language offers easy-to-understand and easy-to-use commands for data loading which perform many of the same data conversion, mapping, filtering, and merging operations which are found in enterprise ETL (Extract,Transform, and Load) systems.

The GSQL system can read structured or semistructured data from text files. The loading language syntax is geared towards tabular or JSON data, but conditional clauses and data manipulation functions allow for reading data that is structured in a more complex or irregular way. For tabular data, each line in the data file contains a series of data values, separated by commas, tabs, spaces, or any other designated ASCII characters (only single character separators are supported). A line should contain only data values and separators, without extra whitespace. From a tabular view, each line of data is a row, and each row consists of a series of column values.

Loading data is a two-step process. First, a loading job is defined. Next, the job is executed with the RUN statement. These two statements, and the components with the loading job, are detailed below.

The structure of a loading job will be presented hierarchically, top-down:

CREATE ... JOB, which may contain a set of DEFINE and LOAD statements

DEFINE statements
LOAD statements, which can have several clauses

All blank spaces are meaningful in string fields in CSV and JSON. That is, both Andy,Knows, Pat and Andy, Knows,Pat are legal but describe different relationships: "Knows" vs. " Knows" and different persons " Pat" vs. "Pat". Either pre-process your data files to remove extra spaces, or use GSQL's token processing functions gsql_trim, gsql_ltrim, and gsql_rtrim (Built-in Loader Token Functions).

New LOADING JOB Capabilities

Beginning with v2.0, the TigerGraph platform introduces an extended syntax for defining and running loading jobs which offers several advantages:

The TigerGraph platform can handle concurrent loading jobs, which can greatly increase throughput.
The data file locations can be specified at compile time or at run time. Run-time settings override compile-time settings.
A loading job definition can include several input files. When running the job, the user can choose to run only part of the job by specifying only some of the input files.
Loading jobs can be monitored, aborted, and restarted.

Concurrent Loading

Among its several duties, the RESTPP component manages loading jobs. Previously, RESTPP could manage only one loading job at a time. In v2.0, there can be multiple RESTPP-LOADER subcomponents, each of which can handle a loading job independently. The maximum number of concurrent loading jobs is set by the configuration parameter RESTPP-LOADER.Replicas.

Furthermore, if the TigerGraph graph is distributed (partitioned) across multiple machine nodes, each machine's RESTPP-LOADER(s) can be put into action. Each RESTPP-LOADER only reads local input data files, but the resulting graph data can be stored on any machine in the cluster.

To maximize loading performance in a cluster, use at least two loaders per machine, and assign each loader approximately the same amount of data.

To provide this added capability for loading, there is an expanded syntax for creating loading jobs and running loading jobs. Below is a summary of changes and additions. Full details are then presented, in the remainder of this document (GSQL Language Reference Part 1).

A loading job begins with CREATE LOADING JOB. (Note that the keyword "LOADING" is included.)
A new statement type, DEFINE FILENAME, is added, to define filename variables.
The file locations can refer either to the local machine, to specific machines, or to all machines.
When a job starts, it is assigned a job_id. Using the job_id, you can check status, abort a job, or restart a job.

Below is a simple example:

CREATE LOADING JOB job1 FOR GRAPH graph1 {

   DEFINE FILENAME file1 = "/data/v1.csv";
   DEFINE FILENAME file2;

   LOAD file1 TO VERTEX v1 VALUES ($0, $1, $2);
   LOAD file2 TO EDGE e2 VALUES ($0, $1);
}
RUN LOADING JOB job1 USING file1="m1:/data/v1_1.csv", file2="m2:/data/e2.csv"

A concurrent-capable loading job can logically be separated into parts according to each file variable. When a concurrent-capable loading job is compiled, a RESTPP endpoint is generated for each file variable. Then, the job can be run in portions, according to each file variable.

pre-v2.0 CREATE JOB syntax is deprecated

If the new CREATE LOADING JOB syntax with DEFINE FILENAME is used, the user can take advantage of concurrent loading.

Pre-v2.0 loading syntax will still be supported for v2.x but is deprecated. Pre-v2.0 loading syntax does not offer concurrent loading.

Example loading jobs and data files for the book_rating schema defined earlier in the document are available in the /doc/examples/gsql_ref folder in your TigerGraph platform installation.

CREATE LOADING JOB Block

The v2.0 CREATE LOADING JOB can be distinguished from the pre-v2.0 loading jobs first by its header, and then by whether its contains DEFINE FILENAME statements or not. Once the loading type has been determined, there are subsequent rules for the format of the individual LOAD statements and then the RUN statement.

The CREATE LOADING JOB and DROP LOADING JOB privileges are reserved for the designer, admin, and superuser roles.

CREATE LOADING JOB

The CREATE LOADING JOB statement is used to define a block of DEFINE, LOAD, and DELETE statements for loading data to or removing data from a particular graph. The sequence of statements is enclosed in curly braces. Each statement in the block, including the last one, should end with a semicolon.

CREATE LOADING JOB job_name FOR GRAPH graph_name {
   [zero or more DEFINE statements;]
   [zero or more LOAD statements;] | [zero or more DELETE statements;]
}

LOAD or DELETE Statements As of version 2.2, a LOADING JOB may contain either LOAD or DELETE statements but not both. A JOB which includes both will be rejected when the CREATE statement is executed.

DROP JOB statement

To drop (remove) a job, run "DROP JOB job_name". The job will be removed from GSQL. To drop all jobs, run either of the following commands: DROP JOB ALL DROP JOB *

The scope of ALL depends on the user's current scope. If the user has set a working graph, then DROP ALL removes all the jobs for that graph. If a superuser has set their scope to be global, then DROP ALL removes all jobs across all graph spaces.

DEFINE statements

A DEFINE statement is used to define a local variable or expression to be used by the subsequent LOAD statements in the loading job.

DEFINE FILENAME

The DEFINE FILENAME statement defines a filename variable. The variable can then be used later in the JOB block by a LOAD statement to identify its data source. Every concurrent loading job must have at least one DEFINE FILENAME statement.

DEFINE FILENAME filevar ["=" filepath_string ]; 
filepath_string = ( path | " all :" path | " any :" path | mach_aliases " :" path ["," mach_aliases ":" path ]* ) 
mach_aliases = name["|"name]*

The filevar is optionally followed by a filepath_string , which tells the job where to find input data. As the name suggests, filepath_string is a string value. Therefore, it should start and end with double quotes.

filepath_string

There are four options for filepath_string :

path : either an absolute path or relative path for either a file or a folder on the machine where the job is run. If it is a folder, then the loader will attempt to load each non-hidden file in the folder.

"/data/graph.csv"

If this path is not valid when CREATE LOADING JOB is executed, GSQL will report an error.

An absolute path may begin with the session variable $sys.data_root.

CREATE LOADING JOB filePathEx FOR GRAPH gsql_demo {
  LOAD "$sys.data_root/persons.csv" TO ...
}

Then, when running this loading job, first set a value for the parameter, and then run the job:

SET sys.data_root="/data/mydata"
RUN JOB filePathEx

As the name implies, session parameters only retain their value for the duration of the current GSQL session. If the user exits GSQL, the settings are lost.

"all:" path : If the path is prefixed with all: , then the loading job will attempt to run on every machine in the cluster which has a RESTPP component, and each machine will look locally for data at path . I f the path is not valid on any of the machines, the job will be aborted . Also, the session parameter $sys.data_root may not be used.

"ALL:/data/graph.csv"

"any:" path : If the path is prefixed with any: , then the loading job will attempt to run on every machine in the cluster which has a RESTPP component, and each machine will look locally for data at path . If the path is not valid on any of the machines, those machines are skipped. Also, the session parameter $sys.data_root may not be used.

"ANY:/data/graph.csv"

A list of machine-specific paths : A machine_alias is a name such as m1, m2, etc. which is defined when the cluster configuration is set. For this option, the filepath_string may include a list of paths, separated by commas. If several machines have the same path, the paths can be grouped together by using a list of machine aliases, with the vertical bar "|" as a separator. The loading job will run on whichever machines are named; each RESTPP-LOADER will work on its local files.

"m1:/data1.csv, m2|m3|m5:/data/data2.csv"

DEFINE HEADER

The DEFINE HEADER statement defines a sequence of column names for an input data file. The first column name maps to the first column, the second column name maps to the second column, etc.

DEFINE HEADER header_name = " column_name "[," column_name "]*;

DEFINE INPUT_LINE_FILTER

The DEFINE INPUT_LINE_FILTER statement defines a named Boolean expression whose value depends on column attributes from a row of input data. When combined with a USING reject_line_rule clause in a LOAD statement, the filter determines whether an input line is ignored or not.

DEFINE INPUT_LINE_FILTER filter_name = boolean_expression_using_column_variables ;

LOAD statements

A LOAD statement tells the GSQL loader how to parse a data line into column values (tokens), and then describes how the values should be used to create a new vertex or edge instance. One LOAD statement can be used to generate multiple vertices or edges, each vertex or edge having its own Destination_Clause , as shown below. Additionally, two or more LOAD statements may refer to the same input data file. In this case, the GSQL loader will merge their operations so that both of their operations are executed in a single pass through the data file.

The LOAD statement has many options. This reference guide provides examples of key features and options. The Platform Knowledge Base / FAQs and the tutorials, such as Get Started with TigerGraph , provide additional solution- and application-oriented examples.

Different LOAD statement types have different rules for the USING clause; see the USING clause section below for specifics.

LOAD statement

LOAD [filepath_string|filevar|TEMP_TABLE table_name ] Destination_Clause [, Destination_Clause ]* [USING clause ];

The filevar must have been previously defined in a DEFINE FILENAME statement.

The filepath_string must satisfy the same rules given above in the DEFINE FILENAME section.

"__GSQL_FILENAME_n__": Position-based File Identifiers

When a CREATE LOADING JOB block is processed, the GSQL system will count the number of unique filepath_strings and assign them position-based index numbers 0, 1, 2, etc. starting from the top. A filepath_string is considered one item, even if it has multiple machine indexes and file locations. These index numbers can then be used as an alternate naming scheme for the filespath_strings:

When running a loading job, the nth filepath_string can be referred as "__GSQL_FILENAME_n__", where n is replaced with the index number. Note that the string has double underscores at both the left and right ends.

The remainder of this section of the document will provide details on the format and use of the file_path, Destination_Clause, its subclauses. USING clause is introduced later in Section "Other Optional LOAD Clauses".

Destination Clause

A Destination_Clause describes how the tokens from a data source should be used to construct one of three types of data objects : a vertex, an edge, or a row in a temporary table (TEMP_TABLE). The destination clause formats for the three types are very similar, but we show them separately for clarity:

TO VERTEX vertex_type_name VALUES (id_expr [, attr_expr]*)
    [WHERE conditions] [OPTION (options)]

TO EDGE edge_type_name VALUES (source_id_expr [source_type_expr],
                               target_id_expr [target_type_expr]
                               [, attr_expr]*)
    [WHERE conditions] [OPTION (options)]

TO TEMP_TABLE table_name (id_name [, attr_name]*)
    VALUES (id_expr [, attr_expr]*)
    [WHERE conditions] [OPTION (options)]

For the TO VERTEX and TO EDGE destination clauses, the vertex_type_name or edge_type_name must match the name of a vertex or edge type previously defined in a CREATE VERTEX or CREATE UNDIRECTED|DIRECTED EDGE statement. The values in the VALUE list(id_expr, attr_expr1, attr_expr2,...) are assigned to the id(s) and attributes of a new vertex or edge instance, in the same order in which they are listed in the CREATE statement. id_expr obeys the same attribute rules as attr_expr , except that only attr_expr can use the reducer function, which is introduced later.

The TO TEMP_TABLE clause defines a new, temporary data structure. Its unique characteristics will be described in a separate subsection. For now, we focus on TO VERTEX and TO EDGE.

For edge clauses, the source_id_expr and target_id_expr can each optionally be followed by a source_type_expr and target_type_expr, respectively. The source_type_expr and target_type_expr must evaluate to one of the allowed endpoint vertex types for the given edge type. By specifying the vertex type, this tells the loader what id types to expect. This may be important when the edge type is defined to accept more than one type of source/target vertex.

For fast loading of edge data, referential integrity checking is disabled by default. For an edge to be valid, it must refer to endpoint vertices which exist. To support fast, out-of-order loading, if one or both of the endpoint vertices do not yet exist, the loader will create vertices with the necessary IDs and default attribute values. Due to the loader's UPSERT semantics, if the vertices data are loader later, it will be automatically merged with the dummy vertices. The user can disable this feature and perform regular referential integrity checking by setting the VERTEXMUSTEXIST=true option.

Example: Suppose we have the following vertex and edge types:

CREATE VERTEX Person (pid STRING PRIMARY KEY, birthdate DATETIME)
CREATE VERTEX Company (cid INT PRIMARY KEY, industry STRING)
CREATE DIRECTED EDGE Visit (FROM Person, TO Person
                          | FROM Person, TO Company, year INT)

A Visit edge can connect two Persons or a Person to a Company. A Person has a string id, while a Company has an INT id. Then suppose the Visits source data comes from a single CSV file, containing both variants of edges. Note that the 2nd column ($1) contains either Person or Company, and that the 3rd column ($2) contains either a string or an integer.

Sam,Person,Joe,2012
Sam,Company,4057,2017
Chris,Company,9401,2016
Pat,Person,Taylor,2020

Using the optional target_type_expr field, we can load both variants of the Visit edge with a single clause.

LOAD file1 TO EDGE Visit VALUES ($0, $2 $1, $3)

Attributes and Attribute Expressions

A LOAD statement processes each line of an input file, splitting each line (according to the SEPARATOR character, see Section "Other Optional LOAD Clauses" for more details) into a sequence of tokens. Each destination clause provides a token-to-attribute mapping which defines how to construct a new vertex, an edge, or a temp table row instance (e.g., one data object). The tokens can also be thought of as the column values in a table. There are two ways to refer to a column, by position or by name. Assuming a column has a name, either method may be used, and both methods may be used within one expression.

By Position : The columns (tokens) are numbered from left to right, starting with $0. The next column is $1, and so on.

By Name : Columns can be named, either through a header line in the input file, or through a DEFINE HEADER statement. If a header line is used, then the first line of the input file should be structured like a data line, using the same separator characters, except that each column contains a column name string instead of a data value. Names are enclosed in double quotes, e.g. $"age".

Data file name: $sys.file_name refers to the current input data file.

In a simple case, a token value is copied directly to an attribute. For example, in the following LOAD statement,

LOAD "xx/yy/a.csv" TO VERTEX person VALUES ($0, $1, $sys.file_name)

The PRIMARY_ID of a person vertex comes from column $0 of the file "xx/yy/a.csv".
The next attribute of a person vertex comes from column $1.
The next attribute of a person vertex is given the value "xx/y/a.csv" (the filename itself).

Users do not need to explicitly define a primary_id. Given the attributes, one will be selected as the primary key.

Cumulative Loading

A basic principle in the GSQL Loader is cumulative loading. Cumulative loading means that a particular data object might be written to (i.e., loaded) multiple times, and the result of the multiple loads may depend on the full sequence of writes. This usually means that If a data line provides a valid data object, and the WHERE clause and OPTION clause are satisfied, then the data object is loaded.

Valid input : For each input data line, each destination clause constructs one or more new data objects. To be a valid data object, it must have an ID value of the correct type, have correctly typed attribute values, and satisfy the optional WHERE clause. If the data object is not valid, the object is rejected (skipped) and counted as an error in the log file. The rules for invalid attributes values are summarized below:

UINT: Any non-digit character. (Out-of-range values cause overflow instead of rejection)
INT: Any non-digit or non-sign character. (Out-of-range values cause overflow instead of rejection)
FLOAT and DOUBLE: Any wrong format
STRING, STRING COMPRESS, FIXED_BINARY: N/A
DATETIME: Wrong format, invalid date time, or out of range.
Complex type: Depends on the field type or element type. Any invalid field (in UDT), element (in LIST or SET), key or value (in MAP) causes rejection.

New data objects: If a valid data object has a new ID value, then the data object is added to the graph store. Any attributes which are missing are assigned the default value for that data type or for that attribute.
Overwriting existing data objects : If a valid data object has a ID value for an existing object, then the new object overwrites the existing data object, with the following clarifications and exceptions:

The attribute values of the new object overwrite the attribute values of the existing data object.
Missing tokens : If a token is missing from the input line so that the generated attribute is missing, then that attribute retains its previous value.

A STRING token is never considered missing; if there are no characters, then the string is the empty string

Skipping an attribute : A LOAD statement can specify that a particular attribute should NOT be loaded by using the special character _ (underscore) as its attribute expression (attr_expr). For example,

LOAD TO VERTEX person VALUES ($0, $1, _, $2)

means to skip the next-to-last attribute. This technique is used when it is known that the input data file does not contain data for every attribute.

If the LOAD is creating a new vertex or edge, then the skipped attribute will be assigned the default value.
If the LOAD is overwriting an existing vertex or edge, then the skipped attribute will retain its existing value.

More Complex Attribute Expressions

An attribute expression may use column tokens (e.g., $0), literals (constant numeric or string values), any of the built-in loader token functions, or a user-defined token function. Attribute expressions may not contain mathematical or boolean operators (such as +, *, AND). The rules for attribute expressions are the same as those for id expressions, but an attribute expression can additionally use a reducer function:

id_expr := $column_number | $"column_name" | constant | $sys.file_name | token_function_name( id_expr [, id_expr ]* )
attr_expr := id_expr | REDUCE(reducer_function_name(id _expr ))

Note that token functions can be nested, that is, a token function can be used as an input parameter for another token function. The built-in loader token/reducer functions and user-defined token functions are described in the section "Built-In Loader Token Functions".

The subsections below describe details about loading particular data types.

Loading a DOUBLE or FLOAT Attribute

A floating point value has the basic format

[sign][digits].[digits](e|E)[sign][digits]

In the first case, the decimal point and following digits are required. In the second case, some digits are required (looking like an integer), and the following decimal point and digits are optional.

In both cases, the leading sign ( "+" or "-") is optional. The exponent, using "e" or "E", is optional. Commas and extra spaces are not allowed.

# Valid floating point values
-198256.03
+16.
-.00036
7.14285e15
9.99E-22


# Invalid floating point values
-198,256.03
9.99 E-22

Loading a DATETIME Attribute

When loading data into a DATETIME attribute, the GSQL loader will automatically read a string representation of datetime information and convert it to internal datetime representation. The loader accepts any of the following string formats:

%Y-%m-%d %H:%M:%S (e.g., 2011-02-03 01:02:03)
%Y/%m/%d %H:%M:%S (e.g., 2011/02/03 01:02:03)
%Y-%m-%dT%H:%M:%S.000z (e.g., 2011-02-03T01:02:03.123z, 123 will be ignored)
%Y-%m-%d (only date, no time, e.g., 2011-02-03 )
%Y/%m/%d (only date, no time, e.g., 2011/02/03)
Any integer value (Unix Epoch time, where Jan 1, 1970 at 00:00:00 is integer 0)

Format notation:

%Y is a 4-digit year. A 2-digit year is not a valid value.

%m and %d are a month (1 to 12) and a day (1 to 31), respectively. Leading zeroes are optional.

%H, %M, %S are hours (0 to 23), minutes (0 to 59) and seconds (0 to 59), respectively. Leading zeroes are optional.

When loading data, the loader checks whether the values of year, month, day, hour, minute, second are out of the valid range. If any invalid value is present, e.g. '2010-13-05' or '2004-04-31 00:00:00', the attribute is invalid and the object (vertex or edge) is not created.

Loading a User-Defined Type (UDT) Attribute

To load a UDT attribute, state the name of the UDT type, followed by the list of attribute expressions for the UDT's fields, in parentheses. See the example below.

TYPEDEF TUPLE <f1 INT (1), f2 UINT, f3 STRING (10), f4 DOUBLE > myTuple   # define a UDT
CREATE VERTEX v_udt  (id STRING PRIMARY KEY, att_udt myTuple)
CREATE GRAPH test_graph (v_udt)
CREATE LOADING JOB load_udt FOR GRAPH test_graph {
    DEFINE FILENAME f;
    LOAD f TO VERTEX v_udt VALUES ($0, myTuple($1, $2, $3, $4) ); 
    # $1 is loaded as f1, $2 is loaded as f2, and so on
}
RUN LOADING JOB load_udt USING f="./udt.csv"

Loading a LIST or SET Attribute

There are three methods to load a LIST or a SET.

The first method is to load multiple rows of data which share the same id values and append the individual attribute values to form a collection of values. The collections are formed incrementally by reading one value from each eligible data line and appending the new value into the collection. When the loading job processes a line, it checks to see whether a vertex or edge with that id value(s) already exists or not. If the id value(s) is new, then a new vertex or edge is created with a new list/set containing the single value. If the id(s) has been used before, then the value from the new line is appended to the existing list/set. Below shows an example:

CREATE VERTEX test_vertex (PRIMARY_ID id STRING, iset SET<INT>, ilist LIST<INT>)
CREATE UNDIRECTED EDGE test_edge(FROM test_vertex, TO test_vertex)
CREATE GRAPH test_set_list (*)

CREATE LOADING JOB load_set_list FOR GRAPH test_set_list {
  DEFINE FILENAME f;
  LOAD f TO VERTEX test_vertex VALUES ($0, $1, $1);                                                                                  
}
RUN LOADING JOB load_set_list USING f="./list_set_vertex.csv"

1,10
3,30
1,20
3,30
3,40
1,20

The job load_set_list will load two test_vertex vertices because there are two unique id values in the data file. Vertex 1 has attribute values with iset = [10,20] and ilist = [10,20,20]. Vertex 3 has values iset = [30,40] and ilist = [30, 30, 40]. Note that a set doesn't contain duplicate values, while a list can contain duplicate values.

Because GSQL loading is multi-threaded, the order of values loaded into a LIST might not match the input order.

If the input file contains multiple columns which should be all added to the LIST or SET, then a second method is available. Use the LIST() or SET() function as in the example below:

CREATE VERTEX v_set  (PRIMARY_ID id STRING, nick_names SET<STRING>)
CREATE VERTEX v_list (PRIMARY_ID id STRING, lucky_nums LIST<INT>)
CREATE GRAPH test_graph (*)
CREATE LOADING JOB load_set_list FOR GRAPH test_graph {
    DEFINE FILENAME f;
    LOAD f TO VERTEX v_set  VALUES ($0, SET($1,$2,$3) );
    LOAD f TO VERTEX v_list VALUES ($0, LIST($2,$4) );
}

The third method is to use the SPLIT () function to read a compound token and split it into a collection of elements, to form a LIST or SET collection. The SPLIT() function takes two arguments: the column index and the element separator. The element separator should be distinct from the separator through the whole file. Below shows an example:

CREATE VERTEX test_vertex (PRIMARY_ID id STRING, ustrset SET<STRING>, ilist LIST<INT>)
CREATE UNDIRECTED EDGE test_edge(FROM test_vertex, TO test_vertex)
CREATE GRAPH test_split (*)

CREATE LOADING JOB set_list_job FOR GRAPH test_split {
  DEFINE FILENAME f;
  LOAD f TO VERTEX test_vertex VALUES ($0, SPLIT($1,"|") , SPLIT($2,"#") );                                                                                  
}
RUN LOADING JOB set_list_job USING f="./split_list_set.csv"

vid,names,numbers 
v1,mike|tom|jack, 1 # 2 # 3 
v2,john, 5 # 4 # 8

The SPLIT() function cannot be used for UDT type elements.

Loading a MAP Attribute

There are three methods to load a MAP.

The first method is to load multiple rows of data which share the same id values. The maps are formed incrementally by reading one key-value pair from each eligible data line. When the loading job processes a line, it checks to see whether a vertex or edge with that id value(s) already exists or not. If the id value(s) is new, then a new vertex or edge is created with a new map containing the single key-value pair. If the id(s) has been used before, then the loading job checks whether the key exists in the map or not. If the key doesn't exist in the map, the new key-value pair is inserted. Otherwise, the value will be replaced by the new value.

The loading order might not be the same as the order in the raw data. If a data file contains multiple lines with the same id and same key but different values, loading them together results in a nondeterministic final value for that key.

Method 1 : Below is the syntax to load a MAP by the first method: Use an arrow (->) to separate the map's key and value.

CREATE VERTEX v_map  (PRIMARY_ID id STRING, att_map MAP<INT, STRING>)
CREATE GRAPH test_graph (*)
CREATE LOADING JOB load_map FOR GRAPH test_graph {
    DEFINE FILENAME f;
    LOAD f TO VERTEX v_map  VALUES ($0, ($1 -> $2) );
}

Method 2 : The second method is to use the MAP() function. If there are multiple key-value pairs among multiple columns, MAP() can load them together. Below is an example:

CREATE VERTEX v_map  (PRIMARY_ID id STRING, att_map MAP<INT, STRING>)
CREATE GRAPH test_graph (*)
CREATE LOADING JOB load_map FOR GRAPH test_graph {
    DEFINE FILENAME f;
    LOAD f TO VERTEX v_map  VALUES ($0, MAP( ($1 -> $2), ($3 -> $4) ) );  # $1 and $3 are keys and $2 and $4 are the corresponding values.
}

Method 3 : The third method is to use the SPLIT() function. Similar to the SPLIT() in loading LIST or SET, the SPLIT() function can be used when the key-value pair is in one column and separated by a key-value separator, or multiple key-value pairs are in one column and separated by element separators and key-value separators. SPLIT() here has three parameters: The first is the column index, the second is the key-value separator, and the third is the element separator. The third parameter is optional. If one row of raw data only has one key-value pair, the third parameter can be skipped. Below are the examples without and with the given element separator.

vid,key_value
v1,1:mike
v2,2:tom
v1,3:lucy

vid,key_value_list
v1,1:mike#4:lin
v2,2:tom
v1,3:lucy#1:john#6:jack

CREATE VERTEX v_map  (PRIMARY_ID id STRING, att_map MAP<INT, STRING>)
CREATE GRAPH test_graph (*)
CREATE LOADING JOB load_map FOR GRAPH test_graph {
    DEFINE FILENAME f;
    LOAD f TO VERTEX v_map  VALUES ($0, SPLIT($1, ":", "#") );
}

The SPLIT() function cannot be used for UDT type elements.

Loading Composite Key Attributes

Loading a Composite Key for a vertex works no differently that normal loading. Simply load all the attributes as you would for a vertex with a single-attribute primary key. The primary key will automatically be constructed from the appropriate attributes.

When loading to an edge where either TO_VERTEX or FROM_VERTEX contains a composite key, the composite set of attributes must be enclosed in parameters. See the example below.

#schema setup
CREATE VERTEX compositePerson (id uint, name string, PRIMARY KEY (name,id))
CREATE VERTEX compositeMovie (id uint, title string, country string, year uint, primary key (title,year,id))
CREATE DIRECTED EDGE compositeRoles (from compositePerson,to compositeMovie, role string) with reverse_edge="composite_roles_reverse"
CREATE GRAPH MyGraph(*)

#loading job
CREATE LOADING JOB composite_load FOR GRAPH MyGraph {
  LOAD "$sys.data_root/movies.csv" TO VERTEX compositeMovie VALUES 
       ($"id", $"title", $"country" ,$"year") USING header ="true", separator=",";
       
  LOAD "$sys.data_root/persons.csv" TO VERTEX compositePerson VALUES 
       ($"id",$"name") USING header = "true", separator =",";
       
  LOAD "$sys.data_root/compositeroles.csv" TO EDGE compositeRoles VALUES 
       (($"personName", $"personId"),($"movieTitle",$"movieYear",$"movieId"),$"role") 
       USING header="true", separator = ",";
}

Loading Wildcard Type Edges

If an edge has been defined using a wildcard vertex type, a vertex type name must be specified, following the vertex id, in a load statement for the edge. An example is shown below:

#schema setup
CREATE VERTEX user(PRIMARY_ID id UINT)
CREATE VERTEX product(PRIMARY_ID id UINT)
CREATE VERTEX picture(PRIMARY_ID id UINT)
CREATE UNDIRECTED EDGE purchase (FROM *, TO *)
CREATE GRAPH test_graph(*)
 
#loading job
CREATE LOADING JOB test2 FOR GRAPH test_graph {
  DEFINE FILENAME f;
  LOAD f
     TO EDGE purchase VALUES ($0 user, $1 product),
     TO EDGE purchase VALUES ($0 user, $2 picture);
  }

Built-in Loader Token Functions

The GSQL Loader provides several built-in functions which operate on tokens. Some may be used to construct attribute expressions and some may be used for conditional expressions in the WHERE clause.

Token Functions for Attribute Expressions

The following token functions can be used in an id or attribute expression

Timestamp Input Format

The timestamp parameter should be in one of the following formats: "%Y-%m-%d %H:%M:%S" "%Y/%m/%d %H:%M:%S" "%Y-%m-%dT%H:%M:%S.000z" // text after the dot . is ignored

Reducer Functions

A reducer function aggregates multiple values of a non-id attribute into one attribute value of a single vertex or edge. Reducer functions are computed incrementally; that is, each time a new input token is applied, a new resulting value is computed.

To reduce and load aggregate data to an attribute, the attribute expression has the form

REDUCE( reducer_function ( input_expr ) )

where reducer_function is one of the functions in the table below. input_expr can include non-reducer functions, but reducer functions cannot be nested.

Each reducer function is overloaded so that one function can be used for several different data types. For primitive data types, the output type is the same as the input_expr type. For LIST, SET, and MAP containers, the input_expr type is one of the allowed element types for these containers (see "Complex Types" in the Attribute Data Types section). The output is the entire container.

Each function supports a certain set of attribute types. Calling a reducer function with an incompatible type crashes the service. In order to prevent that, use the WHERE clause (introduced below) together with IS NUMERIC or other operators, functions, predicates for type checking if necessary.

WHERE Clause

The WHERE clause is an optional clause. The WHERE clause's condition is a boolean expression. The expression may use column token variables, token functions, and operators which are described below. The expression is evaluated for each input data line. If the condition is true, then the vertex or edge instance is loaded into the graph store. If the condition is false, then this instance is skipped. Note that all attribute values are treated as string values in the expression, so the type conversion functions to_int() and to_float(), which are described below, are provided to enable numerical conditions.

Operators in the WHERE Clause

The GSQL Loader language supports most of the standard arithmetic, relational, and boolean operators found in C++. Standard operator precedence applies, and parentheses provide the usual override of precedence.

Arithmetic Operators: +, -, *, /, ^ Numeric operation can be used to express complex operation between numeric types. Just as in ordinary mathematical expressions, parentheses can be used to define a group and to modify the order of precedence.

Because computers necessarily can only store approximations for most DOUBLE and FLOAT type values, it is not recommended to test these data types for exact equality or inequality. Instead, allow for an acceptable amount of error. The following example checks if $0 = 5, with an error of 0.00001 permitted:

WHERE to_float($0) BETWEEN 5-0.00001 AND 5+0.00001

Relational Operators: <, >, ==, !=, <=, >= Comparisons can be performed between two numeric values or between two string values.
Predicate Operators:
- AND, OR, NOT operators are the same as in SQL. They can be used to combine multiple conditions together. E.g., $0 < "abc" AND $1 > "abc" selects the rows with the first token less than "abc" and the second token greater than "abc". E.g., NOT $1 < "abc" selects the rows with the second token greater than or equal to "abc".
- IS NUMERIC token IS NUMERIC returns true if token is in numeric format. Numeric format include integers, decimal notation, and exponential notation. Specifically, IS NUMERIC is true if token matches the following regular expression: (+/-) ? [0-9] + (.[0-9]) ? [0-9] * ((e/E)(+/-) ? [0-9] +) ? . Any leading space and trailing space is skipped, but no other spaces are allowed. E.g., $0 IS NUMERIC checks whether the first token is in numeric format.
- IS EMPTY token IS EMPTY returns true if token is an empty string. E.g., $1 IS EMPTY checks whether the second token is empty.
- IN token IN ( set_of_values ) returns true if token is equal to one member of a set of specified values. The values may be string or numeric types. E.g., $2 IN ("abc", "def", "lhm") tests whether the third token equals one of the three strings in the given set. E.g., to_int($3) IN (10, 1, 12, 13, 19) tests whether the fourth token equals one of the specified five numbers.
- BETWEEN ... AND token BETWEEN lowerVal AND upperVal returns true if token is within the specified range, inclusive of the endpoints. The values may be string or numeric types. E.g., $4 BETWEEN "abc" AND "def" checks whether the fifth token is greater than or equal to "abc" and also less than or equal to "def" E.g., to_float($5) BETWEEN 1 AND 100.5 checks whether the sixth token is greater than or equal to 1.0 and less than or equal to 100.5.

Token functions in the WHERE clause

The GSQL loading language provides several built-in functions for the WHERE clause.

The token functions in the WHERE clause and those token functions used for attribute expression are different. They cannot be used exchangeably.

User-Defined Token Functions

Users can write their own token functions in C++ and install them in the GSQL system. The system installation already contains a source code file containing sample functions. Users simply add their customized token functions to this file. The file for user-defined token functions for attribute expressions or WHERE clauses is at <tigergraph.root.dir>/dev/gdk/gsql/src/TokenBank/TokenBank.cpp. There are a few examples in this file, and details are presented below .

Testing your functions is simple. In the same directory with the TokenBank.cpp file is a command script called compile.

1. To test that your function compiles:

./compile

2. To test that your function works correctly, write your own test and add it to the main() procedure in the TokenBank.cpp. Then, compile the file and run it. Note that files located in ../TokenLib need to be included:

g++ -I../TokenLib TokenBank.cpp
./a.out

User-defined Token Functions for Attribute Expressions

The parameters are as follows: iToken is the array of string tokens, iTokenLen is the array of the length of the string tokens, and iTokenNum is the number of tokens. Note that the input tokens are always in string (char*) format.

If the attribute type is not string nor string compress, the return type should be the corresponding type: bool for bool; uint64_t for uint; int64_t for int; float for float double for double. If the attribute type is string or string compress, the return type should be void, and use the extra parameters ( char *const oToken, uint32_t& oTokenLen) for storing the return string. oToken is the returned string value, and oTokenLen is the length of this string.

The built-in token function gsql_concat is used as an example below. It takes multiple-token parameter and returns a string.

extern "C" void gsql_concat(const char* const iToken[], uint32_t iTokenLen[], uint32_t iTokenNum, char* const oToken, uint32_t& oTokenLen) {
  int k = 0;
  for (int i=0; i < iTokenNum; i++) {
    for (int j =0; j < iTokenLen[i]; j++) {
           oToken[k++] = iToken[i][j];
    }
  }
  oTokenLen = k;
}

User-defined Token Functions for WHERE Clause

User-defined token functions (described above) can also be used to construct the boolean conditional expression in the WHERE clause. However, there are some restrictions in the WHERE clause:

In the clause "WHERE conditions ",

The only type of user-defined token function allowed are those that return a boolean value.
If a user-defined token function is used in a WHERE Clause, then it must constitute the entire condition; it cannot be combined with another function or operator to produce a subsequent value. However, the arguments of the UDF can include other functions.

The source code for the built-in token function gsql_token_equal is used as an example for how to write a user-defined token function.

extern "C" bool gsql_token_equal(const char* const iToken[], uint32_t iTokenLen[], uint32_t iTokenNum) {
  if (iTokenNum != 2) {
    return false;
  }
  if (iTokenLen[0] != iTokenLen[1]) {
    return false;
  }
  for (int i =0; i < iTokenLen[0]; i++) {
    if (iToken[0][i] != iToken[1][i]) {
      return false;
    }
  }
  return true;
}

Other Optional LOAD Clauses

OPTION clause

There are no supported options for the OPTION clause at this time.

USING clause

A USING clause contains one or more optional parameter value pairs:

USING parameter=value [,parameter=value]*

In the v2.0 loading syntax, the USING clause only appears at the end of a LOAD statement.

In earlier versions, the location of the USING clause and which parameters were valid depending the whether the job was a v1.x online loading job or v1.x offline loading job.

If multiple LOAD statements use the same source (the same file path, the same TEMP_TABLE, or the same file variable), the USING clauses in these LOAD statements must be the same. Therefore, we recommend that if multiple destination clauses share the same source, put all of these destination clauses into the same LOAD statement.

QUOTE parameter

The parser will not treat separator characters found within a pair of quotation marks as a separator. For example, if the parsing conditions are QUOTE="double", SEPARATOR=",", the comma in "Leonard,Euler" will not separate Leonard and Euler into separate tokens.

If QUOTE is not declared, quotation marks are treated as ordinary characters.
If QUOTE is declared, but a string does not contain a matching pair of quotation marks, then the string is treated as if QUOTE is not declared.
Only the string inside the first pair of quote (from left to right) marks are loaded. For example QUOTE="double", the string a"b"c"d"e will be loaded as b.
There is no escape character in the loader, so the only way to include quotation marks within a string is for the string body to use one type of quote (single or double) and to declare the other type as the string boundary marker.

Previously, ill-formatted strings such as a"a,b"ac,d would be parsed as a,b,d ignoring a,a,c. The expected input string should be a,"a,b",ac,d. In v2.4, incorrectly formatted strings such as this example will be parsed normally, giving you this result: a"a,b"ac and d.

Loading JSON Data

When the USING option JSON_FILE="true" is used, the loader loads JSON objects instead of tabular data. A JSON object is an unordered set of key/value pairs, where each value may itself be an array or object, leading to nested structures. A colon separates each key from its value, and a comma separates items in a collection. A more complete description of JSON format is available at www.json.org . The JSON loader requires that each input line has exactly one JSON object. Instead of using column values as tokens, the JSON loader uses JSON values as tokens, that is, the second part of each JSON key/value pair. In a GSQL loading job, a JSON field is identified by a dollar sign $ followed by the colon-separated sequence of nested key names to reach the value from the top level. For example, given the JSON object {"abc":{"def": "this_value"}}, the identifier $"abc":"def" is used to access "this_value". The double quotes are mandatory.

An example is shown below:

CREATE VERTEX encoding (PRIMARY_ID id STRING, length FLOAT default 10)
CREATE UNDIRECTED EDGE encoding_edge (FROM encoding, TO encoding)
CREATE GRAPH encoding_graph (*)

CREATE LOADING JOB json_load FOR GRAPH encoding_graph {
  LOAD "encoding.json" TO VERTEX encoding
    VALUES ($"encoding", $"indent":"length") USING JSON_FILE="true";
}
RUN JOB json_load

To specify an end-of-line character other than the standard one, use the EOL option, as shown below.

CREATE LOADING JOB json_load2 FOR GRAPH companyGraph {

  LOAD "/tmp/data.json"
    TO VERTEX company VALUES($"company":"name":"value",$"company":"name":"value"),
    TO VERTEX members VALUES($"members",$"members") USING JSON_FILE="true", EOL="\03";
}

{"encoding": "UTF-7","plug-ins":["c"],"indent" : { "length" : 30, "use_space": true }}
{"encoding":"UTF-1","indent":{"use_space": "dontloadme"}, "plug-ins" : [null, true, false] }
{"plug-ins":["C","c++"],"indent":{"length" : 3, "use_space": false},"encoding":"UTF-6"}

In the above data encoding.json, the order of fields are not fixed and some fields are missing. The JSON loader ignores the order and accesses the fields by the nested key names. The missing fields are loaded with default values. The result vertices are:

VertexMustExist Parameter

Normally, if vertices do not exist when loading data to edges, a vertex will be created for the connecting edge, using default values for all attributes. Using the VERTEXMUSTEXIST="true" option will load data only if the vertices on both sides of an edge already exist, therefore no longer creating extra vertices.

CREATE LOADING JOB load_edge FOR GRAPH MyGraph {
    DEFINE FILENAME f;
    LOAD f
        TO EDGE MyEdge VALUES ($1, $2, $3,) USING VERTEXMUSTEXIST="true";
}

TEMP_TABLE and Flatten Functions

The keyword TEMP_TABLE triggers the use of a temporary data table which is used to store data generated by one LOAD statement, for use by a later LOAD statement. Earlier we introduced the syntax for loading data to a TEMP_TABLE:

TO TEMP_TABLE table_name (id_name [, attr_name]*) VALUES (id_expr [, attr_expr]*)
    [WHERE conditions] [OPTION (options)]

This clause is designed to be used in conjunction with the flatten or flatten_json_array function in one of the attr_expr expressions. The flatten function splits a multi-value field into a set of records. Those records can first be stored into a temporary table, and then the temporary table can be loaded into vertices and/or edges. Only one flatten function is allowed in one temp table destination clause.

There are two versions of the flatten function: One parses single-level groups and the other parses two-level groups. There are also two versions of the flatten_json_array function: One splits an array of primitive values, and the other splits an array of JSON objects.

One-Level Flatten Function

flatten( column_to_be_split, separator, 1 ) is used to parse a one-level group into individual elements. An example is shown below:

101|"Harry Potter and the Philosopher's Stone"|"fiction,fantasy,young adult"
102|"The Three-Body Problem"|"fiction,science fiction,Chinese"

CREATE LOADING JOB load_books_flatten1 FOR GRAPH Book_rating {
  DEFINE FILENAME f;
  LOAD f
      TO VERTEX Book VALUES ($0, $1, _),
      TO TEMP_TABLE t1(bookcode,genre) VALUES ($0, flatten($2,",",1))
      USING QUOTE="double", SEPARATOR="|";

  LOAD TEMP_TABLE t1
      TO VERTEX Genre VALUES($"genre", $"genre"),
      TO EDGE book_genre VALUES($"bookcode", $"genre");
}
RUN LOADING JOB load_books_flatten1 USING f="../data/book1.dat"

The loading job contains two LOAD statements. The first one loads input data to Book vertices and to a TEMP_TABLE. The second one loads the TEMP_TABLE data to Genre vertices and book_genre edges.

Line 5 says that the third column ($2) of each input line should be split into separate tokens, with comma "," as the separator. Each token will have its own row in table t1. The first column is labeled "bookcode" with value $0 and the second column is "genre" with one of the $2 tokens. The contents of TEMP_TABLE t1 are shown below:

Then, lines 8 to 10 say to read TEMP_TABLE t1 and to do the following for each row:

Create a Genre vertex for each new value of "genre".
Create a book_genre edge from "bookcode" to "genre". In this case, each row of TEMP_TABLE t1 generates one book_genre edge.

The final graph will contain two Book vertices (101 and 102), five Genre vertices, and six book_genre edges.

{
  "results": [{"@@edgeSet": [
    {
      "from_type": "Book",
      "to_type": "Genre",
      "directed": false,
      "from_id": "101",
      "to_id": "fiction",
      "attributes": {},
      "e_type": "book_genre"
    },
    {
      "from_type": "Book",
      "to_type": "Genre",
      "directed": false,
      "from_id": "101",
      "to_id": "fantasy",
      "attributes": {},
      "e_type": "book_genre"
    },
    {
      "from_type": "Book",
      "to_type": "Genre",
      "directed": false,
      "from_id": "102",
      "to_id": "sciencevfiction",
      "attributes": {},
      "e_type": "book_genre"
    },
    {
      "from_type": "Book",
      "to_type": "Genre",
      "directed": false,
      "from_id": "101",
      "to_id": "young adult",
      "attributes": {},
      "e_type": "book_genre"
    },
    {
      "from_type": "Book",
      "to_type": "Genre",
      "directed": false,
      "from_id": "102",
      "to_id": "fiction",
      "attributes": {},
      "e_type": "book_genre"
    },
    {
      "from_type": "Book",
      "to_type": "Genre",
      "directed": false,
      "from_id": "102",
      "to_id": "Chinese",
      "attributes": {},
      "e_type": "book_genre"
    }
  ]}]
}

Two-Level Flatten Function

flatten( column_to_be_split, group_separator, sub_field_separator, number_of_sub_fields_in_one_group ) is used for parse a two-level group into individual elements. Each token in the main group may itself be a group, so there are two separators: one for the top level and one for the second level. An example is shown below.

101|"Harry Potter and the Philosopher's Stone"|"FIC:fiction,FTS:fantasy,YA:young adult"
102|"The Three-Body Problem"|"FIC:fiction,SF:science fiction,CHN:Chinese"

The flatten function now has four parameters instead of three. The additional parameter is used to record the genre_name in the Genre vertices.

CREATE LOADING JOB load_books_flatten2 FOR GRAPH Book_rating {
  DEFINE FILENAME f;
  LOAD f
      TO VERTEX Book VALUES ($0, $1, _),
      TO TEMP_TABLE t2(bookcode,genre_id,genre_name) VALUES ($0, flatten($2,",",":",2))
      USING QUOTE="double", SEPARATOR="|";
  
  LOAD TEMP_TABLE t2
      TO VERTEX Genre VALUES($"genre_id", $"genre_name"),
      TO EDGE book_genre VALUES($"bookcode", $"genre_id");
}
RUN LOADING JOB load_books_flatten2 USING f="book2.dat"

In this example, in the genres column ($2), there are multiple groups, and each group has two sub-fields, genre_id and genre_name. After running the loading job, the file book2.dat will be loaded into the TEMP_TABLE t2 as shown below.

Flatten a JSON Array of Primitive Values

flatten_json_array($" array_name ") parses a JSON array of primitive (string, numberic, or bool) values, where "array_name" is the name of the array. Each value in the array creates a record. Below is an example:

CREATE VERTEX encoding (PRIMARY_ID id STRING, length FLOAT default 10)
CREATE UNDIRECTED EDGE encoding_edge (FROM encoding, TO encoding)
CREATE GRAPH encoding_graph (*)

CREATE LOADING JOB json_flatten FOR GRAPH encoding_graph {
  LOAD "encoding2.json" TO TEMP_TABLE t2 (name, length)
    VALUES (flatten_json_array($"plug-ins"), $"indent":"length") USING JSON_FILE ="true";
  LOAD TEMP_TABLE t2
    TO VERTEX encoding VALUES ($"name", $"length");
}
RUN LOADING JOB json_flatten

{"plug-ins" : ["C", "c++"],"encoding" : "UTF-6","indent" : { "length" : 3, "use_space": false}}

The above data and loading job creates the following temporary table:

Flatten a JSON Array of JSON Objects

flatten_json_array ( $"array_name", $"sub_obj_1", $"sub_obj_2", ..., $"sub_obj_n" ) parses a JSON array of JSON objects. "array_name" is the name of the array, and the following parameters $"sub_obj_1", $"sub_obj_2", ..., $"sub_obj_n" are the field key names in each object in the array. See complete example below:

{"encoding":"UTF-1","indent":{"use_space": "dontloadme"}, "plug-ins" : [null, true, false, {"lang":"golang","prop":{"age":"noidea"}}]}
{"encoding": "UTF-8", "plug-ins" : [{"lang": "pascal", "score":"1.0", "prop":{"age":"old"}}, {"lang":"c++", "score":2.0}],"indent":{"length" :12,"use_space": true}}
{"encoding": "UTF-7",  "plug-ins" : [{"lang":"java", "score":2.22}, {"lang":"python", "score":3.0},{"lang":"go", "score":4.0, "prop":{"age":"new"}}],"indent" : { "length" : 30, "use_space": true }}
{"plug-ins" : ["C", "c++"],"encoding" : "UTF-6","indent" : { "length" : 3, "use_space": false}}

CREATE VERTEX encoding3 (PRIMARY_ID id STRING, score FLOAT default -1.0, age STRING default "Unknown", length INT default -1)
CREATE UNDIRECTED EDGE encoding3_edge (FROM encoding3, TO encoding3)
CREATE GRAPH encoding_graph (*)

CREATE LOADING JOB json_flatten_array FOR GRAPH encoding_graph {
  LOAD "encoding3.json" TO TEMP_TABLE t3 (name, score, prop_age, indent_length )
    VALUES (flatten_json_array($"plug-ins", $"lang", $"score", $"prop":"age"), $"indent":"length")
    USING JSON_FILE="true";
  LOAD TEMP_TABLE t3
    TO VERTEX encoding3 VALUES ($"name", $"score", $"prop_age", $"indent_length");
}
RUN LOADING JOB json_flatten_array

When splitting a JSON array of JSON objects, the primitive values are skipped and only JSON objects are processed. As in the example above, the 4th line's "plug-ins" field will not generate any record because its "plug-ins" array doesn't contain any JSON object. Any field which does not exist in the object will be loaded with default value. The above example generates the temporary table shown below:

Flatten a JSON Array in a CSV file

flatten_json_array() can also be used to split a column of a tabular file, where the column contains JSON arrays. An example is given below:

golang|{"prop":{"age":"noidea"}}
pascal|{"score":"1.0", "prop":{"age":"old"}}
c++|{"score":2.0, "indent":{"length":12, "use_space": true}}
java|{"score":2.22, "prop":{"age":"new"}, "indent":{"use_space":"true", "length":2}}
python|{ "prop":{"compiled":"false"}, "indent":{"length":4}, "score":3.0}
go|{"score":4.0, "prop":{"age":"new"}}

The second column in the csv file is a JSON array which we want to split. flatten_json_array() can be used in this case without the USING JSON_FILE="true" clause:

CREATE VERTEX encoding3 (PRIMARY_ID id STRING, score FLOAT default -1.0, age STRING default "Unknown", length INT default -1)
CREATE UNDIRECTED EDGE encoding3_edge (FROM encoding3, TO encoding3)
CREATE GRAPH encoding_graph (*)

CREATE LOADING JOB json_flatten_csv FOR GRAPH encoding_graph {
  LOAD "encoding.csv" TO TEMP_TABLE t4 (name, score, prop_age, indent_length )
    VALUES ($0,flatten_json_array($1, $"score", $"prop":"age", $"indent":"length"))
    USING SEPARATOR="|";
  LOAD TEMP_TABLE t4
    TO VERTEX encoding3 VALUES ($"name", $"score", $"prop_age", $"indent_length");
}
RUN LOADING JOB json_flatten_csv

The above example generates the temporary table shown below:

flatten_json_array in csv

flatten_json_array() does not work if the separator appears also within the json array column. For example, if the separator is comma, the csv loader will erroneously divide the json array into multiple columns. Therefore, it is recommended that the csv file use a special column separator, such as "|" in the above example .

DELETE statement

In addition to loading data, a LOADING JOB can be used to perform the opposite operation: deleting vertices and edges, using the DELETE statement. DELETE cannot be used in offline loading. Just as a LOAD statement uses the tokens from each input line to set the id and attribute values of a vertex or edge to be created, a DELETE statement uses the tokens from each input line to specify the id value of the item(s) to be deleted.

In the v2.0 syntax, there is now a " FROM (filepath_string | filevar) " clause just before the WHERE clause.

There are four variations of the DELETE statement. The syntax of the four cases is shown below.

CREATE LOADING JOB abc FOR GRAPH graph_name {
  DEFINE FILENAME f;
  # 1. Delete each vertex which has the given vertex type and primary id.
  DELETE VERTEX vertex_type_name (PRIMARY_ID id_expr) FROM f [WHERE condition] ;

  # 2. Delete each edge which has the given edge type, source vertex id, and destination vertex id.
  DELETE EDGE edge_type_name (FROM id_expr, TO id_expr) FROM f [WHERE condition] ;

  # 3. Delete all edges which have the given edge type and source vertex id. (Destination vertex id is left open.)
  DELETE EDGE edge_type_name (FROM id_expr) FROM f [WHERE condition] ;

  # 4. Delete all edges which have the given source vertex id. (Edge type and destination vertex id are left open.)
  DELETE EDGE * (FROM id_expr vertex_type_name) FROM f [WHERE condition] ;
}

An example using book_rating data is shown below:

# Delete all user occupation edges if the user is in the new files, then load the new files
CREATE LOADING JOB clean_user_occupation FOR GRAPH Book_rating {
  DEFINE FILENAME f;
  DELETE EDGE user_occupation (FROM $0) FROM f;
}
CREATE LOADING JOB load_user_occupation FOR GRAPH Book_rating {
  DEFINE FILENAME f;
  LOAD f TO EDGE user_occupation VALUES ($0,$1);
}
RUN LOADING JOB clean_user_occupation USING f="./data/user_occupation_update.dat"
RUN LOADING JOB load_user_occupation USING f="./data/user_occupation_update.dat"

There is a separate DELETE statement in the GSQL Query Language. The query delete statement can leverage the query language's ability to explore the graph and to use complex conditions to determine which items to delete. In contrast, the loading job delete statement requires that the id values of the items to be deleted must be specified in advance in an input file.

offline2online Job Conversion (DEPRECATED)

offline2online <offline_job_name>

The gsql command offline2online converts an installed offline loading job to an equivalent online loading job or set of jobs.

Online Job Names

An offline loading job contains one or more LOAD statements, each one specifying the name of an input data file. The offline2online will convert each LOAD statement into a separate online loading job. The data filename will be appended to the offline job name, to create the new online job name. For example, if the offline job has this format:

CREATE LOADING JOB loadEx FOR GRAPH graphEx {
  LOAD "fileA" TO ...
  LOAD "fileB" TO ...
}

then running the GSQL command offline2online loadEx will create two new online loading jobs, called loadEx_fileA and loadEx_fileB . The converted loading jobs are installed in the GSQL system; they are not available as text files. However, if there are already jobs with these names, then a version number will be appended: first "_1", then "_2", etc.

For example, if you were to execute offline2online loadEx three times, this would generate the following online jobs:

1st time: loadEx_fileA, loadEx_fileB
2nd time: loadEx_fileA_1, loadEx_fileB_1
3rd time: loadEx_fileA_2, loadEx_fileB_2

Conversion and RUN JOB Details

Some parameters of a loading job which are built in to offline loading jobs instead cannot be included in online jobs:

input data filename
SEPARATOR
HEADER

Instead, they should be provided when running the loading job. However, online jobs do not have full support for HEADER.

When running any online loading job, the input data filename and the separator character must be provided. See sections on the USING clause and Running a Loading Job for more details.

If an online loading job is run with the HEADER="true" option, it will skip the first line in the data file, but it will not read that line to get the column names. Therefore, offline jobs which read and use column header names must be manually converted to online jobs.

The following example is taken from the Social Network case in the GSQL Tutorial with Real-Life Examples . In version 0.2 of the tutorial, we used offline loading. The job below uses the same syntax as v0.2, but some names have been updated:

CREATE LOADING JOB load_social FOR GRAPH gsql_demo
{
  LOAD "data/social_users.csv"
    TO VERTEX SocialUser VALUES ($0,$1,$2,$3)
  USING QUOTE="double", SEPARATOR=",", HEADER="true";

  LOAD "data/social_connection.csv"
    TO EDGE SocialConn VALUES ($0, $1)
    USING SEPARATOR=",", HEADER="false";
}

To run, this job:

RUN LOADING JOB load_social

Note that the first LOAD statement has HEADER="true", but is does not make use of column names. It simply uses column indices $0, $1, $2, and $3. Therefore, the HEADER option can still be used with the converted job. Running offline2online load_social1 , creates two new jobs called load_social_social_users.csv and load_social_social_connection.csv.

The equivalent run commands for the jobs are the following:

RUN LOADING JOB load_social_social_users.csv USING FILENAME="data/social_users.csv", SEPARATOR=",", EOL="\n", HEADER="true"
RUN LOADING JOB load_social_social_connection.csv USING FILENAME="data/social_connection.csv", SEPARATOR=",", EOL="\n"

For comparison, here is the online loading job in the current version of the Tutorial and its loading commands:

CREATE LOADING JOB load_social1 FOR GRAPH gsql_demo
{
  LOAD
    TO VERTEX SocialUser VALUES ($0,$1,$2,$3) USING QUOTE="double";
}
CREATE LOADING JOB load_social2 FOR GRAPH gsql_demo {
  LOAD
    TO EDGE SocialConn VALUES ($0, $1);
}
# load the data
RUN JOB load_social1 USING FILENAME="../social/data/social_users.csv", SEPARATOR=",", EOL="\n", HEADER="true"
RUN JOB load_social2 USING FILENAME="../social/data/social_connection.csv", SEPARATOR=",", EOL="\n"

Part 1 - Data Definition & Loading

Defining a Graph Schema

CREATE VERTEX

Keys and Attributes

PRIMARY_ID and WITH primary_id_as_attribute

PRIMARY KEY

COMPOSITE KEY

Vertex Attribute List

WITH STATS

CREATE EDGE

Creating an Edge from or to Any Vertex Type

WITH REVERSE_EDGE

Special Options

Sharing a Compression Dictionary

CREATE GRAPH

USE GRAPH

DROP GRAPH

SHOW - View Parts of the Catalog

Modifying a Graph Schema

Global vs. Local Schema Changes

CREATE SCHEMA_CHANGE JOB (local)

ADD VERTEX | EDGE (local)

ALTER VERTEX | EDGE

ALTER ... ADD

ALTER ... DROP

DROP VERTEX | EDGE (local)

DROP TUPLE

RUN SCHEMA_CHANGE JOB

DROP SCHEMA_CHANGE JOB

USE GLOBAL

CREATE GLOBAL SCHEMA_CHANGE JOB

ADD VERTEX | EDGE (global)

ALTER VERTEX | EDGE

ALTER ... ADD

ALTER ... DROP

DROP VERTEX | EDGE (global)

RUN GLOBAL SCHEMA_CHANGE JOB

DROP GLOBAL SCHEMA_CHANGE JOB

Running a Loading Job

Clearing and Initializing the Graph Store

CLEAR GRAPH STORE

DROP ALL

Running a Loading Job

RUN LOADING JOB

Options

Running Loading Jobs as REST Requests

Inspecting and Managing Loading Jobs

Job ID and Status

SHOW LOADING STATUS

ABORT LOADING JOB

RESUME LOADING JOB

Verifying and Debugging a Loading Job

Appendix

Keywords & Reserved Words

GSQL Start-to-End Process and Data Flow

Keywords & Reserved Words

GSQL Start-to-End Process and Data Flow

Modifying a Graph Schema

Global vs. Local Schema Changes

CREATE SCHEMA_CHANGE JOB (local)

ADD VERTEX | EDGE (local)

ALTER VERTEX | EDGE

ALTER ... ADD

ALTER ... DROP

DROP VERTEX | EDGE (local)

DROP TUPLE

RUN SCHEMA_CHANGE JOB

DROP SCHEMA_CHANGE JOB

USE GLOBAL

CREATE GLOBAL SCHEMA_CHANGE JOB

ADD VERTEX | EDGE (global)

ALTER VERTEX | EDGE

ALTER ... ADD

ALTER ... DROP

DROP VERTEX | EDGE (global)

RUN GLOBAL SCHEMA_CHANGE JOB

DROP GLOBAL SCHEMA_CHANGE JOB

Defining a Graph Schema

CREATE VERTEX

Keys and Attributes