1 of 10

Part 1 - Data Definition & Loading

This work is licensed under a Creative Commons Attribution 4.0 International License.

Modifying a Graph Schema

After a graph schema has been created , it can be modified. Data already stored in the graph and which is not logically part of the change will be retained. For example, if you had 100 Book vertices and then added an attribute to the Book schema, you would still have 100 Books, with default values for the new attribute. If you dropped a Book attribute, you still would have all your books, but one attribute would be gone.

To safely update the graph schema, the user should follow this procedure:

Create a SCHEMA_CHANGE JOB, which defines a sequence of ADD, ALTER and/or DROP statements.
Run the SCHEMA_CHANGE JOB (i.e. RUN SCHEMA_CHANGE JOB job_name), which will do the following:
- Attempt the schema change.
- If the change is successful, invalidate any loading job or query definitions which are incompatible with the new schema.
- if the change is unsuccessful, report the failure and return to the state before the attempt.

A schema change will invalidate any loading jobs or query jobs which relate to an altered part of the schema. Specifically:

A loading job becomes invalid if it refers to a vertex or and an edge which has been dropped (deleted) or altered .
A query becomes invalid if it refers to a vertex, and edge, or an attribute which has been dropped .

Invalid loading jobs are dropped, and invalid queries are uninstalled. After the schema update, the user will need to create and install new load and query jobs based on the new schema.

Jobs and queries for unaltered parts of the schema will still be available and do not need to be reinstalled. However, even though these jobs are valid (e.g., they can be run), the user may wish to examine whether they still perform the preferred operations (e.g., do you want to run them?)

Load or query operations which begin before the schema change will be completed based on the pre-change schema. Load or query operations which begin after the schema change, and which have not been invalidated, will be completed based on the post-change schema.

Global vs. Local Schema Changes

Only a superuser or globaldesigner can add, alter, or drop global vertex types or global edge types, which are those that are created using CREATE VERTEX or CREATE ... EDGE. This rule applies even if the vertex or edge type is used in only one graph. To make these changes, the user uses a GLOBAL SCHEMA_CHANGE JOB.

An admin or designer user can add, alter, or drop local vertex types or local edge types which are created in the context of that graph. Local vertex and edge types are created using an ADD statement inside a SCHEMA_CHANGE JOB. To alter or drop any of these local types, the admin user uses a regular SCHEMA_CHANGE JOB.

Local graphs can define vertex and edge types independently of the vertex and edge types in other graph. That is, the same name can be used in different graphs for (different) vertex or edge types.

It is even permitted for a local graph and the global graph to use the same name for their own vertex or edge types, as long as the global vertex/edge type is not used within the local graph.

The two types of schema change jobs are described below.

`CREATE SCHEMA_CHANGE JOB` (local)

The CREATE SCHEMA_CHANGE JOB block defines a sequence of ADD, ALTER, and DROP statements for changing a particular graph. It does not perform the schema change.

CREATE SCHEMA_CHANGE JOB syntax

CREATE SCHEMA_CHANGE JOB job_name FOR GRAPH graph_name {
    [sequence of DROP, ALTER, and ADD statements, each line ending with a semicolon]
}

One use of CREATE SCHEMA_CHANGE JOB is to define an additional vertex type and edge type to be the structure for a secondary index. For example, if you wanted to index the postalCode attribute of the User vertex, you could create a postalCode_idx (PRIMARY_ID id string, code string) vertex type and hasPostalCode (FROM User, TO postalCode_idx) edge type. Then create an index structure having one edge from each User to a postalCode_idx vertex.

By its nature, a SCHEMA_CHANGE JOB may contain multiple statements. If the job block is used in the interactive GSQL shell, then the BEGIN and END commands should be used to permit the SCHEMA_CHANGE JOB to be entered on several lines. if the job is stored in a command file to be read in batch mode, then BEGIN and END are not needed.

Remember to include a semicolon at the end of each DROP, ALTER, or ADD statement within the JOB block.

If a SCHEMA_CHANGE JOB defines a new edge type which connects to a new vertex type, the ADD VERTEX statement should precede the related ADD EDGE statement. However, the ADD EDGE and ADD VERTEX statements can be in the same SCHEMA_CHANGE JOB.

`ADD VERTEX | EDGE` (local)

The ADD statement defines a new type of vertex or edge and automatically adds it to a graph schema. The syntax for the ADD VERTEX | EDGE statement is analogous to that of the CREATE VERTEX | EDGE | GRAPH statements. It may only be used within a SCHEMA_CHANGE JOB.

ADD VERTEX / UNDIRECTED EDGE / DIRECTED EDGE

ADD VERTEX v_type_name (PRIMARY_ID id type [',' attribute_list]) [WITH STATS '=' "none"|"outdegree_by_edgetype"]; 
ADD UNDIRECTED EDGE e_type_name (FROM v_type_name',' TO v_type_name [',' edge_attribute_list])
ADD DIRECTED EDGE e_type_name (FROM v_type_name',' TO v_type_name [',' edge_attribute_list])
    [WITH REVERSE_EDGE '=' "rev_name"];

`ALTER VERTEX | EDGE`

The ALTER statement is used to add attributes to or remove attributes from an existing vertex type or edge type. It can also be used to add or remove source (FROM) vertex types or destination (TO) vertex types of an edge type. It may only be used within a SCHEMA_CHANGE JOB. The basic format is as follows:

ALTER VERTEX / EDGE

ALTER VERTEX|EDGE object_type_name ADD|DROP (attribute_list);

`ALTER ... ADD`

Added attributes are appended to the end of the schema. The new attributes may include DEFAULT fields. To add attributes to a vertex type, the syntax is as follows:

ALTER VERTEX vertex_type_name ADD
    ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* );

For example:

ALTER VERTEX Company ADD ATTRIBUTE (industry
STRING, marketcap DOUBLE)

To add to an edge's endpoint vertex types or attributes, the syntax is as follows:

ALTER EDGE edge_type_name ADD 
    [FROM (vertex_type_name [','vertex_type_name])]
    [TO (vertex_type_name [','vertex_type_name])]
    [ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* )];

For example:

ALTER EDGE Like ADD TO (Animal) ATTRIBUTE (suggested_by STRING)

`ALTER ... DROP`

The syntax for ALTER ... DROP is analogous to that of ALTER ... ADD.

ALTER ... DROP

ALTER VERTEX|EDGE object_type_name DROP ATTRIBUTE (
      attribute_name [',' attribute_name]* );

`ALTER VERTEX ... WITH` (Beta)

The statement ALTER VERTEX WITH TAGGABLE is used to mark a vertex type as taggable or untaggable. Vertex types are untaggable by default. When a vertex type is marked as taggable, the vertex type can be used to create a tag-based graph. Additionally, users with the tag-access privilege can tag vertices whose vertex type is marked as taggable.

ALTER VERTEX WITH TAGGABLE

ALTER VERTEX v_type_name WITH TAGGABLE = ("true" | "false")

`DROP VERTEX | EDGE` (local)

The DROP statement removes the specified vertex type or edge type from the database dictionary. The DROP statement should only be used when graph operations are not in progress.

drop vertex / edge

DROP VERTEX v_type_name [',' v_type_name]*
DROP EDGE e_type_name [',' e_type_name]*

`DROP TUPLE`

For tuples that are defined within a graph schema, you can drop them by using the following command.

drop tuple

DROP TUPLE tuple_name [',' tuple_name]*

`ADD TAG`

ADD TAG defines a tag for the graph. Tags can be used to create tag-based graphs, allowing for finer grain access control.

Syntax for ADD TAG

ADD TAG <tag_name> [DESCRIPTION <tag_description>]

`DROP TAG`

DROP TAG drops a tag or multiple tags from the schema, and deletes the tag from each vertex to which it is attached. DROP TAG cannot be run if the tag to be dropped is used in the definition of a tag-based graph; the graph must be dropped first.

Syntax for DROP TAG

DROP TAG <tag_name> ["," <tag_name>]*

`RUN SCHEMA_CHANGE JOB`

RUN SCHEMA_CHANGE JOB job_name performs the schema change job. After the schema has been changed, the GSQL system checks all existing GSQL queries. If an existing GSQL query uses a dropped vertex, edge, or attribute, the query becomes invalid, and GSQL will show the message "Query query_name becomes invalid after schema update, please update it.".

Below is an example. The schema change job add_reviews adds a Review vertex type and two edge types to connect reviews to users and books, respectively.

SCHEMA_CHANGE JOB example

USE GRAPH Book_rating
CREATE SCHEMA_CHANGE JOB add_reviews FOR GRAPH Book_rating {
    ADD VERTEX Review (PRIMARY_ID id UINT, review_date DATETIME, url STRING);
    ADD UNDIRECTED EDGE wrote_review (FROM User, TO Review);
    ADD UNDIRECTED EDGE review_of_book (FROM Review, TO Book);
}
RUN SCHEMA_CHANGE JOB add_reviews

`DROP SCHEMA_CHANGE JOB`

To drop (remove) a schema change job, run DROP JOB schema_change_job name from the GSQL shell. The specific schema change job will be removed from GSQL. Refer to the Creating a Loading Job page for more information about dropping jobs.

GSQL > USE GRAPH Book_rating
GSQL > DROP JOB local_schema_change123
The job local_schema_change_change123 is dropped!

`USE GLOBAL`

The USE GLOBAL command changes a superuser's mode to Global mode. In global mode, a superuser can define or modify global vertex and edge types, as well as specifying which graphs use those global types. For example, the user should run USE GLOBAL before creating or running a GLOBAL SCHEMA_CHANGE JOB.

`CREATE GLOBAL SCHEMA_CHANGE JOB`

The CREATE GLOBAL SCHEMA_CHANGE JOB block defines a sequence of ADD, ALTER, and DROP statements that modify either the attributes or the graph membership of global vertex or edge types. Unlike the non-global schema change job, the header does not include a graph name. However, the ADD/ALTER/DROP statements in the body do mention graphs.

CREATE GLOBAL SCHEMA_CHANGE JOB syntax

CREATE GLOBAL SCHEMA_CHANGE JOB job_name {
    [sequence of global DROP, ALTER, and ADD statements, each line ending with a semicolon]
}

Although both global and local schema change jobs have ADD and DROP statements, they have different meanings. The table below outlines the differences.

Remember to include a semicolon at the end of each DROP, ALTER, or ADD statement within the JOB block.

`ADD VERTEX | EDGE` (global)

The ADD statement adds existing global vertex or edge types to one of the graphs.

ADD VERTEX / UNDIRECTED EDGE / DIRECTED EDGE (Global)

ADD VERTEX v_type_name [',' v_type_name...] TO GRAPH gname;
ADD EDGE e_type_name [',' e_type_name...] TO GRAPH gname;

`ALTER VERTEX | EDGE`

The ALTER statement adds attributes to or remove attributes from an existing vertex type or edge type. It may only be used within a schema change job. The basic format is as follows:

ALTER VERTEX / EDGE

ALTER VERTEX|EDGE object_type_name ADD|DROP (attribute_list);

`ALTER ... ADD`

Added attributes are appended to the end of the schema. The new attributes may include default fields. To add attributes to a vertex type, the syntax is as follows:

ALTER VERTEX vertex_type_name ADD
    ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* );

For example:

ALTER VERTEX Company ADD ATTRIBUTE (industry
STRING, marketcap DOUBLE)

To add to an edge's endpoint vertex types or attributes, the syntax is as follows:

ALTER EDGE edge_type_name ADD 
    [ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* )];

`ALTER ... DROP`

The syntax for ALTER ... DROP is analogous to that of ALTER ... ADD.

ALTER ... DROP

ALTER VERTEX|EDGE object_type_name DROP ATTRIBUTE (
      attribute_name [',' attribute_name]* );

`ALTER VERTEX ... WITH` (Beta)

The statement ALTER VERTEX WITH TAGGABLE is used to mark a vertex type as taggable or untaggable. Vertex types are untaggable by default. When a vertex type is marked as taggable, the vertex type can be used to create a tag-based graph. Additionally, users with the tag-access privilege can tag vertices whose vertex type is marked as taggable.

ALTER VERTEX WITH TAGGABLE

ALTER VERTEX v_type_name WITH TAGGABLE = ("true" | "false")

`DROP VERTEX | EDGE` (global)

The DROP statement removes specified global vertex or edge types from one of the graphs. The command does not delete any data.

drop vertex / edge

DROP VERTEX v_type_name [',' v_type_name...] FROM GRAPH gname;
DROP EDGE e_type_name   [',' e_type_name...] FROM GRAPH gname;

`RUN GLOBAL SCHEMA_CHANGE JOB`

RUN GLOBAL SCHEMA_CHANGE JOB job_name performs the global schema change job. After the schema has been changed, the GSQL system checks all existing GSQL queries. If an existing GSQL query uses a dropped vertex, edge, or attribute, the query becomes invalid, and GSQL will show the message "Query query_name becomes invalid after schema update, please update it.".

Below is an example. The schema change alter_friendship_make_library drops the on_date attribute from the friend_of edge and adds Book type to the library graph.

GLOBAL SCHEMA_CHANGE JOB example

USE GLOBAL
CREATE GRAPH library()
CREATE GLOBAL SCHEMA_CHANGE JOB alter_friendship_make_library {
    ALTER EDGE friend_of DROP ATTRIBUTE (on_date);
    ADD VERTEX Book TO GRAPH library;
}
RUN GLOBAL SCHEMA_CHANGE JOB alter_friendship_make_library

`DROP GLOBAL SCHEMA_CHANGE JOB`

Global schema change jobs can be dropped by using the DROP JOB command. Refer to the Creating a Loading Job page for more information about dropping jobs.

DROP GLOBAL SCHEMA_CHANGE JOB example

USE GLOBAL
DROP JOB alter_friendship_make_library

`DROP ALL`

The DROP ALL command clears all graph data, all graph schemas, all loading jobs, and all queries. It should only be used when the intent is to erase an entire database design and to start over.

This command is only available to superusers and only when they are in global mode.

Running a Loading Job

Clear and initialize graph store

There are two aspects to clearing the system: flushing the data and clearing the schema definitions in the catalog. Two different commands are available.

CLEAR GRAPH STORE

Available only to superusers.

The CLEAR GRAPH STORE command flushes all the data out of the graph store (database). By default, the system will ask the user to confirm that you really want to discard all the graph data. To force the clear operation and bypass the confirmation question, use the -HARD option, e.g.,

CLEAR GRAPH STORE -HARD

Clearing the graph store does not affect the schema.

Use the -HARD option with extreme caution. There is no undo option. -HARD must be in all capital letters.
CLEAR GRAPH STORE stops all the TigerGraph servers (GPE, GSE, RESTPP, Kafka, and Zookeeper).
Loading jobs and queries are aborted.

DROP ALL clears both the data and the schema.

Run a loading job

Running a loading job executes a previously installed loading job. The job reads lines from an input source, parses each line into data tokens, and applies loading rules and conditions to create new vertex and edge instances to store in the graph data store. The input sources could be defined in the load job or could be provided when running the job. Additionally, loading jobs can also be run by submitted an HTTP request to the REST++ server.

RUN LOADING JOB

RUN LOADING JOB [-noprint] [-dryrun] [-n [i],j] jobname [
    USING filevar [="filepath_string"][, filevar [="filepath_string"]]*
    [, CONCURRENCY="cnum"][,BATCH_SIZE="bnum"]
]

When a concurrent loading job is submitted, it is assigned a job ID number, which is displayed on the GSQL console. The user can use this job ID to refer to the job, for a status update, to abort the job, or to re-start the job. These operations are described later in this section.

Options

`-noprint`

By default, the command will print several lines of status information while the loading is running. If the -noprint option is included, the output will omit the progress and summary details, but it will still display the job id and the location of the log file.

Example of minimal output when -noprint option is used

Kick off the following job:
  JobName: load_videoE, jobid: gsql_demo_m1.1525091090494
  Loading log: '/usr/local/tigergraph/logs/restpp/restpp_loader_logs/gsql_demo/gsql_demo_m1.1525091090494.log'

`-dryrun`

If -dryrun is used, the system will read the data files and process the data as instructed by the job, but will NOT load any data into the graph. This option can be a useful diagnostic tool.

`-n [i], j`

The -n option limits the loading job to processing only a range of lines of each input data file. The -n flag accepts one or two arguments. For example, -n 50 means read lines 1 to 50. -n 10, 50 means read lines 10 to 50. The special symbol $ is interpreted as "last line", so -n 10,$ means reads from line 10 to the end.

Parameters

Below are the parameters available for the RUN QUERY command introduced by the USING clause.

`filevar` list

The optional USING clause may contain a list of file variables. Each file variable may optionally be assigned a filepath_string, obeying the same format as in the CREATE LOADING JOB. This list of file variables determines which parts of a loading job are run and what data files are used.

When a loading job is compiled, it generates one RESTPP endpoint for each filevar and filepath_string. As a consequence, a loading job can be run in parts. When RUN LOADING JOB is executed, only those endpoints whose filevar or file identifier (" __GSQL_FILENAME_n__") is mentioned in the USING clause will be used. However, if the USING clause is omitted, then the entire loading job will be run.
If a filepath_string is given, it overrides the filepath_string defined in the loading job. If a particular filevar is not assigned a filepath_string either in the loading job or in the RUN LOADING JOB statement, then an error is reported and the job exits.

`CONCURRENCY`

The CONCURRENCY parameter sets the maximum number of outstanding requests that the loading job may send to the graph processing engine (GPE). The default value is 256.

For example if CONCURRENCY is set to 256, when the loader sends 256 requests to the GPE for processing and the GPE finishes processing the first one before all 256 arrive, the loader will send a new batch for processing. If all 256 of those requests are received and none are finished processing, then the Kafka loader will stop sending additional batches until one of them is processed.

`BATCH_SIZE`

The BATCH_SIZE parameter sets the number of data lines included in each request sent to the GPE. The default value is 8192.

Running Loading Jobs as REST Requests

Another way to run a loading job is through the POST /ddl/{graph_name} endpoint of the REST++ server. Since the REST++ server has more direct access to the graph processing engine, this can execute more quickly than a RUN LOADING JOB statement in GSQL. For details on how to use the endpoint, please see Run a loading job.

Inspect and manage loading jobs

Starting with v2.0, there are now commands to check loading job status, abort a loading job and, restart a loading job.

Job ID and Status

When a loading job starts, the GSQL server assigns it a job id and displays it for the user to see. The job id format is typically the name of the graph, followed by the machine alias, following by a code number, e.g., gsql_demo_m1.1525091090494

Example of SHOW LOADING STATUS output

Kick off the following job, i.e.
  JobName: load_test1, jobid: demo_graph_m1.1523663024967
  Loading log: '/home/tigergraph/tigergraph/logs/restpp/restpp_loader_logs/demo_graph/demo_graph_m1.1523663024967.log'

Job "demo_graph_m1.1523663024967" loading status
 
[RUNNING] m1 ( Finished: 3 / Total: 4 )
  [LOADING] /data/output/company.data
  [=============                        ]  20%, 200 kl/s
  [LOADED]
  +-------------------------------------------------------------------+
  |               FILENAME |   LOADED LINES |   AVG SPEED |   DURATION|
  | /data/output/movie.dat |            100 |     100 l/s |     1.00 s|
  |/data/output/person.dat |            100 |     100 l/s |     1.00 s|
  | /data/output/roles.dat |            200 |     200 l/s |     1.00 s|
  +-------------------------------------------------------------------+
[RUNNING] m2 ( Finished: 1 / Total: 2 )
  [LOADING] /data/output/company.data
  [==========================           ]  60%, 200 kl/s
  [LOADED]
  +-------------------------------------------------------------------+
  |               FILENAME |   LOADED LINES |   AVG SPEED |   DURATION|
  | /data/output/movie.dat |            100 |     100 l/s |     1.00 s|
  +-------------------------------------------------------------------+

By default, an active loading job will display periodic updates of its progress. There are two ways to inhibit these automatic output displays:

Run the loading job with the -noprint option.
After the loading job has started, enter CTRL+C. This will abort the output display process, but the loading job will continue.

SHOW LOADING STATUS

The command SHOW LOADING JOB shows the current status of either a specified loading job or all current jobs:

SHOW LOADING JOB syntax

SHOW LOADING STATUS job_id|ALL

The display format is the same as that displayed during the periodic progress updates of the RUN LOADING JOB command. If you do not know the job id, but you know the job name and possibly the machine, then the ALL option is a handy way to see a list of active job ids.

ABORT LOADING JOB

The command ABORT LOADING JOB aborts either a specified load job or all active loading jobs:

ABORT LOADING JOB syntax

ABORT LOADING JOB job_id|ALL

The output will show a summary of aborted loading jobs.

ABORT LOADING JOB example

gsql -g demo_graph "abort loading job all"

Job "demo_graph_m1.1519111662589" loading status
[ABORT_SUCCESS] m1
[SUMMARY] Finished: 0 / Total: 2
  +--------------------------------------------------------------------------------------+
  |                  FILENAME |   LOADED LINES |   AVG SPEED  |   DURATION |   PERCENTAGE|
  | /home/tigergraph/data.csv |       23901701 |     174 kl/s |   136.83 s |         65 %|
  |/home/tigergraph/data1.csv |              0 |        0 l/s |     0.00 s |          0 %|
  +--------------------------------------------------------------------------------------+
 
Job "demo_graph_m2.1519111662615" loading status
[ABORT_SUCCESS] m2
[SUMMARY] Finished: 0 / Total: 2
  +--------------------------------------------------------------------------------------+
  |                  FILENAME |   LOADED LINES |   AVG  SPEED |   DURATION |   PERCENTAGE|
  | /home/tigergraph/data.csv |       23860559 |     175 kl/s |   136.23 s |         65 %|
  |/home/tigergraph/data1.csv |              0 |        0 l/s |     0.00 s |          0 %|
  +--------------------------------------------------------------------------------------+

RESUME LOADING JOB

The command RESUME LOADING JOB will restart a previously-run job which ended for some reason before completion.

RESUME LOADING JOB syntax

RESUME LOADING JOB job_id

If the job is finished, this command will do nothing. The RESUME command should pick up where the previous run ended; that is, it should not load the same data twice.

RESUME LOADING JOB example

gsql -g demo_graph "RESUME LOADING JOB demo_graph_m1.1519111662589"
[RESUME_SUCCESS] m1
[MESSAGE] The current job got resummed

Verify and debug a loading job

Every loading job creates a log file. When the job starts, it will display the location of the log file. Typically, the file is located at

<TigerGraph.root.dir>/logs/restpp/restpp_loader_logs/<graph_name>/<job_id>.log

This file contains the following information which most users will find useful:

A list of all the parameter and option settings for the loading job
A copy of the status information that is printed
Statistics report on the number of lines successfully read and parsed

The statistics report include how many objects of each type is created, and how many lines are invalid due to different reasons. This report also shows which lines cause the errors. Here is the list of statistics shown in the report. There are two types of statistics. One is file level (the number of lines), and the other is data object level (the number of objects). If an file level error occurs, e.g., a line does not have enough columns, this line of data is skipped for all LOAD statements in this loading job. If an object level error or failed condition occurs, only the corresponding object is not created, i.e., all other objects in the same loading job are still created if no object level error or failed condition for each corresponding object.

Note that failing a WHERE clause is not necessarily a bad result. If the user's intent for the WHERE clause is to select only certain lines, then it is natural for some lines to pass and some lines to fail.

Below is an example.

CREATE VERTEX movie (PRIMARY_ID id UINT, title STRING, country STRING COMPRESS, year UINT)
CREATE DIRECTED EDGE sequel_of (FROM movie, TO movie)
CREATE GRAPH movie_graph(*)
CREATE LOADING JOB load_movie FOR GRAPH movie_graph{
  DEFINE FILENAME f
  LOAD f TO VERTEX movie VALUES ($0, $1, $2, $3) WHERE to_int($3) < 2000;
}
RUN LOADING JOB load_movie USING f="movie.dat"

movie.dat

0,abc,USA,-1990
1,abc,CHN,1990
2,abc,CHN,1990
3,abc,FRA,2015
4,abc,FRA,2005
5,abc,USA,1990
6,abc,1990

The above loading job and data generate the following report

load_output.log (tail)

--------------------Statistics------------------------------
Valid lines:             6
Reject lines:            0
Invalid Json format:     0
Not enough token:        1 [ERROR] (e.g. 7)
Oversize token:          0

Vertex:                  movie
Valid Object:            3
No ID found:             0
Invalid Attributes:      1 [ERROR] (e.g. 1:year)
Invalid primary id:      0
Incorrect fixed
binary length:           0
Passed condition lines:  4
Failed condition lines:  2 (e.g. 4,5)

There are a total of 7 data lines. The report shows that

Six of the lines are valid data lines
One line (Line 7) does not have enough tokens.

Of the 6 valid lines,

Three of the 6 valid lines generate valid movie vertices.
One line has an invalid attribute (Line 1: year)
Two lines (Lines 4 and 5) do not pass the WHERE clause.

Appendix

GSQL Start-to-End Process and Data Flow

The figures below illustrates the sequence of steps and the dependencies to progress from no graph to a loaded graph and a query result, for TigerGraph platform version 0.8 and higher. Note that online and offline follow the same flow.

Modifying a Graph Schema

To safely update the graph schema, the user should follow this procedure:

Create a SCHEMA_CHANGE JOB, which defines a sequence of ADD, ALTER and/or DROP statements.
Run the SCHEMA_CHANGE JOB (i.e. RUN SCHEMA_CHANGE JOB job_name), which will do the following:
- Attempt the schema change.
- If the change is successful, invalidate any loading job or query definitions which are incompatible with the new schema.
- if the change is unsuccessful, report the failure and return to the state before the attempt.

A schema change will invalidate any loading jobs or query jobs which relate to an altered part of the schema. Specifically:

A loading job becomes invalid if it refers to a vertex or and an edge which has been dropped (deleted) or altered .
A query becomes invalid if it refers to a vertex, and edge, or an attribute which has been dropped .

Invalid loading jobs are dropped, and invalid queries are uninstalled. After the schema update, the user will need to create and install new load and query jobs based on the new schema.

Global vs. Local Schema Changes

Local graphs can define vertex and edge types independently of the vertex and edge types in other graph. That is, the same name can be used in different graphs for (different) vertex or edge types.

It is even permitted for a local graph and the global graph to use the same name for their own vertex or edge types, as long as the global vertex/edge type is not used within the local graph.

The two types of schema change jobs are described below.

`CREATE SCHEMA_CHANGE JOB` (local)

The CREATE SCHEMA_CHANGE JOB block defines a sequence of ADD, ALTER, and DROP statements for changing a particular graph. It does not perform the schema change.

CREATE SCHEMA_CHANGE JOB syntax

CREATE SCHEMA_CHANGE JOB job_name FOR GRAPH graph_name {
    [sequence of DROP, ALTER, and ADD statements, each line ending with a semicolon]
}

Remember to include a semicolon at the end of each DROP, ALTER, or ADD statement within the JOB block.

`ADD VERTEX | EDGE` (local)

ADD VERTEX / UNDIRECTED EDGE / DIRECTED EDGE

ADD VERTEX v_type_name (PRIMARY_ID id type [',' attribute_list]) [WITH STATS '=' "none"|"outdegree_by_edgetype"]; 
ADD UNDIRECTED EDGE e_type_name (FROM v_type_name',' TO v_type_name [',' edge_attribute_list])
ADD DIRECTED EDGE e_type_name (FROM v_type_name',' TO v_type_name [',' edge_attribute_list])
    [WITH REVERSE_EDGE '=' "rev_name"];

`ALTER VERTEX | EDGE`

ALTER VERTEX / EDGE

ALTER VERTEX|EDGE object_type_name ADD|DROP (attribute_list);

`ALTER ... ADD`

Added attributes are appended to the end of the schema. The new attributes may include DEFAULT fields. To add attributes to a vertex type, the syntax is as follows:

ALTER VERTEX vertex_type_name ADD
    ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* );

For example:

ALTER VERTEX Company ADD ATTRIBUTE (industry
STRING, marketcap DOUBLE)

To add to an edge's endpoint vertex types or attributes, the syntax is as follows:

ALTER EDGE edge_type_name ADD 
    [FROM (vertex_type_name [','vertex_type_name])]
    [TO (vertex_type_name [','vertex_type_name])]
    [ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* )];

For example:

ALTER EDGE Like ADD TO (Animal) ATTRIBUTE (suggested_by STRING)

`ALTER ... DROP`

The syntax for ALTER ... DROP is analogous to that of ALTER ... ADD.

ALTER ... DROP

ALTER VERTEX|EDGE object_type_name DROP ATTRIBUTE (
      attribute_name [',' attribute_name]* );

`ALTER VERTEX ... WITH` (Beta)

ALTER VERTEX WITH TAGGABLE

ALTER VERTEX v_type_name WITH TAGGABLE = ("true" | "false")

`DROP VERTEX | EDGE` (local)

The DROP statement removes the specified vertex type or edge type from the database dictionary. The DROP statement should only be used when graph operations are not in progress.

drop vertex / edge

DROP VERTEX v_type_name [',' v_type_name]*
DROP EDGE e_type_name [',' e_type_name]*

`DROP TUPLE`

For tuples that are defined within a graph schema, you can drop them by using the following command.

drop tuple

DROP TUPLE tuple_name [',' tuple_name]*

`ADD TAG`

ADD TAG defines a tag for the graph. Tags can be used to create tag-based graphs, allowing for finer grain access control.

Syntax for ADD TAG

ADD TAG <tag_name> [DESCRIPTION <tag_description>]

`DROP TAG`

Syntax for DROP TAG

DROP TAG <tag_name> ["," <tag_name>]*

`RUN SCHEMA_CHANGE JOB`

Below is an example. The schema change job add_reviews adds a Review vertex type and two edge types to connect reviews to users and books, respectively.

SCHEMA_CHANGE JOB example

USE GRAPH Book_rating
CREATE SCHEMA_CHANGE JOB add_reviews FOR GRAPH Book_rating {
    ADD VERTEX Review (PRIMARY_ID id UINT, review_date DATETIME, url STRING);
    ADD UNDIRECTED EDGE wrote_review (FROM User, TO Review);
    ADD UNDIRECTED EDGE review_of_book (FROM Review, TO Book);
}
RUN SCHEMA_CHANGE JOB add_reviews

`DROP SCHEMA_CHANGE JOB`

GSQL > USE GRAPH Book_rating
GSQL > DROP JOB local_schema_change123
The job local_schema_change_change123 is dropped!

`USE GLOBAL`

`CREATE GLOBAL SCHEMA_CHANGE JOB`

CREATE GLOBAL SCHEMA_CHANGE JOB syntax

CREATE GLOBAL SCHEMA_CHANGE JOB job_name {
    [sequence of global DROP, ALTER, and ADD statements, each line ending with a semicolon]
}

Although both global and local schema change jobs have ADD and DROP statements, they have different meanings. The table below outlines the differences.

Remember to include a semicolon at the end of each DROP, ALTER, or ADD statement within the JOB block.

`ADD VERTEX | EDGE` (global)

The ADD statement adds existing global vertex or edge types to one of the graphs.

ADD VERTEX / UNDIRECTED EDGE / DIRECTED EDGE (Global)

ADD VERTEX v_type_name [',' v_type_name...] TO GRAPH gname;
ADD EDGE e_type_name [',' e_type_name...] TO GRAPH gname;

`ALTER VERTEX | EDGE`

The ALTER statement adds attributes to or remove attributes from an existing vertex type or edge type. It may only be used within a schema change job. The basic format is as follows:

ALTER VERTEX / EDGE

ALTER VERTEX|EDGE object_type_name ADD|DROP (attribute_list);

`ALTER ... ADD`

Added attributes are appended to the end of the schema. The new attributes may include default fields. To add attributes to a vertex type, the syntax is as follows:

ALTER VERTEX vertex_type_name ADD
    ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* );

For example:

ALTER VERTEX Company ADD ATTRIBUTE (industry
STRING, marketcap DOUBLE)

To add to an edge's endpoint vertex types or attributes, the syntax is as follows:

ALTER EDGE edge_type_name ADD 
    [ATTRIBUTE (attribute_name type [DEFAULT default_value]
    [',' attribute_name type [DEFAULT default_value]]* )];

`ALTER ... DROP`

The syntax for ALTER ... DROP is analogous to that of ALTER ... ADD.

ALTER ... DROP

ALTER VERTEX|EDGE object_type_name DROP ATTRIBUTE (
      attribute_name [',' attribute_name]* );

`ALTER VERTEX ... WITH` (Beta)

ALTER VERTEX WITH TAGGABLE

ALTER VERTEX v_type_name WITH TAGGABLE = ("true" | "false")

`DROP VERTEX | EDGE` (global)

The DROP statement removes specified global vertex or edge types from one of the graphs. The command does not delete any data.

drop vertex / edge

DROP VERTEX v_type_name [',' v_type_name...] FROM GRAPH gname;
DROP EDGE e_type_name   [',' e_type_name...] FROM GRAPH gname;

`RUN GLOBAL SCHEMA_CHANGE JOB`

Below is an example. The schema change alter_friendship_make_library drops the on_date attribute from the friend_of edge and adds Book type to the library graph.

GLOBAL SCHEMA_CHANGE JOB example

USE GLOBAL
CREATE GRAPH library()
CREATE GLOBAL SCHEMA_CHANGE JOB alter_friendship_make_library {
    ALTER EDGE friend_of DROP ATTRIBUTE (on_date);
    ADD VERTEX Book TO GRAPH library;
}
RUN GLOBAL SCHEMA_CHANGE JOB alter_friendship_make_library

`DROP GLOBAL SCHEMA_CHANGE JOB`

Global schema change jobs can be dropped by using the DROP JOB command. Refer to the Creating a Loading Job page for more information about dropping jobs.

DROP GLOBAL SCHEMA_CHANGE JOB example

USE GLOBAL
DROP JOB alter_friendship_make_library

`DROP ALL`

The DROP ALL command clears all graph data, all graph schemas, all loading jobs, and all queries. It should only be used when the intent is to erase an entire database design and to start over.

This command is only available to superusers and only when they are in global mode.

Running a Loading Job

Clear and initialize graph store

There are two aspects to clearing the system: flushing the data and clearing the schema definitions in the catalog. Two different commands are available.

CLEAR GRAPH STORE

Available only to superusers.

CLEAR GRAPH STORE -HARD

Clearing the graph store does not affect the schema.

Use the -HARD option with extreme caution. There is no undo option. -HARD must be in all capital letters.
CLEAR GRAPH STORE stops all the TigerGraph servers (GPE, GSE, RESTPP, Kafka, and Zookeeper).
Loading jobs and queries are aborted.

DROP ALL clears both the data and the schema.

Run a loading job

RUN LOADING JOB

RUN LOADING JOB [-noprint] [-dryrun] [-n [i],j] jobname [
    USING filevar [="filepath_string"][, filevar [="filepath_string"]]*
    [, CONCURRENCY="cnum"][,BATCH_SIZE="bnum"]
]

Options

`-noprint`

Example of minimal output when -noprint option is used

Kick off the following job:
  JobName: load_videoE, jobid: gsql_demo_m1.1525091090494
  Loading log: '/usr/local/tigergraph/logs/restpp/restpp_loader_logs/gsql_demo/gsql_demo_m1.1525091090494.log'

`-dryrun`

If -dryrun is used, the system will read the data files and process the data as instructed by the job, but will NOT load any data into the graph. This option can be a useful diagnostic tool.

`-n [i], j`

Parameters

Below are the parameters available for the RUN QUERY command introduced by the USING clause.

`filevar` list

When a loading job is compiled, it generates one RESTPP endpoint for each filevar and filepath_string. As a consequence, a loading job can be run in parts. When RUN LOADING JOB is executed, only those endpoints whose filevar or file identifier (" __GSQL_FILENAME_n__") is mentioned in the USING clause will be used. However, if the USING clause is omitted, then the entire loading job will be run.
If a filepath_string is given, it overrides the filepath_string defined in the loading job. If a particular filevar is not assigned a filepath_string either in the loading job or in the RUN LOADING JOB statement, then an error is reported and the job exits.

`CONCURRENCY`

The CONCURRENCY parameter sets the maximum number of outstanding requests that the loading job may send to the graph processing engine (GPE). The default value is 256.

`BATCH_SIZE`

The BATCH_SIZE parameter sets the number of data lines included in each request sent to the GPE. The default value is 8192.

Running Loading Jobs as REST Requests

Inspect and manage loading jobs

Starting with v2.0, there are now commands to check loading job status, abort a loading job and, restart a loading job.

Job ID and Status

Example of SHOW LOADING STATUS output

Kick off the following job, i.e.
  JobName: load_test1, jobid: demo_graph_m1.1523663024967
  Loading log: '/home/tigergraph/tigergraph/logs/restpp/restpp_loader_logs/demo_graph/demo_graph_m1.1523663024967.log'

Job "demo_graph_m1.1523663024967" loading status
 
[RUNNING] m1 ( Finished: 3 / Total: 4 )
  [LOADING] /data/output/company.data
  [=============                        ]  20%, 200 kl/s
  [LOADED]
  +-------------------------------------------------------------------+
  |               FILENAME |   LOADED LINES |   AVG SPEED |   DURATION|
  | /data/output/movie.dat |            100 |     100 l/s |     1.00 s|
  |/data/output/person.dat |            100 |     100 l/s |     1.00 s|
  | /data/output/roles.dat |            200 |     200 l/s |     1.00 s|
  +-------------------------------------------------------------------+
[RUNNING] m2 ( Finished: 1 / Total: 2 )
  [LOADING] /data/output/company.data
  [==========================           ]  60%, 200 kl/s
  [LOADED]
  +-------------------------------------------------------------------+
  |               FILENAME |   LOADED LINES |   AVG SPEED |   DURATION|
  | /data/output/movie.dat |            100 |     100 l/s |     1.00 s|
  +-------------------------------------------------------------------+

By default, an active loading job will display periodic updates of its progress. There are two ways to inhibit these automatic output displays:

Run the loading job with the -noprint option.
After the loading job has started, enter CTRL+C. This will abort the output display process, but the loading job will continue.

SHOW LOADING STATUS

The command SHOW LOADING JOB shows the current status of either a specified loading job or all current jobs:

SHOW LOADING JOB syntax

SHOW LOADING STATUS job_id|ALL

ABORT LOADING JOB

The command ABORT LOADING JOB aborts either a specified load job or all active loading jobs:

ABORT LOADING JOB syntax

ABORT LOADING JOB job_id|ALL

The output will show a summary of aborted loading jobs.

ABORT LOADING JOB example

gsql -g demo_graph "abort loading job all"

Job "demo_graph_m1.1519111662589" loading status
[ABORT_SUCCESS] m1
[SUMMARY] Finished: 0 / Total: 2
  +--------------------------------------------------------------------------------------+
  |                  FILENAME |   LOADED LINES |   AVG SPEED  |   DURATION |   PERCENTAGE|
  | /home/tigergraph/data.csv |       23901701 |     174 kl/s |   136.83 s |         65 %|
  |/home/tigergraph/data1.csv |              0 |        0 l/s |     0.00 s |          0 %|
  +--------------------------------------------------------------------------------------+
 
Job "demo_graph_m2.1519111662615" loading status
[ABORT_SUCCESS] m2
[SUMMARY] Finished: 0 / Total: 2
  +--------------------------------------------------------------------------------------+
  |                  FILENAME |   LOADED LINES |   AVG  SPEED |   DURATION |   PERCENTAGE|
  | /home/tigergraph/data.csv |       23860559 |     175 kl/s |   136.23 s |         65 %|
  |/home/tigergraph/data1.csv |              0 |        0 l/s |     0.00 s |          0 %|
  +--------------------------------------------------------------------------------------+

RESUME LOADING JOB

The command RESUME LOADING JOB will restart a previously-run job which ended for some reason before completion.

RESUME LOADING JOB syntax

RESUME LOADING JOB job_id

If the job is finished, this command will do nothing. The RESUME command should pick up where the previous run ended; that is, it should not load the same data twice.

RESUME LOADING JOB example

gsql -g demo_graph "RESUME LOADING JOB demo_graph_m1.1519111662589"
[RESUME_SUCCESS] m1
[MESSAGE] The current job got resummed

Verify and debug a loading job

Every loading job creates a log file. When the job starts, it will display the location of the log file. Typically, the file is located at

<TigerGraph.root.dir>/logs/restpp/restpp_loader_logs/<graph_name>/<job_id>.log

This file contains the following information which most users will find useful:

A list of all the parameter and option settings for the loading job
A copy of the status information that is printed
Statistics report on the number of lines successfully read and parsed

Below is an example.

CREATE VERTEX movie (PRIMARY_ID id UINT, title STRING, country STRING COMPRESS, year UINT)
CREATE DIRECTED EDGE sequel_of (FROM movie, TO movie)
CREATE GRAPH movie_graph(*)
CREATE LOADING JOB load_movie FOR GRAPH movie_graph{
  DEFINE FILENAME f
  LOAD f TO VERTEX movie VALUES ($0, $1, $2, $3) WHERE to_int($3) < 2000;
}
RUN LOADING JOB load_movie USING f="movie.dat"

movie.dat

0,abc,USA,-1990
1,abc,CHN,1990
2,abc,CHN,1990
3,abc,FRA,2015
4,abc,FRA,2005
5,abc,USA,1990
6,abc,1990

The above loading job and data generate the following report

load_output.log (tail)

--------------------Statistics------------------------------
Valid lines:             6
Reject lines:            0
Invalid Json format:     0
Not enough token:        1 [ERROR] (e.g. 7)
Oversize token:          0

Vertex:                  movie
Valid Object:            3
No ID found:             0
Invalid Attributes:      1 [ERROR] (e.g. 1:year)
Invalid primary id:      0
Incorrect fixed
binary length:           0
Passed condition lines:  4
Failed condition lines:  2 (e.g. 4,5)

There are a total of 7 data lines. The report shows that

Six of the lines are valid data lines
One line (Line 7) does not have enough tokens.

Of the 6 valid lines,

Three of the 6 valid lines generate valid movie vertices.
One line has an invalid attribute (Line 1: year)
Two lines (Lines 4 and 5) do not pass the WHERE clause.

Part 1 - Data Definition & Loading

Modifying a Graph Schema

Global vs. Local Schema Changes

CREATE SCHEMA_CHANGE JOB (local)

ADD VERTEX | EDGE (local)

ALTER VERTEX | EDGE

ALTER ... ADD

ALTER ... DROP

ALTER VERTEX ... WITH (Beta)

DROP VERTEX | EDGE (local)

DROP TUPLE

ADD TAG

DROP TAG

RUN SCHEMA_CHANGE JOB

DROP SCHEMA_CHANGE JOB

USE GLOBAL

CREATE GLOBAL SCHEMA_CHANGE JOB

ADD VERTEX | EDGE (global)

ALTER VERTEX | EDGE

ALTER ... ADD

ALTER ... DROP

ALTER VERTEX ... WITH (Beta)

DROP VERTEX | EDGE (global)

RUN GLOBAL SCHEMA_CHANGE JOB

DROP GLOBAL SCHEMA_CHANGE JOB

DROP ALL

Running a Loading Job

Clear and initialize graph store

CLEAR GRAPH STORE

Run a loading job

RUN LOADING JOB

Options

-noprint

-dryrun

-n [i], j

Parameters

filevar list

CONCURRENCY

BATCH_SIZE

Running Loading Jobs as REST Requests

Inspect and manage loading jobs

Job ID and Status

SHOW LOADING STATUS

ABORT LOADING JOB

RESUME LOADING JOB

Verify and debug a loading job

Appendix

GSQL Start-to-End Process and Data Flow

Modifying a Graph Schema

Global vs. Local Schema Changes

CREATE SCHEMA_CHANGE JOB (local)

ADD VERTEX | EDGE (local)

ALTER VERTEX | EDGE

ALTER ... ADD

ALTER ... DROP

ALTER VERTEX ... WITH (Beta)

DROP VERTEX | EDGE (local)

DROP TUPLE

ADD TAG

DROP TAG

RUN SCHEMA_CHANGE JOB

DROP SCHEMA_CHANGE JOB

USE GLOBAL

CREATE GLOBAL SCHEMA_CHANGE JOB

ADD VERTEX | EDGE (global)

ALTER VERTEX | EDGE

ALTER ... ADD

ALTER ... DROP

ALTER VERTEX ... WITH (Beta)

DROP VERTEX | EDGE (global)

RUN GLOBAL SCHEMA_CHANGE JOB

DROP GLOBAL SCHEMA_CHANGE JOB

DROP ALL

GSQL Start-to-End Process and Data Flow

Part 1 - Data Definition & Loading

Running a Loading Job

Clear and initialize graph store

CLEAR GRAPH STORE

Run a loading job

RUN LOADING JOB

`CREATE SCHEMA_CHANGE JOB` (local)

`ADD VERTEX | EDGE` (local)

`ALTER VERTEX | EDGE`

`ALTER ... ADD`

`ALTER ... DROP`

`ALTER VERTEX ... WITH` (Beta)

`DROP VERTEX | EDGE` (local)

`DROP TUPLE`

`ADD TAG`

`DROP TAG`

`RUN SCHEMA_CHANGE JOB`

`DROP SCHEMA_CHANGE JOB`

`USE GLOBAL`

`CREATE GLOBAL SCHEMA_CHANGE JOB`

`ADD VERTEX | EDGE` (global)

`ALTER VERTEX | EDGE`

`ALTER ... ADD`

`ALTER ... DROP`

`ALTER VERTEX ... WITH` (Beta)

`DROP VERTEX | EDGE` (global)

`RUN GLOBAL SCHEMA_CHANGE JOB`

`DROP GLOBAL SCHEMA_CHANGE JOB`

`DROP ALL`

`-noprint`

`-dryrun`

`-n [i], j`

`filevar` list

`CONCURRENCY`

`BATCH_SIZE`

`CREATE SCHEMA_CHANGE JOB` (local)

`ADD VERTEX | EDGE` (local)

`ALTER VERTEX | EDGE`

`ALTER ... ADD`

`ALTER ... DROP`

`ALTER VERTEX ... WITH` (Beta)

`DROP VERTEX | EDGE` (local)

`DROP TUPLE`

`ADD TAG`

`DROP TAG`

`RUN SCHEMA_CHANGE JOB`

`DROP SCHEMA_CHANGE JOB`

`USE GLOBAL`

`CREATE GLOBAL SCHEMA_CHANGE JOB`

`ADD VERTEX | EDGE` (global)

`ALTER VERTEX | EDGE`

`ALTER ... ADD`

`ALTER ... DROP`

`ALTER VERTEX ... WITH` (Beta)

`DROP VERTEX | EDGE` (global)

`RUN GLOBAL SCHEMA_CHANGE JOB`

`DROP GLOBAL SCHEMA_CHANGE JOB`

`DROP ALL`

`-noprint`

`-dryrun`

`-n [i], j`

`filevar` list

`CONCURRENCY`

`BATCH_SIZE`