Load Data - GSQL 101

After creating a graph schema, the next step is to load data into it. The task here is to instruct the GSQL loader how to associate ("map") the fields in a set of data files to the attributes in your vertex types and edge types of the graph schema we just defined.

You should have the two data files person.csv and friendship.csv on your local disk. It’s not necessary that they are in the same folder with you.

If you need to exit the GSQL shell for any reason, you can do so by typing "quit" without the quotes. Type gsql to enter again.

Define a loading job

The loading job below assumes that your data files are in the folder /home/tigergraph. If they are elsewhere, then in the loading job script below replace /home/tigergraph/person.csv and /home/tigergraph/friendship.csv with their corresponding file paths respectively. Enter the following set of commands in the GSQL shell.

In the default Docker installation described in Get Started with Docker, these files are in the folder /home/tigergraph/tutorial/3.x/gsql101/
GSQL commands to define a loading job
USE GRAPH Social
BEGIN
CREATE LOADING JOB load_social FOR GRAPH Social {
   DEFINE FILENAME file1="/home/tigergraph/tutorial/3.x/gsql101/person.csv";
   DEFINE FILENAME file2="/home/tigergraph/tutorial/3.x/gsql101/friendship.csv";

   LOAD file1 TO VERTEX Person VALUES ($"name", $"name", $"age", $"gender", $"state") USING header="true", separator=",";
   LOAD file2 TO EDGE Friendship VALUES ($0, $1, $2) USING header="true", separator=",";
}
END

Let’s walk through the commands:

  • USE GRAPH Social: Tells GSQL which graph you want to work with.

  • BEGIN …​ END: Indicates multiple-line mode. The GSQL shell will treat everything between these markers as a single statement. These are only needed for interactive mode. If you run GSQL statements that are stored in a command file, the command interpreter will study your whole file, so it doesn’t need the BEGIN and END hints.

  • CREATE LOADING JOB: One loading job can describe the mappings from multiple files to multiple graph objects. Each file must be assigned to a filename variable. The field labels can be either by name or by position. By-name labelling requires a header line in the source file. By-position labelling uses integers to indicate source column position 0, 1,…​ In the example above, the first LOAD statement refers to the source file columns by name, whereas the second LOAD statement refers to the source file columns by position. Note the following details:

    • The column "name" in file1 gets mapped to two fields, both the PRIMARY_ID and the "name" attribute of the Person vertex.

    • In file1, gender comes before age. In the Person vertex, gender comes after age. When loading, state your attributes in the order needed by the target object (in this case, the Person vertex).

    • Each LOAD statement has a USING clause. Here it tells GSQL that both files contain a header (whether we choose to use the names or not, GSQL still needs to know whether to consider the first line as data or not). It also says the column separator is a comma. GSQL can handle any single-character separator, not just commas.

When you run the CREATE LOADING JOB statement, GSQL checks for syntax errors and checks that you have data files in the locations specified. If it detects no errors, it compiles and saves your job.

GSQL shell
GSQL > USE GRAPH Social
Using graph 'Social'
GSQL > BEGIN
GSQL > CREATE LOADING JOB load_social FOR GRAPH Social {
GSQL >    DEFINE FILENAME file1="/home/tigergraph/person.csv";
GSQL >    DEFINE FILENAME file2="/home/tigergraph/friendship.csv";
GSQL >
GSQL >    LOAD file1 TO VERTEX Person VALUES ($"name", $"name", $"age", $"gender", $"state") USING header="true", separator=",";
GSQL >    LOAD file2 TO EDGE Friendship VALUES ($0, $1, $2) USING header="true", separator=",";
GSQL > }
GSQL > END
Successfully created loading jobs: [load_social].

Run a loading job

You can now run your loading job to load data into your graph:

GSQL command
RUN LOADING JOB load_social

The result is shown below.

GSQL shell
GSQL > run loading job load_social
[Tip: Use "CTRL + C" to stop displaying the loading status update, then use "SHOW LOADING STATUS jobid" to track the loading progress again]
[Tip: Manage loading jobs with "ABORT/RESUME LOADING JOB jobid"]
Starting the following job, i.e.
  JobName: load_social, jobid: social_m1.1528095850854
  Loading log: '/home/tigergraph/tigergraph/logs/restpp/restpp_loader_logs/social/social_m1.1528095850854.log'

Job "social_m1.1528095850854" loading status
[FINISHED] m1 ( Finished: 2 / Total: 2 )
  [LOADED]
  +---------------------------------------------------------------------------+
  |                       FILENAME |   LOADED LINES |   AVG SPEED |   DURATION|
  |/home/tigergraph/friendship.csv |              8 |       8 l/s |     1.00 s|
  |    /home/tigergraph/person.csv |              8 |       7 l/s |     1.00 s|
  +---------------------------------------------------------------------------+

Notice the location of the loading log file. The example assumes that you installed TigerGraph in the default location, /home/tigergraph/. In your installation folder is the main product folder, tigergraph. Within the tigergraph folder are several subfolders, such as logs, document, config, bin, and gstore. If you installed in a different location, say /usr/local/, then you would find the product folder at /usr/local/tigergraph.