Load Data - GSQL 101
After creating a graph schema, the next step is to load data into it. The task here is to instruct the GSQL loader how to associate ("map") the fields in a set of data files to the attributes in your vertex types and edge types of the graph schema we just defined.
You should have the two data files person.csv and friendship.csv on your local disk. It’s not necessary that they are in the same folder with you.
If you need to exit the GSQL shell for any reason, you can do so by typing "quit" without the quotes. Type gsql to enter again.
Define a loading job
The loading job below assumes that your data files are in the folder /home/tigergraph
.
If they are elsewhere, then in the loading job script below replace /home/tigergraph/person.csv
and /home/tigergraph/friendship.csv
with their corresponding file paths respectively.
Enter the following set of commands in the GSQL shell.
In the default Docker installation described in Get Started with Docker, these files are in the folder /home/tigergraph/tutorial/3.x/gsql101/
|
USE GRAPH Social
BEGIN
CREATE LOADING JOB load_social FOR GRAPH Social {
DEFINE FILENAME file1="/home/tigergraph/tutorial/3.x/gsql101/person.csv";
DEFINE FILENAME file2="/home/tigergraph/tutorial/3.x/gsql101/friendship.csv";
LOAD file1 TO VERTEX Person VALUES ($"name", $"name", $"age", $"gender", $"state") USING header="true", separator=",";
LOAD file2 TO EDGE Friendship VALUES ($0, $1, $2) USING header="true", separator=",";
}
END
Let’s walk through the commands:
-
USE GRAPH Social
: Tells GSQL which graph you want to work with. -
BEGIN … END
: Indicates multiple-line mode. The GSQL shell will treat everything between these markers as a single statement. These are only needed for interactive mode. If you run GSQL statements that are stored in a command file, the command interpreter will study your whole file, so it doesn’t need theBEGIN
andEND
hints. -
CREATE LOADING JOB
: One loading job can describe the mappings from multiple files to multiple graph objects. Each file must be assigned to a filename variable. The field labels can be either by name or by position. By-name labelling requires a header line in the source file. By-position labelling uses integers to indicate source column position 0, 1,… In the example above, the firstLOAD
statement refers to the source file columns by name, whereas the secondLOAD
statement refers to the source file columns by position. Note the following details:-
The column "name" in
file1
gets mapped to two fields, both thePRIMARY_ID
and the "name" attribute of thePerson
vertex. -
In
file1
, gender comes before age. In thePerson
vertex, gender comes after age. When loading, state your attributes in the order needed by the target object (in this case, thePerson
vertex). -
Each
LOAD
statement has aUSING
clause. Here it tells GSQL that both files contain a header (whether we choose to use the names or not, GSQL still needs to know whether to consider the first line as data or not). It also says the column separator is a comma. GSQL can handle any single-character separator, not just commas.
-
When you run the CREATE LOADING JOB
statement, GSQL checks for syntax errors and checks that you have data files in the locations specified. If it detects no errors, it compiles and saves your job.
GSQL > USE GRAPH Social
Using graph 'Social'
GSQL > BEGIN
GSQL > CREATE LOADING JOB load_social FOR GRAPH Social {
GSQL > DEFINE FILENAME file1="/home/tigergraph/person.csv";
GSQL > DEFINE FILENAME file2="/home/tigergraph/friendship.csv";
GSQL >
GSQL > LOAD file1 TO VERTEX Person VALUES ($"name", $"name", $"age", $"gender", $"state") USING header="true", separator=",";
GSQL > LOAD file2 TO EDGE Friendship VALUES ($0, $1, $2) USING header="true", separator=",";
GSQL > }
GSQL > END
Successfully created loading jobs: [load_social].
Run a loading job
You can now run your loading job to load data into your graph:
RUN LOADING JOB load_social
The result is shown below.
GSQL > run loading job load_social
[Tip: Use "CTRL + C" to stop displaying the loading status update, then use "SHOW LOADING STATUS jobid" to track the loading progress again]
[Tip: Manage loading jobs with "ABORT/RESUME LOADING JOB jobid"]
Starting the following job, i.e.
JobName: load_social, jobid: social_m1.1528095850854
Loading log: '/home/tigergraph/tigergraph/logs/restpp/restpp_loader_logs/social/social_m1.1528095850854.log'
Job "social_m1.1528095850854" loading status
[FINISHED] m1 ( Finished: 2 / Total: 2 )
[LOADED]
+---------------------------------------------------------------------------+
| FILENAME | LOADED LINES | AVG SPEED | DURATION|
|/home/tigergraph/friendship.csv | 8 | 8 l/s | 1.00 s|
| /home/tigergraph/person.csv | 8 | 7 l/s | 1.00 s|
+---------------------------------------------------------------------------+
Notice the location of the loading log file. The example assumes that you installed TigerGraph in the default location, /home/tigergraph/
. In your installation folder is the main product folder, tigergraph. Within the tigergraph folder are several subfolders, such as logs, document, config, bin, and gstore. If you installed in a different location, say /usr/local/
, then you would find the product folder at /usr/local/tigergraph
.