Load Data

Define the Loading Job

Below, we use the GSQL loading language to define a loading job script, which encodes all the mappings from the source CSV file (generated by the LDBC SNB benchmark data generator) to our schema.

The script to load the LDBC-SNB data is below:

GSQL script for loading LDBC-SNB data
USE GRAPH ldbc_snb
CREATE LOADING JOB load_ldbc_snb FOR GRAPH ldbc_snb {
  // define vertex
  DEFINE FILENAME v_comment_file;
  DEFINE FILENAME v_post_file;
  DEFINE FILENAME v_organisation_file;
  DEFINE FILENAME v_place_file;
  DEFINE FILENAME v_forum_file;
  DEFINE FILENAME v_person_file;
  DEFINE FILENAME v_tag_file;
  DEFINE FILENAME v_tagclass_file;

  // define edge
  DEFINE FILENAME forum_containerOf_post_file;
  DEFINE FILENAME comment_hasCreator_person_file;
  DEFINE FILENAME post_hasCreator_person_file;
  DEFINE FILENAME person_hasInterest_tag_file;
  DEFINE FILENAME forum_hasMember_person_file;
  DEFINE FILENAME forum_hasModerator_person_file;
  DEFINE FILENAME comment_hasTag_tag_file;
  DEFINE FILENAME post_hasTag_tag_file;
  DEFINE FILENAME forum_hasTag_tag_file;
  DEFINE FILENAME tag_hasType_tagclass_file;
  DEFINE FILENAME organisation_isLocatedIn_place_file;
  DEFINE FILENAME comment_isLocatedIn_place_file;
  DEFINE FILENAME post_isLocatedIn_place_file;
  DEFINE FILENAME person_isLocatedIn_place_file;
  DEFINE FILENAME place_isPartOf_place_file;
  DEFINE FILENAME tagclass_isSubclassOf_tagclass_file;
  DEFINE FILENAME person_knows_person_file;
  DEFINE FILENAME person_likes_comment_file;
  DEFINE FILENAME person_likes_post_file;
  DEFINE FILENAME comment_replyOf_comment_file;
  DEFINE FILENAME comment_replyOf_post_file;
  DEFINE FILENAME person_studyAt_organisation_file;
  DEFINE FILENAME person_workAt_organisation_file;

  // load vertex
  LOAD v_comment_file
    TO VERTEX Comment VALUES ($0, $1, $2, $3, $4, $5) USING header="true", separator="|";
  LOAD v_post_file
    TO VERTEX Post VALUES ($0, $1, $2, $3, $4, $5, $6, $7) USING header="true", separator="|";
  LOAD v_organisation_file
    TO VERTEX Company VALUES ($0, $2, $3) WHERE $1=="company",
    TO VERTEX University VALUES ($0, $2, $3) WHERE $1=="university" USING header="true", separator="|";
  LOAD v_place_file
    TO VERTEX City VALUES ($0, $1, $2) WHERE $3=="city",
    TO VERTEX Country VALUES ($0, $1, $2) WHERE $3=="country",
    TO VERTEX Continent VALUES ($0, $1, $2) WHERE $3=="continent" USING header="true", separator="|";
  LOAD v_forum_file
    TO VERTEX Forum VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD v_person_file
    TO VERTEX Person VALUES ($0, $1, $2, $3, $4, $5, $6, $7, SPLIT($8,";"), SPLIT($9,";")) USING header="true", separator="|";
  LOAD v_tag_file
    TO VERTEX Tag VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD v_tagclass_file
    TO VERTEX TagClass VALUES ($0, $1, $2) USING header="true", separator="|";

  // load edge
  LOAD forum_containerOf_post_file
    TO EDGE CONTAINER_OF VALUES ($0, $1) USING header="true", separator="|";
  LOAD comment_hasCreator_person_file
    TO EDGE HAS_CREATOR VALUES ($0 Comment, $1) USING header="true", separator="|";
  LOAD post_hasCreator_person_file
    TO EDGE HAS_CREATOR VALUES ($0 Post, $1) USING header="true", separator="|";
  LOAD person_hasInterest_tag_file
    TO EDGE HAS_INTEREST VALUES ($0, $1) USING header="true", separator="|";
  LOAD forum_hasMember_person_file
    TO EDGE HAS_MEMBER VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD forum_hasModerator_person_file
    TO EDGE HAS_MODERATOR VALUES ($0, $1) USING header="true", separator="|";
  LOAD comment_hasTag_tag_file
    TO EDGE HAS_TAG VALUES ($0 Comment, $1) USING header="true", separator="|";
  LOAD post_hasTag_tag_file
    TO EDGE HAS_TAG VALUES ($0 Post, $1) USING header="true", separator="|";
  LOAD forum_hasTag_tag_file
    TO EDGE HAS_TAG VALUES ($0 Forum, $1) USING header="true", separator="|";
  LOAD tag_hasType_tagclass_file
    TO EDGE HAS_TYPE VALUES ($0, $1) USING header="true", separator="|";
  LOAD organisation_isLocatedIn_place_file
    TO EDGE IS_LOCATED_IN VALUES ($0 Company, $1 Country) WHERE to_int($1) < 111,
    TO EDGE IS_LOCATED_IN VALUES ($0 University, $1 City) WHERE to_int($1) > 110 USING header="true", separator="|";
  LOAD comment_isLocatedIn_place_file
    TO EDGE IS_LOCATED_IN VALUES ($0 Comment, $1 Country) USING header="true", separator="|";
  LOAD post_isLocatedIn_place_file
    TO EDGE IS_LOCATED_IN VALUES ($0 Post, $1 Country) USING header="true", separator="|";
  LOAD person_isLocatedIn_place_file
    TO EDGE IS_LOCATED_IN VALUES ($0 Person, $1 City) USING header="true", separator="|";
  LOAD place_isPartOf_place_file
    TO EDGE IS_PART_OF VALUES ($0 Country, $1 Continent) WHERE to_int($0) < 111,
    TO EDGE IS_PART_OF VALUES ($0 City, $1 Country) WHERE to_int($0) > 110 USING header="true", separator="|";
  LOAD tagclass_isSubclassOf_tagclass_file
    TO EDGE IS_SUBCLASS_OF VALUES ($0, $1) USING header="true", separator="|";
  LOAD person_knows_person_file
    TO EDGE KNOWS VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD person_likes_comment_file
    TO EDGE LIKES VALUES ($0, $1 Comment, $2) USING header="true", separator="|";
  LOAD person_likes_post_file
    TO EDGE LIKES VALUES ($0, $1 Post, $2) USING header="true", separator="|";
  LOAD comment_replyOf_comment_file
    TO EDGE REPLY_OF VALUES ($0, $1 Comment) USING header="true", separator="|";
  LOAD comment_replyOf_post_file
    TO EDGE REPLY_OF VALUES ($0, $1 Post) USING header="true", separator="|";
  LOAD person_studyAt_organisation_file
    TO EDGE STUDY_AT VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD person_workAt_organisation_file
    TO EDGE WORK_AT VALUES ($0, $1, $2) USING header="true", separator="|";
}

Prepare The Raw Data

We have generated a data set with scale factor 1 (approximately 1GB). You can download it from the following link: https://s3-us-west-1.amazonaws.com/tigergraph-benchmark-dataset/LDBC/SF-1/ldbc_snb_data-sf1.tar.gz

Linux Bash
wget https://s3-us-west-1.amazonaws.com/tigergraph-benchmark-dataset/LDBC/SF-1/ldbc_snb_data-sf1.tar.gz

After downloading the raw file, run the tar command below to decompress the downloaded file.

Linux Bash
tar -xzf  ldbc_snb_data-sf1.tar.gz

After decompressing the file, you see a folder named ldbc_snb_data. Inside are two sub-folders:

  • social_network

  • substitution_parameters

The raw data is in the social_network folder.

Run The Loading Job

Download setup_schema.gsql which combines the schema script and loading job script shown before.

Specify the environment variable LDBC_SNB_DATA_DIR to point to your raw file folder un-tarred in the previous section. In our example below, the raw data is in /home/tigergraph/ldbc_snb_data/social_network, so we use the export shell command to specify its location. Then, start your TigerGraph services if needed. Finally, run the setup_schema.gsql script to create your LDBC Social Network graph.

Linux Bash
#change the directory to your raw file directory
export LDBC_SNB_DATA_DIR=/home/tigergraph/ldbc_snb_data/social_network/

#start all TigerGraph services
gadmin start all

#setup schema and loading job
gsql setup_schema.gsql

Download the loading job script and invoke it on the command line.

Linux Bash
./load_data.sh
# Sample Loading Progress Output
tigergraph/gsql_102$ ./load_data.sh
[Tip: Use "CTRL + C" to stop displaying the loading status update, then use "SHOW LOADING STATUS jobid" to track the loading progress again]
[Tip: Manage loading jobs with "ABORT/RESUME LOADING JOB jobid"]
Starting the following job, i.e.
  JobName: load_ldbc_snb, jobid: ldbc_snb.load_ldbc_snb.file.m1.1558053156447
  Loading log: '/mnt/data/tigergraph/logs/restpp/restpp_loader_logs/ldbc_snb/ldbc_snb.load_ldbc_snb.file.m1.1558053156447.log'

Job "ldbc_snb.load_ldbc_snb.file.m1.1558053156447" loading status
[FINISHED] m1 ( Finished: 31 / Total: 31 )
  [LOADED]
  +----------------------------------------------------------------------------------------------------------------------------------+
  |                                                                              FILENAME |   LOADED LINES |   AVG SPEED |   DURATION|
  |                       /mnt/data/download/ldbc_snb_data/social_network/comment_0_0.csv |        2052170 |    281 kl/s |     7.28 s|
  |     /mnt/data/download/ldbc_snb_data/social_network/comment_hasCreator_person_0_0.csv |        2052170 |    251 kl/s |     8.17 s|
  |            /mnt/data/download/ldbc_snb_data/social_network/comment_hasTag_tag_0_0.csv |        2698394 |    422 kl/s |     6.38 s|
  |     /mnt/data/download/ldbc_snb_data/social_network/comment_isLocatedIn_place_0_0.csv |        2052170 |    291 kl/s |     7.04 s|
  |       /mnt/data/download/ldbc_snb_data/social_network/comment_replyOf_comment_0_0.csv |        1040750 |    253 kl/s |     4.11 s|
  |          /mnt/data/download/ldbc_snb_data/social_network/comment_replyOf_post_0_0.csv |        1011421 |    248 kl/s |     4.07 s|
  |                         /mnt/data/download/ldbc_snb_data/social_network/forum_0_0.csv |          90493 |     87 kl/s |     1.03 s|
  |        /mnt/data/download/ldbc_snb_data/social_network/forum_containerOf_post_0_0.csv |        1003606 |    240 kl/s |     4.18 s|
  |        /mnt/data/download/ldbc_snb_data/social_network/forum_hasMember_person_0_0.csv |        1611870 |    431 kl/s |     3.74 s|
  |     /mnt/data/download/ldbc_snb_data/social_network/forum_hasModerator_person_0_0.csv |          90493 |     89 kl/s |     1.01 s|
  |              /mnt/data/download/ldbc_snb_data/social_network/forum_hasTag_tag_0_0.csv |         309767 |    297 kl/s |     1.04 s|
  |                  /mnt/data/download/ldbc_snb_data/social_network/organisation_0_0.csv |           7956 |      7 kl/s |     1.00 s|
  |/mnt/data/download/ldbc_snb_data/social_network/organisation_isLocatedIn_place_0_0.csv |           7956 |      7 kl/s |     1.00 s|
  |                        /mnt/data/download/ldbc_snb_data/social_network/person_0_0.csv |           9893 |      9 kl/s |     1.05 s|
  |        /mnt/data/download/ldbc_snb_data/social_network/person_hasInterest_tag_0_0.csv |         229167 |    223 kl/s |     1.03 s|
  |      /mnt/data/download/ldbc_snb_data/social_network/person_isLocatedIn_place_0_0.csv |           9893 |      9 kl/s |     1.00 s|
  |           /mnt/data/download/ldbc_snb_data/social_network/person_knows_person_0_0.csv |         180624 |    169 kl/s |     1.06 s|
  |          /mnt/data/download/ldbc_snb_data/social_network/person_likes_comment_0_0.csv |        1438419 |    449 kl/s |     3.20 s|
  |             /mnt/data/download/ldbc_snb_data/social_network/person_likes_post_0_0.csv |         751678 |    331 kl/s |     2.27 s|
  |   /mnt/data/download/ldbc_snb_data/social_network/person_studyAt_organisation_0_0.csv |           7950 |      7 kl/s |     1.00 s|
  |    /mnt/data/download/ldbc_snb_data/social_network/person_workAt_organisation_0_0.csv |          21655 |     21 kl/s |     1.00 s|
  |                         /mnt/data/download/ldbc_snb_data/social_network/place_0_0.csv |           1461 |      1 kl/s |     1.00 s|
  |          /mnt/data/download/ldbc_snb_data/social_network/place_isPartOf_place_0_0.csv |           1455 |      1 kl/s |     1.00 s|
  |                          /mnt/data/download/ldbc_snb_data/social_network/post_0_0.csv |        1003606 |    195 kl/s |     5.14 s|
  |        /mnt/data/download/ldbc_snb_data/social_network/post_hasCreator_person_0_0.csv |        1003606 |    320 kl/s |     3.13 s|
  |               /mnt/data/download/ldbc_snb_data/social_network/post_hasTag_tag_0_0.csv |         713259 |    341 kl/s |     2.09 s|
  |        /mnt/data/download/ldbc_snb_data/social_network/post_isLocatedIn_place_0_0.csv |        1003606 |    327 kl/s |     3.07 s|
  |                           /mnt/data/download/ldbc_snb_data/social_network/tag_0_0.csv |          16081 |     16 kl/s |     1.00 s|
  |          /mnt/data/download/ldbc_snb_data/social_network/tag_hasType_tagclass_0_0.csv |          16081 |     16 kl/s |     1.00 s|
  |                      /mnt/data/download/ldbc_snb_data/social_network/tagclass_0_0.csv |             72 |      71 l/s |     1.00 s|
  |/mnt/data/download/ldbc_snb_data/social_network/tagclass_isSubclassOf_tagclass_0_0.csv |             71 |      70 l/s |     1.00 s|
  +----------------------------------------------------------------------------------------------------------------------------------+

After loading, you can check the graph’s size using built-in REST endpoint calls.

Below we call two functions, stat_vertex_number and stat_edge_number to return the cardinality of each vertex and edge type.

Linux shell
curl -X POST 'http://localhost:9000/builtins/ldbc_snb' -d  '{"function":"stat_vertex_number","type":"*"}'  | jq .
curl -X POST 'http://localhost:9000/builtins/ldbc_snb' -d  '{"function":"stat_edge_number","type":"*"}' | jq .