Load Data

Define the Loading Job

Below, we use GSQL loading language to define a loading job script, which encodes all the mappings from the source csv file from the LDBC SNB benchmark data generator to our schema.

You can download the below loading script from here.

GSQL Loading Script
USE GRAPH ldbc_snb
CREATE LOADING JOB load_ldbc_snb FOR GRAPH ldbc_snb {
  // define vertex source files
  DEFINE FILENAME v_comment_file;
  DEFINE FILENAME v_post_file;
  DEFINE FILENAME v_organisation_file;
  DEFINE FILENAME v_place_file;
  DEFINE FILENAME v_forum_file;
  DEFINE FILENAME v_person_file;
  DEFINE FILENAME v_tag_file;
  DEFINE FILENAME v_tagclass_file;
  
  // define edge source files
  DEFINE FILENAME forum_containerOf_post_file;
  DEFINE FILENAME comment_hasCreator_person_file;
  DEFINE FILENAME post_hasCreator_person_file;
  DEFINE FILENAME person_hasInterest_tag_file;
  DEFINE FILENAME forum_hasMember_person_file;
  DEFINE FILENAME forum_hasModerator_person_file;
  DEFINE FILENAME comment_hasTag_tag_file;
  DEFINE FILENAME post_hasTag_tag_file;
  DEFINE FILENAME forum_hasTag_tag_file;
  DEFINE FILENAME tag_hasType_tagclass_file;
  DEFINE FILENAME organisation_isLocatedIn_place_file;
  DEFINE FILENAME comment_isLocatedIn_place_file;
  DEFINE FILENAME post_isLocatedIn_place_file;
  DEFINE FILENAME person_isLocatedIn_place_file;
  DEFINE FILENAME place_isPartOf_place_file;
  DEFINE FILENAME tagclass_isSubclassOf_tagclass_file;
  DEFINE FILENAME person_knows_person_file;
  DEFINE FILENAME person_likes_comment_file;
  DEFINE FILENAME person_likes_post_file;
  DEFINE FILENAME comment_replyOf_comment_file;
  DEFINE FILENAME comment_replyOf_post_file;
  DEFINE FILENAME person_studyAt_organisation_file;
  DEFINE FILENAME person_workAt_organisation_file;

  // load vertex
  LOAD v_comment_file 
    TO VERTEX Comment VALUES ($0, $0, $1, $2, $3, $4, $5) USING header="true", separator="|";
  LOAD v_post_file
    TO VERTEX Post VALUES ($0, $0, $1, $2, $3, $4, $5, $6, $7) USING header="true", separator="|";
  LOAD v_organisation_file
    TO VERTEX Company VALUES ($0, $0, $2, $3) WHERE $1=="company",
    TO VERTEX University VALUES ($0, $0, $2, $3) WHERE $1=="university" USING header="true", separator="|";
  LOAD v_place_file
    TO VERTEX City VALUES ($0, $0, $1, $2) WHERE $3=="city",
    TO VERTEX Country VALUES ($0, $0, $1, $2) WHERE $3=="country",
    TO VERTEX Continent VALUES ($0, $0, $1, $2) WHERE $3=="continent" USING header="true", separator="|";
  LOAD v_forum_file
    TO VERTEX Forum VALUES ($0, $0, $1, $2) USING header="true", separator="|";
  LOAD v_person_file
    TO VERTEX Person VALUES ($0, $0, $1, $2, $3, $4, $5, $6, $7, SPLIT($8,";"), SPLIT($9,";")) USING header="true", separator="|";
  LOAD v_tag_file
    TO VERTEX Tag VALUES ($0, $0, $1, $2) USING header="true", separator="|";
  LOAD v_tagclass_file
    TO VERTEX TagClass VALUES ($0, $0, $1, $2) USING header="true", separator="|";

  // load edge
  LOAD forum_containerOf_post_file
    TO EDGE Forum_CONTAINER_OF_Post VALUES ($0, $1) USING header="true", separator="|";
  LOAD comment_hasCreator_person_file
    TO EDGE Comment_HAS_CREATOR_Person VALUES ($0, $1) USING header="true", separator="|";
  LOAD post_hasCreator_person_file
    TO EDGE Post_HAS_CREATOR_Person VALUES ($0, $1) USING header="true", separator="|";
  LOAD person_hasInterest_tag_file
    TO EDGE Person_HAS_INTEREST_Tag VALUES ($0, $1) USING header="true", separator="|";
  LOAD forum_hasMember_person_file
    TO EDGE Forum_HAS_MEMBER_Person VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD forum_hasModerator_person_file
    TO EDGE Forum_HAS_MODERATOR_Person VALUES ($0, $1) USING header="true", separator="|";
  LOAD comment_hasTag_tag_file
    TO EDGE Comment_HAS_TAG_Tag VALUES ($0, $1) USING header="true", separator="|";
  LOAD post_hasTag_tag_file
    TO EDGE Post_HAS_TAG_Tag VALUES ($0, $1) USING header="true", separator="|";
  LOAD forum_hasTag_tag_file
    TO EDGE Forum_HAS_TAG_Tag VALUES ($0, $1) USING header="true", separator="|";
  LOAD tag_hasType_tagclass_file
    TO EDGE Tag_HAS_TYPE_TagClass VALUES ($0, $1) USING header="true", separator="|";
  LOAD organisation_isLocatedIn_place_file
    TO EDGE Company_IS_LOCATED_IN_Country VALUES ($0, $1) WHERE to_int($1) < 111, 
    TO EDGE University_IS_LOCATED_IN_City VALUES ($0, $1) WHERE to_int($1) > 110 USING header="true", separator="|";
  LOAD comment_isLocatedIn_place_file
    TO EDGE Comment_IS_LOCATED_IN_Country VALUES ($0, $1) USING header="true", separator="|";
  LOAD post_isLocatedIn_place_file
    TO EDGE Post_IS_LOCATED_IN_Country VALUES ($0, $1) USING header="true", separator="|";
  LOAD person_isLocatedIn_place_file
    TO EDGE Person_IS_LOCATED_IN_City VALUES ($0, $1) USING header="true", separator="|";
  LOAD place_isPartOf_place_file
    TO EDGE Country_IS_PART_OF_Continent VALUES ($0, $1) WHERE to_int($0) < 111,
    TO EDGE City_IS_PART_OF_Country VALUES ($0, $1) WHERE to_int($0) > 110 USING header="true", separator="|";
  LOAD tagclass_isSubclassOf_tagclass_file
    TO EDGE TagClass_IS_SUBCLASS_OF_TagClass VALUES ($0, $1) USING header="true", separator="|";
  LOAD person_knows_person_file
    TO EDGE Person_KNOWS_Person VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD person_likes_comment_file
    TO EDGE Person_LIKES_Comment VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD person_likes_post_file
    TO EDGE Person_LIKES_Post VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD comment_replyOf_comment_file
    TO EDGE Comment_REPLY_OF_Comment VALUES ($0, $1) USING header="true", separator="|";
  LOAD comment_replyOf_post_file
    TO EDGE Comment_REPLY_OF_Post VALUES ($0, $1) USING header="true", separator="|";
  LOAD person_studyAt_organisation_file
    TO EDGE Person_STUDY_AT_University VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD person_workAt_organisation_file
    TO EDGE Person_WORK_AT_Company VALUES ($0, $1, $2) USING header="true", separator="|";
}

Prepare The Raw Data

We have generated scale-factor 1 data set (approximate 1GB). You can download it from https://s3-us-west-1.amazonaws.com/tigergraph-benchmark-dataset/LDBC/SF-1/ldbc_snb_data-sf1.tar.gz

Linux Bash
wget https://s3-us-west-1.amazonaws.com/tigergraph-benchmark-dataset/LDBC/SF-1/ldbc_snb_data-sf1.tar.gz

After downloading the raw file, you can run tar command below to decompress the downloaded file.

Linux Bash
tar -xzf  ldbc_snb_data-sf1.tar.gz

After decompressing the file, you will see a folder named "ldbc_snb_data". Enter it, you will see two subfolders

  • social_network

  • substitution_parameters

The raw data is under the social_network folder.

Run The Loading Job

Download the setup_schema.gsql file, and run the script in the shell command line to setup the schema and the loading job.

Linux Bash
gsql setup_schema.gsql

Setup the environment variable LDBC_SNB_DATA_DIR pointing to your raw file folder un-tarred in the previous section. In the example below, the raw data is in /home/tigergraph/ldbc_snb_data/social_network. Note, the folder should have the name social_network.

Linux Bash
#change the directory to your raw file directory
export LDBC_SNB_DATA_DIR=/home/tigergraph/ldbc_snb_data/social_network/

#start all TigerGraph services
gadmin start

#setup schema and loading job
gsql setup_schema.gsql

Download the loading job script and invoke it on the command line.

Linux Bash
./load_data.sh
Sample Loading Progress Output
tigergraph/gsql102$ ./load_data.sh
[Tip: Use "CTRL + C" to stop displaying the loading status update, then use "SHOW LOADING STATUS jobid" to track the loading progress again]
[Tip: Manage loading jobs with "ABORT/RESUME LOADING JOB jobid"]
Starting the following job, i.e.
  JobName: load_ldbc_snb, jobid: ldbc_snb.load_ldbc_snb.file.m1.1558053156447
  Loading log: '/mnt/data/tigergraph/logs/restpp/restpp_loader_logs/ldbc_snb/ldbc_snb.load_ldbc_snb.file.m1.1558053156447.log'

Job "ldbc_snb.load_ldbc_snb.file.m1.1558053156447" loading status
[FINISHED] m1 ( Finished: 31 / Total: 31 )
  [LOADED]
  +----------------------------------------------------------------------------------------------------------------------------------+
  |                                                                              FILENAME |   LOADED LINES |   AVG SPEED |   DURATION|
  |                       /mnt/data/download/ldbc_snb_data/social_network/comment_0_0.csv |        2052170 |    281 kl/s |     7.28 s|
  |     /mnt/data/download/ldbc_snb_data/social_network/comment_hasCreator_person_0_0.csv |        2052170 |    251 kl/s |     8.17 s|
  |            /mnt/data/download/ldbc_snb_data/social_network/comment_hasTag_tag_0_0.csv |        2698394 |    422 kl/s |     6.38 s|
  |     /mnt/data/download/ldbc_snb_data/social_network/comment_isLocatedIn_place_0_0.csv |        2052170 |    291 kl/s |     7.04 s|
  |       /mnt/data/download/ldbc_snb_data/social_network/comment_replyOf_comment_0_0.csv |        1040750 |    253 kl/s |     4.11 s|
  |          /mnt/data/download/ldbc_snb_data/social_network/comment_replyOf_post_0_0.csv |        1011421 |    248 kl/s |     4.07 s|
  |                         /mnt/data/download/ldbc_snb_data/social_network/forum_0_0.csv |          90493 |     87 kl/s |     1.03 s|
  |        /mnt/data/download/ldbc_snb_data/social_network/forum_containerOf_post_0_0.csv |        1003606 |    240 kl/s |     4.18 s|
  |        /mnt/data/download/ldbc_snb_data/social_network/forum_hasMember_person_0_0.csv |        1611870 |    431 kl/s |     3.74 s|
  |     /mnt/data/download/ldbc_snb_data/social_network/forum_hasModerator_person_0_0.csv |          90493 |     89 kl/s |     1.01 s|
  |              /mnt/data/download/ldbc_snb_data/social_network/forum_hasTag_tag_0_0.csv |         309767 |    297 kl/s |     1.04 s|
  |                  /mnt/data/download/ldbc_snb_data/social_network/organisation_0_0.csv |           7956 |      7 kl/s |     1.00 s|
  |/mnt/data/download/ldbc_snb_data/social_network/organisation_isLocatedIn_place_0_0.csv |           7956 |      7 kl/s |     1.00 s|
  |                        /mnt/data/download/ldbc_snb_data/social_network/person_0_0.csv |           9893 |      9 kl/s |     1.05 s|
  |        /mnt/data/download/ldbc_snb_data/social_network/person_hasInterest_tag_0_0.csv |         229167 |    223 kl/s |     1.03 s|
  |      /mnt/data/download/ldbc_snb_data/social_network/person_isLocatedIn_place_0_0.csv |           9893 |      9 kl/s |     1.00 s|
  |           /mnt/data/download/ldbc_snb_data/social_network/person_knows_person_0_0.csv |         180624 |    169 kl/s |     1.06 s|
  |          /mnt/data/download/ldbc_snb_data/social_network/person_likes_comment_0_0.csv |        1438419 |    449 kl/s |     3.20 s|
  |             /mnt/data/download/ldbc_snb_data/social_network/person_likes_post_0_0.csv |         751678 |    331 kl/s |     2.27 s|
  |   /mnt/data/download/ldbc_snb_data/social_network/person_studyAt_organisation_0_0.csv |           7950 |      7 kl/s |     1.00 s|
  |    /mnt/data/download/ldbc_snb_data/social_network/person_workAt_organisation_0_0.csv |          21655 |     21 kl/s |     1.00 s|
  |                         /mnt/data/download/ldbc_snb_data/social_network/place_0_0.csv |           1461 |      1 kl/s |     1.00 s|
  |          /mnt/data/download/ldbc_snb_data/social_network/place_isPartOf_place_0_0.csv |           1455 |      1 kl/s |     1.00 s|
  |                          /mnt/data/download/ldbc_snb_data/social_network/post_0_0.csv |        1003606 |    195 kl/s |     5.14 s|
  |        /mnt/data/download/ldbc_snb_data/social_network/post_hasCreator_person_0_0.csv |        1003606 |    320 kl/s |     3.13 s|
  |               /mnt/data/download/ldbc_snb_data/social_network/post_hasTag_tag_0_0.csv |         713259 |    341 kl/s |     2.09 s|
  |        /mnt/data/download/ldbc_snb_data/social_network/post_isLocatedIn_place_0_0.csv |        1003606 |    327 kl/s |     3.07 s|
  |                           /mnt/data/download/ldbc_snb_data/social_network/tag_0_0.csv |          16081 |     16 kl/s |     1.00 s|
  |          /mnt/data/download/ldbc_snb_data/social_network/tag_hasType_tagclass_0_0.csv |          16081 |     16 kl/s |     1.00 s|
  |                      /mnt/data/download/ldbc_snb_data/social_network/tagclass_0_0.csv |             72 |      71 l/s |     1.00 s|
  |/mnt/data/download/ldbc_snb_data/social_network/tagclass_isSubclassOf_tagclass_0_0.csv |             71 |      70 l/s |     1.00 s|
  +----------------------------------------------------------------------------------------------------------------------------------+

After loading, you can check the graph's size using one of the options of the administrator tool, gadmin. From a Linux shell, enter the command

gadmin status graph -v

Linux shell
gadmin status graph -v
verbose is ON
=== graph ===
[m1     ][GRAPH][MSG ] Graph was loaded (/mnt/data/tigergraph/gstore/0/part/): partition size is 437.20MiB, IDS size is 102.30MiB, SchemaVersion: 0, VertexCount: 3181724, NumOfSkippedVertices: 0, NumOfDeletedVertices: 0, EdgeCount: 34512076
[m1     ][GRAPH][INIT] True
[INFO   ][GRAPH][MSG ] Above vertex and edge counts are for internal use which show approximate topology size of the local graph partition. Use DML to get the correct graph topology information
[SUMMARY][GRAPH] graph is ready

You should see VertexCount: 3,181,724 and EdgeCount 34,512,076.

Last updated