Load Data

Define the Loading Job

Below, we use the GSQL loading language to define a loading job script, which encodes all the mappings from the source CSV file (generated by the LDBC SNB benchmark data generator) to our schema.

The script to load the LDBC-SNB data is below.

# GSQL script for loading LDBC-SNB data
USE GRAPH ldbc_snb
CREATE LOADING JOB load_ldbc_snb FOR GRAPH ldbc_snb {
// define vertex
DEFINE FILENAME v_comment_file;
DEFINE FILENAME v_post_file;
DEFINE FILENAME v_organisation_file;
DEFINE FILENAME v_place_file;
DEFINE FILENAME v_forum_file;
DEFINE FILENAME v_person_file;
DEFINE FILENAME v_tag_file;
DEFINE FILENAME v_tagclass_file;
// define edge
DEFINE FILENAME forum_containerOf_post_file;
DEFINE FILENAME comment_hasCreator_person_file;
DEFINE FILENAME post_hasCreator_person_file;
DEFINE FILENAME person_hasInterest_tag_file;
DEFINE FILENAME forum_hasMember_person_file;
DEFINE FILENAME forum_hasModerator_person_file;
DEFINE FILENAME comment_hasTag_tag_file;
DEFINE FILENAME post_hasTag_tag_file;
DEFINE FILENAME forum_hasTag_tag_file;
DEFINE FILENAME tag_hasType_tagclass_file;
DEFINE FILENAME organisation_isLocatedIn_place_file;
DEFINE FILENAME comment_isLocatedIn_place_file;
DEFINE FILENAME post_isLocatedIn_place_file;
DEFINE FILENAME person_isLocatedIn_place_file;
DEFINE FILENAME place_isPartOf_place_file;
DEFINE FILENAME tagclass_isSubclassOf_tagclass_file;
DEFINE FILENAME person_knows_person_file;
DEFINE FILENAME person_likes_comment_file;
DEFINE FILENAME person_likes_post_file;
DEFINE FILENAME comment_replyOf_comment_file;
DEFINE FILENAME comment_replyOf_post_file;
DEFINE FILENAME person_studyAt_organisation_file;
DEFINE FILENAME person_workAt_organisation_file;
// load vertex
LOAD v_comment_file
TO VERTEX Comment VALUES ($0, $1, $2, $3, $4, $5) USING header="true", separator="|";
LOAD v_post_file
TO VERTEX Post VALUES ($0, $1, $2, $3, $4, $5, $6, $7) USING header="true", separator="|";
LOAD v_organisation_file
TO VERTEX Company VALUES ($0, $2, $3) WHERE $1=="company",
TO VERTEX University VALUES ($0, $2, $3) WHERE $1=="university" USING header="true", separator="|";
LOAD v_place_file
TO VERTEX City VALUES ($0, $1, $2) WHERE $3=="city",
TO VERTEX Country VALUES ($0, $1, $2) WHERE $3=="country",
TO VERTEX Continent VALUES ($0, $1, $2) WHERE $3=="continent" USING header="true", separator="|";
LOAD v_forum_file
TO VERTEX Forum VALUES ($0, $1, $2) USING header="true", separator="|";
LOAD v_person_file
TO VERTEX Person VALUES ($0, $1, $2, $3, $4, $5, $6, $7, SPLIT($8,";"), SPLIT($9,";")) USING header="true", separator="|";
LOAD v_tag_file
TO VERTEX Tag VALUES ($0, $1, $2) USING header="true", separator="|";
LOAD v_tagclass_file
TO VERTEX TagClass VALUES ($0, $1, $2) USING header="true", separator="|";
// load edge
LOAD forum_containerOf_post_file
TO EDGE CONTAINER_OF VALUES ($0, $1) USING header="true", separator="|";
LOAD comment_hasCreator_person_file
TO EDGE HAS_CREATOR VALUES ($0 Comment, $1) USING header="true", separator="|";
LOAD post_hasCreator_person_file
TO EDGE HAS_CREATOR VALUES ($0 Post, $1) USING header="true", separator="|";
LOAD person_hasInterest_tag_file
TO EDGE HAS_INTEREST VALUES ($0, $1) USING header="true", separator="|";
LOAD forum_hasMember_person_file
TO EDGE HAS_MEMBER VALUES ($0, $1, $2) USING header="true", separator="|";
LOAD forum_hasModerator_person_file
TO EDGE HAS_MODERATOR VALUES ($0, $1) USING header="true", separator="|";
LOAD comment_hasTag_tag_file
TO EDGE HAS_TAG VALUES ($0 Comment, $1) USING header="true", separator="|";
LOAD post_hasTag_tag_file
TO EDGE HAS_TAG VALUES ($0 Post, $1) USING header="true", separator="|";
LOAD forum_hasTag_tag_file
TO EDGE HAS_TAG VALUES ($0 Forum, $1) USING header="true", separator="|";
LOAD tag_hasType_tagclass_file
TO EDGE HAS_TYPE VALUES ($0, $1) USING header="true", separator="|";
LOAD organisation_isLocatedIn_place_file
TO EDGE IS_LOCATED_IN VALUES ($0 Company, $1 Country) WHERE to_int($1) < 111,
TO EDGE IS_LOCATED_IN VALUES ($0 University, $1 City) WHERE to_int($1) > 110 USING header="true", separator="|";
LOAD comment_isLocatedIn_place_file
TO EDGE IS_LOCATED_IN VALUES ($0 Comment, $1 Country) USING header="true", separator="|";
LOAD post_isLocatedIn_place_file
TO EDGE IS_LOCATED_IN VALUES ($0 Post, $1 Country) USING header="true", separator="|";
LOAD person_isLocatedIn_place_file
TO EDGE IS_LOCATED_IN VALUES ($0 Person, $1 City) USING header="true", separator="|";
LOAD place_isPartOf_place_file
TO EDGE IS_PART_OF VALUES ($0 Country, $1 Continent) WHERE to_int($0) < 111,
TO EDGE IS_PART_OF VALUES ($0 City, $1 Country) WHERE to_int($0) > 110 USING header="true", separator="|";
LOAD tagclass_isSubclassOf_tagclass_file
TO EDGE IS_SUBCLASS_OF VALUES ($0, $1) USING header="true", separator="|";
LOAD person_knows_person_file
TO EDGE KNOWS VALUES ($0, $1, $2) USING header="true", separator="|";
LOAD person_likes_comment_file
TO EDGE LIKES VALUES ($0, $1 Comment, $2) USING header="true", separator="|";
LOAD person_likes_post_file
TO EDGE LIKES VALUES ($0, $1 Post, $2) USING header="true", separator="|";
LOAD comment_replyOf_comment_file
TO EDGE REPLY_OF VALUES ($0, $1 Comment) USING header="true", separator="|";
LOAD comment_replyOf_post_file
TO EDGE REPLY_OF VALUES ($0, $1 Post) USING header="true", separator="|";
LOAD person_studyAt_organisation_file
TO EDGE STUDY_AT VALUES ($0, $1, $2) USING header="true", separator="|";
LOAD person_workAt_organisation_file
TO EDGE WORK_AT VALUES ($0, $1, $2) USING header="true", separator="|";
}

Prepare The Raw Data

We have generated a data set with scale factor 1 (approximate 1GB). You can download it from https://s3-us-west-1.amazonaws.com/tigergraph-benchmark-dataset/LDBC/SF-1/ldbc_snb_data-sf1.tar.gz

# Linux Bash
wget https://s3-us-west-1.amazonaws.com/tigergraph-benchmark-dataset/LDBC/SF-1/ldbc_snb_data-sf1.tar.gz

After downloading the raw file, run the tar command below to decompress the downloaded file.

# Linux Bash
tar -xzf ldbc_snb_data-sf1.tar.gz

After decompressing the file, you will see a folder named "ldbc_snb_data". Within it, you will see two subfolders

  • social_network

  • substitution_parameters

The raw data is in the social_network folder.

Run The Loading Job

Download setup_schema.gsql which combines the schema script and loading job script shown before.

Specify the environment variable LDBC_SNB_DATA_DIR to point to your raw file folder un-tarred in the previous section. In our example below, the raw data is in /home/tigergraph/ldbc_snb_data/social_network, so we use the export shell command to specify its location. Then, start your TigerGraph services if needed. Finally, run the setup_schema.gsql script to create your LDBC Social Network graph.

# Linux Bash
#change the directory to your raw file directory
export LDBC_SNB_DATA_DIR=/home/tigergraph/ldbc_snb_data/social_network/
#start all TigerGraph services
gadmin start all
#setup schema and loading job
gsql setup_schema.gsql

Download the loading job script and invoke it on the command line. #

# Linux Bash
./load_data.sh
# Sample Loading Progress Output
tigergraph/gsql_102$ ./load_data.sh
[Tip: Use "CTRL + C" to stop displaying the loading status update, then use "SHOW LOADING STATUS jobid" to track the loading progress again]
[Tip: Manage loading jobs with "ABORT/RESUME LOADING JOB jobid"]
Starting the following job, i.e.
JobName: load_ldbc_snb, jobid: ldbc_snb.load_ldbc_snb.file.m1.1558053156447
Loading log: '/mnt/data/tigergraph/logs/restpp/restpp_loader_logs/ldbc_snb/ldbc_snb.load_ldbc_snb.file.m1.1558053156447.log'
Job "ldbc_snb.load_ldbc_snb.file.m1.1558053156447" loading status
[FINISHED] m1 ( Finished: 31 / Total: 31 )
[LOADED]
+----------------------------------------------------------------------------------------------------------------------------------+
| FILENAME | LOADED LINES | AVG SPEED | DURATION|
| /mnt/data/download/ldbc_snb_data/social_network/comment_0_0.csv | 2052170 | 281 kl/s | 7.28 s|
| /mnt/data/download/ldbc_snb_data/social_network/comment_hasCreator_person_0_0.csv | 2052170 | 251 kl/s | 8.17 s|
| /mnt/data/download/ldbc_snb_data/social_network/comment_hasTag_tag_0_0.csv | 2698394 | 422 kl/s | 6.38 s|
| /mnt/data/download/ldbc_snb_data/social_network/comment_isLocatedIn_place_0_0.csv | 2052170 | 291 kl/s | 7.04 s|
| /mnt/data/download/ldbc_snb_data/social_network/comment_replyOf_comment_0_0.csv | 1040750 | 253 kl/s | 4.11 s|
| /mnt/data/download/ldbc_snb_data/social_network/comment_replyOf_post_0_0.csv | 1011421 | 248 kl/s | 4.07 s|
| /mnt/data/download/ldbc_snb_data/social_network/forum_0_0.csv | 90493 | 87 kl/s | 1.03 s|
| /mnt/data/download/ldbc_snb_data/social_network/forum_containerOf_post_0_0.csv | 1003606 | 240 kl/s | 4.18 s|
| /mnt/data/download/ldbc_snb_data/social_network/forum_hasMember_person_0_0.csv | 1611870 | 431 kl/s | 3.74 s|
| /mnt/data/download/ldbc_snb_data/social_network/forum_hasModerator_person_0_0.csv | 90493 | 89 kl/s | 1.01 s|
| /mnt/data/download/ldbc_snb_data/social_network/forum_hasTag_tag_0_0.csv | 309767 | 297 kl/s | 1.04 s|
| /mnt/data/download/ldbc_snb_data/social_network/organisation_0_0.csv | 7956 | 7 kl/s | 1.00 s|
|/mnt/data/download/ldbc_snb_data/social_network/organisation_isLocatedIn_place_0_0.csv | 7956 | 7 kl/s | 1.00 s|
| /mnt/data/download/ldbc_snb_data/social_network/person_0_0.csv | 9893 | 9 kl/s | 1.05 s|
| /mnt/data/download/ldbc_snb_data/social_network/person_hasInterest_tag_0_0.csv | 229167 | 223 kl/s | 1.03 s|
| /mnt/data/download/ldbc_snb_data/social_network/person_isLocatedIn_place_0_0.csv | 9893 | 9 kl/s | 1.00 s|
| /mnt/data/download/ldbc_snb_data/social_network/person_knows_person_0_0.csv | 180624 | 169 kl/s | 1.06 s|
| /mnt/data/download/ldbc_snb_data/social_network/person_likes_comment_0_0.csv | 1438419 | 449 kl/s | 3.20 s|
| /mnt/data/download/ldbc_snb_data/social_network/person_likes_post_0_0.csv | 751678 | 331 kl/s | 2.27 s|
| /mnt/data/download/ldbc_snb_data/social_network/person_studyAt_organisation_0_0.csv | 7950 | 7 kl/s | 1.00 s|
| /mnt/data/download/ldbc_snb_data/social_network/person_workAt_organisation_0_0.csv | 21655 | 21 kl/s | 1.00 s|
| /mnt/data/download/ldbc_snb_data/social_network/place_0_0.csv | 1461 | 1 kl/s | 1.00 s|
| /mnt/data/download/ldbc_snb_data/social_network/place_isPartOf_place_0_0.csv | 1455 | 1 kl/s | 1.00 s|
| /mnt/data/download/ldbc_snb_data/social_network/post_0_0.csv | 1003606 | 195 kl/s | 5.14 s|
| /mnt/data/download/ldbc_snb_data/social_network/post_hasCreator_person_0_0.csv | 1003606 | 320 kl/s | 3.13 s|
| /mnt/data/download/ldbc_snb_data/social_network/post_hasTag_tag_0_0.csv | 713259 | 341 kl/s | 2.09 s|
| /mnt/data/download/ldbc_snb_data/social_network/post_isLocatedIn_place_0_0.csv | 1003606 | 327 kl/s | 3.07 s|
| /mnt/data/download/ldbc_snb_data/social_network/tag_0_0.csv | 16081 | 16 kl/s | 1.00 s|
| /mnt/data/download/ldbc_snb_data/social_network/tag_hasType_tagclass_0_0.csv | 16081 | 16 kl/s | 1.00 s|
| /mnt/data/download/ldbc_snb_data/social_network/tagclass_0_0.csv | 72 | 71 l/s | 1.00 s|
|/mnt/data/download/ldbc_snb_data/social_network/tagclass_isSubclassOf_tagclass_0_0.csv | 71 | 70 l/s | 1.00 s|
+----------------------------------------------------------------------------------------------------------------------------------+

After loading, you can check the graph's size using built-in REST endpoint calls.

Below we call two functions, stat_vertex_number and stat_edge_number to return the cardinality of each vertex and edge type.

Linux shell
curl -X POST 'http://localhost:9000/builtins/ldbc_snb' -d '{"function":"stat_vertex_number","type":"*"}' | jq .
curl -X POST 'http://localhost:9000/builtins/ldbc_snb' -d '{"function":"stat_edge_number","type":"*"}' | jq .