Load Data

Define the Loading Job

Below, we use GSQL loading language to define a loading job script, which encodes all the mappings from the source csv file from the LDBC SNB benchmark data generator to our schema.

You can download the below loading script from here.

Prepare The Raw Data

We have generated scale-factor 1 data set (approximate 1GB). You can download it from https://s3-us-west-1.amazonaws.com/tigergraph-benchmark-dataset/LDBC/SF-1/ldbc_snb_data-sf1.tar.gz

After downloading the raw file, you can run tar command below to decompress the downloaded file.

After decompressing the file, you will see a folder named "ldbc_snb_data". Enter it, you will see two subfolders

  • social_network

  • substitution_parameters

The raw data is under the social_network folder.

Run The Loading Job

Download the setup_schema.gsql file, and run the script in the shell command line to setup the schema and the loading job.

Setup the environment variable LDBC_SNB_DATA_DIR pointing to your raw file folder un-tarred in the previous section. In the example below, the raw data is in /home/tigergraph/ldbc_snb_data/social_network. Note, the folder should have the name social_network.

Download the loading job script and invoke it on the command line.

After loading, you can check the graph's size using one of the options of the administrator tool, gadmin. From a Linux shell, enter the command

gadmin status graph -v

You should see VertexCount: 3,181,724 and EdgeCount 34,512,076.