Data Loading

Once you have defined a graph schema, you can load data into the graph store. The data loading process varies based on where the data files are located relative to the TigerGraph server.

If you are loading from files that exist locally on a TigerGraph server, create and run a loading job in GSQL to load the files.

Loading data from an outside source, such as cloud storage, requires one additional step of first defining the data source for GSQL to work with. This process is described in this documentation section. We refer to this as "data streaming" because TigerGraph can be configured to automatically load new data as it appears in the outside source.

Set up a data source for a data streaming loading job

GSQL uses a user-provided configuration file to automatically set up a streaming data connection and a loading job for data in these external cloud data hosts:

  • Google Cloud Storage (GCS)

  • AWS S3

  • Azure Blob Storage (ABS)

  • Google BigQuery

Go to the GSQL Data Streaming Connector main page for instructions on setting up the loading job.

The data streaming will stage temporary data files on the database server’s disk. You should have free disk space of at least 2 times the size of your total (uncompressed) input data.

Manual connector setup

For data stored in an external Kafka cluster, you need to perform a few more steps to set up data streaming. Using gadmin server commands, you first create a connector to interpret the data source, then define the data source, create the loading job, and run it.

See the Kafka cluster streaming page for more information.

This method relies on the TigerGraph Kafka Loader.