Map Data To Graph

After you have created a graph schema, the next major step is to map your data to the schema. Click "Map Data To Graph" on the left side menu bar.

The working panel is split into a left panel and a right panel. Adjust the panel sizes with the Expand Panels buttons at the top: image

Before any data is mapped, the left panel will display only the graph schema.

map data to graph overview

The main steps are:

  1. Add a data source

  2. Add data file(s)

  3. Map data file(s) to vertex/edge types

  4. Map data file columns to vertex/edge fields

  5. Publish data mapping

Data Source

The GraphStudio term "data source" is separate from the GSQL DATA SOURCE keyword.

In GraphStudio, a data source is a connection to where one or more data files are stored, either locally on the TigerGraph server or remotely in cloud storage (Amazon S3, Google Cloud Storage (GCS), or Azure Blob Storage (ABS)). A data file is a file containing structured data to be loaded into the graph, creating vertex and/or edge instances.

Visibility of Local Files

When the Local File data source option is used, these files will be visible to any user with access to GraphStudio for this database. For sensitive data, we strongly recommend using one of the cloud storage options coupled with their cloud access security features.

In the data mapping process, you are not loading data from files directly, but rather creating links to set up the process of loading the files in the next step. For example, if you connect to remote storage, you provide the metadata and access credentials necessary to access the files during the loading process. If you upload a local file, you store that file on the TigerGraph server.

Click the Add Data File button add data source btn in the toolbar to begin the process of choosing a data source and adding the data files.

choose a data source type

Data File

A data file is the actual structured data on the TigerGraph server that is loaded into your graph to create vertices and edges.

Folders

Remote data sources support selecting entire folders at the same time. Local file upload does not currently support uploading a folder from your browser to the server.

If you select a folder, all of its data files must share the same data schema.

Compressed files

TigerGraph supports loading from archived and compressed files only through remote storage connections. Currently supported file extensions include zip, tar.gz, tgz and tar. GraphStudio detects the file extension and automatically chooses the corresponding file format. If the file is encoded with one of these formats but has a non-standard file extension, you can manually specify the file format.

Data Loading

The following tabs contain instructions for each data source type.

  • Local File

  • Amazon S3

  • Google Cloud Storage

  • Azure Blob Storage

files on server browse

After clicking Local File, you will be prompted to choose one or more local files to upload through your browser to the TigerGraph server. If you have uploaded local files to the server previously, they appear in the file list in the left panel.

Supported files:

  • CSV

  • TSV

  • JSON Lines

Unsupported files:

  • Zip archives

  • Tar archives

Compressed files are only supported through remote storage connections.

A success message for each file appears after each file is successfully uploaded to the TigerGraph server. Once the files have finished, choose the file or files you want to work with from the left panel and click Next.

Confirm that the data has parsed correctly in the next step.

aws add data source

After you click the S3 data source icon, you will be prompted for your:

  • AWS access key ID

  • AWS secret access key

  • Connection alias (a name for you to give to the connection)

Identify your file as CSV or JSON format.

load data csv or json

Enter the S3 URI. Follow the instructions here to retrieve it: Accessing a bucket using S3 URI

enter s3 uri

Confirm that the data has parsed correctly in the next step.

add data source from gcs

Browse your computer or drag and drop to upload your GCS account key file. Google provides a guide to generating and downloading key files at this link: Getting a service account key.

After you enter your key, identify your file as CSV or JSON format.

load data csv or json

Enter the gsutil URI for your data file in your Google Cloud Storage bucket.

gcs data source

Confirm that the data has parsed correctly in the next step.

azure add data source

After you click the ABS data source icon, you will be prompted for your Connection String and a custom alias for the connection (required). See View Account Access Keys for instructions.

After you enter your key, identify your file as CSV or JSON format.

load data csv or json

Enter the Blob URL.

azure blob url

Confirm that the data has parsed correctly in the next step.

Confirm data parsing

Whether loading from a local file on the server or from a file connected from remote storage, the last step is to check over a preview of the parsed data. In this example, the parser is working with a local file, but the process is identical for remote files as well.

examine csv

CSV file parsing

If your data file is in tabular format, the parser splits each line into a series of tokens. If the parsing is not correct, choose a different option for the file format, delimiter, or end of line character.

The enclosing character is used to mark the boundaries of a token, overriding the delimiter character. For example, if your delimiter is a comma, but you have commas in some strings, then you can define single or double quotes as the enclosing character to mark the endpoints of your string tokens.

It is not necessary for every token to have enclosing characters. The parser uses enclosing characters when it encounters them.

You can edit the header line of the parsing result to give each column a more intuitive name, since you will be referring to these names when loading data to the graph. The header name is ignored during data loading.

JSON file parsing

GraphStudio supports loading files in JSON format as well as in CSV or TSV format. Each line in the uploaded file must contain exactly one JSON object.

Similar to loading a CSV or TSV, you will first see a preview of the JSON file so that you can check the parsing.

After looking at the preview, you may edit the data key and data type for each of the JSON fields.

json data types

In this stage, you specify the data types for interpreting each JSON key as a potential object to load to a vertex or edge attribute. Here, you can also delete any keys that you do not want to load.

Once you are satisfied with the file parsing configuration, click the ADD button to add the data file into the left working panel.

Folder parsing

The folder preview, like the file preview, is limited to the first ten lines of uploaded data. If a folder contains more than one file and the first file has more than ten lines, only the first ten lines of the first file will appear in the preview.

Map data files to vertex type or edge type

In this step, you link (map) a data file to a target vertex type or edge type. The mapping can be many-to-many, which means one data file can map to multiple vertex and/or edge types, and multiple data files can map to the same vertex or edge type. Click the map data file to vertex or edge button map file to ve to enter map data file to vertex or edge mode.

First, click the data file icon.

Screen Shot 2019 05 16 at 1.05.30 PM

Next, click the target vertex type circle or edge type link to create a dashed link representing the mapping:

Screen Shot 2019 05 16 at 2.20.53 PM

A red hint appears if the target type has not yet received a mapping for its primary id(s).

Map data columns to vertex or edge attributes

In this step, you link particular columns of a data file to particular ids or attributes of a vertex type or edge type.

First, choose one data mapping from one data file to one vertex or edge type (represented as a dashed green link on the left working panel).

When selected, the dashed line becomes orange (active), and the right working panel will show two tables with the data file and target vertex or edge fields.

1

Drag and drop from the left table to the right table to map the attributes to a target field. The left table contains the CSV columns or JSON keys. The target field is either an attribute of the vertex/edge, a primary id for a vertex, or a source and target id for an edge.

A green arrow appears to show the mapping.

3

Repeat as needed to create all the mappings for this table-to-vertex/edge pair. Since many-to-one mapping is allowed, it is not necessary for one table to provide a mapping for every field in the target vertex/edge.

Data must be loaded for all Discriminator attributes on an edge. Edges cannot have Discriminator attributes with no data loaded to them.

Advanced data transformation

See the page on Data Transformation for information about making changes to the data during the loading process.

Data transformation includes token functions, data filtering (equivalent to a WHERE clause during data loading), and mapping data to Map type attributes.

Auto mapping

If the data file columns and the vertex/edge attributes have very similar names (only capitalization and hyphen differences), click the auto mapping button auto mapping btn. All matching or similar columns will be mapped automatically.

Undo and redo

You can undo or redo changes by clicking the Back or Forward buttons in the toolbar: image. The whole history since the time you entered the Map Data To Graph page is recorded.

Delete options

In the Map Data To Graph page, you can delete anything that you added, including data files, mapping between files and vertices/edges, mapping between data columns and vertex/edge attributes, and token functions. Choose what you want to delete, then click the delete button image . Press the "Shift" key to select multiple icons you want to delete. Note that you cannot delete vertex or edge types in this page.

For example, to delete a data file mapping, select the dashed green link(s) between the data file and the vertex/edge type, then click the delete button.

image

If you remove a file from the server, you also need to manually remove data mapping using that file. Otherwise, a "file not on server" error will be triggered when loading data.

Publish data mapping

Once you are satisfied with the data loading procedure, click the publish button image to publish it to the TigerGraph system. It takes a few seconds to publish each data file mapping.