Load Data

Load Data

After mapping data files to the graph schema, you can start loading data. Click "Load Data" on the left side menu bar to go to the Load Data page.

The "Load Data" interface is separated into three parts:

  • Data Mapping Overview

    • Provides a general view of the graph and the data mapping.

    • Shows the loading progress of each data file.

  • Toolbar (above Data Mapping)

    • Start/pause/resume/stop data loading and clear graph data buttons.

  • Statistics

    • Graph statistics: displays the numbers of vertices and edges in total and per type, with real-time loading progress.

    • Loading statistics: displays the total number of vertices and edges loader vs. time.

To display real-time graph statistics, this page checks the number of vertices and edges every 10 seconds, which adds overhead. To maximize loading performance, move to a different page after starting loading, and only come back here occasionally to check the progress.

Start Loading

GraphStudio provides two types of loading:

  • Partial Loading: load a subset of the data files which the user selects.

  • Full Loading: load all of the data files.

Load Some Data Files

Load All Data Sources

Pause Loading

Similar to Start Loading, you can pause loading some of the data files, or all loading data files.

Resume Loading

You can resume loading some or all loading data files which have been paused.

Stop Loading

Statistics Panel

The Statistics panel contains two tabs: Graph Statistics (1st tab) and Data Loading Statistics (2nd tab).

Graph Statistics

By default if no data file is selected, the Statistics panel will show Graph Statistics.

The table at the top shows the total number of vertices and edges in the current graph, and the number of each vertex type and edge type as well. The line chart at the bottom shows the number of vertices and edges over time, when loading is in progress.

Data Loading Statistics

If you click on one data file, the Statistics panel will change to show Data Loading Statistics:

The table at the top shows the detailed loading information of the selected data file, including:

  • Status (RUNNING, PAUSED, STOPPED, etc)

  • Loaded percentage (for files on server) or loaded size (for S3 file)

  • Loading speed

  • Average loading speed

  • Number of loaded lines

  • Number of missing token lines

  • Number of oversize lines

  • Loading start time

  • Loading duration

The area chart in the middle shows the real-time loading speed (lines per second) for this data file.

The pie chart at the bottom shows the distribution of data lines, among three categories:

  • Loaded lines

  • Missing token lines (the lines contain fewer tokens than required by the data mapping)

  • Oversize lines (some tokens are too large)

The number of loaded lines doesn't mean all these lines are successfully loaded. Some issues during Data Mapping (like mapping a non-numeric column to an integer attribute) or because of dirty data may cause some of these lines not to be loaded.

If data file loading encounters any issues and gets an error message, the error message will be shown at the bottom:

Clear Graph Data

Caution: Clear Graph Data deletes all data from your database. The schema and queries will remain. This deletion is irreversible. Please confirm the impact before you proceed with clearing graph data operation.

Tip: Only users with superuser role can clear graph. You can consider assigning other roles to your team to avoid accidental data deletion.

After the clear operation, the graph vertex and edge number statistics will both drop to 0.

After data has been loaded, you can go to the Explore Graph or Write Queries pages.

Last updated