Data Loaders

Data loaders are classes in the pyTigerGraph Graph Data Science (GDS) module. You can define an instance of each data loader class through a factory function.

Requires querywriters user permissions for full functionality.

NeighborLoader

A data loader that performs neighbor sampling. You can declare a NeighborLoader instance with the factory function neighborLoader().

A neighbor loader is an iterable. When you loop through a neighbor loader instance, it loads one batch of data from the graph to which you established a connection.

In every iteration, it first chooses a specified number of vertices as seeds, then picks a specified number of neighbors of each seed at random, then the same number of neighbors of each neighbor, and repeat for a specified number of hops. It loads both the vertices and the edges connecting them to their neighbors. The vertices sampled this way along with their edges form one subgraph and is contained in one batch.

You can iterate on the instance until every vertex has been picked as seed.

data

data → Any

A property of the instance. The data property stores all data if all data is loaded in a single batch. If there are multiple batches of data, the data property returns the instance itself

Examples:

The following example iterates over a neighbor loader instance.

for i, batch in enumerate(neighbor_loader):
print("----Batch {}----".format(i))
print(batch)

See the ML Workbench tutorial notebook for examples. See more details about the specific sampling method in Inductive Representation Learning on Large Graphs.

fetch()

fetch(vertices: list) → None

Fetch neighborhood subgraphs for specific vertices.

Parameter:

vertices (list of dict): Vertices to fetch with their neighborhood subgraphs. Each vertex corresponds to a dict with two mandatory keys {"primary_id": …, "type": …}

EdgeLoader

Data loader that loads all edges from the graph in batches. You can define an edge loader using the edgeLoader() factory function.

An edge loader instance is an iterable. When you loop through an edge loader instance, it loads one batch of data from the graph to which you established a connection in each iteration. The size and total number of batches are specified when you define the edge loader instance.

The boolean attribute provided to filter_by indicates which edges are included. If you need random batches, set shuffle to True.

data

data → Any

A property of the instance. The data property stores all edges if all data is loaded in a single batch. If there are multiple batches of data, the data property returns the instance itself.

Examples:

The following for loop prints every edge in batches.

Input
Output

edge_loader = conn.gds.edgeLoader(
num_batches=10,
attributes=["time", "is_train"],
shuffle=True,
filter_by=None
)
for i, batch in enumerate(edge_loader):
print("----Batch {}: Shape {}----".format(i, batch.shape))
print(batch.head(1))

----Batch 0: Shape (1129, 4)----
source    target  time  is_train
0  3145728  22020185     0         1
----Batch 1: Shape (1002, 4)----
source    target  time  is_train
0  1048577  20971586     0         1
----Batch 2: Shape (1124, 4)----
source   target  time  is_train
0       4  9437199     0         1
----Batch 3: Shape (1071, 4)----
source    target  time  is_train
0  11534340  32505859     0         1
----Batch 4: Shape (978, 4)----
source    target  time  is_train
0  11534341  16777293     0         1
----Batch 5: Shape (1149, 4)----
source   target  time  is_train
0  5242882  2097158     0         1
----Batch 6: Shape (1013, 4)----
source    target  time  is_train
0  4194305  23068698     0         1
----Batch 7: Shape (1037, 4)----
source   target  time  is_train
0  7340035  4194337     0         0
----Batch 8: Shape (1067, 4)----
source   target  time  is_train
0       3  1048595     0         1
----Batch 9: Shape (986, 4)----
source    target  time  is_train
0  9437185  13631508     0         1

See the ML Workbench edge loader tutorial notebook for examples.

VertexLoader

Data loader that loads all vertices from the graph in batches.

A vertex loader instance is an iterable. When you loop through a vertex loader instance, it loads one batch of data from the graph to which you established a connection in each iteration. The size and total number of batches are specified when you define the vertex loader instance.

The boolean attribute provided to filter_by indicates which vertices are included. If you need random batches, set shuffle to True.

data

data → Any

A property of the instance. The data property stores all data if all data is loaded in a single batch. If there are multiple batches of data, the data property returns the instance itself.

Examples:

The following for loop loads all vertices in the graph and prints one from each batch:

Input
Output

vertex_loader = conn.gds.vertexLoader(
num_batches=10,
attributes=["time", "is_train"],
shuffle=True,
filter_by=None
)

for i, batch in enumerate(edge_loader):
print("----Batch {}: Shape {}----".format(i, batch.shape))
print(batch.head(1)) (1)

1	Since the example does not provide an output format, the output format defaults to panda frames, have access to the methods of panda frame instances.

----Batch 0: Shape (1129, 4)----
source    target  time  is_train
0  3145728  22020185     0         1
----Batch 1: Shape (1002, 4)----
source    target  time  is_train
0  1048577  20971586     0         1
----Batch 2: Shape (1124, 4)----
source   target  time  is_train
0       4  9437199     0         1
----Batch 3: Shape (1071, 4)----
source    target  time  is_train
0  11534340  32505859     0         1
----Batch 4: Shape (978, 4)----
source    target  time  is_train
0  11534341  16777293     0         1
----Batch 5: Shape (1149, 4)----
source   target  time  is_train
0  5242882  2097158     0         1
----Batch 6: Shape (1013, 4)----
source    target  time  is_train
0  4194305  23068698     0         1
----Batch 7: Shape (1037, 4)----
source   target  time  is_train
0  7340035  4194337     0         0
----Batch 8: Shape (1067, 4)----
source   target  time  is_train
0       3  1048595     0         1
----Batch 9: Shape (986, 4)----
source    target  time  is_train
0  9437185  13631508     0         1

See the ML Workbench tutorial notebook for more examples.

GraphLoader

Data loader that loads all edges from the graph in batches, along with the vertices that are connected with each edge.

Different from NeighborLoader which produces connected subgraphs, this loader loads all edges by batches and vertices attached to those edges.

There are two ways to use the data loader:

It can be used as an iterable, which means you can loop through it to get every batch of data. If you load all data at once (num_batches=1), there will be only one batch (of all the data) in the iterator.
You can access the data property of the class directly. If there is only one batch of data to load, it will give you the batch directly instead of an iterator, which might make more sense in that case. If there are multiple batches of data to load, it will return the loader itself.

data

data → Any

A property of the instance. The data property stores all data if all data is loaded in a single batch. If there are multiple batches of data, the data property returns the instance itself

Examples:

The following for loop prints all edges and their connected vertices in batches. The output format is PyG:

Input
Output

graph_loader = conn.gds.graphLoader(
num_batches=10,
v_in_feats = ["x"],
v_out_labels = ["y"],
v_extra_feats = ["train_mask", "val_mask", "test_mask"],
e_in_feats=["time"],
e_out_labels=[],
e_extra_feats=["is_train", "is_val"],
output_format = "PyG",
shuffle=True,
filter_by=None
)
for i, batch in enumerate(graph_loader):
print("----Batch {}----".format(i))
print(batch)

----Batch 0----
Data(edge_index=[2, 1128], edge_feat=[1128], is_train=[1128], is_val=[1128], x=[1061, 1433], y=[1061], train_mask=[1061], val_mask=[1061], test_mask=[1061])
----Batch 1----
Data(edge_index=[2, 997], edge_feat=[997], is_train=[997], is_val=[997], x=[1207, 1433], y=[1207], train_mask=[1207], val_mask=[1207], test_mask=[1207])
----Batch 2----
Data(edge_index=[2, 1040], edge_feat=[1040], is_train=[1040], is_val=[1040], x=[1218, 1433], y=[1218], train_mask=[1218], val_mask=[1218], test_mask=[1218])
----Batch 3----
Data(edge_index=[2, 1071], edge_feat=[1071], is_train=[1071], is_val=[1071], x=[1261, 1433], y=[1261], train_mask=[1261], val_mask=[1261], test_mask=[1261])
----Batch 4----
Data(edge_index=[2, 1091], edge_feat=[1091], is_train=[1091], is_val=[1091], x=[1163, 1433], y=[1163], train_mask=[1163], val_mask=[1163], test_mask=[1163])
----Batch 5----
Data(edge_index=[2, 1076], edge_feat=[1076], is_train=[1076], is_val=[1076], x=[1018, 1433], y=[1018], train_mask=[1018], val_mask=[1018], test_mask=[1018])
----Batch 6----
Data(edge_index=[2, 1054], edge_feat=[1054], is_train=[1054], is_val=[1054], x=[1249, 1433], y=[1249], train_mask=[1249], val_mask=[1249], test_mask=[1249])
----Batch 7----
Data(edge_index=[2, 1006], edge_feat=[1006], is_train=[1006], is_val=[1006], x=[1185, 1433], y=[1185], train_mask=[1185], val_mask=[1185], test_mask=[1185])
----Batch 8----
Data(edge_index=[2, 1061], edge_feat=[1061], is_train=[1061], is_val=[1061], x=[1250, 1433], y=[1250], train_mask=[1250], val_mask=[1250], test_mask=[1250])
----Batch 9----
Data(edge_index=[2, 1032], edge_feat=[1032], is_train=[1032], is_val=[1032], x=[1125, 1433], y=[1125], train_mask=[1125], val_mask=[1125], test_mask=[1125])

See the ML Workbench tutorial notebook for graph loaders for examples.

EdgeNeighborLoader

A data loader that performs neighbor sampling from seed edges. You can declare a EdgeNeighborLoader instance with the factory function edgeNeighborLoader().

An edge neighbor loader is an iterable. When you loop through a loader instance, it loads one batch of data from the graph to which you established a connection.

In every iteration, it first chooses a specified number of edges as seeds, then starting from the vertices attached to those seed edges, it picks a specified number of neighbors of each vertex at random, then the same number of neighbors of each neighbor, and repeat for a specified number of hops. It loads both the vertices and the edges connecting them to their neighbors. The edges and vertices sampled this way form one subgraph and is contained in one batch.

You can iterate on the instance until every edge has been picked as seed.

data

data → Any

A property of the instance. The data property stores all data if all data is loaded in a single batch. If there are multiple batches of data, the data property returns the instance itself

Examples:

The following example iterates over an edge neighbor loader instance.

for i, batch in enumerate(edge_neighbor_loader):
print("----Batch {}----".format(i))
print(batch)

See the ML Workbench tutorial notebook for examples.

NodePieceLoader

A data loader that performs NodePiece sampling from the graph. You can declare a NodePieceLoader instance with the factory function nodePieceLoader().

A NodePiece loader is an iterable. When you loop through a loader instance, it loads one batch of data from the graph to which you established a connection.

In every iteration, the NodePiece loader selects a group of seed vertices of size batch size. For each vertex in the batch, it will produce a set of the k closest "anchor" vertices in the graph, as well as up to j edge types. For more information on the NodePiece data loading scheme, the blog article and paper are good places to start.

You can iterate on the instance until every vertex has been picked as seed.

Examples:

The following example iterates over an NodePiece loader instance.

for i, batch in enumerate(node_piece_loader):
print("----Batch {}----".format(i))
print(batch)

saveTokens()

saveTokens(filename) → None

Save tokens to pickle file

Parameter:

filename (str): Filename to save the tokens to.

reinstall_query()

reinstall_query() → str

Reinstall the dataloader query.

Returns:

The name of the query installed (str)

data

data → Any

A property of the instance. The data property stores all data if all data is loaded in a single batch. If there are multiple batches of data, the data property returns the instance itself.

fetch()

fetch(vertices: list) → None

Fetch NodePiece results (anchors, distances, and relational context) for specific vertices.

Parameter:

vertices (list of dict): Vertices to fetch with their NodePiece results. Each vertex corresponds to a dict with two mandatory keys {"primary_id": …, "type": …}

precompute()

precompute() → None

Compute NodePiece results (anchors and their distances) to cache attribute.

HGTLoader

A data loader that performs stratified neighbor sampling as in Heterogeneous Graph Transformer. You can declare a HGTLoader instance with the factory function hgtLoader().

A HGT loader is an iterable. When you loop through a HGT loader instance, it loads one batch of data at a time from the graph.

In every iteration, it first chooses a specified number of vertices as seeds, then picks a specified number of neighbors of each type at random, then the specified number of neighbors of every type of each neighbor, and repeat for a specified number of hops. It loads both the vertices and the edges connecting them to their neighbors. The vertices sampled this way along with their edges form one subgraph and is contained in one batch.

You can iterate on the instance until every vertex has been picked as seed.

Examples:

The following example iterates over a HGT loader instance.

for i, batch in enumerate(hgt_loader):
print("----Batch {}----".format(i))
print(batch)

See more details about the specific sampling method in Heterogeneous Graph Transformer.

data

data → Any

A property of the instance. The data property stores all data if all data is loaded in a single batch. If there are multiple batches of data, the data property returns the instance itself

fetch()

fetch(vertices: list) → None

Fetch neighborhood subgraphs for specific vertices.

Parameter:

vertices (list of dict): Vertices to fetch with their neighborhood subgraphs. Each vertex corresponds to a dict with two mandatory keys {"primary_id": …, "type": …}