Data Splitting Functions

This class contains functions for data splitting.

RandomVertexSplitter

The split results are stored in the provided vertex attributes. Each boolean attribute indicates which part a vertex belongs to.

Usage:

  • A random 60% of vertices will have their attribute "attr_name" set to True, and others False. attr_name can be any attribute that exists in the database (same below). Example:

    conn = TigerGraphConnection(...)
    splitter = RandomVertexSplitter(conn, timeout, attr_name=0.6)
    splitter.run()
  • A random 60% of vertices will have their attribute "attr_name" set to True, and a random 20% of vertices will have their attribute "attr_name2" set to True. The two parts are disjoint. Example:

    conn = TigerGraphConnection(...)
    splitter = RandomVertexSplitter(conn, timeout, attr_name=0.6, attr_name2=0.2)
    splitter.run()
  • A random 60% of vertices will have their attribute "attr_name" set to True, a random 20% of vertices will have their attribute "attr_name2" set to True, and another random 20% of vertices will have their attribute "attr_name3" set to True. The three parts are disjoint. Example:

    conn = TigerGraphConnection(...)
    splitter = RandomVertexSplitter(conn, timeout, attr_name=0.6, attr_name2=0.2, attr_name3=0.2)
    splitter.run()

Parameters:

  • conn (TigerGraphConnection): Connection to TigerGraph database.

  • timeout (int, optional): Timeout value for the operation. Defaults to 600000.

run()

run() → None

Perform the split.

The split ratios set in initialization can be overridden here.

For example:

splitter = RandomVertexSplitter(conn, timeout, attr_name=0.6);
splitter.run(attr_name=0.3)

The spliter above uses the ratio 0.3 instead of 0.6.

RandomEdgeSplitter

The split results are stored in the provided edge attributes. Each boolean attribute indicates which part an edge belongs to.

Usage:

  • A random 60% of edges will have their attribute "attr_name" set to True, and others False. attr_name can be any attribute that exists in the database (same below). Example:

    conn = TigerGraphConnection(...)
    splitter = conn.gds.edgeSplitter(timeout, attr_name=0.6)
    splitter.run()
  • A random 60% of edges will have their attribute "attr_name" set to True, and a random 20% of edges will have their attribute "attr_name2" set to True. The two parts are disjoint. Example:

    conn = TigerGraphConnection(...)
    splitter = conn.gds.edgeSplitter(timeout, attr_name=0.6, attr_name2=0.2)
    splitter.run()
  • A random 60% of edges will have their attribute "attr_name" set to True, a random 20% of edges will have their attribute "attr_name2" set to True, and another random 20% of edges will have their attribute "attr_name3" set to True. The three parts are disjoint. Example:

    conn = TigerGraphConnection(...)
    splitter = conn.gds.edgeSplitter(timeout, attr_name=0.6, attr_name2=0.2, attr_name3=0.2)
    splitter.run()

Parameters:

  • conn (TigerGraphConnection): Connection to TigerGraph database.

  • timeout (int, optional): Timeout value for the operation. Defaults to 600000.

run()

run() → None

Perform the split.

The split ratios set in initialization can be overridden here. For example:

splitter = RandomVertexSplitter(conn, timeout, attr_name=0.6);
splitter.run(attr_name=0.3)

The splitter above uses the ratio 0.3 instead of 0.6.