Workload Management

Certain TigerGraph operations, such as running online analytical processing (OLAP) queries that touch a lot of data, can be memory-intensive. TigerGraph provides the following mechanisms for you to manage workload in your TigerGraph instances.

Workload Queue

You can configure workload queues so that queries are routed to the appropriate queues during runtime. Each queue has a few properties, such as the maximum number of concurrent queries allowed and the maximum number of queries that can be queued so it can help prevent the system overload. You can grant workload queues to users based on their roles so that the users can submit queries to the appropriate workload queues to be managed.

What APIs are managed by the workload queue?

The following types of requests will be routed to either the default workload queue or the one specified by the user:

  • Run installed queries.

  • Interpret queries.

  • Run heavy built-in queries, mostly used to "Explore Graph" in GraphStudio.

Configurations

You can toggle the workload queue feature on and off, and add, update, or delete workload queues as you need.

Put Workload Queue

POST /gsqlserver/gsql/workload-manager/configs

Upload the workload queue configs.

Request Body

The request body expects a JSON object with the following schema:

{
  "isEnabled": true,
  "queues": {
    "OLTP": {
      "description": "OLTP queries",
      "isDefault": false,
      "maxConcurrentQueries": 100,
      "maxDelayQueueSize": 200
    },
    "scheduled_jobs": {
      "description": "Scheduled jobs",
      "maxConcurrentQueries": 10,
      "maxDelayQueueSize": 20
    },
    "AdHoc": {
      "description": "Ad-hoc queries",
      "isDefault": true,
      "maxConcurrentQueries": 1,
      "maxDelayQueueSize": 2
    }
  }
}

The request body must have the following fields at the top level:

Field Description Data type

isEnabled

The feature flag to enable or disable the workload queue.

BOOL

queues

The map of the available workload queues.

OBJECT

Objects under queues consist of queue ID (key) and properties (value).

The queue ID must be a string of less than 64 characters including alphanumeric and underscore.

Each queue has the following properties:

Field Description Data type

description

The description of the queue.

STRING (< 256 characters)

isDefault (optional)

The flag to indicate if the queue is the default queue. Must be set to true for exactly one queue.

BOOL

maxConcurrentQueries

The maximum number of concurrent queries allowed in the queue.

UINT in the range of (0, 131072)

maxDelayQueueSize

The maximum number of queries that can be queued in the delay queue.

UINT in the range of [0, 131072)

maxConcurrentQueries and maxDelayQueueSize

maxConcurrentQueries and maxDelayQueueSize are enforced on per machine level. More specifically, it puts a limit on how many requests ONE GPE process can handle.

For example, in a TigerGraph cluster with 4 nodes (there will be 4 GPE processes), the total number of qureies allowed for a WorkloadQueue is 4*maxConcurrentQueries. Similarly, the total number of queries can be put into the corresponding delay queue is 4*maxDelayQueueSize.

TigerGraph internally would try to evenly distribute queries evenly among the nodes, hence, the WorkloadQueue from each GPE would be filled at a similar pace.

The query concurrency is also confined by the number of physical cores that the machine has. Therefore, maxConcurrentQueries of a WorkloadQueue is not recommended to be too large (i.e less than 3 * number of machine’s physical cores).

Once the configurations change, GPE must be restarted to take effect.

Examples

To modify the whole config:

curl -X POST -u tigergraph:tigergraph \
  <hostname>:<nginx-port>/gsqlserver/gsql/workload-manager/configs \
  -d '{"isEnabled":true,"queues":{"OLTP":{"description":"OLTP queries","isDefault":false,"maxConcurrentQueries":100,"maxDelayQueueSize":200},"scheduled_jobs":{"description":"Scheduled jobs","maxConcurrentQueries":10,"maxDelayQueueSize":20},"AdHoc":{"description":"Ad-hoc queries","isDefault":true,"maxConcurrentQueries":1,"maxDelayQueueSize":2}}}'

To just toggle the feature flag, simply skip queues:

curl -X POST -u tigergraph:tigergraph \
  <hostname>:<nginx-port>/gsqlserver/gsql/workload-manager/configs \
  -d '{"isEnabled":true}'

To add, delete, or update the queues while keeping the feature flag untouched, simply skip isEnabled:

curl -X POST -u tigergraph:tigergraph \
  <hostname>:<nginx-port>/gsqlserver/gsql/workload-manager/configs \
  -d '{"queues":{"OLTP":{"description":"OLTP queries","isDefault":false,"maxConcurrentQueries":100,"maxDelayQueueSize":200},"scheduled_jobs":{"description":"Scheduled jobs","maxConcurrentQueries":10,"maxDelayQueueSize":20},"AdHoc":{"description":"Ad-hoc queries","isDefault":true,"maxConcurrentQueries":1,"maxDelayQueueSize":2}}}'
Response Status Codes

Status Code

Description

200

The queue configs have been uploaded successfully.

400

The payload is ill-formed.

403

The user doesn’t have the privilege WRITE_WORKLOAD_QUEUE.

GSQL Command

From a local file:

PUT WORKLOAD QUEUE FROM "/path/to/queue.json"

From a raw string:

PUT WORKLOAD QUEUE FROM "{\"queues\":{\"OLTP\":{\"description\":\"OLTP queries\",\"isDefault\":false,\"maxConcurrentQueries\":100,\"maxDelayQueueSize\":200},\"scheduled_jobs\":{\"description\":\"Scheduled jobs\",\"maxConcurrentQueries\":10,\"maxDelayQueueSize\":20},\"AdHoc\":{\"description\":\"Ad-hoc queries\",\"isDefault\":true,\"maxConcurrentQueries\":1,\"maxDelayQueueSize\":2}}}"

Get Workload Queue

GET /gsqlserver/gsql/workload-manager/configs

Dump the queue configs so that the response would be the equivalent of the payload for POST. The purpose of this API is to retrieve the active configs and modify them on top of it. Other than the administrative purposes, one may use GET WORKLOAD QUEUE instead.

Example Request
curl -X GET -u tigergraph:tigergraph \
  <hostname>:<nginx-port>/gsqlserver/gsql/workload-manager/configs
Response Status Codes

Status Code

Description

200

The queue configs have been retrieved successfully.

403

The user doesn’t have the privilege READ_WORKLOAD_QUEUE.

GSQL Command
GET WORKLOAD QUEUE

Permissions

You can grant or revoke workload queues to a user based on its user name, groups, and/or roles.

Grant/Revoke Workload Queue

POST /gsqlserver/gsql/workload-manager/permission

Grant a workload queue to users, groups, and/or roles.

Request Body

The request body expects a JSON object with the following schema:

{
  "OLTP": {
    "granted": {
      "USER": []
      "GROUP": ["*"]
      "ROLE": ["r1", "r2"]
    }
  }
}

The request body must have the following fields at the top level:

Field

Description

Data type

action

GRANT or REVOKE (case insensitive)

STRING

queue

The ID of the queue to be granted or revoked.

STRING

user (optional)

The list of the user names to be granted/revoked.

STRING or STRING[]

group (optional)

The list of the group names to be granted/revoked.

STRING or STRING[]

role (optional)

The list of the role names to be granted/revoked.

STRING or STRING[]

TIP: You can use the wildcard " * " to grant/revoke the queue to all users, groups, or roles. Note that " * " must be the only entry in the list when available.

Example Request

Grant the queue OLTP to the user u1 and u2:

curl -X GET -u tigergraph:tigergraph \
  <hostname>:<nginx-port>/gsqlserver/gsql/workload-manager/permission \
  -d '{"action": "grant", "queue": "OLTP", "user": ["u1", "u2"]}'

Revoke the queue scheduled_jobs from all users and the role r1:

curl -X GET -u tigergraph:tigergraph \
  <hostname>:<nginx-port>/gsqlserver/gsql/workload-manager/permission \
  -d '{"action": "REVOKE" "queue": "scheduled_jobs", "user": "*", role": ["r1"]}'
Response Status Codes

Status Code

Description

200

The queue has been granted/revoked successfully.

400

The payload is ill-formed so none of the given entities could be granted/revoked.

403

The user doesn’t have the privilege WRITE_WORKLOAD_QUEUE`

GSQL Command
# GRANT
GRANT WORKLOAD QUEUE OLTP TO USER u1, u2
GRANT WORKLOAD QUEUE OLTP TO GROUP g1, g2
GRANT WORKLOAD QUEUE OLTP TO ROLE r1, r2
GRANT WORKLOAD QUEUE OLTP TO ALL USERS
GRANT WORKLOAD QUEUE OLTP TO ALL GROUPS
GRANT WORKLOAD QUEUE OLTP TO ALL ROLES

# REVOKE
REVOKE WORKLOAD QUEUE OLTP FROM USER u1, u2
REVOKE WORKLOAD QUEUE OLTP FROM GROUP g1, g2
REVOKE WORKLOAD QUEUE OLTP FROM ROLE r1, r2
REVOKE WORKLOAD QUEUE OLTP FROM ALL USERS
REVOKE WORKLOAD QUEUE OLTP FROM ALL GROUPS
REVOKE WORKLOAD QUEUE OLTP FROM ALL ROLES
Unlike REST API, the GSQL commands don’t allow you to specify USER, GROUP, and ROLE in a command. You must use separate commands for each entity type.

Show Workload Queue

GET gsqlserver/gsql/workload-manager/permission

Show info on a specific workload queue or all.

Query Parameters
Parameter Description Data type

id (optional)

The ID of the queue to be shown. If not specified, all queues will be shown.

STRING

Example Request

To retrieve the permission info of the queue OLTP:

curl -X GET -u tigergraph:tigergraph \
  localhost:14240/gsql/workload-manager/permission?id=OLTP
Example Response

The response will be the combination of configs and permission, e.g.

{
  "OLTP": {
    "description": "OLTP queries",
    "isDefault": false,
    "maxConcurrentQueries": 100,
    "maxDelayQueueSize": 200,
    "granted": {
      "USER": [],
      "GROUP": ["*"],
      "ROLE": ["r1", "r2"]
    }
  }
}
Response Status Codes
Status Code Description

200

The queue info has been retrieved successfully.

403

The user doesn’t have the privilege READ_WORKLOAD_QUEUE.

GSQL Command

To show the permission info of all queues:

GET WORKLOAD QUEUE

To show the permission info of a specific queue, for example OLTP:

GET WORKLOAD QUEUE OLTP

List Workload Queue

GET restpp/workload-manager/queue

List all granted workload queues to the current user so the user can choose the appropriate queue from the list.

Example Request
curl -X GET -u tigergraph:tigergraph \
  <hostname>:<nginx-port>/restpp/workload-manager/queue
Example Response

The response will include the information available to the general users.

[
  {
    "id": "AdHoc",
    "description": "Ad-hoc queries",
    "isDefault": true
  },
  {
    "id": "OLTP",
    "description": "OLTP queries"
  }
]
Response Status Codes
Status Code Description

200

The queue info has been retrieved successfully.

403

The user doesn’t have the privilege READ_DATA.

Use Cases

Suppose we have configured the following workload queues that are the output of the SHOW WORKLOAD QUEUE command:

{
  "OLTP": {
    "description": "OLTP queries",
    "isDefault": true,
    "maxConcurrentQueries": 100,
    "maxDelayQueueSize": 100,
    "granted": {
      "USER": [],
      "GROUP": ["g1", "g2"],
      "ROLE": []
    }
  },
  "scheduled_jobs": {
    "description": "Scheduled jobs",
    "maxConcurrentQueries": 5,
    "maxDelayQueueSize": 0,
    "granted": {
      "USER": ["u1"],
      "GROUP": [],
      "ROLE": ["r1"]
    }
  },
  "AdHoc": {
    "description": "Ad-hoc queries",
    "isDefault": false,
    "maxConcurrentQueries": 10,
    "maxDelayQueueSize": 10,
    "granted": {
      "USER": [],
      "GROUP": ["g3"],
      "ROLE": ["r2"]
    }
  }
}
Running a Query

When running a query, you can specify the workload queue to run the query on. If the queue is not specified, the query will be routed to the default queue. To specify the queue in the GSQL shell, you can use the -queue option, e.g.

RUN QUERY -queue AdHoc q1()

or you can use the HTTP header Workload-Queue:

curl -X POST -u tigergraph:tigergraph \
  -H "Workload-Queue: AdHoc" \
  <hostname>:14240/restpp/query/ldbc_snb/q1"

If the given queue is not granted to the current user, the query will be rejected with the error code REST-14000 and return HTTP 422 Unprocessable Entity.

For example, if the user tigergraph who does not belong to the group g3 or holds the role r2 tries to run a query on the queue AdHoc, the query will be rejected.

If the queue is full of capacity, the query will be rejected.

Monitoring

You can use the following API to check the status of the workload queues for monitoring purposes.

Check Running Queries
POST /restpp/workload-manager/queuestatus

Return the status of the given workload queue on each GPE instance.

Request Body

Field

Description

Data type

queuelist (optional)

The list of the ID of the WorkloadQueue. If not specified, all queues will be shown.

STRING[]

mode (optional)

stats or verbose (case-sensitive). If not specified, stats will be used.

STRING

For mode field, if stats is specified, response only gives the numbers of queries waiting and running. If verbose is specified, the response will include the the request Ids of the queries that are waiting and running.

If Request Body is not provided, response is generated as if both fields are using the default values.

Example Request
curl -X POST -u tigergraph:tigergraph \
  <hostname>:<nginx-port>/restpp/workload-manager/queuestatus \
   -d '{"queuelist": ["AdHoc"], "mode": "verbose"}'
Example Response
{
  "version": {
    "edition": "enterprise",
    "api": "v2",
    "schema": 0
  },
  "error": false,
  "message": "Completes",
  "WorkloadQueueStatusByInstances": [
    {
      "version": {
        "edition": "enterprise",
        "api": "v2",
        "schema": 0
      },
      "error": false,
      "message": "",
      "results": {
        "GPE_2_1": [
          {
            "WorkloadQueueName": "AdHoc",
            "maxConcurrentQueries": 1,
            "maxDelayQueueSize": 2,
            "runningQueries": [
              "196702.RESTPP_1_1.1707799387957.N"
            ],
            "delayQueries": [
              "65630.RESTPP_1_1.1707799387958.N"
            ]
          }
        ]
      }
    },
    {
      "version": {
        "edition": "enterprise",
        "api": "v2",
        "schema": 0
      },
      "error": false,
      "message": "",
      "results": {
        "GPE_1_1": [
          {
            "WorkloadQueueName": "AdHoc",
            "maxConcurrentQueries": 1,
            "maxDelayQueueSize": 2,
            "runningQueries": [
              "94.RESTPP_1_1.1707799387957.N"
            ],
            "delayQueries": [
              "131167.RESTPP_1_1.1707799387959.N"
            ]
          }
        ]
      }
    }
  ],
  "code": "REST-0000"
}

Other Query Concurrency Control Methods

Limit the number of current built-in heavy queries

This configuration is deprecated as of TG 3.10.0 and will be removed in a future release. This is ignored once the workload queue feature is enabled.

TigerGraph has a few built-in queries that are memory-intensive, here referred to as "heavy". These queries tend to be invoked by applications such as GraphStudio. You can set a limit of how many of these heavy queries are allowed to run concurrently by configuring the parameter RESTPP.WorkLoadManager.MaxHeavyBuiltinQueries with the gadmin config command.

For example, to set the maximum number of heavy built-in queries to 10, run the following command:

$ gadmin config set RESTPP.WorkLoadManager.MaxHeavyBuiltinQueries 10

You must restart the RESTPP service for the change to take effect.

Limit number of concurrent queries

This configuration is deprecated as of TG 3.10.0 and will be removed in a future release. This is ignored once the workload queue feature is enabled.

You can use the RESTPP.WorkLoadManager.MaxConcurrentQueries parameter to set a limit of how many queries are allowed to be running concurrently. The count of these queries does not include the built-in heavy queries.

For example, to specify that there can only be 50 concurrent queries at a time, excluding the heavy built-in queries, change the value of the configuration parameter to 50 with the gadmin config command:

$ gadmin config set RESTPP.WorkLoadManager.MaxConcurrentQueries 50

If the maximum number of concurrent queries is reached, newly submitted queries are placed in a delay queue, and begin to run as the currently running queries finish. If the queue is at capacity, newly submitted queries are rejected. and you need wait until there is capacity to run the query again. You can adjust the size of the queue with the configuration parameter RESTPP.WorkLoadManager.MaxDelayQueueSize.

For example, to specify that a maximum 20 queries may remain in the queue, run the following command:

$ gadmin config set RESTPP.WorkLoadManager.MaxDelayQueueSize 20

You must restart the RESTPP service for the change to take effect.

Specify number of threads used by a query

You can specify the limit of the number of threads that can be used by one query through the Run Query REST endpoint.

For example, to specify a limit of four threads that can be used by a query, use the GSQL-THREAD-LIMIT parameter and set its value to 4:

Specify that the query run with a limit of 4 threads
curl -X POST -H "GSQL-THREAD-LIMIT: 4" -d '{"p":{"id":"Tom","type":"person"}}' "http://localhost:9000/query/social/hello"

Specify replica to run query on

On a distributed cluster, you can specify on which replica you want a query to be run through the Run Query REST endpoint.

For example, to run the query on the primary cluster, use the GSQL-REPLICA header when running a query and set its value to 1:

Specify that the query run on the primary cluster
curl -X POST -H "GSQL-REPLICA: 1" -d '{"p":{"id":"Tom","type":"person"}}'
"http://localhost:9000/query/social/hello"

Query Routing Schemes

In a distributed or replicated cluster, REST++ automatically routes queries to different GPEs, in order to spread the workload.

If GSQL-REPLICA header is used when invoking a query, this header overrides the routing scheme for that query.

Round Robin routing

The default query routing scheme is round-robin. The first query is managed by GPE 0, the next query by GPE 1, and so on. After the last GPE, the cycle returns to GPE 0.

Version 3.9.3 adds a system configuration parameter RESTPP.CPULoadAware.Mode to enable system administrators to select other query routing schemes:

  • Mode = 0 (default): Round-Robin

  • Mode = 1: CPU Load Aware

CPU Load Aware Query Routing

When this query routing mode is selected, REST++ tries to direct incoming queries to the GPEs that are currently less busy.

Specifically, the system periodically polls CPU usage data to find a GPE whose CPU usage percentage is below RESTPP.QueryRouting.TargetSelectionCPUThreshold (default 50).

If no GPE satisfies the CPU threshold condition, REST++ falls back to the default behavior (round-robin selection).

Example: Change CPU Load Threshold and Enable CPU Load Aware routing
$ gadmin config entry RESTPP.QueryRouting.TargetSelectionCPUThreshold 40
$ gadmin config entry RESTPP.QueryRouting.Mode 1