Statistics REST APIs (Preview Feature)

The endpoints on this page calculate cardinality and histogram statistics of vertex and edge types and attributes. The statistics are required for effective optimization.

Query Optimizer is currently a Preview Feature. Preview Features give users an early look at future production-level features.

Preview Features should not be used for production deployments.

Cardinality

The following endpoints compute, fetch and delete vertex and edge counts (cardinality) of a graph.

Compute cardinality statistics

POST :14240/gsql/v1/stats/cardinality

This endpoint computes the cardinality statistics for the vertex and edge types specified in the request. The cardinality statistics describe the number of vertices of a specified type, and the number of edges of a specified type between a specified pair of vertex types.

The following header is also required: -H 'Content-Type: application/json'

Parameters

Parameter Description Type

graph

Name of the graph. Required.

STRING

vertex

Vertex type to compute statistics for. You must provide either the vertex or edge parameter.

STRING

edge

Edge type to compute statistics for. You must provide either the vertex or edge parameter, but not both. If you provided it an edge type, you also need to provide the from and to parameters.

STRING

from

The source vertex type of an edge. Required if edge is provided.

STRING

to

The target vertex type of an edge. Required if edge is provided.

STRING

Examples

The following request computes cardinality for vertex type Person in the graph ldbc_snb:

curl -s --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X POST "http://localhost:14240/gsql/v1/stats/cardinality?graph=ldbc_snb&vertex=Person"

The following request computes cardinality for edge type LIKES from Person to Post in the graph ldbc_snb:

curl -s --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X POST "http://localhost:14240/gsql/v1/stats/cardinality?graph=ldbc_snb&edge=LIKES&from=Person&to=Post"

Retrieve cardinality statistics

GET :14240/gsql/v1/stats/cardinality

This endpoint retrieves the cardinality statistics of a graph.

Parameters

Parameter Description Data type

graph

Name of the graph. Required.

STRING

Examples

The following request retrieves the cardinality data for graph ldbc_snb:

curl -s --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X GET "http://localhost:14240/gsql/v1/stats/cardinality?graph=ldbc_snb"

Delete cardinality statistics

DELETE :14240/gsql/v1/stats/cardinality

This endpoint deletes the cardinality statistics of a graph.

Parameters

Parameter Description Data type

graph

Name of the graph. Required.

STRING

Examples

The following request deletes the cardinality data for graph ldbc_snb:

curl -s --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X DELETE "http://localhost:14240/gsql/v1/stats/cardinality?graph=ldbc_snb"

Histogram

Histograms store information about the distribution of attribute values. They are computed using an Equi-Width computation method. This means, given a target bucket count, a histogram will be computed with up to that many buckets (see bucket parameter for more details).

Bucket Count: Increasing the number of buckets will increase the granularity of the computed histogram, while decreasing the number of buckets will increase compaction. Choosing the right bucket count should be done based on the use case first, then adjusted afterwards if necessary. Please see recommendations below for best practices.

NDV (Number of Distinct Values): This value is approximated for each bucket.

  • Integral datatypes (UINT, INT, DATETIME): the NDV value represents an upper bound (the actual NDV may be lower).

  • Floating point numeric values (FLOAT, DOUBLE): the NDV is approximated by assuming all values are distinct.

  • Primary id’s: the NDV represents the actual NDV and no approximation is used.

Supported Data Types: UINT, INT, DATETIME, FLOAT, DOUBLE

Compute histogram statistics

POST :14240/gsql/v1/stats/histogram

This endpoint computes histograms for a specified attribute of a vertex or edge type.

READ_SCHEMA privilege is required for the graph that the histogram attribute belongs to. When gathering for multiple graphs at once (see Compute Multiple Histograms in one request), READ_SCHEMA is required on ALL of those graphs.

The following header is also required: -H 'Content-Type: application/json'

Parameters

Additional options can be specified in the header using -H: GSQL-Timeout indicates timeout for the request. In the case of multiple computations, this timeout covers the overall timeout, including all the histogram computations. GSQL-Thread-Limit indicates the thread limit per histogram computation. In the case of multiple computations this thread limit will apply to each histogram computation individually.

Note that multiple headers can be included, for example: -H 'Content-Type: application/json' -H 'GSQL-Timeout: 1000000' …​

Request body

The request body expects a JSON object with the following schema:

  • Vertex attribute

  • Edge attribute

{
    "graph": <graph_name>,
    "vertex": <vertex_type_name>
    "attribute": <attribute_name>
    "bucket" <bucket_number>
}
{
    "graph": <graph_name>,
    "edge": <edge_type_name>,
    "attribute": <attribute_name>,
    "bucket" <bucket_number>,
    "from`: <source_vertex_type>,
    "to": <target_vertex_type>
}
Parameter Description Data type

graph

Name of the graph.

STRING

vertex

Name of the vertex type.

STRING

edge

Name of the refined edge type.

STRING

attribute

Name of the attribute.

STRING

bucket

The number of intervals (buckets) to divide the range of attribute values by.

The default value is 256. With integral datatypes, it may be possible that the provided bucket count is too high for the range of values. In such cases, the highest number of buckets that can be used for that range of values will be used instead.

INT

from

Source vertex type of a refined edge type.

STRING

to

Target edge type of a refined edge type.

STRING

Examples

The following request computes the histogram data for attribute length of vertex type Comment on graph test_graph:

  • Histogram Computation Request

  • Results

Vertex Attribute Histogram Computation
$ curl --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X POST "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{"graph":"test_graph","vertex":"Comment","attribute":"length","bucket":10}'
{
  "error": false,
  "message": "Histogram stats gathering completed",
  "results": [
    {
      "success": "[1/1]",
      "failure": []
    }
  ]
}

The following request computes the histogram data for attribute joinDate from edge type COMP_MEMBER between from vertex compForum and to vertex compPerson on graph test_graph:

  • Histogram Computation Request

  • Results

Edge Attribute Histogram Computation
$ curl --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X POST "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{"graph":"test_graph","edge":"COMP_MEMBER","from":"compForum","to":"compPerson","attribute":"joinDate","bucket":10}'
{
  "error": false,
  "message": "Histogram stats gathering completed",
  "results": [
    {
      "success": "[1/1]",
      "failure": []
    }
  ]
}

Compute Multiple Histograms together

In addition to requesting individual histograms through the POST endpoint, it is also possible to request all valid histograms on a per schema object basis. The`bucket` parameter can still be specified and will be applied to every histogram being computed in the request.

Respecting the dependency of lower level schema objects, this can be done as follows:

// Compute Histograms for ALL valid attributes (default bucket value)
-d '{}'
// Compute Histograms for ALL valid attributes (bucket specified)
-d '{"bucket":"10"}'
// Compute Histograms for ALL valid attributes under:
// graph test_graph
// (default bucket value)
-d '{"graph":"test_graph"}'
// Compute Histograms for ALL valid attributes under:
// graph test_graph
// vertex Comment
// (default bucket value)
-d '{"graph":"test_graph", "vertex":"Comment"}'
// Compute Histograms for ALL valid attributes under:
// graph test_graph
// vertex Comment
// (default bucket value)
-d '{"graph":"test_graph", "vertex":"Comment"}'
// Compute Histograms for ALL valid attributes and from/to vertices under:
// graph test_graph
// edge COMP_MEMBER
// (default bucket value)
-d '{"graph":"test_graph", "edge":"COMP_MEMBER"}'
// Compute Histograms for ALL from/to vertices for attribute joinDate:
// graph test_graph
// edge COMP_MEMBER
// attribute joinDate
// (default bucket value)
-d '{"graph":"test_graph", "edge":"COMP_MEMBER", "attribute":"joinDate"}'
// Compute Histograms for ALL valid attributes under:
// graph test_graph
// edge COMP_MEMBER, from vertex compForum, to vertex compPerson
// (default bucket value)
-d '{"graph":"test_graph", "edge":"COMP_MEMBER", "from":"compForum", "to":"compPerson"}'

The output generated is in the same format as individual computation, with the success field tracking the total number of successful computations and failure tracking all of the failures encountered:

  • Histogram Computation Request

  • Results

Gather ALL Histogram Computation
$ curl --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X POST "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{}'
{
  "error": false,
  "message": "Histogram stats gathering completed",
  "results": [
    {
      "failure": [],
      "success": "[94/94]"
    }
  ]
}

A more condensed error message will be provided under failure if any of the histogram computations encounter an error. This allows the request to log the error and continue with the next computation. For more information on the specific error refer to the logs or run the particular histogram computation individually for more verbose feedback.

Retrieve histogram statistics

READ_DATA privilege is required to retrieve a histogram for a particular attribute. This can be at the attribute level or a higher level. Note that for edge-based histograms, the 'from' and 'to' vertex are not read and therefore only the edge requires the privilege.

GET :14240/gsql/v1/stats/histogram

This endpoint retrieves histogram data for a specified attribute of a vertex or edge type. Empty buckets in the histogram are denoted by a totalCount, upperCount, and ndv of zero. The upperBound will have the max theoretical value for an empty bucket.

Request Body

Parameter Description Data type

graph

Name of the graph.

STRING

vertex

Name of the vertex type.

STRING

edge

Name of the refined edge type.

STRING

from

Source vertex type of a refined edge type.

STRING

to

Target edge type of a refined edge type.

STRING

attribute

Name of the attribute.

STRING

Examples

  • Histogram Retrieval Request

  • Results Histogram Exists

  • Results Histogram Does Not Exist

Retrieve Vertex Attribute Histogram
$ curl --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X GET "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{"graph":"test_graph","vertex":"Comment","attribute":"length","bucket":10}'
{
  "error": false,
  "message": "",
  "results": [
    {
      "histogram": [
        {
          "ndv": 1,
          "upperBound": 1988,
          "rowsUpper": 1,
          "rowsTotal": 1
        },
        {
          "ndv": 0,
          "upperBound": 1789,
          "rowsUpper": 0,
          "rowsTotal": 0
        },
        {
          "ndv": 1,
          "upperBound": 1402,
          "rowsUpper": 1,
          "rowsTotal": 1
        },
        {
          "ndv": 1,
          "upperBound": 1211,
          "rowsUpper": 1,
          "rowsTotal": 1
        },
        {
          "ndv": 1,
          "upperBound": 1119,
          "rowsUpper": 1,
          "rowsTotal": 1
        },
        {
          "ndv": 1,
          "upperBound": 908,
          "rowsUpper": 1,
          "rowsTotal": 1
        },
        {
          "ndv": 116,
          "upperBound": 183,
          "rowsUpper": 90,
          "rowsTotal": 151037
        },
        {
          "ndv": 0,
          "upperBound": 399,
          "rowsUpper": 0,
          "rowsTotal": 0
        },
        {
          "ndv": 0,
          "upperBound": 598,
          "rowsUpper": 0,
          "rowsTotal": 0
        },
        {
          "ndv": 1,
          "upperBound": 710,
          "rowsUpper": 1,
          "rowsTotal": 1
        }
      ],
      "gatherEnd": "2024-05-07 19:09:03",
      "gatherStart": "2024-05-07 19:09:03"
    }
  ]
}
{
  "error": true,
  "message": "Histogram does not exist for test_graph.Comment.length",
  "results": ""
}
  • Histogram Retrieval Request

  • Results Histogram Exists

  • Results Histogram Does Not Exist

Retrieve Edge Attribute Histogram
$ curl --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X GET "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{"graph":"test_graph","edge":"COMP_MEMBER","from":"compForum","to":"compPerson","attribute":"joinDate","bucket":10}'
{
  "error": false,
  "message": "",
  "results": [
    {
      "histogram": [
        {
          "ndv": 1,
          "upperBound": "2012-09-03 16:44:14",
          "rowsUpper": 1,
          "rowsTotal": 1
        },
        {
          "ndv": 2,
          "upperBound": "2012-04-15 18:07:22",
          "rowsUpper": 1,
          "rowsTotal": 2
        },
        {
          "ndv": 0,
          "upperBound": "2012-03-14 15:53:37",
          "rowsUpper": 0,
          "rowsTotal": 0
        },
        {
          "ndv": 3,
          "upperBound": "2011-12-15 02:34:53",
          "rowsUpper": 1,
          "rowsTotal": 3
        },
        {
          "ndv": 1,
          "upperBound": "2011-09-11 17:20:51",
          "rowsUpper": 1,
          "rowsTotal": 1
        },
        {
          "ndv": 0,
          "upperBound": "2011-06-29 02:37:41",
          "rowsUpper": 0,
          "rowsTotal": 0
        },
        {
          "ndv": 1,
          "upperBound": "2010-04-22 12:31:07",
          "rowsUpper": 1,
          "rowsTotal": 1
        },
        {
          "ndv": 0,
          "upperBound": "2010-10-12 13:21:44",
          "rowsUpper": 0,
          "rowsTotal": 0
        },
        {
          "ndv": 1,
          "upperBound": "2010-11-15 07:23:59",
          "rowsUpper": 1,
          "rowsTotal": 1
        },
        {
          "ndv": 0,
          "upperBound": "2011-04-03 14:12:22",
          "rowsUpper": 0,
          "rowsTotal": 0
        }
      ],
      "gatherEnd": "2024-05-07 20:44:34",
      "gatherStart": "2024-05-07 20:44:34"
    }
  ]
}
{
  "error": true,
  "message": "Histogram does not exist for test_graph.COMP_MEMBER.compForum.compPerson.joinDate",
  "results": ""
}

Delete histogram statistics

DELETE :14240/gsql/v1/stats/histogram

This endpoint deletes histogram data for a graph. This includes histogram data on all vertex attributes in that graph.

Request Body

Parameter Description Data type

graph

Name of the graph.

STRING

vertex

Name of the vertex type.

STRING

edge

Name of the refined edge type.

STRING

from

Source vertex type of a refined edge type.

STRING

to

Target edge type of a refined edge type.

STRING

attribute

Name of the attribute.

STRING

Examples

  • Histogram Deletion Request

  • Results

Delete Vertex Attribute Histogram
$ curl --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X DELETE "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{"graph":"test_graph","vertex":"Comment","attribute":"length"}'
{
  "error": false,
  "message": "",
  "results": "Deleting any histogram(s) for test_graph.Comment.length"
}
  • Histogram Deletion Request

  • Results

Delete Edge Attribute Histogram
$ curl --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X DELETE "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{"graph":"test_graph","edge":"COMP_MEMBER","from":"compForum","to":"compPerson","attribute":"joinDate"}'
{
  "error": false,
  "message": "",
  "results": "Deleting any histogram(s) for test_graph.compForum.COMP_MEMBER.compPerson.joinDate"
}

Deleting Multiple Histograms together

It is possible to request multiple histograms be deleted in the same request. The graph name is always required, but it is possible to delete all histograms under a given type as follows:

// Delete Histograms for ALL attributes under:
// graph test_graph
-d '{"graph":"test_graph"}'
// Delete Histograms for ALL attributes under:
// graph test_graph
// vertex Comment
-d '{"graph":"test_graph", "vertex":"Comment"}'
// Delete Histograms for ALL attributes under:
// graph test_graph
// edge COMP_MEMBER, from vertex compForum, to vertex compPerson
-d '{"graph":"test_graph", "edge":"COMP_MEMBER", "from":"compForum", "to":"compPerson"}'

Current Limitations

If leader switch or some other interruption takes place in GSQL the histogram operation will not be able to resume. Statistics are stored as they are computed along with their start and end timestamps so there is no progress lost. See recommendations for some suggestions on how to handle such situations.

Recommendations

  1. If timeout or some other event interrupts the collection of all the desired statistics, specifying a smaller subset of histograms based on what still has yet to be gathered will reduce the amount of regathering. For example, if executing a histogram computation request for attributes under all graphs, and after 2/4 graphs are fully gathered a timeout occurs, you can issue two requests at the graph level for graphs 3 and 4.

  2. When possible, it is best to schedule histogram computations after schema change jobs and data loading, not before. This is to ensure the statistics best reflect the current state of the database.

  3. When choosing the right bucket value, the first step is to determine the requirements of the use case. Then, choose the fewer number of buckets that provides enough granularity.