Statistics REST APIs (Preview Feature)
The endpoints on this page calculate cardinality and histogram statistics of vertex and edge types and attributes. The statistics are required for effective optimization.
Query Optimizer is currently a Preview Feature. Preview Features give users an early look at future production-level features. Preview Features should not be used for production deployments. |
Cardinality
The following endpoints compute, fetch and delete vertex and edge counts (cardinality) of a graph.
Compute cardinality statistics
POST :14240/gsql/v1/stats/cardinality
This endpoint computes the cardinality statistics for the vertex and edge types specified in the request. The cardinality statistics describe the number of vertices of a specified type, and the number of edges of a specified type between a specified pair of vertex types.
The following header is also required: -H 'Content-Type: application/json'
Parameters
Parameter | Description | Type |
---|---|---|
|
Name of the graph. Required. |
|
|
Vertex type to compute statistics for.
You must provide either the |
|
|
Edge type to compute statistics for.
You must provide either the |
|
|
The source vertex type of an edge.
Required if |
|
|
The target vertex type of an edge.
Required if |
|
Examples
The following request computes cardinality for vertex type Person
in the graph ldbc_snb
:
curl -s --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X POST "http://localhost:14240/gsql/v1/stats/cardinality?graph=ldbc_snb&vertex=Person"
The following request computes cardinality for edge type LIKES
from Person
to Post
in the graph ldbc_snb
:
curl -s --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X POST "http://localhost:14240/gsql/v1/stats/cardinality?graph=ldbc_snb&edge=LIKES&from=Person&to=Post"
Retrieve cardinality statistics
GET :14240/gsql/v1/stats/cardinality
This endpoint retrieves the cardinality statistics of a graph.
Histogram
Histograms store information about the distribution of attribute values. They are computed using an Equi-Width computation method. This means, given a target bucket count, a histogram will be computed with up to that many buckets (see bucket parameter for more details).
Bucket Count: Increasing the number of buckets will increase the granularity of the computed histogram, while decreasing the number of buckets will increase compaction. Choosing the right bucket count should be done based on the use case first, then adjusted afterwards if necessary. Please see recommendations below for best practices.
NDV (Number of Distinct Values): This value is approximated for each bucket.
-
Integral datatypes (UINT, INT, DATETIME): the NDV value represents an upper bound (the actual NDV may be lower).
-
Floating point numeric values (FLOAT, DOUBLE): the NDV is approximated by assuming all values are distinct.
-
Primary id’s: the NDV represents the actual NDV and no approximation is used.
Supported Data Types: UINT, INT, DATETIME, FLOAT, DOUBLE
Compute histogram statistics
POST :14240/gsql/v1/stats/histogram
This endpoint computes histograms for a specified attribute of a vertex or edge type.
READ_SCHEMA privilege is required for the graph that the histogram attribute belongs to. When gathering for multiple graphs at once (see Compute Multiple Histograms in one request), READ_SCHEMA is required on ALL of those graphs.
The following header is also required: -H 'Content-Type: application/json'
Parameters
Additional options can be specified in the header using -H:
GSQL-Timeout
indicates timeout for the request. In the case of multiple computations, this timeout covers the overall timeout, including all the histogram computations.
GSQL-Thread-Limit
indicates the thread limit per histogram computation. In the case of multiple computations this thread limit will apply to each histogram computation individually.
Note that multiple headers can be included, for example: -H 'Content-Type: application/json' -H 'GSQL-Timeout: 1000000' …
Request body
The request body expects a JSON object with the following schema:
{ "graph": <graph_name>, "vertex": <vertex_type_name> "attribute": <attribute_name> "bucket" <bucket_number> }
{ "graph": <graph_name>, "edge": <edge_type_name>, "attribute": <attribute_name>, "bucket" <bucket_number>, "from`: <source_vertex_type>, "to": <target_vertex_type> }
Parameter | Description | Data type |
---|---|---|
|
Name of the graph. |
|
|
Name of the vertex type. |
|
|
Name of the refined edge type. |
|
|
Name of the attribute. |
|
|
The number of intervals (buckets) to divide the range of attribute values by. The default value is 256. With integral datatypes, it may be possible that the provided bucket count is too high for the range of values. In such cases, the highest number of buckets that can be used for that range of values will be used instead. |
|
|
Source vertex type of a refined edge type. |
|
|
Target edge type of a refined edge type. |
|
Examples
The following request computes the histogram data for attribute length
of vertex type Comment
on graph test_graph
:
$ curl --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X POST "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{"graph":"test_graph","vertex":"Comment","attribute":"length","bucket":10}'
{
"error": false,
"message": "Histogram stats gathering completed",
"results": [
{
"success": "[1/1]",
"failure": []
}
]
}
The following request computes the histogram data for attribute joinDate
from edge type COMP_MEMBER
between from vertex compForum
and to vertex compPerson
on graph test_graph
:
$ curl --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X POST "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{"graph":"test_graph","edge":"COMP_MEMBER","from":"compForum","to":"compPerson","attribute":"joinDate","bucket":10}'
{
"error": false,
"message": "Histogram stats gathering completed",
"results": [
{
"success": "[1/1]",
"failure": []
}
]
}
Compute Multiple Histograms together
In addition to requesting individual histograms through the POST endpoint, it is also possible to request all valid histograms on a per schema object basis. The`bucket` parameter can still be specified and will be applied to every histogram being computed in the request.
Respecting the dependency of lower level schema objects, this can be done as follows:
// Compute Histograms for ALL valid attributes (default bucket value)
-d '{}'
// Compute Histograms for ALL valid attributes (bucket specified)
-d '{"bucket":"10"}'
// Compute Histograms for ALL valid attributes under:
// graph test_graph
// (default bucket value)
-d '{"graph":"test_graph"}'
// Compute Histograms for ALL valid attributes under:
// graph test_graph
// vertex Comment
// (default bucket value)
-d '{"graph":"test_graph", "vertex":"Comment"}'
// Compute Histograms for ALL valid attributes under:
// graph test_graph
// vertex Comment
// (default bucket value)
-d '{"graph":"test_graph", "vertex":"Comment"}'
// Compute Histograms for ALL valid attributes and from/to vertices under:
// graph test_graph
// edge COMP_MEMBER
// (default bucket value)
-d '{"graph":"test_graph", "edge":"COMP_MEMBER"}'
// Compute Histograms for ALL from/to vertices for attribute joinDate:
// graph test_graph
// edge COMP_MEMBER
// attribute joinDate
// (default bucket value)
-d '{"graph":"test_graph", "edge":"COMP_MEMBER", "attribute":"joinDate"}'
// Compute Histograms for ALL valid attributes under:
// graph test_graph
// edge COMP_MEMBER, from vertex compForum, to vertex compPerson
// (default bucket value)
-d '{"graph":"test_graph", "edge":"COMP_MEMBER", "from":"compForum", "to":"compPerson"}'
The output generated is in the same format as individual computation, with the success
field tracking the total number of successful computations and failure
tracking all of the failures encountered:
$ curl --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X POST "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{}'
{
"error": false,
"message": "Histogram stats gathering completed",
"results": [
{
"failure": [],
"success": "[94/94]"
}
]
}
A more condensed error message will be provided under failure
if any of the histogram computations encounter an error. This allows the request to log the error and continue with the next computation. For more information on the specific error refer to the logs or run the particular histogram computation individually for more verbose feedback.
Retrieve histogram statistics
READ_DATA
privilege is required to retrieve a histogram for a particular attribute. This can be at the attribute level or a higher level. Note that for edge-based histograms, the 'from' and 'to' vertex are not read and therefore only the edge requires the privilege.
GET :14240/gsql/v1/stats/histogram
This endpoint retrieves histogram data for a specified attribute of a vertex or edge type. Empty buckets in the histogram are denoted by a totalCount, upperCount, and ndv of zero. The upperBound will have the max theoretical value for an empty bucket.
Request Body
Parameter | Description | Data type |
---|---|---|
|
Name of the graph. |
|
|
Name of the vertex type. |
|
|
Name of the refined edge type. |
|
|
Source vertex type of a refined edge type. |
|
|
Target edge type of a refined edge type. |
|
|
Name of the attribute. |
|
Examples
$ curl --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X GET "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{"graph":"test_graph","vertex":"Comment","attribute":"length","bucket":10}'
{
"error": false,
"message": "",
"results": [
{
"histogram": [
{
"ndv": 1,
"upperBound": 1988,
"rowsUpper": 1,
"rowsTotal": 1
},
{
"ndv": 0,
"upperBound": 1789,
"rowsUpper": 0,
"rowsTotal": 0
},
{
"ndv": 1,
"upperBound": 1402,
"rowsUpper": 1,
"rowsTotal": 1
},
{
"ndv": 1,
"upperBound": 1211,
"rowsUpper": 1,
"rowsTotal": 1
},
{
"ndv": 1,
"upperBound": 1119,
"rowsUpper": 1,
"rowsTotal": 1
},
{
"ndv": 1,
"upperBound": 908,
"rowsUpper": 1,
"rowsTotal": 1
},
{
"ndv": 116,
"upperBound": 183,
"rowsUpper": 90,
"rowsTotal": 151037
},
{
"ndv": 0,
"upperBound": 399,
"rowsUpper": 0,
"rowsTotal": 0
},
{
"ndv": 0,
"upperBound": 598,
"rowsUpper": 0,
"rowsTotal": 0
},
{
"ndv": 1,
"upperBound": 710,
"rowsUpper": 1,
"rowsTotal": 1
}
],
"gatherEnd": "2024-05-07 19:09:03",
"gatherStart": "2024-05-07 19:09:03"
}
]
}
{
"error": true,
"message": "Histogram does not exist for test_graph.Comment.length",
"results": ""
}
$ curl --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X GET "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{"graph":"test_graph","edge":"COMP_MEMBER","from":"compForum","to":"compPerson","attribute":"joinDate","bucket":10}'
{
"error": false,
"message": "",
"results": [
{
"histogram": [
{
"ndv": 1,
"upperBound": "2012-09-03 16:44:14",
"rowsUpper": 1,
"rowsTotal": 1
},
{
"ndv": 2,
"upperBound": "2012-04-15 18:07:22",
"rowsUpper": 1,
"rowsTotal": 2
},
{
"ndv": 0,
"upperBound": "2012-03-14 15:53:37",
"rowsUpper": 0,
"rowsTotal": 0
},
{
"ndv": 3,
"upperBound": "2011-12-15 02:34:53",
"rowsUpper": 1,
"rowsTotal": 3
},
{
"ndv": 1,
"upperBound": "2011-09-11 17:20:51",
"rowsUpper": 1,
"rowsTotal": 1
},
{
"ndv": 0,
"upperBound": "2011-06-29 02:37:41",
"rowsUpper": 0,
"rowsTotal": 0
},
{
"ndv": 1,
"upperBound": "2010-04-22 12:31:07",
"rowsUpper": 1,
"rowsTotal": 1
},
{
"ndv": 0,
"upperBound": "2010-10-12 13:21:44",
"rowsUpper": 0,
"rowsTotal": 0
},
{
"ndv": 1,
"upperBound": "2010-11-15 07:23:59",
"rowsUpper": 1,
"rowsTotal": 1
},
{
"ndv": 0,
"upperBound": "2011-04-03 14:12:22",
"rowsUpper": 0,
"rowsTotal": 0
}
],
"gatherEnd": "2024-05-07 20:44:34",
"gatherStart": "2024-05-07 20:44:34"
}
]
}
{
"error": true,
"message": "Histogram does not exist for test_graph.COMP_MEMBER.compForum.compPerson.joinDate",
"results": ""
}
Delete histogram statistics
DELETE :14240/gsql/v1/stats/histogram
This endpoint deletes histogram data for a graph. This includes histogram data on all vertex attributes in that graph.
Request Body
Parameter | Description | Data type |
---|---|---|
|
Name of the graph. |
|
|
Name of the vertex type. |
|
|
Name of the refined edge type. |
|
|
Source vertex type of a refined edge type. |
|
|
Target edge type of a refined edge type. |
|
|
Name of the attribute. |
|
Examples
$ curl --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X DELETE "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{"graph":"test_graph","vertex":"Comment","attribute":"length"}'
{
"error": false,
"message": "",
"results": "Deleting any histogram(s) for test_graph.Comment.length"
}
$ curl --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X DELETE "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{"graph":"test_graph","edge":"COMP_MEMBER","from":"compForum","to":"compPerson","attribute":"joinDate"}'
{
"error": false,
"message": "",
"results": "Deleting any histogram(s) for test_graph.compForum.COMP_MEMBER.compPerson.joinDate"
}
Deleting Multiple Histograms together
It is possible to request multiple histograms be deleted in the same request. The graph name is always required, but it is possible to delete all histograms under a given type as follows:
// Delete Histograms for ALL attributes under:
// graph test_graph
-d '{"graph":"test_graph"}'
// Delete Histograms for ALL attributes under:
// graph test_graph
// vertex Comment
-d '{"graph":"test_graph", "vertex":"Comment"}'
// Delete Histograms for ALL attributes under:
// graph test_graph
// edge COMP_MEMBER, from vertex compForum, to vertex compPerson
-d '{"graph":"test_graph", "edge":"COMP_MEMBER", "from":"compForum", "to":"compPerson"}'
Current Limitations
If leader switch or some other interruption takes place in GSQL the histogram operation will not be able to resume. Statistics are stored as they are computed along with their start and end timestamps so there is no progress lost. See recommendations for some suggestions on how to handle such situations.
Recommendations
-
If timeout or some other event interrupts the collection of all the desired statistics, specifying a smaller subset of histograms based on what still has yet to be gathered will reduce the amount of regathering. For example, if executing a histogram computation request for attributes under all graphs, and after 2/4 graphs are fully gathered a timeout occurs, you can issue two requests at the graph level for graphs 3 and 4.
-
When possible, it is best to schedule histogram computations after schema change jobs and data loading, not before. This is to ensure the statistics best reflect the current state of the database.
-
When choosing the right bucket value, the first step is to determine the requirements of the use case. Then, choose the fewer number of buckets that provides enough granularity.