Enabling Cost-Based Optimization (Preview Feature)
This page walks you through the steps to enable cost-based optimization.
Query Optimizer is currently a Preview Feature. Preview Features give users an early look at future production-level features. Preview Features should not be used for production deployments. |
1. Before you begin
-
You have created a query that contains a
SELECT
statement that uses a path pattern in itsFROM
clause with one or more hops. -
You have the privilege to install queries on the graph where you are operating.
2. Procedure
The following is the procedure to enable cost-based optimization when installing a query.
2.1. Identify referenced types and attributes
You need to identify the vertex and edge types for which you need to compute the following statistics:
-
Cardinality: the number of instances for each type
-
Histogram: the distribution of values for an attribute
2.1.1. Cardinality
You need to compute cardinality data for the following vertex and edge types:
-
Any vertex type referenced in the path pattern of a
FROM
clause. -
Any refined edge type. A refined edge type means an edge type with a specified pair of
from
andto
vertex types as well as their reverse edge types if they exist.
2.1.2. Histogram
You need to compute histograms for the following attributes:
-
Any vertex attribute referenced in a
WHERE
clause in aSELECT
statement in the query. -
Any refined edge type attribute referenced in a
WHERE
clause in aSELECT
statement in the query.-
A refined edge type means an edge type with a specified pair of
from
andto
vertex types as well as their reverse edge types if they exist.
-
If you are not sure which types and attributes you need to compute histogram data for a query, you can try installing that query with the cost-based optimizer enabled first. The optimizer produces a helpful recommendation if it sees any types/attributes that don’t have statistics computed for them.
For example, to optimize the following query:
CREATE OR REPLACE QUERY example3() FOR GRAPH ldbc_snb SYNTAX V2 {
vSet = SELECT p
FROM Tag:t - (<HAS_TAG) - Comment:m - (<LIKES:e) - Person:p
WHERE p.gender == "female" AND t.id != 100 AND e.creationDate > to_datetime("2010-10-01 16:00")
LIMIT 50;
PRINT vSet;
}
You need the following histogram statistics:
-
gender
attribute of typePerson
-
id
attribute of typeTag
-
creationDate
attribute of typeLIKES
fromPerson
toComment
.
2.2. Compute cardinality data
To compute cardinality data (the count of each type), use the POST :14240/gsql/v1/stats/cardinality
endpoint.
For example, to compute the cardinality data of the vertex Person
in graph ldbc_snb
, make the following request:
curl -s --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X POST "http://localhost:14240/gsql/v1/stats/card?graph=ldbc_snb&vertex=Person"
For another example, to compute the cardinality data of the refined edge type LIKES
from a Person
vertex to a Comment
vertex, make the following request:
curl -s --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X POST "http://localhost:14240/gsql/v1/stats/card?graph=ldbc_snb&edge=LIKES&from=Person&to=Comment"
For best results, compute the cardinality data for every type you identified in the previous step, each with its own request.
2.3. Compute histogram data
To compute histogram data, use the POST :14240/gsql/v1/stats/histogram
endpoint.
For example, to compute the histogram data for attribute gender
of vertex type Person
on graph ldbc_snb
, make the following request:
$ $ curl --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X POST "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{"graph":"ldbc_snb", "vertex":"Person", "attribute":"gender", "bucket":10}'
To compute the histogram data for attribute creationDate
of refined edge type LIKES
from vertex type Person
to vertex type Comment
, make the following request:
$ curl -s --user tigergraph:tigergraph \
-H 'Content-Type: application/json' \
-X POST "http://localhost:14240/gsql/v1/stats/histogram" \
-d '{"graph":"ldbc_snb", "edge":"LIKES", "from":"Person", "to":"Comment", "attribute":"creationDate", "bucket":10}' | jq .
For best results, compute the histogram data for every attribute referenced in a WHERE
clause of a SELECT
statement.
2.4. Install query with -COST
option
Now that the relevant statistics are available to the query optimizer, you can install the query with cost-based optimization.
To use the cost-based optimization during installation, use the -COST
option when installing the query.
For example, to install the query example3
, run INSTALL QUERY -COST example3
.
Alternatively, set the cost_opt
session parameter to true by running set cost_opt = true
in the GSQL shell.
This enables cost-based optimization for the rest of your GSQL session.
You no longer need to supply the -COST
option when installing queries if the cost_opt
parameter is set to true.
If you missed any cardinality or histogram data, the optimizer warns you about which types are missing, so you can compute the data for any missing type or attribute.