Enabling Cost-Based Optimization (Preview Feature)
This page walks you through the steps to enable cost-based optimization.
|
Query Optimizer is currently a Preview Feature. Preview Features give users an early look at future production-level features. Preview Features should not be used for production deployments. |
1. Before you begin
-
You have created a query that contains a
SELECTstatement that uses a path pattern in itsFROMclause with one or more hops. -
You have the privilege to install queries on the graph where you are operating.
2. Procedure
The following is the procedure to enable cost-based optimization when installing a query.
2.1. Identify referenced types and attributes
You need to identify the vertex and edge types for which you need to compute the following statistics:
-
Cardinality: the number of instances for each type
-
Histogram: the distribution of values for an attribute
2.1.1. Cardinality
You need to compute cardinality data for the following vertex and edge types:
-
Any vertex type referenced in the path pattern of a
FROMclause. -
Any refined edge type. A refined edge type means an edge type with a specified pair of
fromandtovertex types as well as their reverse edge types if they exist.
2.1.2. Histogram
You need to compute histograms for the following attributes:
-
Any vertex attribute referenced in a
WHEREclause in aSELECTstatement in the query. -
Any refined edge type attribute referenced in a
WHEREclause in aSELECTstatement in the query.-
A refined edge type means an edge type with a specified pair of
fromandtovertex types as well as their reverse edge types if they exist.
-
If you are not sure which types and attributes you need to compute histogram data for a query, you can try installing that query with the cost-based optimizer enabled first. The optimizer produces a helpful recommendation if it sees any types/attributes that don’t have statistics computed for them.
For example, to optimize the following query:
CREATE OR REPLACE QUERY example3() FOR GRAPH ldbc_snb SYNTAX V2 {
vSet = SELECT p
FROM Tag:t - (<HAS_TAG) - Comment:m - (<LIKES:e) - Person:p
WHERE p.gender == "female" AND t.id != 100 AND e.creationDate > to_datetime("2010-10-01 16:00")
LIMIT 50;
PRINT vSet;
}
You need the following histogram statistics:
-
genderattribute of typePerson -
idattribute of typeTag -
creationDateattribute of typeLIKESfromPersontoComment.
2.2. Compute cardinality data
To compute cardinality data (the count of each type), use the POST :14240/gsqlserver/gsql/stats/card endpoint.
For example, to compute the cardinality data of the vertex Person in graph ldbc_snb, make the following request:
curl -s --user tigergraph:tigergraph -X POST "http://localhost:14240/gsqlserver/gsql/stats/card?graph=ldbc_snb&vertex=Person"
For another example, to compute the cardinality data of the refined edge type LIKES from a Person vertex to a Comment vertex, make the following request:
curl -s --user tigergraph:tigergraph -X POST "http://localhost:14240/gsqlserver/gsql/stats/card?graph=ldbc_snb&edge=LIKES&from=Person&to=Comment"
For best results, compute the cardinality data for every type you identified in the previous step, each with its own request.
2.3. Compute histogram data
To compute histogram data, use the POST :14240/gsqlserver/gsql/stats/histogram endpoint.
For example, to compute the histogram data for attribute gender of vertex type Person on graph ldbc_snb, make the following request:
$ $ curl --user tigergraph:tigergraph -X POST \
"http://localhost:14240/gsqlserver/gsql/stats/histogram" \
-d '{"graph":"ldbc_snb", "vertex":"Person", "attribute":"gender", "buckets":10}'
To compute the histogram data for attribute creationDate of refined edge type LIKES from vertex type Person to vertex type Comment, make the following request:
$ curl -s --user tigergraph:tigergraph -X POST \
"http://localhost:14240/gsqlserver/gsql/stats/histogram" \
-d '{"graph":"ldbc_snb", "edge":"LIKES", "from":"Person", "to":"Comment", "attribute":"creationDate", "buckets":10}' | jq .
For best results, compute the histogram data for every attribute referenced in a WHERE clause of a SELECT statement.
2.4. Install query with -COST option
Now that the relevant statistics are available to the query optimizer, you can install the query with cost-based optimization.
To use the cost-based optimization during installation, use the -COST option when installing the query.
For example, to install the query example3, run INSTALL QUERY -COST example3.
Alternatively, set the cost_opt session parameter to true by running set cost_opt = true in the GSQL shell.
This enables cost-based optimization for the rest of your GSQL session.
You no longer need to supply the -COST option when installing queries if the cost_opt parameter is set to true.
If you missed any cardinality or histogram data, the optimizer warns you about which types are missing, so you can compute the data for any missing type or attribute.