# Enabling Cost-Based Optimization

This page walks you through the steps to enable cost-based optimization.

## 1. Before you begin

• You have created a query that contains a `SELECT` statement that uses a path pattern in its `FROM` clause with one or more hops.

• You have the privilege to install queries on the graph where you are operating.

## 2. Procedure

The following is the procedure to enable cost-based optimization when installing a query.

### 2.1. Identify referenced types and attributes

You need to identify the vertex and edge types for which you need to compute the following statistics:

• Cardinality: the number of instances for each type

• Histogram: the distribution of values for an attribute

#### 2.1.1. Cardinality

You need to compute cardinality data for the following vertex and edge types:

• Any vertex type referenced in the path pattern of a `FROM` clause.

• Any refined edge type. A refined edge type means an edge type with a specified pair of `from` and `to` vertex types as well as their reverse edge types if they exist.

#### 2.1.2. Histogram

You need to compute histograms for the following attributes:

• Any vertex attribute referenced in a `WHERE` clause in a `SELECT` statement in the query.

• Any refined edge type attribute referenced in a `WHERE` clause in a `SELECT` statement in the query.

• A refined edge type means an edge type with a specified pair of `from` and `to` vertex types as well as their reverse edge types if they exist.

If you are not sure which types and attributes you need to compute histogram data for a query, you can try installing that query with the cost-based optimizer enabled first. The optimizer produces a helpful recommendation if it sees any types/attributes that don’t have statistics computed for them.

For example, to optimize the following query:

``````CREATE OR REPLACE QUERY example3() FOR GRAPH ldbc_snb SYNTAX V2 {

vSet = SELECT p
FROM Tag:t - (<HAS_TAG) - Comment:m - (<LIKES:e) - Person:p
WHERE p.gender == "female" AND t.id != 100 AND e.creationDate >  to_datetime("2010-10-01 16:00")
LIMIT 50;
PRINT vSet;
}``````

You need the following histogram statistics:

• `gender` attribute of type `Person`

• `id` attribute of type `Tag`

• `creationDate` attribute of type `LIKES` from `Person` to `Comment`.

### 2.2. Compute cardinality data

To compute cardinality data (the count of each type), use the `POST :14240/gsqlserver/gsql/stats/card` endpoint.

For example, to compute the cardinality data of the vertex `Person` in graph `ldbc_snb`, make the following request:

``curl -s --user tigergraph:tigergraph -X POST "http://localhost:14240/gsqlserver/gsql/stats/card?graph=ldbc_snb&vertex=Person"``

For another example, to compute the cardinality data of the refined edge type `LIKES` from a `Person` vertex to a `Comment` vertex, make the following request:

``curl -s --user tigergraph:tigergraph -X POST "http://localhost:14240/gsqlserver/gsql/stats/card?graph=ldbc_snb&edge=LIKES&from=Person&to=Comment"``

For best results, compute the cardinality data for every type you identified in the previous step, each with its own request.

### 2.3. Compute histogram data

For example, to compute the histogram data for attribute `gender` of vertex type `Person` on graph `ldbc_snb`, make the following request:

`````` curl --user tigergraph:tigergraph -X POST \
"http://localhost:14240/gsqlserver/gsql/stats/histogram" \
-d '{"graph":"ldbc_snb", "vertex":"Person", "attribute":"gender", "buckets":10}'``````

To compute the histogram data for attribute `creationDate` of refined edge type `LIKES` from vertex type `Person` to vertex type `Comment`, make the following request:

``````\$ curl -s --user tigergraph:tigergraph -X POST \
"http://localhost:14240/gsqlserver/gsql/stats/histogram" \
-d '{"graph":"ldbc_snb", "edge":"LIKES", "from":"Person", "to":"Comment", "attribute":"creationDate", "buckets":10}' | jq .``````

For best results, compute the histogram data for every attribute referenced in a `WHERE` clause of a `SELECT` statement.

### 2.4. Install query with `-COST` option

Now that the relevant statistics are available to the query optimizer, you can install the query with cost-based optimization. To use the cost-based optimization during installation, use the `-COST` option when installing the query.

For example, to install the query `example3`, run `INSTALL QUERY -COST example3`.

Alternatively, set the `cost_opt` session parameter to true by running `set cost_opt = true` in the GSQL shell. This enables cost-based optimization for the rest of your GSQL session. You no longer need to supply the `-COST` option when installing queries if the `cost_opt` parameter is set to true.

If you missed any cardinality or histogram data, the optimizer warns you about which types are missing, so you can compute the data for any missing type or attribute.