CREATE/INTERPRET/INSTALL/RUN QUERY
Query Privileges
Users with the WRITE_QUERY
privilege can create, install, and drop queries.
Users with the READ_QUERY
privilege on the global scope or on a given graph can run the queries for that graph.
To implement fine-grained control over which queries can be executed by which sets of users:
Group your queries into your desired privilege groups.
Define a graph for each privilege group. These graphs can all have the same domain if you wish.
Create your queries, assigning each to its appropriate privilege group.
A GSQL query is a sequence of data retrieval-and-computation statements executed as a single operation. Users can write queries to explore a data graph however they like, to read and make computations on the graph data along the way, to update the graph, and to deliver resulting data. A query is analogous to a user-defined procedure or function: it can have one or more input parameters, and it can produce output in two ways: by returning a value or by printing. A query can be run in one of three ways:
Define and run an unnamed query immediately:
INTERPRET QUERY
: execute the query's statements
Alternately, there is also a built-in REST++ endpoint to interpret a query string:
POST /gsqlserver/interpreted_query
See the RESTPP API User Guide for details.Define a named query and then run it.
CREATE QUERY
: define the functionality of the queryINTERPRET QUERY
: execute the query with input values
Define a named query, compile it to optimize performance, and then run it.
CREATE QUERY
: define the functionality of the queryINSTALL QUERY
: compile the queryRUN QUERY
: execute the query with input values
There are some limitations to Interpreted Mode. See the section on INTERPRET QUERY
and the appendix section Interpreted GSQL Limitations.
CREATE QUERY
CREATE QUERY
CREATE QUERY
defines the functionality of a query on a given graph schema.
A query has a name, a parameter list, the name of the graph being queried, an optional RETURNS
type (see Section "RETURN
Statement" for more details), optional specifiers for the output API and the language syntax version, and a body. The body consists of an optional sequence of typedefs
, followed by an optional sequence of declarations, then followed by one or more statements. The body defines the behavior of the query.
Dynamic Query Support
As of TigerGraph 3.0+, FOR GRAPH graphName
is optional, as long as the graph has been specified already, either when entering gsql:
GSQL -g graphName [<gsql_command>]
or once inside the GSQL shell, by using the USE GRAPH graphName
command.
This is one aspect of Dynamic Querying.
If the optional keywords OR REPLACE
are included, then this query definition, if error-free, will replace a previous definition with the same query name. The new query will not be installed. That is,
CREATE OR REPLACE QUERY name
acts like
DROP QUERY name
CREATE QUERY name
However, if there are any errors in this query definition, then the previous query definition will be maintained. If the OR REPLACE
option is not used, then GSQL will reject a CREATE QUERY
command that uses an existing name.
The DISTRIBUTED
option applies only to installations where the graph has been distributed across a cluster. If specified, the query will run with a different execution model which may give better performance for queries that traverse a large portion of the cluster. For details, see Distributed Query Mode.
Typedefs allow the definition of custom types for use within the body. The declarations support the definition of accumulators (see Chapter "Accumulators" for more details) and global/local variables. All accumulators and global variables must be declared before any statements. There are various types of statements that can be used within the body. Typically, the core statement(s) in the body of a query is one or more SELECT
, UPDATE
, INSERT
, DELETE
statements. The language supports conditional statements such as an IF
statement as well as looping constructs such as WHILE
and FOREACH
. It also supports calling functions, assigning variables, printing, and modifying the graph data.
The query body may include calls to other queries. That is, the other queries are treated as subquery functions. See the subsection on "Queries as Functions".
Required privilege
WRITE_QUERY
Query parameter and return types
This table lists the supported data types for input parameters and return values.
Dynamic querying
TigerGraph 3.0+ supports Dynamic Querying. This means the query can be written and installed as a saved procedure without referencing a particular graph. Schema details -- the name of the graph, vertex types, edge types, and attributes -- can all be parameterized and only need to be specified at run time.
Here are the ingredients for a dynamic query:
Graph name: When creating a query,
FOR GRAPH graphName
is optional, as long as the graph has been specified already, either when entering gsql:GSQL -g graphName [<gsql_command>]
or once inside the GSQL shell, by using theUSE GRAPH graphName
command.Vertex type and edge type in
SELECT
statements. Typically, theFROM
clause mentions the name of specific vertex types and edge types. String or string set parameters can be used for edge and target types instead.Attribute names. The
getAttr
and setAttr functions, which take attribute name and data type as string parameters, can be used to parameterize attribute access.INSERT
statements: If you are usingINSERT
to add data to your graph, you need to specify what type of vertex or edge you want to add. This can also be parameterized.
Here is a simple example to demonstrate how to apply Dynamic GSQL Query techniques. Here is the PageRank algorithm from our GSQL Graph Algorithm library. Here is it written with schema information embedded statically in the query:
graph name = social
vertex type = Page
edge type = Link
vertex attribute = Score
Here is the same algorithm written in Dynamic Querying style:
Statement types
A statement is a standalone instruction that expresses an action to be carried out. The most common statements are data manipulation language (DML) statements. DML statements include the SELECT
, UPDATE
, INSERT INTO
, DELETE FROM
, and DELETE
statements.
A GSQL query has two levels of statements. The upper-level statement type is called query-body-level statement, or query-body statement for short. This statement type is part of either the top-level block or a query-body control flow block. For example, each of the statements at the top level directly under CREATE QUERY
is a query-body statement. If one of the statements is a CASE
statement with several THEN
blocks, each of the statements in the THEN
blocks is also a query-body statement. Each query-body statement ends with a semicolon.
The lower-level statement type is called DML-sub-level statement or DML-sub-statement for short. This statement type is used inside certain query-body DML statements, to define particular data manipulation actions. DML-sub-statements are comma-separated. There is no comma or semicolon after the last DML-sub-statement in a block. For example, one of the top-level statements is a SELECT
statement, each of the statements in its ACCUM
clause is a DML-sub-statement. If one of those DML-sub-statements is a CASE
statement, each of the statement in the THEN
blocks is a DML-sub-statement.
There is some overlap in the types. For example, an assignment statement can be used either at the query-body level or the DML-sub-level.
Guidelines for understanding statement type hierarchy:
Top-level statements are Query-Body type (each statement ending with a semicolon).
The statements within a DML statement are DML-sub statements (comma-separated list).
The blocks within a Control Flow statement have the same type as the entire Control Flow statement itself.
Here is a descriptive list of query-body statements:
Here is a descriptive list of DML-sub-statements:
INTERPRET QUERY
INTERPRET QUERY
INTERPRET QUERY
runs a query by translating it line-by-line. This is in contrast to the 2-step flow: (1) INSTALL
to pre-translate and optimize a query, then (2) RUN
to execute the installed query. The basic trade-off between INTERPRET QUERY
and INSTALL/RUN QUERY
is as follows:
INTERPRET
:Starts running immediately but may take longer to finish than running an installed query.
Suitable for ad hoc exploration of a graph or when developing and debugging an application, and rapid experimentation is desired.
Supports most but not all of the features of the full GSQL query language. See the Appendix section Interpreted GSQL Limitations.
INSTALL
/RUN
:Takes up to a minute to
INSTALL
.Runs faster than
INTERPRET
, from only a few percent faster to twice as fast.Should always be used for production environments with fixed queries.
There are two GSQL syntax options for Interpreted GSQL: Immediate mode and Saved-query mode. In addition there is also a predefined RESTful endpoint for running interpreted GSQL: POST /gsqlserver/interpreted_query
. The query body is sent as the payload of the request. The syntax is like the Immediate query option, except that it is possible to provide parameters, using the query string of the endpoint's request URL. The example below shows a parameterized query using the POST /gsqlserver/interpreted_query
endpoint. For more details, see the RESTPP API User Guide.
Required privilege
READ_DATA
if the query does not include any updates to the graph.WRITE_DATA
if the query includes updates to the graph.
Immediate mode: define and interpret
This syntax is similar in concept to SQL queries. Queries are not named, do not accept parameters, and are not saved after being run. Syntax differences from compiled GSQL:
The keyword
CREATE
is replaced withINTERPRET
.The query is executed immediately by the
INTERPRET
statement. TheINSTALL
andRUN
statements are not used.Parameters are not accepted.
Compare the example below to the example in the Create Query
section:
No query name, no parameters, no
RETURN
statement.Because no parameter is allowed, the parameter
uid
is set within the query.
Interpret a created query
This syntax is like RUN
query, except
The keyword
RUN
is replaced withINTERPRET
.Some options may not be supported.
INSTALL QUERY
INSTALL QUERY
INSTALL QUERY
installs a query or multiple queries on a graph. Installing a query compiles the procedures described by the query as well as generates a REST endpoint for running the query. Installing a query will enhance the performance of the query.
A query must be installed before it can be executed through RUN QUERY
. The INSTALL QUERY
command will install the queries specified, with query names separated by a comma.
If a query calls a subquery, the subquery must be installed before the query that calls the subquery or installed in the same INSTALL QUERY
command as the query that calls the subquery.
When a single INSTALL QUERY
command installs multiple queries, the command takes effect only if all queries are installed successfully. If one query fails to be installed, the rest of the queries specified in the command will not be installed either.
To install a query, the user needs to have the role globaldesigner
or superuser
globally, or have one of the following roles on the graph where the query is to be installed:
querywriter
designer
admin
Users can also install all uninstalled queries with INSTALL QUERY
, using either of the following commands:
INSTALL QUERY *
INSTALL QUERY ALL
Installing takes several seconds for each query. The current version does not support concurrent installation and running of queries. Other concurrent graph operations will be delayed until the installation finishes.
Required privilege
WRITE_QUERY
Options for INSTALL QUERY
INSTALL QUERY
The following options are available:
Optimize installed queries
INSTALL QUERY -OPTIMIZE
Users can run INSTALL QUERY -OPTIMIZE
to optimize all installed queries. The names of the individual queries are not needed. This operation optimizes all previously installed queries, reducing their run times by about 20%. Optimize a query if query run time is more important to you than query installation time.
RUN QUERY
Installing a query creates a REST++ endpoint. Once a query is installed, there are two ways of executing a query: through the GSQL RUN QUERY
command or an HTTP request.
Required privilege
READ_DATA
for queries that don't update the graphWRITE_DATA
for queries that update the graph
Running a query as a REST endpoint
Application developers may find it more convenient to run queries by sending an HTTP request to a REST endpoint. Installed queries have an endpoint:
https://server_ip:9000/query/<graphName>/<queryName>
If the REST++ server is local, then server_ip
is localhost. The request can use either the GET
or POST
method. The query parameter values are either included directly in the query string of the HTTP request's URL or supplied using a data payload. The basic format for the parameters as a query string is:
param1=value¶m2=value¶m3=value
The following two curl commands are each equivalent to the RUN QUERY
command above. The first gives the parameter values in the query string in a URL. This example illustrates the simple format for primitive data types such as INT
, DOUBLE
, and STRING
. The second gives the parameter values through the curl command's data payload -d
option.
where RunQueryExPara.dat
has the exact string as the query string in the first URL.
To see a list of the parameter names and types for the user-installed GSQL queries, run the following REST++ request:
curl -X GET "http://localhost:9000/endpoints?dynamic=true"
By using the data payload option, the user can avoid using a long and complex URL, as well as keep their parameters more secure. In fact, to call the same query but with different parameters, only the data payload file contents need to be changed; the HTTP request can be the same. The file loader loads the entire file, appends multiple lines into one, and uses the resulting string as the URL query string. If both a query string and a data payload are given (which we strongly discourage), both are included, where the URL query string's parameter values overwrite the values given in the data payload.
Some curl options may be either required or recommended for security, user authentication, or error messages. Please see RESTPP Requests for general advice on using TigerGraph REST endpoints.
Complex Type Parameter Passing
This subsection describes how to format the complex type parameter values when executing a query by RUN QUERY
or HTTP requests. More details about all parameter types are described in Section "Query Parameter Types".
When square brackets are used in a curl URL, the -g option or escape characters must be adopted. If the parameters are given by data payload (either by file or data payload string), the -g
option is not needed and escape characters should not be used.
Below are examples.
Payload size limit
This data payload option can accept a file up to 128MB by default. To increase this limit to xxx MB, use the following command:
The upper limit of this setting is 1024 MB. Raising the size limit for the data payload buffer reduces the memory available for other operations, so be cautious about increasing this limit.
For more detailed information about REST++ endpoints and requests, see the RESTPP API User Guide.
The following options are available when running a query:
All-vertex mode -av option
Some queries run with all or almost all vertices in a SELECT statement s, e.g. PageRank algorithm. In this case, the graph processing engine can run much more efficiently in all-vertex mode. In the all-vertex mode, all vertices are always selected, and the following actions become ineffective:
Filtering with selected vertices or vertex types. The source vertex set must be all vertices.
Filtering with the WHERE clause.
Filtering with the HAVING clause.
Assigning designated vertex or designated type of vertexes. E.g. X = { vertex_type .*}
To run the query in all-vertex mode, use the -av option in shell mode or include __GQUERY__USING_ALL_ACTIVE_MODE=true
in the query string of an HTTP request.
Detached mode -async option
Typically, the GSQL RUN QUERY
command runs in the foreground and does not produce output until the query completes, which is inconvenient in the case of long-running queries. Starting with TigerGraph 3.1, you can run queries in Detached Mode to enable background execution of long-running queries.
Queries executed in Detached mode are still subject to the system timeout limit. The default timeout limit is 16 seconds and can be set using theGSQL-TIMEOUT
header.
To run a query in Detached Mode from the command line, use the-async
option for theRUN QUERY
command:
You will receive a JSON response immediately containing a query ID (request_id
):
To run queries in Detached Mode via RESTPP endpoint call, use the GSQL-ASYNC
header and set its value to true. If the query takes parameters, put them in the query string:
To check the status and results of the queries executed in Detached Mode, use the /query_status
and the /query_result
RESTPP endpoints.
Diagnose -d option
The Diagnose -d Option is currently not supported in TigerGraph 3.x.
The diagnose option can be turned on in order to produce a diagnostic monitoring log, which contains the processing time of each SELECT block. To turn on the monitoring log, use the -d option in shell mode or __GQUERY__monitor=true
in the query string of an HTTP request.
The path of the generated log file will be shown as a part of output message. An example log is shown below:
GSQL Query output format
The standard output of GSQL queries is in industry-standard JSON format. A JSON object is an unordered set of key-value pairs, enclosed in curly braces. Among the acceptable data types for a JSON value are array and object. A JSON array is an ordered list of values, enclosed in square brackets. Since values can be objects or arrays, JSON supports hierarchical, nested structures. Strings are enclosed in double quotation marks. We also use the term field to refer to a key (or a key-value pair) of a given object.
At the top level of the JSON structure are four required fields ("version", "error", "message", and "results") and one dependent field ("code"). If a query is successful, the value of "error" will be "false", the "message" value will be empty, and the "results" value will be the intended output of the query. If an error or exception occurred during query execution, the "error" value will be "true", the "message" value will be a string message describing the error condition, and the "results" field will be empty. Also, the "code" field will contain an error code.
Beginning with version 2 (v2) of the output specification, an additional top-level field is required: "version"
. The "version"
value is an object with the following fields:
Other top-level objects, such as "code" may appear in certain circumstances. Note that the top-level objects are enclosed in curly braces, meaning that they form an unordered set. They may appear in any order.
Below is an example of the output of a successful query:
The value of the "results" key-value pair is a sequential list of the data objects specified by the PRINT statements of the query. The list order follows the order of PRINT execution. The detailed format of the PRINT statement results is described in Output Statements and FILE Objects.
The following REST response misspells the name of the endpoint
and generates the following output:
Changing the default output API
The following GSQL statement can be used to set the JSON output API configuration.
This statement sets a persistent system parameter. Each version of the TigerGraph platform is pre-configured to what was the latest output API that at the time of release. For example, platform version 1.1 is configured so that each query will produce v2 output by default.
As of TigerGraph v3.0, the only supported JSON API is "v2".
SHOW QUERY
To show the GSQL text of a query, run SHOW QUERY query_name
. The query_name
argument can use *
or ?
wildcards from Linux globbing, or it can be a regular expression when preceded by -r
. See SHOW: View Parts of the Catalog
Additionally, the ls
GSQL command lists all created queries and identifies which queries have been installed.
DROP QUERY
To drop a query, run DROP QUERY query_name
. The query will be uninstalled (if it has been installed) and removed from the dictionary. The GSQL language will refuse to drop an installed query if another query is installed which calls query Q. That is, all calling queries must be dropped before or at the same time that their called subqueries are dropped.
To drop all queries, either of the following commands can be used:
DROP QUERY ALL
DROP QUERY *
The scope of ALL depends on the user's current scope. If the user has set a working graph, then DROP ALL removes all the jobs for that graph. If a superuser has set their scope to be global, then DROP ALL removes all jobs across all graph spaces.
Last updated