Algorithms as GSQL Queries

Table of Contents

Check for data or schema constraints
Create query
- Create a distributed mode algorithm
Install query
Run query

TigerGraph provides an open-source GitHub repository https://github.com/tigergraph/gsql-graph-algorithms with the full text of each query algorithm written in GSQL. Therefore, the three-step process (CREATE,INSTALL,RUN) to run any query will also work for GDS algorithms.

GraphStudio lets you select the algorithms you’re interested in from a menu and then combines the CREATE and INSTALL steps.
The pyTigerGraph Python library offers even more automated: when you RUN, if will CREATE and INSTALL as needed. See https://github.com/tigergraph/graph-ml-notebooks/tree/main/algos for example scripts.

The algorithms are divided into several categories. Each algorithm has its own folder, where you can find the implementation of the algorithm, any user-defined functions required to install the algorithm, or different versions of the algorithm if they exist.

Some query algorithms rely on subqueries, which need to be created as separate queries and must be installed together.

Any version of TigerGraph can use non-packaged queries.

Check for data or schema constraints

Most algorithm queries in the GDS Library are schema-free, meaning that you are able to run the query on any schema. However, some algorithms have certain schema or data constraints by nature. Make sure to read the documentation for the algorithm to determine the following:

Does the algorithm require edges to be directed/undirected?
Does the algorithm require edges to be weighted/unweighted?
Does the algorithm require any vertex type to have an attribute of a certain data type?
Does the algorithm require your data to have been processed in a certain way before it runs?

For example, k-Nearest Neighbors runs on graphs with either directed or undirected edges, but the edges must have a weight attribute.

Another example is the Fast Random Projection algorithm, which expects the vertex type to have an attribute of type LIST<DOUBLE> if you want to store the embedding results to your graph data.

Create query

If the algorithm you want is not yet installed in your TigerGraph instance, and if you do not use the GraphStudio simplified installation process, then you can install the algorithm as you would add any query to the database.

You can create the query in the following ways:

GSQL Shell
GraphStudio Library Import
GraphStudio Source Code

Locate the query in the GDS Library GitHub repository. It is a .gsql file named after the query.
Copy the entire contents of the query file, which is the command to create the query, and paste it into a file on the machine running TigerGraph.
Log in to the GSQL shell as a user with query writing privileges for the graph on which you want to create the query.
Run @<file path> from the GSQL shell, and replace <file path> with the absolute path to the file where you copied the query. For example, if your filepath is /home/tigergraph/query/pagerank.gsql, run @/home/tigergraph/query/pagerank.gsql from the GSQL shell.

This method is easier than copying and pasting the source code, but not all queries, subqueries, and query variants are available using this method.

In the Query Editing Panel in GraphStudio, click the New GSQL Query button (a circular green button with a plus sign +) and click Choose from library in the next window that pops up. Select any queries from the list that you would like to install.

This method is more complex than the library import, but is suitable for all queries, subqueries, and query variants, including any library algorithms that may not yet be available via the library import method.

Saving a query in GraphStudio does not create the query in GSQL.

Locate the query in the GDS Library GitHub repository. It is a .gsql file named after the query.
Copy the entire content of the query file. This is the command to create the query.
Log in to GraphStudio as a user with query writing privileges for the graph on which you want to create the query.
Click Global View in the top-left corner and choose the graph to use.
Click Write Queries on the left side navigation. Click + to add a new query and enter the query name. This name must be the same as the name in the CREATE QUERY command.
Paste the CREATE QUERY command into the query and save the query.

Create a distributed mode algorithm

If you are running the query on a multi-node cluster, you should consider modifying the query to be in distributed mode, before you execute CREATE QUERY. In general, distributed mode is likely to improve the performance of a query if the query meets the following conditions:

The query starts at a very large set of starting point vertices.
The query performs many hops.

For example, algorithms that compute a value for every vertex or one value for the entire graph should use distributed mode. This includes PageRank, Centrality, and Connected Component algorithms.

To change a query definition from single-server (default) to distribution, edit the first line of the query. Change the CREATE QUERY at the beginning to CREATE DISTRIBUTED QUERY.

Install query

Installing a query allows the algorithm query to access all features offered by the GSQL Query language. It also increases the performance of the query.

While CREATE QUERY can only be run as a GSQL command or with the GraphStudio visual interface, you have four options for installing query.

Suppose we want to install tg_pagerank.

GSQL Shell
GraphStudio Library Import
GraphStudio Source Code
REST endpoint
pyTigerGraph python

INSTALL QUERY tg_pagerank

If you used this method to select algorithms, then they shold already be install. Check the list of queries in GraphStudio to see if they indeed got added and installed.

If you created the query by copying and pasting the code into the query editing panel, then use GraphStudio’s Save Query and Install Query buttons.

  curl <user_credentials> GET 'http://locahost:8123/gsql/queries/install?graph=MyGraph&queries=tgPageRank

See Built-in Endpoints: Install a query for more details.

The install step is usually not needed as a separate step when using pyTigerGraph, because when you run runAlgorithm(), the Featurizer will automatically install the algorithm if needed. Moreover, if the algorithm has not been added to TigerGraph yet, the runAlgorithm() will first fetch the latest version of the algorithm from TigerGraph’s public GitHub repository that is compatible with your database.

You only need to use installAlgorithm() if you have a customized algorithm or what to use a different version of the algorithm.

Locate the algorithm’s .gsql in the GDS Library GitHub repository.
Copy the file to a location assessible to your Python code.
Assuming you have already created a database connection object called conn, instantiate a Featurizer object and then invoke its installAlgorithm() method.

f = conn.gds.featurizer()
f.installAlgorithm("tg_pagerank", query_path="<path_to_query_file>")

See Featurizer for more details.

Run query

Once the query has been installed, you can run the query on your graph data. Installing a query also creates a REST endpoint you can use to call the query.

GSQL Shell
GraphStudio
REST endpoint
pyTigerGraph python

RUN QUERY tg_pagerank({"v_type":"Person", "e_type":"Likes"})

Run the algorithm as you would run any query in GraphStudio.

Create a file containing the input parameters in JSON format:

pagerank_param.txt

{"v_type":"Person", "e_type":"Likes"}

  curl <user_credentials> POST --data-binary @./pagerank_param.txt "http://localhost:9000/query/MyGraph/tg_pagerank"

See Built-in Endpoints: Run an installed query for more details.

runAlgorithm() not only runs an algorithm. It will first fetch an algorithm from TigerGraph’s public GitHub repository, if needed, and install an algorithm, if needed.

f = conn.gds.featurizer()
f.runAlgorithm("tg_pagerank", {"v_type":"Person", "e_type":"Likes"})

See Featurizer for more details.