Writing Output to Cloud Storage
This guide shows you how to configure TigerGraph to write query results to AWS S3 or S3-compatible storage systems, such as Ceph, MinIO, or Wasabi.
To write to an S3 bucket, you need:
-
Access Key ID
-
Secret Access Key
-
Endpoint URL for S3-compatible systems
Configure the S3 Connection
You can configure S3 access in three ways:
-
Cluster-wide using gadmin
-
Per-session using GSQL parameters
-
Via RESTPP headers
Choose the method based on whether you want settings to apply to all users or just your session.
Use gadmin Config (Cluster-Wide)
Set S3 credentials for the entire TigerGraph cluster.
-
Open your terminal.
-
Run these commands:
gadmin config set GPE.QueryOutputS3AWSAccessKeyID <your-access-key-id> gadmin config set GPE.QueryOutputS3AWSSecretAccessKey <your-secret-access-key> gadmin config apply -y gadmin restart gpe -y -
Replace
<your-access-key-id>and<your-secret-access-key>with your S3 credentials.
|
gadmin supports only Access Key ID and Secret Access Key. For S3-compatible systems like Ceph, set the endpoint URL using GSQL session parameters or RESTPP headers, as gadmin doesn’t support it. |
Use GSQL Session Parameters
Set credentials and the endpoint URL for your current GSQL session. This method suits users working with their own buckets or testing different S3-compatible systems.
-
Open the GSQL shell.
-
Run these commands:
SET s3_aws_access_key_id = "<your-access-key-id>" SET s3_aws_secret_access_key = "<your-secret-access-key>" SET s3_endpoint = "http://<your-s3-host>" -
Replace the placeholders:
-
<your-access-key-id>: Your S3 access key. -
<your-secret-access-key>: Your S3 secret key. -
<your-s3-host>: Your S3-compatible server URL, includinghttp://orhttps://(for example,http://192.168.99.1:8080for Ceph).
-
|
Use RESTPP Headers (API Calls)
Set S3 parameters in HTTP headers for queries run via RESTPP, such as with curl. This method does not require setting parameters in a GSQL session and is ideal for API integrations or one-off queries.
-
Open your terminal.
-
Run a curl command:
curl -X GET \ -H "GSQL-S3AWSAccessKeyId: <your-access-key-id>" \ -H "GSQL-S3AWSSecretAccessKey: <your-secret-access-key>" \ -H "GSQL-S3Endpoint: http://<your-s3-host>" \ 'http://<tigergraph-host>:14240/restpp/query/<graph-name>/<query-name>?param=<s3-path>' -
Replace the placeholders:
-
<your-access-key-id>: Your S3 access key. -
<your-secret-access-key>: Your S3 secret key. -
<your-s3-host>: Your S3-compatible server URL, includinghttp://orhttps://. -
<tigergraph-host>: Your TigerGraph server address (for example,10.244.106.233). -
<graph-name>: Your graph name. -
<query-name>: Your query name. -
<s3-path>: The S3 path.
-
Unique File Paths to Avoid Conflicts
Since S3 is a shared storage system, multiple nodes in a cluster can upload to the same S3 bucket. To avoid naming conflicts (multiple nodes trying to write to the same file), the S3 path will include a prefix based on the instance name:
-
Instance Name: A prefix like GPE_{PartitionId}_{ReplicaId} ensures uniqueness by identifying the instance that generated the output.
-
Role: For distributed queries, additional suffixes will be used to differentiate between the manager and worker roles on the same GPE:
-
Coordinator: The node managing the query (written as .coordinator).
-
Worker: The node processing the query (written as .worker).
-
So, your S3 file paths might look like:
-
GPE_{PartitionId}_{ReplicaId}.coordinator
-
GPE_{PartitionId}_{ReplicaId}.worker
Example
Consider a scenario where a 3 x 2 cluster is executing a distributed query, and the results are being saved to a file called queryResults.
In this case, the cluster has 3 partitions, each with 2 replicas, and the query is distributed across multiple nodes.
The output files generated by the query would be named as follows:
GPE_0_0.worker.queryResults.csv – This file represents the output from the worker node in partition 0, replica 0. GPE_0_1.coordinator.queryResults.csv – This file represents the output from the coordinator node in partition 0, replica 1.
These unique file names help ensure that no conflicts occur when multiple nodes are writing query outputs at the same time.
Troubleshoot Common Issues
-
“Cannot determine target” error: You didn’t set
s3_endpointfor an S3-compatible system. Add it in session parameters or RESTPP headers. -
Connection fails on Ceph, MinIO, or Wasabi: Verify the endpoint URL includes
http://orhttps://. Check your bucket permissions and credentials.