Integrate Workbench with AWS SageMaker
This guide walks you through integrating the ML Workbench with a notebook on AWS SageMaker.
1. Prerequisites
-
A running AWS SageMaker notebook instance.
-
Port 8000 on TigerGraph instance is accessible by your notebook instance.
2. Install kernel
-
From the Notebook instances page, click Open JupyterLab to navigate to JupyterLab and open a terminal.
-
From the terminal, run the following command to install the
tigergraph-torch
Python kernel. Choose the appropriate command depending on whether you are using a CPU or GPU for training: -
After the installation finishes, refresh your browser. You should see a new Python kernel named "conda_tigergraph-torch-cpu" or "conda_tigergraph-torch-gpu" on the launch page. This Python kernel includes all packages required for machine learning with TigerGraph.
Having installed the kernel, the next step is to deploy GDPS on your TigerGraph instance. Since AWS SageMaker runs an older version of JupyterLab, you need to install GDPS manually on the TigerGraph server to communicate with the database if GDPS is not already installed.
3. Install GDPS
Before you install GDPS, make sure that port 8000 is open on your machine or Docker container. For Docker containers, make sure that port 8000 is mapped to an appropriate port that the Workbench can connect to.
To install GDPS, follow these steps:
-
On the machine that hosts the TigerGraph server (or the Docker container running the TigerGraph Server image), create a directory for GDPS and navigate to it:
mkdir -p tg_gdps && cd tg_gdps
.
-
Download GDPS for your operating system to the folder you just created.
-
Run
chmod +x
on the file you downloaded to make it executable. -
Run the executable to start GDPS with the default configurations.
:~$ mkdir -p tg_gdps && cd tg_gdps
:~/tg_gdps$ wget https://tg-mlworkbench.s3.us-west-1.amazonaws.com/gdps/start_gdps_linux --no-check-certificate
--2022-03-29 22:08:02-- https://tg-mlworkbench.s3.us-west-1.amazonaws.com/gdps/start_gdps_linux
Resolving tg-mlworkbench.s3.us-west-1.amazonaws.com (tg-mlworkbench.s3.us-west-1.amazonaws.com)... 52.219.193.58
Connecting to tg-mlworkbench.s3.us-west-1.amazonaws.com (tg-mlworkbench.s3.us-west-1.amazonaws.com)|52.219.193.58|:443... connected.
WARNING: cannot verify tg-mlworkbench.s3.us-west-1.amazonaws.com's certificate, issued by ‘CN=Amazon,OU=Server CA 1B,O=Amazon,C=US’:
Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 200 OK
Length: 49905264 (48M) [binary/octet-stream]
Saving to: ‘start_gdps_linux’
start_gdps_linux 100%[===================>] 47.59M 7.40MB/s in 6.7s
2022-03-29 22:08:09 (7.12 MB/s) - ‘start_gdps_linux’ saved [49905264/49905264]
:~/tg_gdps$ chmod +x start_gdps_linux
:~/tg_gdps$ ./start_gdps_linux
INFO: Started server process [36]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO:uvicorn.error:Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:uvicorn.error:Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Test that the install is working by opening a new terminal instance and running curl http://0.0.0.0:8000/ping
.
The response on a working install will be {"message":"GDPS is running"}
.
3.1. GDPS configurations
Here is a list of configurable options for the GDPS service:
Option | Definition | Default |
---|---|---|
|
The IP address of the database host. |
Normally it is If the database is running on an HTTPS server, use the full address of the server. |
|
The port for TigerGraph’s RESTPP server. |
|
|
The port for the GSQL server. |
|
|
Whether the TigerGraph database is running on a cluster. |
|
|
Where GDPS can read the output files from the database. |
|
|
Where to generate temporary output files from the database. |
Normally this is the same as |
|
Whether to keep temporary files. Temporary files are generated in the temporary output folder while the service is running. We do not recommend keeping these files unless you are debugging. |
|
|
Whether to use the default TigerGraph username and password ( |
|
To configure GDPS, set the values for the configurations through environment variables before starting GDPS.
For example, if you run the following line:
tg_host=http://127.0.0.1 local_output_path=/home/tigergraph/tmp ./start_gdps_linux
you set tg_host
to be http://127.0.0.1
and local_output_path
to be /home/tigergraph/tmp
when you start GDPS.
4. Next steps
Go to the Git extension on the left navigation panel and clone the repository that contains the tutorials. You are now ready to start following the tutorials and train your first model.