Integrate Workbench with AWS SageMaker

This guide walks you through integrating the ML Workbench with a notebook on AWS SageMaker.

1. Prerequisites

  • A running AWS SageMaker notebook instance.

  • Port 8000 on TigerGraph instance is accessible by your notebook instance.

2. Install kernel

  1. From the Notebook instances page, click Open JupyterLab to navigate to JupyterLab and open a terminal.

  2. From the terminal, run the following command to install the tigergraph-torch Python kernel. Choose the appropriate command depending on whether you are using a CPU or GPU for training:

    • For CPU

    • For GPU

    $ conda env create -f https://raw.githubusercontent.com/TigerGraph-DevLabs/mlworkbench-docs/main/conda_envs/tigergraph-torch-cpu.yml
    $ conda env create -f https://raw.githubusercontent.com/TigerGraph-DevLabs/mlworkbench-docs/main/conda_envs/tigergraph-torch-gpu.yml
  3. After the installation finishes, refresh your browser. You should see a new Python kernel named "conda_tigergraph-torch-cpu" or "conda_tigergraph-torch-gpu" on the launch page. This Python kernel includes all packages required for machine learning with TigerGraph.

Having installed the kernel, the next step is to deploy GDPS on your TigerGraph instance. Since AWS SageMaker runs an older version of JupyterLab, you need to install GDPS manually on the TigerGraph server to communicate with the database if GDPS is not already installed.

3. Install GDPS

Before you install GDPS, make sure that port 8000 is open on your machine or Docker container. For Docker containers, make sure that port 8000 is mapped to an appropriate port that the Workbench can connect to.

To install GDPS, follow these steps:

  1. On the machine that hosts the TigerGraph server (or the Docker container running the TigerGraph Server image), create a directory for GDPS and navigate to it:

mkdir -p tg_gdps && cd tg_gdps.

  1. Download GDPS for your operating system to the folder you just created.

  2. Run chmod +x on the file you downloaded to make it executable.

  3. Run the executable to start GDPS with the default configurations.

Install process on Docker container
:~$ mkdir -p tg_gdps && cd tg_gdps

:~/tg_gdps$ wget https://tg-mlworkbench.s3.us-west-1.amazonaws.com/gdps/start_gdps_linux --no-check-certificate

--2022-03-29 22:08:02--  https://tg-mlworkbench.s3.us-west-1.amazonaws.com/gdps/start_gdps_linux
Resolving tg-mlworkbench.s3.us-west-1.amazonaws.com (tg-mlworkbench.s3.us-west-1.amazonaws.com)... 52.219.193.58
Connecting to tg-mlworkbench.s3.us-west-1.amazonaws.com (tg-mlworkbench.s3.us-west-1.amazonaws.com)|52.219.193.58|:443... connected.
WARNING: cannot verify tg-mlworkbench.s3.us-west-1.amazonaws.com's certificate, issued by ‘CN=Amazon,OU=Server CA 1B,O=Amazon,C=US’:
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 200 OK
Length: 49905264 (48M) [binary/octet-stream]
Saving to: ‘start_gdps_linux’

start_gdps_linux    100%[===================>]  47.59M  7.40MB/s    in 6.7s

2022-03-29 22:08:09 (7.12 MB/s) - ‘start_gdps_linux’ saved [49905264/49905264]

:~/tg_gdps$ chmod +x start_gdps_linux

:~/tg_gdps$ ./start_gdps_linux

INFO:     Started server process [36]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:uvicorn.error:Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:uvicorn.error:Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Test that the install is working by opening a new terminal instance and running curl http://0.0.0.0:8000/ping. The response on a working install will be {"message":"GDPS is running"}.

3.1. GDPS configurations

Here is a list of configurable options for the GDPS service:

Option Definition Default

tg_host

The IP address of the database host.

Normally it is http://localhost since GDPS should run on the same machine as the database.

If the database is running on an HTTPS server, use the full address of the server. https://localhost will not work in this case.

tg_rest_port

The port for TigerGraph’s RESTPP server.

9000

tg_gs_port

The port for the GSQL server.

14240

tg_cluster_mode

Whether the TigerGraph database is running on a cluster.

False

local_output_path

Where GDPS can read the output files from the database.

/home/tigergraph/output

tg_output_path

Where to generate temporary output files from the database.

Normally this is the same as local_output_path, which is also the default. However, if the database is running in a container, this should be the path inside the container that is mounted to local_output_path.

keep_tmp_files

Whether to keep temporary files. Temporary files are generated in the temporary output folder while the service is running. We do not recommend keeping these files unless you are debugging.

False

tg_default_credentials

Whether to use the default TigerGraph username and password (tigergraph:tigergraph) to authenticate with the database. If true, any authentication header will be ignored.

False

To configure GDPS, set the values for the configurations through environment variables before starting GDPS.

For example, if you run the following line:

tg_host=http://127.0.0.1 local_output_path=/home/tigergraph/tmp ./start_gdps_linux

you set tg_host to be http://127.0.0.1 and local_output_path to be /home/tigergraph/tmp when you start GDPS.

4. Next steps

Go to the Git extension on the left navigation panel and clone the repository that contains the tutorials. You are now ready to start following the tutorials and train your first model.