TigerGraph ML Workbench

TigerGraph Machine Learning (ML) Workbench is a Jupyter-based Python development framework offered either as a fully-managed, on-cloud offering or as an integrated on-prem solution for direct integration with existing machine learning infrastructure.

It is designed for data scientists and AI/ML practitioners to easily develop graph-based machine learning models with production-scale graph data stored in TigerGraph. It provides robust and efficient data pipelines at the Python level to interact with the TigerGraph database to perform common data processing functions such as:

  • data partitioning

  • subgraph sampling

  • graph feature generation

  • data batching and/or streaming for ML model development, training, and inference purposes.

TigerGraph ML Workbench on Cloud also provides an end-to-end solution from development to deployment including a fully-managed Jupyter notebook instance with:

  • on-demand computation

  • distributed AutoML for hyperparameter tuning

  • ML pipeline management & orchestration

  • model deployment and serverless production serving.

TigerGraph ML Workbench is designed to be infrastructure- and ML framework-agnostic, working on-prem or in a private cloud. It is compatible with other popular ML frameworks such as PyTorch Geometric, DGL, and TensorFlow. It can be plugged into your existing on-prem infrastructure, or in the Cloud with Amazon SageMaker, Google Vertex AI or Microsoft Azure.

High-level architecture

The TigerGraph ML Workbench contains three major components:

  • TigerLab, a JupyterLab-based IDE

  • pyTigerGraph, the Python client for GDPS

  • User Defined Functions (UDFs) on TigerGraph DB

high level architecture v1
High level architecture

ML Workbench Jupyter Plugin

The ML Workbench Jupyter Plugin is a JupyterLab-based development environment with TigerGraph specific utilities and components, such as a server manager and link to GraphStudio. In addition, all Python libraries such as PyTorch Geometric, DGL, and TGML come pre-installed, so you don’t have to worry about setting up the right Python environment.

The plugin is included in the sandbox option for getting started with ML Workbench.

pyTigerGraph

pyTigerGraph is a Python package installed on the computer or server that does the machine learning training. The tgml package provides utilities such as vertex set splitting for training, validation, and testing, as well as graph data loaders for both PyTorch Geometric (PyG) and Deep Graph Library (DGL). As tgml is a Python package, it can be installed anywhere Python is used.

UDF on TigerGraph DB

In order for pyTigerGraph to work with the TigerGraph Database, some of its functionality, such as dataloading and the featurizer, require certain user defined functions (UDFs) to be installed on the TigerGraph Database instance. After activation, these UDFs will be deployed to your database and you can successfully use pyTigerGraph.

Graph neural networks and their applications

GNNs tend to outperform other machine learning techniques when there are well-defined relationships between data as it directly models the connectivity of your graph data. From recent research, GNNs have proven its success across various business domains and applications. With TigerGraph ML Workbench, you can now easily explore the potentials of GNN for your domains. Below are some papers and resources to spark ideas in a range of applications and industries:

Recommendation Engines

Pinterest introduced PinSAGE[1], an architecture that can serve real-time recommendations to their users, resulting in a 10-30% improvement compared to other deep learning methods when evaluated in A/B testing.

Supply Chain

Amazon released a GNN architecture[2] that incorporates temporal information with GNNs for demand forecasting. The method models interactions between products and their sellers on Amazon in a graph, resulting in a 16% improvement over other state-of-the-art forecasting methods.

Healthcare

AstraZeneca has used graph neural networks like GraphSAGE to generate knowledge graph embeddings for predicting possible drug-drug interactions such as possible synergies between drugs, as well as possible polypharmacy side effects[3]. Additionally, the possibility of repurposing drugs to treat COVID has been studied using a drug repurposing knowledge graph and GNNs[4].

Financial Institutions

GCNs have been studied for predicting money-laundering behavior in Bitcoin transaction networks, and have been shown to perform admirably compared to other approaches[5].

If you are interested in learning more about the fundamental research on different variations of Graph Neural Network, here is a list of helpful publications:


1. Ying, Rex et al. “Graph Convolutional Neural Networks for Web-Scale Recommender Systems”, Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018.
2. Ankit Gandhi, Aakankasha, Sivaramakrishnan Kaveri, Vineet Chaoji, “Spatio-temporal multi-graph networks for demand forecasting in online marketplaces”
3. Benedek Rozemberczki, Stephen Bonner, Andriy Nikolov, Michael Ughetto, Sebastian Nilsson, Eliseo Papa, “A Unified View of Relational Deep Learning for Drug Pair Scoring”, CoRR, November 2021.
4. Hsieh, K., Wang, Y., Chen, L. et al. “Drug repurposing for COVID-19 using graph neural network and harmonizing multiple evidence”, Sci Rep 11, 23179, 2021.
5. Mark Weber, Giacomo Domeniconi, Jie Chen, Daniel Karl I. Weidele, Claudio Bellei, Tom Robinson, and Charles E. Leiserson, “Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics”, In Proceedings of ACM Conference (KDD ’19 Workshop on Anomaly Detection in Finance), 2019.