Spark Connection Via JDBC Driver

Apache Spark is a popular big data distributed processing system which is frequently used in data management ETL process and Machine Learning applications.

Using the open-source type 4 JDBC Driver for TigerGraph, you can read and write data between Spark and TigerGraph. This is a two-way data connection.

Diagram showing streaming and static data sources using Apache Spark and the TigerGraph JDBC server to perform two-way communication with TigerGraph services. The TigerGraph services shown are Graph Insights and applications, graph business intelligence, and graph visualization, all under the theme of Real-Time Graph Analytics.
Figure 1. Apache Spark, containing streaming and static data sources, uses the JDBC driver to communicate back and forth with TigerGraph in order to perform graph analytics.

The Github Link to the JDBC Driver is https://github.com/tigergraph/ecosys/tree/master/tools/etl/tg-jdbc-driver
The README file there provides more details.

TigerGraph JDBC connector is streaming in data via REST endpoints. No data throttle mechanism is in place yet. When the incoming concurrent JDBC connection number exceeds the configured hardware capacity limit, the overload may cause the system to stop responding.

If you use spark job to connect TigerGraph via JDBC, we recommend your concurrent spark loading jobs be capped at 10 with the following per job configuration. This limits the concurrent JDBC connections to 40.

—num-executors 2  /* 2 executors per job */
—executor-cores 2 /* 1 executor take 2 cores */

Note that the TigerGraph JDBC Driver has more use cases than just as a Spark Connection. You can use TigerGraph’s JDBC Driver for your Java and Python applications. This is an open source project.