CDC Setup
When using the CDC feature, a “CDC service” running in GPE nodes will process data updates ("deltas") and write CDC messages to the external Kafka maintained by the user.
Users are required to establish and manage an external Kafka service independently from TigerGraph. The TigerGraph CDC service will then generate CDC messages directed to the external Kafka service. For guidance on setting up the external Kafka service, refer to the Official Apache Kafka documentation. |
Setup Configuration
TigerGraph employs librdkafka 1.1.0 for the Kafka producer in the CDC service. Refer to the Global configuration properties and the Topic configuration properties sections of the librdkafka documentation for all other properties not mentioned in this guide, noting the applicable ones marked with “P”(Producer) or with “*” for both Producer and Consumer.
To configure CDC Producer and Topic settings in TigerGraph, utilize gadmin commands below.
If you use After modifying, run the following command to apply the changes:
|
CDC Configuration Parameters
System.CDC.Enable
gadmin config set System.CDC.Enable true
gadmin config set System.CDC.Enable false
CDC messages will only generate after the CDC is enabled and services have been applied and restarted.
System.CDC.ProducerConfig
gadmin config entry System.CDC.ProducerConfig
This configuration entry is designated for the CDC producer.
Properties are passed through a file, with each line adhering to the format <property name>=<property value>
separated by “new line”.
It is mandatory to include the property bootstrap.servers
, specifying the IP and port for the broker(s) that the CDC producer connects to as in the example below:
mkdir -p /home/tigergraph/test_cdc # The ip:port is bind to target external kafka server echo -e "bootstrap.servers=$(gmyip):9092\nenable.idempotence=true" > /home/tigergraph/test_cdc/cdc_producer_config gadmin config set System.CDC.ProducerConfig @/home/tigergraph/test_cdc/cdc_producer_config
System.CDC.TopicConfig
gadmin config entry System.CDC.TopicConfig
Utilize a file to pass properties, separating them with a "new line."
The prescribed format for each line is <property name>=<property value>
.
It is imperative to employ the property name to designate the CDC topic, such as name=<CDC topic name>
, as in the following example:
echo -e "name=cdc_topic" > /home/tigergraph/test_cdc/cdc_topic_config gadmin config set System.CDC.TopicConfig @/home/tigergraph/test_cdc/cdc_topic_config
For other configuration settings please see the Other Configuration Settings Table.
Setup Tutorial
This tutorial will walk you through how to set up a TigerGraph CDC service.
-
Setup external Kafka service for CDC messages
First make the folder it will be downloaded to:mkdir -p /home/tigergraph/test_cdc/download_kafka
Use this package:`https://archive.apache.org/dist/kafka/3.3.1/kafka_2.13-3.3.1.tgz`
And run this command to download Kafka to the folder that was just created:curl https://archive.apache.org/dist/kafka/3.3.1/kafka_2.13-3.3.1.tgz | tar -xzf - -C "/home/tigergraph/test_cdc/download_kafka"
Check if it’s successfully downloaded and extracted with this command:ls -l /home/tigergraph/test_cdc/download_kafka
Next, start a Zookeeper server
Open a new terminal to start the Zookeeper service. Use the default configurationZookeeper.properties
, where it is using default port2181
:KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 $KAFKA_ROOT/bin/zookeeper-server-start.sh $KAFKA_ROOT/config/zookeeper.properties
Now, start a Kafka server
Use the default configurationserver.properties
, where it is using default port9092
:KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 $KAFKA_ROOT/bin/kafka-server-start.sh $KAFKA_ROOT/config/server.properties
To listen to messages produced from remote servers, edit the
server.properties
to addlisteners=PLAINTEXT://<my ip>:9092
. For the value of<my ip>
, use the commandifconfig
orip addr show
, and find the ip afterinet
.(Optional) clear Kafka topic
Run this command to clear existing old Kafka messages in the Kafka.KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 MYIP=127.0.0.1 $KAFKA_ROOT/bin/kafka-topics.sh --bootstrap-server $MYIP:9092 --delete --topic cdc_topic
-
Setup TigerGraph CDC service
Now, start the CDC service in TigerGraph.
Use the setup configuration commands as followed.System.CDC.ProducerConfig System.CDC.TopicConfig System.CDC.Enable MYIP=127.0.0.1 echo -e "bootstrap.servers=$MYIP:9092\nenable.idempotence=true" > /home/tigergraph/test_cdc/cdc_producer_config echo -e "name=cdc_topic" > /home/tigergraph/test_cdc/cdc_topic_config gadmin config set System.CDC.ProducerConfig @/home/tigergraph/test_cdc/cdc_producer_config gadmin config set System.CDC.TopicConfig @/home/tigergraph/test_cdc/cdc_topic_config gadmin config set System.CDC.Enable true gadmin config apply gadmin restart gpe restpp
-
Test TigerGraph CDC service
Once the service is up and running, test it, by making an update to an existing graph with Data Modification Statements.
Statements like:
-
Running a custom or built-in query
-
Running a loading job.
If an existing graph is not available, create a new graph by following TigerGraph’s GSQL 101 tutorial documentation and using the provided Example Graphs data.
-
Lastly, check CDC messages.
To consume and display CDC messages, run:KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 MYIP=127.0.0.1 $KAFKA_ROOT/bin/kafka-console-consumer.sh --topic cdc_topic --from-beginning --bootstrap-server $MYIP:9092
Other Configuration Settings Table
Command |
Name |
Description |
Default (Unit: Value) |
|
|
When a GPE service shuts down, CDC will try to flush all generated cdc messages to external kafka. |
ms: -1. When set to -1, there is an infinite timeout, which may slow the GPE shutdown. |
|
In-memory buffer limit for delta message in CDC service. |
megabytes: 10. |
|
|
In-memory buffer limit for “vertex-deletion“ delta message in deleted id map service. |
megabytes: 100. |
|
|
In-memory cache limit for deleted id map. |
megabytes: 1024. |
|
|
Interval for purging outdated entries in deleted id map. |
minutes: 30. |