CDC Setup
When using the CDC feature, a “CDC service” running in GPE nodes will process data updates ("deltas") and write CDC messages to the external Kafka maintained by the user.
|
Setup Configuration
TigerGraph 4.2+ employs librdkafka 2.5.3 for the Kafka producer in the CDC service. Refer to the Global configuration properties and the Topic configuration properties sections of the librdkafka documentation for all other properties not mentioned in this guide, noting the applicable ones marked with “P”(Producer) or with “*” for both Producer and Consumer.
CDC Configuration Parameters
System.CDC.Enable
Controls whether CDC is enabled or disabled.
Enable or disable CDC using:
gadmin config set System.CDC.Enable true // To enable CDC
gadmin config set System.CDC.Enable false //To disable CDC
CDC messages are generated only after enabling the CDC service and restarting the system. |
System.CDC.ProducerConfig
Specifies properties for the CDC producer.
To update properties non-interactively, create a file (e.g., cdc_producer_config
) where each line has the
format <property name>=<property value>
separated by “new line”.
Then use
gadmin config set System.CDC.ProducerConfig @<filename>
to read in the settings.
It is mandatory to include the property bootstrap.servers , which specifies the IP and port of the external Kafka cluster for CDC.
|
Example:
mkdir -p /home/tigergraph/test_cdc # The ip:port is bind to target external kafka server echo -e "bootstrap.servers=$(gmyip):9092\nenable.idempotence=true" > /home/tigergraph/test_cdc/cdc_producer_config gadmin config set System.CDC.ProducerConfig @/home/tigergraph/test_cdc/cdc_producer_config
If you prefer to enter the property values interactively, use
gadmin config entry System.CDC.ProducerConfig
This will walk you through the full set of properties for this component, with a description and the current value for each item.
The full list of entries for configuration file of System.CDC.ProducerConfig
can be viewed in the table "Global configuration properties" at https://github.com/confluentinc/librdkafka/blob/v2.5.3/CONFIGURATION.md#global-configuration-properties.
Only properties marked with P(Producer) or *(both Producer and Consumer) are applicable. |
Kafka Security
To secure communication with the external Kafka cluster for CDC, configure authentication settings in System.CDC.ProducerConfig.
|
Example 1: Authenticating with SASL/PLAIN
security.protocol=SASL_PLAINTEXT sasl.mechanisms=PLAIN sasl.username=<username> sasl.password=<password>
Example 2: Authenticating with SASL/GSSAPI
security.protocol=SASL_PLAINTEXT sasl.mechanism=GSSAPI sasl.kerberos.service.name=kafka sasl.kerberos.principal=<user@EXAMPLE.COM> sasl.kerberos.keytab=</path/to/user.keytab>
Example 3: Authenticating with SASL/PLAIN and Encrypted with SSL
security.protocol=SASL_SSL sasl.mechanisms=PLAIN sasl.username=<username> sasl.password=<password> ssl.ca.location=<path/to/ca.pem> ssl.certificate.location=<path/to/cert.pem> ssl.key.location=<path/to/key.pem>
For more details on SASL with librdkafka, refer to the: https://github.com/confluentinc/librdkafka/wiki/Using-SASL-with-librdkafka
System.CDC.TopicConfig
To update properties non-interactively, create a file (e.g., cdc_producer_config
) where each line has the
format <property name>=<property value>
separated by “new line”.
Then use
gadmin config set System.CDC.TopicConfig @<filename>
to read in the settings.
It is mandatory to include the property name to designate the CDC topic, as in the following example:
|
echo -e "name=cdc_topic" > /home/tigergraph/test_cdc/cdc_topic_config gadmin config set System.CDC.TopicConfig @/home/tigergraph/test_cdc/cdc_topic_config
The full list of entries for configuration file of System.CDC.TopicConfig
can be viewed in the table "Topic configuration properties" at https://github.com/edenhill/librdkafka/blob/v2.5.3/CONFIGURATION.md#topic-configuration-properties.
Only properties marked with P(Producer) or *(both Producer and Consumer) are applicable. |
For other configuration settings please see the Other Configuration Settings Table.
Setup Tutorial
This tutorial will walk you through how to set up a TigerGraph CDC service.
-
Set Up the External Kafka Cluster for CDC
If you already have a running external Kafka cluster for CDC, this step can be skipped.
Create the directory where Kafka will be downloaded:mkdir -p /home/tigergraph/test_cdc/download_kafka
Use this package:`https://archive.apache.org/dist/kafka/3.3.1/kafka_2.13-3.3.1.tgz`
Download Kafka:curl https://archive.apache.org/dist/kafka/3.3.1/kafka_2.13-3.3.1.tgz | tar -xzf - -C "/home/tigergraph/test_cdc/download_kafka"
Verify the download:ls -l /home/tigergraph/test_cdc/download_kafka
-
Start the External Zookeeper and Kafka Cluster for CDC
-
Start the External Zookeeper Instance.
Use the default configuration
zookeeper.properties
, where it is using default port2181
:KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 $KAFKA_ROOT/bin/zookeeper-server-start.sh $KAFKA_ROOT/config/zookeeper.properties
-
Start the External Kafka Cluster for CDC.
Use the configuration file
server.properties
, where it is using default port9092
.If you have a cluster environment or if the external Kafka server is not local to the GPE servers, you need to add the following line to
server.properties
file, to enable listening to messages from remote servers:listeners=PLAINTEXT://<my_ip>:9092
To determine the value for
<my ip>
, use the commandifconfig
orip addr show
, and find the ip afterinet
.Command to start a Kafka server:KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 $KAFKA_ROOT/bin/kafka-server-start.sh $KAFKA_ROOT/config/server.properties
To listen to messages produced from remote servers, edit the server.properties
to addlisteners=PLAINTEXT://<my ip>:9092
. For the value of<my ip>
, use the commandifconfig
orip addr show
, and find the ip afterinet
.
(Optional) clear Kafka topic
+ .Run this command to clear existing old Kafka messages in the Kafka.
MYIP=127.0.0.1 KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 $KAFKA_ROOT/bin/kafka-topics.sh --bootstrap-server $MYIP:9092 --delete --topic cdc_topic
-
-
Set Up the TigerGraph CDC Service
After configuring the external Kafka cluster for CDC, set up the TigerGraph CDC service.
Configure the CDC producer and topic settings:System.CDC.ProducerConfig System.CDC.TopicConfig System.CDC.Enable MYIP=127.0.0.1 echo -e "bootstrap.servers=$MYIP:9092\nenable.idempotence=true" > /home/tigergraph/test_cdc/cdc_producer_config echo -e "name=cdc_topic" > /home/tigergraph/test_cdc/cdc_topic_config gadmin config set System.CDC.ProducerConfig @/home/tigergraph/test_cdc/cdc_producer_config gadmin config set System.CDC.TopicConfig @/home/tigergraph/test_cdc/cdc_topic_config gadmin config set System.CDC.Enable true gadmin config apply gadmin restart gpe restpp
-
Test the TigerGraph CDC Service
Once the TigerGraph CDC service is running, test it by making an update to an existing graph, such as:
-
Run an Update or Insert statement.
-
Run a loading job.
If an existing graph is not available, create a new graph by following TigerGraph’s GSQL V3 Tutorial data.
-
-
Check CDC Messages in the External Kafka Cluster.
To consume and view CDC messages from the external Kafka cluster for CDC, run:MYIP=127.0.0.1 KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 $KAFKA_ROOT/bin/kafka-console-consumer.sh --topic cdc_topic --from-beginning --bootstrap-server $MYIP:9092
Other Configuration Settings Table
Command |
Name |
Description |
Default (Unit: Value) |
|
|
When a GPE service shuts down, CDC will try to flush all generated cdc messages to external kafka. |
ms: -1. When set to -1, there is an infinite timeout, which may slow the GPE shutdown. |
|
In-memory buffer limit for delta message in CDC service. |
megabytes: 10. |
|
|
In-memory buffer limit for “vertex-deletion“ delta message in deleted id map service. |
megabytes: 100. |
|
|
In-memory cache limit for deleted id map. |
megabytes: 1024. |
|
|
Interval for purging outdated entries in deleted id map. |
minutes: 30. |