CDC Setup
When using the CDC feature, a “CDC service” running in GPE nodes will process data updates ("deltas") and write CDC messages to the external Kafka maintained by the user.
Important
-
External Kafka Cluster : Users must set up and manage their own Kafka cluster. The TigerGraph CDC service will send CDC messages to this external Kafka cluster.
-
For guidance on setting up the external Kafka service, refer to the Official Apache Kafka documentation.
Setup Configuration
TigerGraph employs librdkafka 1.1.0 for the Kafka producer in the CDC service. Refer to the Global configuration properties and the Topic configuration properties sections of the librdkafka documentation for all other properties not mentioned in this guide, noting the applicable ones marked with “P”(Producer) or with “*” for both Producer and Consumer.
CDC Configuration Parameters
System.CDC.Enable
Controls whether CDC is enabled or disabled.
gadmin config entry
Enable or disable CDC using:
gadmin config set System.CDC.Enable true // To enable CDC
gadmin config set System.CDC.Enable false //To disable CDC
[NOTE]:CDC messages are generated only after enabling the CDC service and restarting the system.
System.CDC.ProducerConfig
Specifies properties for the CDC producer.Properties are passed through a file, with each line adhering to the format <property name>=<property value>
separated by “new line”.
gadmin config entry System.CDC.ProducerConfig
[NOTE]: It is mandatory to include the property bootstrap.servers
, which specifies the IP and port of the external Kafka cluster for CDC:
mkdir -p /home/tigergraph/test_cdc # The ip:port is bind to target external kafka server echo -e "bootstrap.servers=$(gmyip):9092\nenable.idempotence=true" > /home/tigergraph/test_cdc/cdc_producer_config gadmin config set System.CDC.ProducerConfig @/home/tigergraph/test_cdc/cdc_producer_config
The full list of entries for configuration file of System.CDC.ProducerConfig
can be viewed in table "Global configuration properties" here: https://github.com/confluentinc/librdkafka/blob/v1.1.0/CONFIGURATION.md#global-configuration-properties.
[NOTE]:Only properties marked with P(Producer) or *(both Producer and Consumer) are applicable.
Kafka Security
To secure communication with the external Kafka cluster for CDC, configure authentication settings in System.CDC.ProducerConfig.
|
Example 1: Authenticating with SASL/PLAIN
security.protocol=SASL_PLAINTEXT sasl.mechanisms=PLAIN sasl.username=<username> sasl.password=<password>
Example 2: Authenticating with SASL/GSSAPI
security.protocol=SASL_PLAINTEXT sasl.mechanism=GSSAPI sasl.kerberos.service.name=kafka sasl.kerberos.principal=<user@EXAMPLE.COM> sasl.kerberos.keytab=</path/to/user.keytab>
Example 3: Authenticating with SASL/PLAIN and Encrypted with SSL
security.protocol=SASL_SSL sasl.mechanisms=PLAIN sasl.username=<username> sasl.password=<password> ssl.ca.location=<path/to/ca.pem> ssl.certificate.location=<path/to/cert.pem> ssl.key.location=<path/to/key.pem>
For more details on SASL with librdkafka, refer to the: https://github.com/confluentinc/librdkafka/wiki/Using-SASL-with-librdkafka
System.CDC.TopicConfig
gadmin config entry System.CDC.TopicConfig
Utilize a file to pass properties, separating them with a "new line."
The prescribed format for each line is <property name>=<property value>
.
It is imperative to employ the property name to designate the CDC topic, such as name=<CDC topic name>
, as in the following example:
echo -e "name=cdc_topic" > /home/tigergraph/test_cdc/cdc_topic_config gadmin config set System.CDC.TopicConfig @/home/tigergraph/test_cdc/cdc_topic_config
The full list of entries for configuration file of System.CDC.TopicConfig
can be viewed in table "Topic configuration properties" here: https://github.com/edenhill/librdkafka/blob/v1.1.0/CONFIGURATION.md#topic-configuration-properties.
[NOTE]:Only properties marked with P(Producer) or *(both Producer and Consumer) are applicable.
For other configuration settings please see the Other Configuration Settings Table.
Setup Tutorial
This tutorial will walk you through how to set up a TigerGraph CDC service.
-
Setting Up the External Kafka Cluster for CDC
If you already have a running external Kafka cluster for CDC, this step can be skipped.
Create the directory where Kafka will be downloaded:mkdir -p /home/tigergraph/test_cdc/download_kafka
Use this package:`https://archive.apache.org/dist/kafka/3.3.1/kafka_2.13-3.3.1.tgz`
Download Kafka:curl https://archive.apache.org/dist/kafka/3.3.1/kafka_2.13-3.3.1.tgz | tar -xzf - -C "/home/tigergraph/test_cdc/download_kafka"
Verify the download:ls -l /home/tigergraph/test_cdc/download_kafka
Starting the External Zookeeper and Kafka Cluster for CDC
Start the External Zookeeper Instance. Use the default configurationzookeeper.properties
, where it is using default port2181
:KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 $KAFKA_ROOT/bin/zookeeper-server-start.sh $KAFKA_ROOT/config/zookeeper.properties
Use the default configurationserver.properties
, where it is using default port9092
:KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 $KAFKA_ROOT/bin/kafka-server-start.sh $KAFKA_ROOT/config/server.properties
[NOTE]:To listen to messages produced from remote servers, edit the
server.properties
to addlisteners=PLAINTEXT://<my ip>:9092
.For the value of<my ip>
, use the commandifconfig
orip addr show
, and find the ip afterinet
.(Optional) clear Kafka topic
Run this command to clear existing old Kafka messages in the Kafka.KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 MYIP=127.0.0.1 $KAFKA_ROOT/bin/kafka-topics.sh --bootstrap-server $MYIP:9092 --delete --topic cdc_topic
-
Setting Up the TigerGraph CDC Service
After configuring the external Kafka cluster for CDC, set up the TigerGraph CDC service.
Configure the CDC producer and topic settings:System.CDC.ProducerConfig System.CDC.TopicConfig System.CDC.Enable MYIP=127.0.0.1 echo -e "bootstrap.servers=$MYIP:9092\nenable.idempotence=true" > /home/tigergraph/test_cdc/cdc_producer_config echo -e "name=cdc_topic" > /home/tigergraph/test_cdc/cdc_topic_config gadmin config set System.CDC.ProducerConfig @/home/tigergraph/test_cdc/cdc_producer_config gadmin config set System.CDC.TopicConfig @/home/tigergraph/test_cdc/cdc_topic_config gadmin config set System.CDC.Enable true gadmin config apply gadmin restart gpe restpp
-
Testing the TigerGraph CDC Service
Once the TigerGraph CDC service is running, test it by making updates to an existing graph with Data Modification Statements.
Statements like:
-
Running a custom or built-in query
-
Running a loading job.
If an existing graph is not available, create a new graph by following TigerGraph’s GSQL 101 tutorial documentation and using the provided Example Graphs data.
-
Checking CDC Messages in the External Kafka Cluster.
To consume and view CDC messages from the external Kafka cluster for CDC, run:KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 MYIP=127.0.0.1 $KAFKA_ROOT/bin/kafka-console-consumer.sh --topic cdc_topic --from-beginning --bootstrap-server $MYIP:9092
Other Configuration Settings Table
Command |
Name |
Description |
Default (Unit: Value) |
|
|
When a GPE service shuts down, CDC will try to flush all generated cdc messages to external kafka. |
ms: -1. When set to -1, there is an infinite timeout, which may slow the GPE shutdown. |
|
In-memory buffer limit for delta message in CDC service. |
megabytes: 10. |
|
|
In-memory buffer limit for “vertex-deletion“ delta message in deleted id map service. |
megabytes: 100. |
|
|
In-memory cache limit for deleted id map. |
megabytes: 1024. |
|
|
Interval for purging outdated entries in deleted id map. |
minutes: 30. |