CDC Setup

When using the CDC feature, a “CDC service” running in GPE nodes will process data updates ("deltas") and write CDC messages to the external Kafka maintained by the user.

  • External Kafka Cluster : Users must set up and manage their own Kafka cluster. The TigerGraph CDC service will send CDC messages to this external Kafka cluster.

  • For guidance on setting up the external Kafka service, refer to the Official Apache Kafka documentation.

Setup Configuration

TigerGraph 4.2+ employs librdkafka 2.5.3 for the Kafka producer in the CDC service. Refer to the Global configuration properties and the Topic configuration properties sections of the librdkafka documentation for all other properties not mentioned in this guide, noting the applicable ones marked with “P”(Producer) or with “*” for both Producer and Consumer.

Configuring CDC Producer and Topic Settings

Use the following gadmin commands to configure the CDC producer and topic settings in TigerGraph.

Applying Configuration Changes

  • If you modify any configuration using gadmin config, ensure to run the following commands to apply the changes:

gadmin config apply
gadmin restart gpe restpp

CDC Configuration Parameters

System.CDC.Enable

Controls whether CDC is enabled or disabled.

Enable or disable CDC using:

gadmin config set System.CDC.Enable true // To enable CDC
gadmin config set System.CDC.Enable false //To disable CDC
CDC messages are generated only after enabling the CDC service and restarting the system.

System.CDC.ProducerConfig

Specifies properties for the CDC producer.

To update properties non-interactively, create a file (e.g., cdc_producer_config) where each line has the format <property name>=<property value> separated by “new line”. Then use

gadmin config set System.CDC.ProducerConfig @<filename>

to read in the settings.

It is mandatory to include the property bootstrap.servers , which specifies the IP and port of the external Kafka cluster for CDC.

Example:

mkdir -p /home/tigergraph/test_cdc

# The ip:port is bind to target external kafka server
echo -e "bootstrap.servers=$(gmyip):9092\nenable.idempotence=true" > /home/tigergraph/test_cdc/cdc_producer_config

gadmin config set System.CDC.ProducerConfig @/home/tigergraph/test_cdc/cdc_producer_config

If you prefer to enter the property values interactively, use

gadmin config entry System.CDC.ProducerConfig

This will walk you through the full set of properties for this component, with a description and the current value for each item.

The full list of entries for configuration file of System.CDC.ProducerConfig can be viewed in the table "Global configuration properties" at https://github.com/confluentinc/librdkafka/blob/v2.5.3/CONFIGURATION.md#global-configuration-properties.

Only properties marked with P(Producer) or *(both Producer and Consumer) are applicable.

Kafka Security

To secure communication with the external Kafka cluster for CDC, configure authentication settings in System.CDC.ProducerConfig.

  1. When using a local path in any entry (e.g., sasl.kerberos.keytab or ssl.ca.location), the local file must exist and be consistent across all nodes in the TigerGraph cluster.

  2. The entry sasl.jaas.config is not applicable, because it is specific to JAVA-based kafka clients, while librdkafka in TigerGraph engine is a C++ library.

Example 1: Authenticating with SASL/PLAIN

security.protocol=SASL_PLAINTEXT
sasl.mechanisms=PLAIN
sasl.username=<username>
sasl.password=<password>

Example 2: Authenticating with SASL/GSSAPI

security.protocol=SASL_PLAINTEXT
sasl.mechanism=GSSAPI
sasl.kerberos.service.name=kafka
sasl.kerberos.principal=<user@EXAMPLE.COM>
sasl.kerberos.keytab=</path/to/user.keytab>

Example 3: Authenticating with SASL/PLAIN and Encrypted with SSL

security.protocol=SASL_SSL
sasl.mechanisms=PLAIN
sasl.username=<username>
sasl.password=<password>
ssl.ca.location=<path/to/ca.pem>
ssl.certificate.location=<path/to/cert.pem>
ssl.key.location=<path/to/key.pem>

For more details on SASL with librdkafka, refer to the: https://github.com/confluentinc/librdkafka/wiki/Using-SASL-with-librdkafka

System.CDC.TopicConfig

To update properties non-interactively, create a file (e.g., cdc_producer_config) where each line has the format <property name>=<property value> separated by “new line”. Then use

gadmin config set System.CDC.TopicConfig @<filename>

to read in the settings.

It is mandatory to include the property name to designate the CDC topic, as in the following example:
echo -e "name=cdc_topic" > /home/tigergraph/test_cdc/cdc_topic_config

gadmin config set System.CDC.TopicConfig @/home/tigergraph/test_cdc/cdc_topic_config

The full list of entries for configuration file of System.CDC.TopicConfig can be viewed in the table "Topic configuration properties" at https://github.com/edenhill/librdkafka/blob/v2.5.3/CONFIGURATION.md#topic-configuration-properties.

Only properties marked with P(Producer) or *(both Producer and Consumer) are applicable.

For other configuration settings please see the Other Configuration Settings Table.

Setup Tutorial

This tutorial will walk you through how to set up a TigerGraph CDC service.

  1. Set Up the External Kafka Cluster for CDC

    If you already have a running external Kafka cluster for CDC, this step can be skipped.

    Create the directory where Kafka will be downloaded:
    mkdir -p /home/tigergraph/test_cdc/download_kafka
    Use this package:
    `https://archive.apache.org/dist/kafka/3.3.1/kafka_2.13-3.3.1.tgz`
    Download Kafka:
    curl https://archive.apache.org/dist/kafka/3.3.1/kafka_2.13-3.3.1.tgz | tar -xzf - -C "/home/tigergraph/test_cdc/download_kafka"
    Verify the download:
    ls -l /home/tigergraph/test_cdc/download_kafka
  2. Start the External Zookeeper and Kafka Cluster for CDC

    1. Start the External Zookeeper Instance.

      Use the default configuration zookeeper.properties, where it is using default port 2181:

      KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1
      
      $KAFKA_ROOT/bin/zookeeper-server-start.sh $KAFKA_ROOT/config/zookeeper.properties
    2. Start the External Kafka Cluster for CDC.

      Use the configuration file server.properties, where it is using default port 9092.

      If you have a cluster environment or if the external Kafka server is not local to the GPE servers, you need to add the following line to server.properties file, to enable listening to messages from remote servers:

      listeners=PLAINTEXT://<my_ip>:9092

      To determine the value for <my ip>, use the command ifconfig or ip addr show, and find the ip after inet.

      Command to start a Kafka server:
      KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1
      
      $KAFKA_ROOT/bin/kafka-server-start.sh $KAFKA_ROOT/config/server.properties
      To listen to messages produced from remote servers, edit the server.properties to add listeners=PLAINTEXT://<my ip>:9092. For the value of <my ip>, use the command ifconfig or ip addr show, and find the ip after inet.

    (Optional) clear Kafka topic

    + .Run this command to clear existing old Kafka messages in the Kafka.

    MYIP=127.0.0.1
    
    KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1
    $KAFKA_ROOT/bin/kafka-topics.sh --bootstrap-server $MYIP:9092 --delete --topic cdc_topic
  3. Set Up the TigerGraph CDC Service

    After configuring the external Kafka cluster for CDC, set up the TigerGraph CDC service.

    Configure the CDC producer and topic settings:
    System.CDC.ProducerConfig
    System.CDC.TopicConfig
    System.CDC.Enable
    
    MYIP=127.0.0.1
    echo -e "bootstrap.servers=$MYIP:9092\nenable.idempotence=true" > /home/tigergraph/test_cdc/cdc_producer_config
    
    echo -e "name=cdc_topic" > /home/tigergraph/test_cdc/cdc_topic_config
    
    gadmin config set System.CDC.ProducerConfig @/home/tigergraph/test_cdc/cdc_producer_config
    gadmin config set System.CDC.TopicConfig @/home/tigergraph/test_cdc/cdc_topic_config
    gadmin config set System.CDC.Enable true
    gadmin config apply
    gadmin restart gpe restpp
  4. Test the TigerGraph CDC Service

    Once the TigerGraph CDC service is running, test it by making an update to an existing graph, such as:

  5. Check CDC Messages in the External Kafka Cluster.

    To consume and view CDC messages from the external Kafka cluster for CDC, run:
    MYIP=127.0.0.1
    
    KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1
    $KAFKA_ROOT/bin/kafka-console-consumer.sh --topic cdc_topic --from-beginning --bootstrap-server $MYIP:9092

Other Configuration Settings Table

Command

Name

Description

Default (Unit: Value)

gadmin config entry GPE.BasicConfig.Env

CDCKafkaFlushTimeoutMs

When a GPE service shuts down, CDC will try to flush all generated cdc messages to external kafka.

ms: -1.

When set to -1, there is an infinite timeout, which may slow the GPE shutdown.

CDCDeltaBufferCapInMB

In-memory buffer limit for delta message in CDC service.

megabytes: 10.

DIMDeltaBufferCapInMB

In-memory buffer limit for “vertex-deletion“ delta message in deleted id map service.

megabytes: 100.

DIMCacheLimitInMB

In-memory cache limit for deleted id map.

megabytes: 1024.

DIMPurgeIntervalInMin

Interval for purging outdated entries in deleted id map.

minutes: 30.