CDC Setup

When using the CDC feature, a “CDC service” running in GPE nodes will process data updates ("deltas") and write CDC messages to the external Kafka maintained by the user.

Important

  • External Kafka Cluster : Users must set up and manage their own Kafka cluster. The TigerGraph CDC service will send CDC messages to this external Kafka cluster.

  • For guidance on setting up the external Kafka service, refer to the Official Apache Kafka documentation.

Setup Configuration

TigerGraph employs librdkafka 1.1.0 for the Kafka producer in the CDC service. Refer to the Global configuration properties and the Topic configuration properties sections of the librdkafka documentation for all other properties not mentioned in this guide, noting the applicable ones marked with “P”(Producer) or with “*” for both Producer and Consumer.

Configuring CDC Producer and Topic Settings

Use the following gadmin commands to configure the CDC producer and topic settings in TigerGraph.

Applying Configuration Changes

  • If you modify any configuration using gadmin config, ensure to run the following command to apply the changes:

gadmin config apply
gadmin restart gpe restpp

CDC Configuration Parameters

System.CDC.Enable

Controls whether CDC is enabled or disabled.

This is the CDC enable config entry:
gadmin config entry

Enable or disable CDC using:

gadmin config set System.CDC.Enable true // To enable CDC
gadmin config set System.CDC.Enable false //To disable CDC

[NOTE]:CDC messages are generated only after enabling the CDC service and restarting the system.

System.CDC.ProducerConfig

Specifies properties for the CDC producer.Properties are passed through a file, with each line adhering to the format <property name>=<property value> separated by “new line”.

This is the CDC producer config entry:
gadmin config entry System.CDC.ProducerConfig

[NOTE]: It is mandatory to include the property bootstrap.servers , which specifies the IP and port of the external Kafka cluster for CDC:

mkdir -p /home/tigergraph/test_cdc

# The ip:port is bind to target external kafka server
echo -e "bootstrap.servers=$(gmyip):9092\nenable.idempotence=true" > /home/tigergraph/test_cdc/cdc_producer_config

gadmin config set System.CDC.ProducerConfig @/home/tigergraph/test_cdc/cdc_producer_config

The full list of entries for configuration file of System.CDC.ProducerConfig can be viewed in table "Global configuration properties" here: https://github.com/confluentinc/librdkafka/blob/v1.1.0/CONFIGURATION.md#global-configuration-properties.

[NOTE]:Only properties marked with P(Producer) or *(both Producer and Consumer) are applicable.

Kafka Security

To secure communication with the external Kafka cluster for CDC, configure authentication settings in System.CDC.ProducerConfig.

  1. When using local path in any entry, for example: sasl.kerberos.keytab or ssl.ca.location, the local file must exist and be consistent on all nodes in TigerGraph cluster.

  2. The entry sasl.jaas.config is not applicable, because it is only for JAVA-based kafka client, while librdkafka in TigerGraph engine is C++ library.

Example 1: Authenticating with SASL/PLAIN

security.protocol=SASL_PLAINTEXT
sasl.mechanisms=PLAIN
sasl.username=<username>
sasl.password=<password>

Example 2: Authenticating with SASL/GSSAPI

security.protocol=SASL_PLAINTEXT
sasl.mechanism=GSSAPI
sasl.kerberos.service.name=kafka
sasl.kerberos.principal=<user@EXAMPLE.COM>
sasl.kerberos.keytab=</path/to/user.keytab>

Example 3: Authenticating with SASL/PLAIN and Encrypted with SSL

security.protocol=SASL_SSL
sasl.mechanisms=PLAIN
sasl.username=<username>
sasl.password=<password>
ssl.ca.location=<path/to/ca.pem>
ssl.certificate.location=<path/to/cert.pem>
ssl.key.location=<path/to/key.pem>

For more details on SASL with librdkafka, refer to the: https://github.com/confluentinc/librdkafka/wiki/Using-SASL-with-librdkafka

System.CDC.TopicConfig

This is the CDC topic config entry:
gadmin config entry System.CDC.TopicConfig

Utilize a file to pass properties, separating them with a "new line." The prescribed format for each line is <property name>=<property value>. It is imperative to employ the property name to designate the CDC topic, such as name=<CDC topic name>, as in the following example:

echo -e "name=cdc_topic" > /home/tigergraph/test_cdc/cdc_topic_config

gadmin config set System.CDC.TopicConfig @/home/tigergraph/test_cdc/cdc_topic_config

The full list of entries for configuration file of System.CDC.TopicConfig can be viewed in table "Topic configuration properties" here: https://github.com/edenhill/librdkafka/blob/v1.1.0/CONFIGURATION.md#topic-configuration-properties.

[NOTE]:Only properties marked with P(Producer) or *(both Producer and Consumer) are applicable.

For other configuration settings please see the Other Configuration Settings Table.

Setup Tutorial

This tutorial will walk you through how to set up a TigerGraph CDC service.

  1. Setting Up the External Kafka Cluster for CDC

    If you already have a running external Kafka cluster for CDC, this step can be skipped.

    Create the directory where Kafka will be downloaded:
    mkdir -p /home/tigergraph/test_cdc/download_kafka
    Use this package:
    `https://archive.apache.org/dist/kafka/3.3.1/kafka_2.13-3.3.1.tgz`
    Download Kafka:
    curl https://archive.apache.org/dist/kafka/3.3.1/kafka_2.13-3.3.1.tgz | tar -xzf - -C "/home/tigergraph/test_cdc/download_kafka"
    Verify the download:
    ls -l /home/tigergraph/test_cdc/download_kafka

    Starting the External Zookeeper and Kafka Cluster for CDC

    Start the External Zookeeper Instance. Use the default configuration zookeeper.properties, where it is using default port 2181:
    KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1
    
    $KAFKA_ROOT/bin/zookeeper-server-start.sh $KAFKA_ROOT/config/zookeeper.properties
    Use the default configuration server.properties, where it is using default port 9092:
    KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1
    
    $KAFKA_ROOT/bin/kafka-server-start.sh $KAFKA_ROOT/config/server.properties

    [NOTE]:To listen to messages produced from remote servers, edit the server.properties to add listeners=PLAINTEXT://<my ip>:9092.For the value of <my ip>, use the command ifconfig or ip addr show, and find the ip after inet.

    (Optional) clear Kafka topic

    Run this command to clear existing old Kafka messages in the Kafka.
    KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 MYIP=127.0.0.1
    
    $KAFKA_ROOT/bin/kafka-topics.sh --bootstrap-server $MYIP:9092 --delete --topic cdc_topic
  2. Setting Up the TigerGraph CDC Service

    After configuring the external Kafka cluster for CDC, set up the TigerGraph CDC service.

    Configure the CDC producer and topic settings:
    System.CDC.ProducerConfig
    System.CDC.TopicConfig
    System.CDC.Enable
    
    MYIP=127.0.0.1
    echo -e "bootstrap.servers=$MYIP:9092\nenable.idempotence=true" > /home/tigergraph/test_cdc/cdc_producer_config
    
    echo -e "name=cdc_topic" > /home/tigergraph/test_cdc/cdc_topic_config
    
    gadmin config set System.CDC.ProducerConfig @/home/tigergraph/test_cdc/cdc_producer_config
    gadmin config set System.CDC.TopicConfig @/home/tigergraph/test_cdc/cdc_topic_config
    gadmin config set System.CDC.Enable true
    gadmin config apply
    gadmin restart gpe restpp
  3. Testing the TigerGraph CDC Service

    Once the TigerGraph CDC service is running, test it by making updates to an existing graph with Data Modification Statements.

    Statements like:

  4. Checking CDC Messages in the External Kafka Cluster.

    To consume and view CDC messages from the external Kafka cluster for CDC, run:
    KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1 MYIP=127.0.0.1
    
    $KAFKA_ROOT/bin/kafka-console-consumer.sh --topic cdc_topic --from-beginning --bootstrap-server $MYIP:9092

Other Configuration Settings Table

Command

Name

Description

Default (Unit: Value)

gadmin config entry GPE.BasicConfig.Env

CDCKafkaFlushTimeoutMs

When a GPE service shuts down, CDC will try to flush all generated cdc messages to external kafka.

ms: -1.

When set to -1, there is an infinite timeout, which may slow the GPE shutdown.

CDCDeltaBufferCapInMB

In-memory buffer limit for delta message in CDC service.

megabytes: 10.

DIMDeltaBufferCapInMB

In-memory buffer limit for “vertex-deletion“ delta message in deleted id map service.

megabytes: 100.

DIMCacheLimitInMB

In-memory cache limit for deleted id map.

megabytes: 1024.

DIMPurgeIntervalInMin

Interval for purging outdated entries in deleted id map.

minutes: 30.