CDC Setup

When using the CDC feature, a “CDC service” running in GPE nodes will process data updates ("deltas") and write CDC messages to the external Kafka maintained by the user.

Users are required to establish and manage an external Kafka service independently from TigerGraph. The TigerGraph CDC service will then generate CDC messages directed to the external Kafka service. For guidance on setting up the external Kafka service, refer to the Official Apache Kafka documentation.

Setup Configuration

TigerGraph employs librdkafka 1.1.0 for the Kafka producer in the CDC service. Refer to the Global configuration properties and the Topic configuration properties sections of the librdkafka documentation for all other properties not mentioned in this guide, noting the applicable ones marked with “P”(Producer) or with “*” for both Producer and Consumer.

To configure CDC Producer and Topic settings in TigerGraph, utilize gadmin commands below.

If you use gadmin config to change any of these parameters, you need to run gadmin config apply -y for it to take effect. You can change multiple parameters and then run gadmin config apply for all of them together

After modifying, run the following command to apply the changes:
gadmin config apply
gadmin restart gpe restpp

CDC Configuration Parameters

System.CDC.Enable

This is the CDC enable config entry:
gadmin config entry

System.CDC.Enable

Set it to true or false, to enable CDC or not as in these examples:
gadmin config set System.CDC.Enable true
gadmin config set System.CDC.Enable false

CDC messages will only generate after the CDC is enabled and services have been applied and restarted.

System.CDC.ProducerConfig

This is the CDC producer config entry:
gadmin config entry System.CDC.ProducerConfig

This configuration entry is designated for the CDC producer.

Properties are passed through a file, with each line adhering to the format <property name>=<property value> separated by “new line”.

It is mandatory to include the property bootstrap.servers, specifying the IP and port for the broker(s) that the CDC producer connects to as in the example below:

mkdir -p /home/tigergraph/test_cdc

# The ip:port is bind to target external kafka server
echo -e "bootstrap.servers=$(gmyip):9092\nenable.idempotence=true" > /home/tigergraph/test_cdc/cdc_producer_config

gadmin config set System.CDC.ProducerConfig @/home/tigergraph/test_cdc/cdc_producer_config

System.CDC.TopicConfig

This is the CDC topic config entry:
gadmin config entry System.CDC.TopicConfig

Utilize a file to pass properties, separating them with a "new line." The prescribed format for each line is <property name>=<property value>. It is imperative to employ the property name to designate the CDC topic, such as name=<CDC topic name>, as in the following example:

echo -e "name=cdc_topic" > /home/tigergraph/test_cdc/cdc_topic_config

gadmin config set System.CDC.TopicConfig @/home/tigergraph/test_cdc/cdc_topic_config

For other configuration settings please see the Other Configuration Settings Table.

Setup Tutorial

This tutorial will walk you through how to set up a TigerGraph CDC service.

  1. Setup external Kafka service for CDC messages

    First make the folder it will be downloaded to:
    mkdir -p /home/tigergraph/test_cdc/download_kafka
    Use this package:
    `https://archive.apache.org/dist/kafka/3.3.1/kafka_2.13-3.3.1.tgz`
    And run this command to download Kafka to the folder that was just created:
    curl https://archive.apache.org/dist/kafka/3.3.1/kafka_2.13-3.3.1.tgz | tar -xzf - -C "/home/tigergraph/test_cdc/download_kafka"
    Check if it’s successfully downloaded and extracted with this command:
    ls -l /home/tigergraph/test_cdc/download_kafka

    Next, start a Zookeeper server.

    Open a new terminal to start the Zookeeper service. Use the default configuration Zookeeper.properties, where it is using default port 2181:
    KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1
    
    $KAFKA_ROOT/bin/zookeeper-server-start.sh $KAFKA_ROOT/config/zookeeper.properties

    Now, start a Kafka server.

    Use the configuration file server.properties, where it is using default port 9092.

    If you have a cluster environment or if the external Kafka server is not local to the GPE servers, you need to add the following line to server.properties file, to enable listening to messages from remote servers:

    listeners=PLAINTEXT://<my_ip>:9092

    To determine the value for <my ip>, use the command ifconfig or ip addr show, and find the ip after inet.

    Command to start a Kafka server:

    KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1
    
    $KAFKA_ROOT/bin/kafka-server-start.sh $KAFKA_ROOT/config/server.properties

    (Optional) clear Kafka topic

    Run this command to clear existing old Kafka messages in the Kafka.
    MYIP=127.0.0.1
    
    KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1
    
    $KAFKA_ROOT/bin/kafka-topics.sh --bootstrap-server $MYIP:9092 --delete --topic cdc_topic
  2. Setup TigerGraph CDC service

    Now, start the CDC service in TigerGraph.

    Use the setup configuration commands as followed.
    System.CDC.ProducerConfig
    System.CDC.TopicConfig
    System.CDC.Enable
    
    MYIP=127.0.0.1
    echo -e "bootstrap.servers=$MYIP:9092\nenable.idempotence=true" > /home/tigergraph/test_cdc/cdc_producer_config
    
    echo -e "name=cdc_topic" > /home/tigergraph/test_cdc/cdc_topic_config
    
    gadmin config set System.CDC.ProducerConfig @/home/tigergraph/test_cdc/cdc_producer_config
    gadmin config set System.CDC.TopicConfig @/home/tigergraph/test_cdc/cdc_topic_config
    gadmin config set System.CDC.Enable true
    gadmin config apply
    gadmin restart gpe restpp
  3. Test TigerGraph CDC service

    Once the service is up and running, test it, by making an update to an existing graph, such as

  4. Lastly, check CDC messages.

    To consume and display CDC messages, run:
    MYIP=127.0.0.1
    
    KAFKA_ROOT=/home/tigergraph/test_cdc/download_kafka/kafka_2.13-3.3.1
    
    $KAFKA_ROOT/bin/kafka-console-consumer.sh --topic cdc_topic --from-beginning --bootstrap-server $MYIP:9092

Other Configuration Settings Table

Command

Name

Description

Default (Unit: Value)

gadmin config entry GPE.BasicConfig.Env

CDCKafkaFlushTimeoutMs

When a GPE service shuts down, CDC will try to flush all generated cdc messages to external kafka.

ms: -1.

When set to -1, there is an infinite timeout, which may slow the GPE shutdown.

CDCDeltaBufferCapInMB

In-memory buffer limit for delta message in CDC service.

megabytes: 10.

DIMDeltaBufferCapInMB

In-memory buffer limit for “vertex-deletion“ delta message in deleted id map service.

megabytes: 100.

DIMCacheLimitInMB

In-memory cache limit for deleted id map.

megabytes: 1024.

DIMPurgeIntervalInMin

Interval for purging outdated entries in deleted id map.

minutes: 30.