Change Data Capture (CDC) Overview
The Change Data Capture (CDC) equips TigerGraph users with the capability to automatically capture and stream data changes to external Kafka systems.
Key Features
-
Captures and publishes change log data to external Kafka topics.
-
Pauses the publication process during external Kafka downtime and upon recovery the system resumes publishing from the last successfully published point.
-
Maintains sequence of changes to facilitate reproduction of data updates for debugging.
-
Structured in JSON format, promoting readability and compatibility with third-party tools.
CDC Setup
Learn about setup configurations and get started with the setup tutorial.
CDC Message Examples
Deep dive into the CDC messages format and showcased message examples.
CDC Monitoring and Reset
Here users can delve into state monitoring, including DIM state monitoring with the CDC service.
CDC Reaction to Other Features
When GPE is reset
When the GPE is reset (gadmin reset gpe
), all deltas that have not been rebuilt to the snapshot are lost.
The TigerGraph CDC is also reset at the same time.
When that happens, TigerGraph CDC will skip all historical data updates.
Some commands will call gadmin reset gpe
implicitly, so the CDC will reset simultaneously with these commands:
CDC HA
High Availability (HA) support for the CDC service was introduced in version 4.1.0. Before this version, the CDC service (manager) operated exclusively on replica 1 of each GPE partition.
As of version 4.1.0, the CDC service operates on the GPE leader within each GPE partition. In each GPE partition, as long as there is at least one live node, a leader will be elected, and the CDC service should function normally.
For instance, in a 2x2 cluster, if only GPE_1#1 and GPE_2#2 are online among all GPE servers, the CDC service should still operate without issues.
Please note that when the GPE leader switches, duplicate CDC messages may be produced and sent to external Kafka. This occurs because the new GPE leader is unaware of how many CDC messages the previous leader has produced. To deduplicate the produced CDC messages, utilize the "mid" field within the messages.
If you wish to disable CDC High Availability (HA) in a multi-replica cluster and keep the CDC service running only on replica 1, you can do so by accessing the GPE feature manager. Use the command gadmin config entry GPE.BasicConfig.Features
and remove the "CDC_HA"
entry from the list.
CDC Limitations
Limitation on CDC Message
When distinguishing between modification and insertion for vertex/edge attribute modification, the TigerGraph CDC message will have the "operator": "insert"
key value pair, same as vertex/edge insertion.
However, the "content"
will only contain the field for the modified attribute.
No CDC message for implicit edge deletion
When a vertex is deleted, any edge that uses the vertex as source or target will be implicitly deleted. However, TigerGraph CDC currently does not generate a CDC message for such “implicit edge deletion”.
No CDC message for implicit source vertex insertion
For insertion/modification on undirected edge, or directed edge with reverse edge type, the TigerGraph database will implicitly insert source and target vertex if it does not exist (This behavior can be configured via VERTEX_MUST_EXIST
in a loading job and POST
data api).
In this scenario, TigerGraph CDC will generate a CDC message with "operator": "insert-only"
for target vertex (See the section “Extra CDC message for Edge Update“ ), however there is no CDC message for source vertex.