Externalizing Kafka Configurations

Users can utilize external sources, including files, vault, and environment variables, to provide configurations for Kafka connectors when Loading data from Kafka.

Sensitive information such as credentials and security setups are kept secure. For more information on Kafka security see Kafka Connect Security Basics.

Disallowed Configurations

Disallowed

Some configs are not allowed to be externalized because there is a dependency in TigerGraph:

  • task.max

  • topic

  • connector.class

  • mode

  • name (connector name)

  • source.cluster.alias

  • topics (in MirrorMaker connector)

  • num.partitions

Config Providers

Users can specify config provider type and class from the Kafka ConfigProvider API along with the parameters config providers need.

The format ${<config-provider-type>:<path>:<key>} is used to replace a config value in the config file. The config provider will automatically look up the referred source and find the value corresponding to the <path> and <key>.

Config Provider Config Provider Type config.providers Config Provider class config.providers.<type>.class Config Provider Reference ${<config-provider-type>:<path>:<key>}

File Config Provider

file

org.apache.kafka.common.config.provider.FileConfigProvider

${file:<path-to-file>:<key>}

Environment Variable Config Provider

env

org.apache.kafka.common.config.provider.EnvVarConfigProvider

${env:<key>} (no path needed)

Vault Config Provider

vault

org.apache.kafka.common.config.provider.EnvVarConfigProvider

${vault:<vault-path>:<key>}

File Config Provider

Allows users to put secrets and other config items in a separate file.

The file must exist in the same path on all nodes of the cluster.

Example usage

Example secret file secret.properties:
S3_ACCESS_KEY=thisisakey
S3_SECRET_KEY=anotherkey
Example stream loading job config file:
{
"type": "s3",
"access.key":"${file:/tmp/secret.properties:S3_ACCESS_KEY}",
"secret.key":"${file:/tmp/secret.properties:S3_SECRET_KEY}",
config.providers=file,
config.providers.file.class=org.apache.kafka.common.config.provider.FileConfigProvider
}

Environment Variable Config Provider

Allows users to use an environment variable as values in configuration.

The environment variable should be added to KafkaConnect using gadmin config set KafkaConnect.BasicConfig.Env.

Example Usage

Set environment variable (bash)
# Format details:
# The secrets should be ';' separated and in format "key=value".
# The values will be read as is and no escaping (\n for example) or quotes (' or ") are needed.
# Key names should follow bash variable name standards, ie, no special characters other than '_'.

secrets="S3_ACCESS_KEY=password; S3_SECRET_KEY=password;"
gadmin config set KafkaConnect.BasicConfig.Env "$(gadmin config get KafkaConnect.BasicConfig.Env); $secrets"
gadmin config apply -y
gadmin restart kafkaconn -y
Stream loading job config
{
"type": "s3",
"access.key":"${env:S3_ACCESS_KEY}",
"secret.key":"${env:S3_SECRET_KEY}",
"config.providers":"env",
"config.providers.env.class":"org.apache.kafka.common.config.provider.EnvVarConfigProvider"
}

Vault Config Provider

Allows users to use vault as a source for config values. Please follow vault setup guide on their the Official Documentations.

There are additional config parameters for vault config provider.

Table 1. Additional Vault Configurations:
Configuration Key Default Value Description

vault.max.retries

5

The number of times that API operations will be retried when a failure occurs.

vault.retry.interval.ms

2000

The number of milliseconds that the driver will wait in between retries.

vault.address

Sets the address (URL) of the Vault server instance to which API calls should be sent. If not explicitly set, it will look to the VAULT_ADDR environment variable. If not found, initialization may fail.

vault.prefix

Sets a prefix that will be added to all paths, allowing the same configuration settings to be used across multiple environments.

vault.namespace

Sets a global namespace to the Vault server instance, if desired.

vault.token

Sets the token used to access Vault. If not explicitly set, the VAULT_TOKEN environment variable will be used.

vault.login.by

Token

The login method to use. Currently, only supports Token.

vault.ssl.verify.enabled

true

Flag to determine if the configuration provider should verify the SSL Certificate of the Vault server. Outside of development, this should never be disabled.

vault.ssl.cert

The path of a classpath resource containing an X.509 certificate, in unencrypted PEM format with UTF-8 encoding. Must be available on all nodes in the cluster.

Example Usage

Example vault server config with HTTPS enabled:
storage "file" {
    path    = "<vault_data_root>"
}

listener "tcp" {
    address     = "<ip>:<vault_port>"
    tls_disable = "false"
    tls_cert_file = "<cert>"
    tls_key_file = "<key>"
}

api_addr = "<VAULT_ADDR>"
cluster_addr = "https://<ip>:<vault_cluster_port>"
ui = true
disable_mlock = true
Example stream loading job config file
{
    "type" : "abs",
    "account.key" : "${vault:tg/vault-test:accountKey}",
    "config.providers":"vault",
    "config.providers.vault.class":"com.tigergraph.kafka.connect.config.providers.VaultConfigProvider",
    "config.providers.vault.param.vault.token":"<root_token>",
    "config.providers.vault.param.vault.address":"http://<ip>:8200"
}