how to pass MongoDB tls certificates while creating debezium mongodb kafka connector? - mongodb

We have MongoDB cluster with three replicas. I have enabled preferred TLS and authentication type as MongoDB-X509.
We have three broker strimzi kafka cluster and connect cluster with all required plugins (i.e. mongoDB provided by debezium) up and running.
Below provided partial connect.yaml file used for kafka connect deployment:-
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
name: my-connect
spec:
config:
config.providers: directory
config.providers.directory.class: org.apache.kafka.common.config.provider.DirectoryConfigProvider
externalConfiguration:
volumes:
- name: connector-config
secret:
secretName: mysecret
deployment works fine and able to see ca.pem and mongo-server.pem file in /opt/kafka/external-configuration/connector-config directory.
After then I am trying to create mongoDB connector with configuration files as give below, but not sure on exact way of passing certificates. As there is no sample configuration file available for mongoDb connectors. Could you please help on this by providing some sample configuration.
I tried below configuration file:-
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
name: my-source-connector
labels:
strimzi.io/cluster: my-connect-cluster
spec:
class: io.debezium.connector.mongodb.MongoDbConnector
tasksMax: 2
config:
ssl.truststore.type: PEM
ssl.truststore.location: "${directory:/opt/kafka/external-configuration/connector-config:ca.pem}"
ssl.keystore.type: PEM
ssl.keystore.location: "${directory:/opt/kafka/external-configuration/connector-config:mongo-server.pem}"
"mongodb.hosts": "rs0/192.168.99.100:27017"
"mongodb.name": "fullfillment"
"collection.include.list": "inventory[.]*"
"mongodb.ssl.enabled": true
"mongodb.ssl.invalid.hostname.allowed": true
but it was throwing syntax error. Please help on this by providing sample mongoDB connector.yaml?

As for Strimzi, you can use the external configuration to mount Secrets or Config Maps into the Strimzi Kafka Connect deployment: https://strimzi.io/docs/operators/latest/full/using.html#type-ExternalConfiguration-reference. Once it is loaded into the Pods, you can either just refer to it using the file path, or you can use the Kafka configuration providers to load the data into the configuration options.

Related

no matches for kind "KafkaConnect" in version "kafka.strimzi.io/v1beta2"

I'm trying work with Strimzi to create kafka-connect cluster and encountering the following error
unable to recognize "kafka-connect.yaml": no matches for kind "KafkaConnect" in
version "kafka.strimzi.io/v1beta2"
Here's the kafka-connect.yaml I have:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
name: kafka-connect
namespace: connect
annotations:
strimzi.io/use-connector-resources: "true"
spec:
version: 2.4.0
replicas: 1
bootstrapServers: host:port
tls:
trustedCertificates:
- secretName: connectorsecret
certificate: cert
config:
group.id: o
offset.storage.topic: strimzi-connect-cluster-offsets
config.storage.topic: strimzi-connect-cluster-configs
status.storage.topic: strimzi-connect-cluster-status
sasl.mechanism: scram-sha-256
security.protocol: SASL_SSL
secretName: connectorsecret
sasl.jaas.config: org.apache.kafka.common.security.scram.ScramLoginModule required username=username password=password
Then I tried to apply the config via kubectl apply -f kafka-connect.yaml
Is there anything necessities to create resources using Strimzi or something I'm doing wrong?
I think there are two possibilities:
You did not installed the CRD resources
You are using Strimzi version which is too old and does not support the v1beta2 API
Judging that you are trying to use Kafka 2.4.0, I guess the second option is more likely. If you ŕeally want to do that, you should make sure to use the documentation, examples and everything from the version of Strimzi you use- they should be useing one of the older APIs (v1alpha1 or v1beta1).
But in general, I would strongly recommend you to use the latest version of Strimzi and not a version which is several years old.
One more note: If you want to configure the SASL authentication for your Kafka Connect cluster, you should do it in the .spec.authentication section of the custom resource and not in .spec.config.

Flink kubernetes deployment - how to provide S3 credentials from Hashicorp Vault?

I'm trying to deploy a Flink stream processor to a Kubernetes cluster with the help of the official Flink kubernetes operator.
The Flink app also uses Minio as its state backend. Everything worked fine until I tried to provide the credentials from Hashicorp Vault in the following way:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: flink-app
namespace: default
spec:
serviceAccount: sa-example
podTemplate:
apiVersion: v1
kind: Pod
metadata:
name: pod-template
spec:
serviceAccountName: default:sa-example
containers:
- name: flink-main-container
# ....
flinkVersion: v1_14
flinkConfiguration:
presto.s3.endpoint: https://s3-example-api.dev.net
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: s3p://example-flink/example-1/high-availability/
high-availability.cluster-id: example-1
high-availability.namespace: example
high-availability.service-account: default:sa-example
# presto.s3.access-key: *
# presto.s3.secret-key: *
presto.s3.path-style-access: "true"
web.upload.dir: /opt/flink
jobManager:
podTemplate:
apiVersion: v1
kind: Pod
metadata:
name: job-manager-pod-template
annotations:
vault.hashicorp.com/namespace: "/example/dev"
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/agent-init-first: "true"
vault.hashicorp.com/agent-inject-secret-appsecrets.yaml: "example/Minio"
vault.hashicorp.com/role: "example-serviceaccount"
vault.hashicorp.com/auth-path: auth/example
vault.hashicorp.com/agent-inject-template-appsecrets.yaml: |
{{- with secret "example/Minio" -}}
presto.s3.access-key: {{.Data.data.accessKey}}
presto.s3.secret-key: {{.Data.data.secretKey}}
{{- end }}
When I comment the presto.s3.access-key and presto.s3.secret-key config values in the flinkConfiguration, replace them with the above listed Hashicorp Vault annotations and try to provide them programmatically during runtime:
val configuration: Configuration = getSecretsFromFile("/vault/secrets/appsecrets.yaml")
val env = org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getExecutionEnvironment(configuration)
I receive the following error message:
java.io.IOException: com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [EnvironmentVariableCredentialsProvider: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)), SystemPropertiesCredentialsProvider: Unable to load AWS credentials from Java system properties (aws.accessKeyId and aws.secretKey), WebIdentityTokenCredentialsProvider: You must specify a value for roleArn and roleSessionName, com.amazonaws.auth.profile.ProfileCredentialsProvider#5331f738: profile file cannot be null, com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper#bc0353f: Failed to connect to service endpoint: ]
at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3OutputStream.uploadObject(PrestoS3FileSystem.java:1278) ~[flink-s3-fs-presto-1.14.2.jar:1.14.2]
at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3OutputStream.close(PrestoS3FileSystem.java:1226) ~[flink-s3-fs-presto-1.14.2.jar:1.14.2]
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) ~[flink-s3-fs-presto-1.14.2.jar:1.14.2]
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101) ~[flink-s3-fs-presto-1.14.2.jar:1.14.2]
at org.apache.flink.fs.s3presto.common.HadoopDataOutputStream.close(HadoopDataOutputStream.java:52) ~[flink-s3-fs-presto-1.14.2.jar:1.14.2]
at org.apache.flink.runtime.blob.FileSystemBlobStore.put(FileSystemBlobStore.java:80) ~[flink-dist_2.12-1.14.2.jar:1.14.2]
at org.apache.flink.runtime.blob.FileSystemBlobStore.put(FileSystemBlobStore.java:72) ~[flink-dist_2.12-1.14.2.jar:1.14.2]
at org.apache.flink.runtime.blob.BlobUtils.moveTempFileToStore(BlobUtils.java:385) ~[flink-dist_2.12-1.14.2.jar:1.14.2]
at org.apache.flink.runtime.blob.BlobServer.moveTempFileToStore(BlobServer.java:680) ~[flink-dist_2.12-1.14.2.jar:1.14.2]
at org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:350) [flink-dist_2.12-1.14.2.jar:1.14.2]
at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:110) [flink-dist_2.12-1.14.2.jar:1.14.2]
I initially also tried to append the secrets to flink-config.yaml in the docker-entrypoint.sh based on this documentation - Configure Access Credentials:
if [ -f '/vault/secrets/appsecrets.yaml' ]; then
(echo && cat '/vault/secrets/appsecrets.yaml') >> $FLINK_HOME/conf/flink-conf.yaml
fi
The question is how to provide the S3 credentials during the runtime since the Flink operator mounts the flink-config.yaml from a config map and it is a flink-conf.yaml: Read-only file system.
Thank you
There is no support for this from the Kubernetes operator. In fact, this is not a limitation of the Flink Kubernetes operator, it is due to the fact of lack in support in Kubernetes native integration. There is a separate story for this in the Kubernetes operator side - FLINK-27491.
As a workaround, what you can do is, set up an init container and update the config map from the init container using kubernetes API after reading it from the vault. So the updated config map should have the secrets replaced by the init container and those will be visible to the job manager and all of its task managers. The whole Flink cluster journey starts only after updating the config map from the init container so it should be visible to the Flink cluster.
A simple example to update the config map from the init container can be found here. In this example, the config map is updated with a simple CURL command. In theory, you can use any lightweight client to update the config map like this.
A side note: If possible I would suggest to use AWS IAM role rather than IAM plain secrets as IAM role is more secure compared to IAM static credentials.

Confluent Kafka Connect on Kubernetes - cannot override config properties

I run Confluent Kafka on Kubernetes and I am unable to override the Kafka Connect connectors properties.
I have added the following to the Kafka Connect CR, but to no avail:
apiVersion: platform.confluent.io/v1beta1
kind: Connect
metadata:
name: connect
namespace: confluent
spec:
replicas: 3
image:
application: confluentinc/cp-server-connect:7.0.0
init: confluentinc/confluent-init-container:2.2.0
configOverrides:
server:
- connector.client.config.override.policy=All
What is required to set this property in the context of a Kubernetes deployment ?
Thanks

How to use Kafka connect in Strimzi

I am using Kafka with strimzi operator, I created a Kafka cluster and and also deployed Kafka connect using yml file. But after this I am totally blank what to do next . I read that Kafka connect is used to copy data from a source to Kafka cluster or from Kafka cluster to another destination.
I want to use Kafka connect to copy the data from a file to Kafka cluster's any topic.
Can any one please help me how can I do that I am sharing the yml file using which I created my Kafka connect cluster.
apiVersion: kafka.strimzi.io/v1beta1
kind: KafkaConnect
metadata:
name: my-connect-cluster
# annotations:
# # use-connector-resources configures this KafkaConnect
# # to use KafkaConnector resources to avoid
# # needing to call the Connect REST API directly
# strimzi.io/use-connector-resources: "true"
spec:
version: 2.6.0
replicas: 1
bootstrapServers: my-cluster-kafka-bootstrap:9093
tls:
trustedCertificates:
- secretName: my-cluster-cluster-ca-cert
certificate: ca.crt
config:
group.id: connect-cluster
offset.storage.topic: connect-cluster-offsets
config.storage.topic: connect-cluster-configs
status.storage.topic: connect-cluster-status
#kubeclt create -f kafka-connect.yml -n strimzi
After that pod for Kafka connect is in running status ,I don't know what to do next. Please help me.
Kafka Connect exposes a REST API, so you need to expose that HTTP endpoint from the Connect pods
I read that Kafka connect is used to copy data from a source to Kafka cluster or from Kafka cluster to another destination.
That is one application, but sounds like you want MirrorMaker2 instead for that
If you don't want to use the REST API, then uncomment this line
# strimzi.io/use-connector-resources: "true"
and use another YAML file to configure the Connect resources , as shown here for Debezium. See kind: "KafkaConnector"
Look at this simple example from scratch. Not really what you want to do, but pretty close. We are sending messages to a topic using the kafka-console-producer.sh and consuming them using a file sink connector.
The example also shows how to include additional connectors by creating your own custom Connect image, based on the Strimzi one. This step would be needed for more complex examples involving external systems.

Datetime substitution in Apache - Camel Kubernetes Yaml file

I wanted to deploy S3 kafka sink connector (using Apache Camel) in kubernetes using reference document https://ibm-cloud-architecture.github.io/refarch-eda/scenarios/connect-s3/
DateTime placeholder (${date:now:yyyyMMdd-HHmmssSSS} used in this Kubernetes deployment file is not resolved. Whereas the ${file:} is resolved.
apiVersion: kafka.strimzi.io/v1alpha1
kind: KafkaConnector
metadata:
name: s3-sink-connector
labels:
strimzi.io/cluster: my-connect-cluster
spec:
class: org.apache.camel.kafkaconnector.CamelSinkConnector
tasksMax: 1
config:
key.converter: org.apache.kafka.connect.storage.StringConverter
value.converter: org.apache.kafka.connect.storage.StringConverter
topics: my-replicated-topic
camel.sink.url: aws-s3://kafka-s3?keyName=s3-connect/${date:now:yyyyMMdd-HHmmssSSS} <-- NOT RESOLVED
camel.component.aws-s3.configuration.autocloseBody: false
camel.component.aws-s3.accessKey: ${file:/opt/kafka/external-configuration/aws-credentials/aws-credentials.properties:aws_access_key_id} <-- RESOLVED
camel.component.aws-s3.secretKey: ${file:/opt/kafka/external-configuration/aws-credentials/aws-credentials.properties:aws_secret_access_key} <-- RESOLVED
I see both ${date:now:yyyyMMdd} and exchangeId in this document (https://camel.apache.org/components/latest/languages/simple-language.html)
As described in this issue: https://github.com/apache/camel-kafka-connector/issues/251 DateTime placeholders are not supported yet.
They probably will be according to https://github.com/apache/camel-kafka-connector/issues/252.