Stop auto-create topic in confluent-cloud - confluent-platform

In confluent cloud we don't want that anyone can create a topic without any kind of control, we would like to automate this somehow.
However, I don't see anywhere that creation of topics can be controlled in anyway in confluent-cloud.
Is it possible to do this?

auto.create.topics.enable is disabled by default on Confluent Cloud, and cannot be enabled on Basic or Standard clusters, only Dedicated.

Related

how to enable confluent.value.schema.validation for cp-helm-charts

I am using helm to deploy kafka using cp-helm-charts
I have enabled zookeeper, kafka, schema registry and control center components. In control center UI I am able to create a topic and set a schema for the topic. However schema validation is not enabled and it is still possible to write arbitrary text to the topic.
I am trying to enable schema validation as described here
by adding these options to my helm values:
cp-control-center:
configurationOverrides:
"confluent.schema.registry.url": http://data-cp-schema-registry:8081
"confluent.value.schema.validation": true
But it has no effect.
QUESTION:
How to enable schema validation for cp-helm-charts kafka?
The idea is to restrict all the contents that does not match specified schema.
Schema validation is only applicable to Confluent Server (the broker), or the topics, not Control Center container, so you'll need to move that override to the kafka configuration instead (and verify it's using the cp-server image)
It's worth mentioning that that's a paid feature of Confluent Enterprise.

What are the different ways to get Kafka Cluster Audit log to GCP Logging?

What are the different ways to get Kafka Cluster Audit log to GCP Logging?
Can anyone share more information on how can I achieve it?
Thank you!
Assuming you have access to the necessary topic (from what I understand the Audit topic is not stored on your own cluster), to get data out of Kafka, you need a consumer. This could be in any language.
To get data into Cloud Logging, you need to use its API.
That being said, you could use any compatible pair of Kafka clients & Cloud logging clients that you would be comfortable with.
For example, you could write or find a Kafka Connect Sink connector that wraps the Java Cloud Logging client.

How AWS MSK and Confluent Schema Registry and Confluent Kafka connect recommended to use together?

We are planning to use AWS MSK service for Managed Kafka and Schema Registry and Kafka Connect services from Confluent together to run our connectors (Elasticsearch Sink Connector). We have planned to run Schema Registry and Connectors in EC2.
As per the Confluent team, They could not officially support Confluent Schema Registry and Kafka Connect if we use MSK for Kafka.
So, Anyone can share their experience? like
if Anybuddy has used a combination of MSK and Confluent services together in the production environment?
Is there any risk in using this kind of combination?
Is it recommended or not to use this combination?
How is Confluent community support if we will face any issue with Connectors?
Any other suggestions, comments, or alternatives?
We already have a Confluent Corporate Platform license but We want to have managed Kafka service that's why we have chosen AWS MKS as it's very cost-effective than Confluent Cloud as per our analysis?
Kindly please share your thoughts and Thanks in advance.
Thanks
Objectively answering your question this is something doable but it depends where is your major pain.
From the licensing perspective there is nothing that forces you to have a Confluent subscription just to use Kafka Connect or Schema Registry, as they are based on the Apache License 2.0 and Confluent Community License respectively.
From the technical perspective you can run both Kafka Connect and Schema Registry on EC2 and; as long they are running in the same VPC that the MSK cluster they will work flawlessly.
From the cost perspective you will have to evaluate how much it costs to have Kafka Connect and Schema Registry being managed by you and/or your team. Think not only about the install and setup phase but the manage and evolve phase as well. The software might not have any cost but the effort to operate these components can be translated into cost.
How is Confluent community support if we will face any issue with Connectors?
The Kafka community is usually very helpful whether if you ask for help in the Apache Kafka users group or the community that Confluent owns in Slack. Of course, it is all about best effort and you can't rely on them to get support. It may take several days until some good Samaritan decide to help you. Which also translates to cost: how much costs being down and/or waiting for a resolution?
I am no longer a Confluent employee and therefore I won't even try to convince you to buy from them. But you should evaluate this component of cost and check if using Confluent Cloud wouldn't provide you a more cost effective solution since it includes a managed version of Kafka, Kafka Connect, and Schema Registry. In my experience, the managed Kafka on Confluent Cloud is not that costly and the managed Schema Registry is "free", but using a managed connector can be very costly and it can be worse depending of the number of tasks that you configure in the managed connector. This is the only gotcha that you ought to watch out.
AWS MSK now supports fully managed free schema registry service that easily integrates with Kafka and other AWS services like Kinesis, Glue etc. It's much easier to get started with it.

Kafka and IIDR CDC

I am trying to build a CDC pipeline using : DB2--IBM CDC --Kafka
and I am trying to figure out the right way to setup this .
I tried below things -
1.Setup a 3 node kafka cluster on linux on prem
2.Installed IIDR CDC software on linux on prem using - setup-iidr-11.4.0.1-5085-linux-x86.bin file . The CDC instance is up and running .
The various online documentation suggest to install 'IIDR management console ' to configure the source datastore and CDC server configuration and also Kafka subscription configuration to build the pipeline .
Currently I do not have the management console installed .
Few questions on this -
1.Is there any alternative to IBM CDC management console for setting up the kafka-CDC pipeline ?
2.How can I get the IIDR management console ? and if we install it on our local windows dekstop and try to connect to CDC/Kafka which are on remote linux servers, will it work ?
3.Any other method to setup the data ingestion IIDR CDC to Kafka ?
I am fairly new to CDC/ IIDR , please help !
I own the development of the IIDR Kafka target for our CDC Replication product.
Management Console is the best way to setup the subscription initially. You can install it on a windows box.
Technically I believe you can use our scripting language called CHCCLP to setup a subscription as well. But I recommend using the GUI.
Here are links to our resources on our IIDR (CDC) Kafka Target. Search for the "Kafka" section.
"https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W8d78486eafb9_4a06_a482_7e7962f5ac59/page/IIDR%20Wiki"
An example of setting up a subscription and replicating is this video
https://ibm.box.com/s/ur8jokg6tclsx5fcav5g86a3n57mqtd5
Management console and access server can be obtained from IBM fix central.
I have installed MC/Access server on my VM and on my personal windows box to use it against my linux VMs. You will need connectivity of course.
You can definitely follow up with our Support and they'll be able to sort you out. Plus we have docs in our knowledge centre on MC starting here.... https://www.ibm.com/support/knowledgecenter/en/SSTRGZ_11.4.0/com.ibm.cdcdoc.mcadminguide.doc/concepts/overview_of_cdc.html
You'll find our Kafka target is very flexible it comes with five different formats to write data into Kafka, and you can choose to capture data in an audit format, or the Kafka compaction compatible key, null for a delete method.
Additionally you can even use the product to write several records to several different topics in several formats from a single insert operation. This is useful if some of your consumer apps want JSON and others Avro binary. Additionally you can use this to put all the data to more secure topics, and write out just some of the data to topics that more people have access to.
We even have customers who encrypt columns in flight when replicating.
Finally the product's transformations can be parallelized even if you choose to only use one producer to write out data.
Actually one more finally, we additionally provide the option to use a special consumer which produces database ACID semantics for data written into Kafka and shred across topics and partitions. It re-orders it. we call it the transactionally consistent consumer. It provides operation order, bookmarks for restarting applications, and allows parallelism in performance but ordered, exactly once, deduplicated consumption of data.
From my talk at the Kafka Summit...
https://www.confluent.io/kafka-summit-sf18/a-solution-for-leveraging-kafka-to-provide-end-to-end-acid-transactions

How to set replication factor in librdkafka?

I'm using librdkafka to develop in C++ kafka message producer.
Is there a way to create topic with custom replication factor, different than default one?
CONFIGURATION.md does not mention explicitly any parameter, but Kafka tools allow for this.
While auto topic creation is currently supported by librdkafka, it merely uses the broker's topic default configuration.
What you need is manual topic creation from the client. The broker support for this was recently added in KIP-4, which is also supported through librdkafka's Admin API.
See the rd_kafka_CreateTopics() API.