Kafka confluent 4.1.1 restart issue -- previously created topics not getting displayed - apache-kafka

I am using kafka confluent-4.1.1. I created few topics and it worked well. I don't see previously created topics if I restart confluent.
I tried the suggestions mentioned in the post:
Kafka topic no longer exists after restart
But no luck. Did any body face the same issue? Do I need to change any configurations?
Thanks in advance.

What configuration changes do I need to do in order to persist?
confluent start will use the CONFLUENT_CURRENT environmental variable for all its data storage. If you export this to a static location, data should, in theory, persist across reboots.
Otherwise, the standard ways to run each component individually is what you would do in a production environment (e.g. zookeeeper-start, kafka-server-start, schema-registry-start, etc.), which will persist data in whatever settings you've given in their respective configuration files.

Related

Can I tell spring to use compatibility=NONE when auto-registering schemas for a kafka topic

We are using kafka topics for our micro-services to communicate with each other. We are also using schemas to define the content of our topics.
On our various stages we explicitely deploy the schemas for each topic as part of the deployment process. However, on our local developer laptops (where we have a docker container running a local kafka and schema-registry instance) we do not want to do this.
We are using Spring-Boot and spring-kafka.
Accordingly, we have the following two config files:
application.yml
spring.kafka.producer.properties.auto.register.schemas=false
*application-local.yml
spring.kafka.producer.properties.auto.register.schemas=true
This works well, our schemas are automatically registered with the local schema-registry when we write to a kafka-topic for the first time.
However, after we've made some schema changes, our publish now fails telling us that the new schema is not compatible with the previously installed schema. Checking the local schema registry, we see that the auto-registered schema was registered with compatibility=BACKWARD whereas on our staged registries we work with compatibility=NONE (we're well aware of the issues this may bring with regard to breaking changes -> this is handled in the way we work with our data).
Is there any way to make the auto-registration use NONE instead of BACKWARD?
Any new subject will inherit the global compatibility level of the Registry; you cannot set it when registering a schema without making a secondary out-of-band compatibility HTTP request (in other words, prior to actually producing any data which may register the schema on its own).
During local development, I would suggest deleting the schema from your registry until you are content with the schema rather than worrying about local compatibility changes.
You could also set the default compatibility of the container to NONE

Create kafka topic using predefined config files

Is there any way to create kafka topic in kafka/zookeeper configuration files before I will run the services, so once they will start - the topics will be in place?
I have looked inside of script bin/kafka-topics.sh and found that in the end, it executes a live command on the live server. But since the server is here, its config files are here and zookeeper with its configs also are here, is it a way to predefined topics in advance?
Unfortunately haven't found any existing config keys for this.
The servers need to be running in order to allocate metadata and log directories for them, so no

two kafka versions running on same cluster

I am trying to configure two Kafka servers on a cluster of 3 nodes. while there is already one Kafka broker(0.8 version) already running with the application. and there is a dependency on that kafka version 0.8 that cannot be disturbed/upgraded .
Now for a POC, I need to configure 1.0.0 since my new code is compatible with this version and above...
my task is to push data from oracle to HIVE tables. for this I am using jdbc connect to fetch data from oracle and hive jdbc to push data to hive tables. it should be fast and easy way...
I need the following help
can I use spark-submit to run this data push to hive?
can I simply copy kafka_2.12-1.0.0 on my Linux server on one of the node and run my code on it. I think I need to configure my Zookeeper.properties and server.properties with ports not in use and start this new zookeeper and kafka services separately??? please note I cannot disturb existing zookeeper and kafka already running.
kindly help me achieve it.
I'm not sure running two very memory intensive applications (Kafka and/or Kafka Connect) on the same machines is considered very safe. Especially if you do not want to disturb existing applications. Realistically, a rolling restart w/ upgrade will be best for performance and feature reasons. And, no, two Kafka versions should not be part of the same cluster, unless you are in the middle of a rolling upgrade scenario.
If at all possible, please use new hardware... I assume Kafka 0.8 is even running on machines that could be old, and out of warranty? Then, there's no significant reason that I know of not to even use a newer version of Kafka, but yes, extract it on any machine you'd like, use perhaps use something like Ansible, or preferred config management tool you choose, to do it for you.
You can share the same Zookeeper cluster actually, just make sure it's not the same settings. For example,
Cluster 0.8
zookeeper.connect=zoo.example.com:2181/kafka08
Cluster 1.x
zookeeper.connect=zoo.example.com:2181/kafka10
Also, not clear where Spark fits into this architecture. Please don't use JDBC sink for Hive. Use the proper HDFS Kafka Connect sink, which has direct Hive support via the metastore. And while the JDBC source might work for Oracle, chances are, you might already be able to afford a license for GoldenGate
i am able to achieve two kafka version 0.8 and 1.0 running on the same server with respective zookeepers.
steps followed:
1. copy the version package folder to the server at desired location
2. changes configuration setting in zookeeper.properties and server.propeties(here you need to set port which are not in used on that particular server)
3. start the services and push data to kafka topics.
Note: this requirement is only for a POC and not an ideal production environment. as answered above we must upgrade to next level rather than what is practiced above.

Schema Registry persistence after reboot

I just finished this tutorial to use Kafka and Schema Registry :http://cloudurable.com/blog/kafka-avro-schema-registry/index.html
I also played with Conlfuent Platform : https://docs.confluent.io/current/installation/installing_cp.html
Everything works fine, until I rebooted my Virtual Machine (VMBOX) :
All schemas/subjects have been deleted (or disappeared) after I rebooted.
I read that Schema Registry to not store itself the data but use Kafka to do that. Of course, as I work for the moment only on my laptop, Kafka was also shutdown during the machine reboot.
Is it normal behavior, do we have to expect to RE-store all schemas all the time we reboot??? (-> maybe last version so!)
Do anybody have good best practices about that?
How persistence of schemas can be managed to avoid this problem ?
Environment : Ubuntu 16... , Kafka 2.11.1.0.0, Confluent Platform 4.0
Thanks a lot
nota: I already read this topics which discuss about keeping schema's ID, but has I don't recover any schemas, it's not a problem of Ids : Confluent Schema Registry Persistence
Schema Registry persists its data in Kafka.
Therefore your question becomes, why did you lose your data from Kafka on reboot.
My guess would be you've inadvertently used /tmp as the data folder. Are you using Confluent CLI in your experiments?

Kafka sink connector: No tasks assigned, even after restart

I am using Confluent 3.2 in a set of Docker containers, one of which is running a kafka-connect worker.
For reasons yet unclear to me, two of my four connectors - to be specific, hpgraphsl's MongoDB sink connector - stopped working. I was able to identify the main problem: The connectors did not have any tasks assigned, as could be seen by calling GET /connectors/{my_connector}/status. The other two connectors (of the same type) were not affected and were happily producing output.
I tried three different methods to get my connectors running again via the REST API:
Pausing and resuming the connectors
Restarting the connectors
Deleting and the creating the connector under the same name, using the same config
None of the methods worked. I finally got my connectors working again by:
Deleting and creating the connector under a different name, say my_connector_v2 instead of my_connector
What is going on here? Why am I not able to restart my existing connector and get it to start an actual task? Is there any stale data on the kafka-connect worker or in some kafka-connect-related topic on the Kafka brokers that needs to be cleaned?
I have filed an issue on the specific connector's github repo, but I feel like this might actually be general bug related to the intrinsics of kafka-connect. Any ideas?
I have faced this issue. If the resources are less for a SinkTask or SourceTask to start, this can happen.
Memory allocated to the worker may be less some time. By default workers are allocated 250MB. Please increase this. Below is an example to allocate 2GB memory for the worker running in distributed mode.
KAFKA_HEAP_OPTS="-Xmx2G" sh $KAFKA_SERVICE_HOME/connect-distributed $KAFKA_CONFIG_HOME/connect-avro-distributed.properties