Deleting and re-creating Kafka topics through the Java KafkaServer API

Deleting and re-creating Kafka topics through the Java KafkaServer API - apache-kafka

I have a few integration tests for my application that connect to a local Kafka instance. I am using the Java KafkaServer API to create the local instance on demand when the test runs in a way similar to the accepted answer from this question:
How can I instanciate a Mock Kafka Topic for junit tests?
Each of my tests pass when run in isolation. The problem I am having is that my tests use the same Kafka topics and I would like the topics to start each test containing no messages. However, when I run the tests in series I am getting this error when all tests after the first run and try to recreate the topics they need:
kafka.common.TopicExistsException: Topic "test_topic" already exists.
at kafka.admin.AdminUtils$.createOrUpdateTopicPartitionAssignmentPathInZK(AdminUtils.scala:187)
at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:172)
at kafka.admin.TopicCommand$.createTopic(TopicCommand.scala:93)
Each test creates and shuts down its own EmbeddedZookeeper and KafkaServer. I have also tried deleting the 'brokers/topics' path from ZK as well as the KafkaServer's logDirs at the end of each test. Somehow the topics from the first test are still surviving to the second.
What can I do at the end of each test to make sure that the topics it uses do not interfere with tests that run after it?

I was able to eventually get it to work.
Instead of cleaning up after each test, I changed the tests to clean up before they ran.
There were two cleanup steps that I needed to do.
The first was to delete the broker's data directory before starting KafkaServer.
String dataDirectory = 'tmp/kafka'
FileUtils.deleteDirectory(FileUtils.getFile(dataDirectory))
Properties props = TestUtils.createBrokerConfig(BROKER_ID, port, true)
props.put('log.dir', dataDirectory)
props.put('delete.topic.enable', 'true')
KafkaConfig config = new KafkaConfig(props)
Time mock = new MockTime()
kafkaServer = TestUtils.createServer(config, mock)
The second was to delete the topic path recursively in Zookeeper before sending the createTopic command.
zkClient.deleteRecursive(ZkUtils.getTopicPath(topicName))
List<String> arguments = ['--topic', topicName, '--partitions', '1', '--replication-factor', '1']
TopicCommand.createTopic(zkClient, new TopicCommand.TopicCommandOptions(arguments as String[]))
I tried a number of similar approaches but couldn't get it working with anything except exactly this.
Note that the code is Groovy and not Java.

Related

How to run a Kafka Producer and Kafka Consumer via CLI commands for 24 hours

We have a requirement, where we would need to showcase the resiliency of a kafka cluster. To prove this, we have a use case where we need to run a producer and consumer ( I am thinking kafka-console-producer and kafla-console-consumer) preferably via cli commands and/or scripts to run continuously for 24hrs. We are not concerned with the message size and contents; preferably the size can be as small as possible and messages be any random value, say the present timestamp.
How can I achieve this?

There's nothing preventing you from doing this, and the problem isn't unique to Kafka.
You can use nohup to run a script as a daemon, otherwise, the commands will terminate when that console session ends. You could also use cron to schedule any script, a minimum of every minute...
Or you can write your own app with a simple while(true) loop.
Regardless, you will want a proess supervisor to truly ensure the command remains running at all times.

Create kafka topic using predefined config files

Is there any way to create kafka topic in kafka/zookeeper configuration files before I will run the services, so once they will start - the topics will be in place?
I have looked inside of script bin/kafka-topics.sh and found that in the end, it executes a live command on the live server. But since the server is here, its config files are here and zookeeper with its configs also are here, is it a way to predefined topics in advance?
Unfortunately haven't found any existing config keys for this.

The servers need to be running in order to allocate metadata and log directories for them, so no

NimbusLeaderNotFoundException in Apache Storm UI

I am trying to launch Storm ui for streaming application, however I constantly get this error:
org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts [localhost]. Did you specify a valid list of nimbus hosts for config nimbus.seeds?
at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:250)
at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:179)
at org.apache.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:138)
at org.apache.storm.daemon.ui.resources.StormApiResource.getClusterConfiguration(StormApiResource.java:116)
I have launched storm locally using storm script for starting nimbus, submitting jar and polling ui. What could be the reason of it?
Here is the code with connection setup:
val cluster = new LocalCluster()
val bootstrapServers = "localhost:9092"
val spoutConfig = KafkaTridentSpoutConfig.builder(bootstrapServers, "tweets")
.setProp(props)
.setFirstPollOffsetStrategy(FirstPollOffsetStrategy.LATEST)
.build()
val config = new Config()
cluster.submitTopology("kafkaTest", config, tridentTopology.build())

When you submit to a real cluster using storm jar, you should not use LocalCluster. Instead use the StormSubmitter class.
The error you're getting is saying that it can't find Nimbus at localhost. Are you sure Nimbus is running on the machine you're running storm jar from? If so, please post the commands you're running, and maybe also check the Nimbus log.

Kafka Connector - distributed - load balancing tasks

I am running development environment for Confluent Kafka, Community edition on Windows, version 3.0.1-2.11.
I am trying to achieve load balancing of tasks between 2 instances of connector. I am running Kafka Zookepper, Server, REST services and 2 instance of Connect distributed on the same machine.
Only difference between properties file for connectors is rest port since they are running on the same machine.
I don't create topics for connector offsets, config, status. Should I?
I have custom code for sink connector.
When I create worker for my sink connector I do this by executing POST request
POST http://localhost:8083/connectors
toward any of the running connectors. Checking is there loaded worker is done at URL
GET http://localhost:8083/connectors
My sink connector has System.out.println() lines in code with which I can follow output of my code in the console log.
When my worker is running I can see that only one instance of connector is executing code. If I terminate one connector another instance will take over the worker and execution will resume. However this is not what I want.
My goal is that both connector instances are running worker code so that they can share the load between them.
I've tried to got over some open source connectors to see is there specifics in writing code of connectors but with no success.
I've made some different attempts to tackle this problem but with no success.
I could rewrite my business code to come around this but I'm pretty sure I'm missing on something not obvious for me.
Recently I commented on Robin Moffatt's answer of this question.

From the sounds of it your custom code is not correctly spawning the number of tasks that you are expecting.
Make sure that you've set tasks.max >1 in your config
Make sure that your connector is correctly creating the appropriate number of tasks to taskConfigs
References:
https://opencredo.com/blogs/kafka-connect-source-connectors-a-detailed-guide-to-connecting-to-what-you-love/
https://docs.confluent.io/current/connect/devguide.html
https://enfuse.io/a-diy-guide-to-kafka-connectors/

Kafka sink connector: No tasks assigned, even after restart

I am using Confluent 3.2 in a set of Docker containers, one of which is running a kafka-connect worker.
For reasons yet unclear to me, two of my four connectors - to be specific, hpgraphsl's MongoDB sink connector - stopped working. I was able to identify the main problem: The connectors did not have any tasks assigned, as could be seen by calling GET /connectors/{my_connector}/status. The other two connectors (of the same type) were not affected and were happily producing output.
I tried three different methods to get my connectors running again via the REST API:
Pausing and resuming the connectors
Restarting the connectors
Deleting and the creating the connector under the same name, using the same config
None of the methods worked. I finally got my connectors working again by:
Deleting and creating the connector under a different name, say my_connector_v2 instead of my_connector
What is going on here? Why am I not able to restart my existing connector and get it to start an actual task? Is there any stale data on the kafka-connect worker or in some kafka-connect-related topic on the Kafka brokers that needs to be cleaned?
I have filed an issue on the specific connector's github repo, but I feel like this might actually be general bug related to the intrinsics of kafka-connect. Any ideas?

I have faced this issue. If the resources are less for a SinkTask or SourceTask to start, this can happen.
Memory allocated to the worker may be less some time. By default workers are allocated 250MB. Please increase this. Below is an example to allocate 2GB memory for the worker running in distributed mode.
KAFKA_HEAP_OPTS="-Xmx2G" sh $KAFKA_SERVICE_HOME/connect-distributed $KAFKA_CONFIG_HOME/connect-avro-distributed.properties

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse