python KafkaConsumer and Producer at the same time - apache-kafka

In a python program I would like to write some messages to Kafka, then read the response with the same number of messages from a remote app on a different topic. The problem is that by the time I am done with sending messages, the other end already start responding and when I start reading, I only get the tailing part of the message batch, or no messages at all, depending on timing. This contradict to my understanding of the package, i.e. I thought if I create a consumer with auto_offset_reset='latest', and subscribe for a topic, then it remembers the offset at subscription time and when I get to iterate over the consumer object, it will start reading messages from that offset.
Here is what I do:
I create a consumer first and subscribe for the out topic:
consumer = KafkaConsumer(
bootstrap_servers=host+':'+broker_port,
group_id = "0",
auto_offset_reset='latest',
consumer_timeout_ms=10000
)
consumer.subscribe(topics=(topic_out))
then create a producer and send messages to topic_in:
producer = KafkaProducer(
bootstrap_servers=host+':'+broker_port
)
future = producer.send(topic,json.dumps(record).encode('utf-8'))
future.get(timeout=5)
Then I start reading from the consumer:
results = []
for msg in consumer:
message = json.loads(msg.value)
results.append(message)
I tried consumer.pause() before sending, consumer.resume() after sending - does not help.
Is there something I am missing in the configuration, or I misunderstand how Consumer works?
Thanks in advance!

Sounds like you have a race condition.
One solution would be to store a local dictionary or sqlite table that's built from consuming this "lookup topic", then when you consume from the "action topic", you are doing lookups locally rather than starting a consumer to scan over the "lookup topic" for the data you need.

Related

Kafka Consumer writing to the same queue from where it is reading

I have a use case where I want to consume from a kafka topic and depending on some logic if I am not able to process the message right now, I want to enqueue the message back to the same topic from where it had been read
Something like this
Topic1 ---> Consumer ---> Can't process now
^
|Re-enqueues________|
Is it possible ?
Yes, this is possible.
However, be aware that depending on your retention settings the re-ingested message might exist in the topic multiple times. Also, the consumer will consume all messages as long as it is running which could lead to the case that it has consumed all valid messages but keeps on re-ingesting the other messages over and over again.
The typical pattern to deal with messages that should be re-ingested into your pipeline is to send them to a dedicated Kafka topic. Once your consumer is fixed to be able to process those messages you can then have your consumer read that dedicated topic just once.

Removing one message from a topic in Kafka

I'm new at using Kafka and I have one question. Can I delete only ONE message from a topic if I know the topic, the offset and the partition? And if not is there any alternative?
It is not possible to remove a single message from a Kafka topic, even though you know its partition and offset.
Keep in mind, that Kafka is not a key/value store but a topic is rather an append-only(!) log that represents a stream of data.
If you are looking for alternatives to remove a single message you may
Have your consumer clients ignore that message
Enable log compaction and send a tompstone message
Write a simple job (KafkaStreams) to consume the data, filter out that one message and produce all messages to a new topic.

How to send Kafka message from one topic to another topic?

suppose my producer is writing the message to Topic A...once the message is in Topic A, i want to copy the same message to Topic B. Is this possible in kafka?
If I understand correctly, you just want stream.to("topic-b"), although, that seems strange without doing something to the data.
Note:
The specified topic should be manually created before it is used
I am not clear about what use case you are exactly trying to achieve by simply copying data from one topic to another topic. If both the topics are in the same Kafka cluster then it is never a good idea to have two topics with the same message/content.
I believe the gap here is that probably you are not clear about the concept of the Consumer group in Kafka. Probably you have two action items to do by consuming the message from the Kafka topic. And you are believing that if the first application consumes the message from the Kafka topic, will it be available for the second application to consume the same message or not. Kafka allows you to solve this kind of common use case with the help of the consumer group.
Let's try to differentiate between other message queue and Kafka and you will understand that you do not need to copy the same data/message between two topics.
In other message queues, like SQS(Simple Queue Service) where if the message is consumed by a consumer, the same message is not available to get consumed by other consumers. It is the responsibility of the consumer to delete the message safely once it has processed the message. By doing this we guarantee that the same message should not get processed by two consumers leading to inconsistency.
But, In Kafka, it is totally fine to have multiple sets of consumers consuming from the same topic. The set of consumers form a group commonly termed as the consumer group. Here one of the consumers from the consumer group can process the message based on the partition of the Kafka topic the message is getting consumed from.
Now the catch here is that we can have multiple consumer groups consuming from the same Kafka topic. Each consumer group will process the message in the way they want to do. There is no interference between consumers of two different consumer groups.
To fulfill your use case I believe you might need two consumer groups that can simply process the message in the way they want. You do not essentially have to copy the data between two topics.
Hope this helps.
There are two immediate options to forward the contents of one topic to another:
by using the stream feature of Kafka to create a forwarding link
between the two topics.
by creating a consumer / producer pair
and using those to receive and then forward on messages
I have a short piece of code that shows both (in Scala):
def topologyPlan(): StreamsBuilder = {
val builder = new StreamsBuilder
val inputTopic: KStream[String, String] = builder.stream[String, String]("topic2")
inputTopic.to("topic3")
builder
}
def run() = {
val kafkaStreams = createStreams(topologyPlan())
kafkaStreams.start()
val kafkaConsumer = createConsumer()
val kafkaProducer = createProducer()
kafkaConsumer.subscribe(List("topic1").asJava)
while (true) {
val record = kafkaConsumer.poll(Duration.ofSeconds(5)).asScala
for (data <- record.iterator) {
kafkaProducer.send(new ProducerRecord[String, String]("topic2", data.value()))
}
}
}
Looking at the run method, the first two lines set up a streams object to that uses the topologyPlan() to listen for messages in 'topic2' and forward then to 'topic3'.
The remaining lines show how a consumer can listen to a 'topic1' and use a producer to send them onward to 'topic2'.
The final point of the example here is Kafka is flexible enough to let you mix options depending on what you need, so the code above will take messages in 'topic1', and send them to 'topic3' via 'topic2'.
If you want to see the code that sets up consumer, producer and streams, see the full class here.

Questions about Kafka Consumer of Transient messages via Akka-Streams on Multiple Nodes

We are using Kafka to store messages that are produced by a node in our cluster and to be distributed to all nodes in the cluster and I have it mostly working with akka-streams but there is a couple of questions I have to tie this up. There are some constraints to this.
First of all the message has to be consumed by every node in the cluster but produced by only one node. I understand I can assign each node a group id that is probably its node ID which means each node will get the message. That sorted. But here are the questions.
The data is extremely transient and fairly large (just under a meg) and cannot be compressed further or broken up. If there is a new message on the topic the old one is pretty much trash. How can I limit the topic to basically just one message currently maximum?
Given that the data is necessary for the node to start, I need to consume the latest message on the topic no matter whether I have consumed it before and, hopefully without creating a unique group id every time I start the server. Is this possible and if so, how can it be done.
Finally, the data is usually on the topic but on occasion it is not there and I, ideally, need to be able to check if there is a message there and if not ask the producer to create the message. Is this possible?
This is the code I am currently using to start the consumer:
private Control startMatrixConsumer() {
final ConsumerSettings<Long, byte[]> matrixConsumerSettings = ConsumerSettings
.create(services.actorSystem(), new LongDeserializer(), new ByteArrayDeserializer())
.withBootstrapServers(services.config().getString("kafka.bootstrapServers"))
.withGroupId("group1") // todo put in the conf ??
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
final String topicName = Matrix.class.getSimpleName() + '-' + eventId;
final AutoSubscription subscription = Subscriptions.topics(topicName);
return Consumer.plainSource(MatrixConsumerSettings, subscription)
.named(Matrix.class.getSimpleName() + "-Kafka-Consumer-" + eventId)
.map(data -> {
final Matrix matrix = services.kryoDeserialize(data.value(), Matrix.class);
log.debug(format("Received %s for event %d from Kafka", Matrix.class.getSimpleName(), matrix.getEventId()));
return matrix;
})
.filter(Objects::nonNull)
.to(Sink.actorRef(getSelf(), NotUsed.getInstance()))
.run(ActorMaterializer.create(getContext()));
}
Thanks a bunch.
All the message has to be consumed by every node in the cluster but
produced by only one.
You are correct, you can achieve this by having an unique group id per node.
How can I limit the topic to basically just one message currently
maximum?
Kafka provides compacted topics.
Compacted topic maintains only the most recent message of a given key. For instance, Kafka consumers store their offsets in compacted topic.
In your case, produce every message with the same key, and Kafka Log Cleaner will delete old messages. Please be aware that compaction is performed periodically, so you can end up with two (or more) messages with the same key for a short period of time (depends on your Log Cleaner configuration.
I need to consume the latest message on the topic no matter whether I
have consumed it before.
You can achieve this by not committing the consumer offset (enable.auto.commit set to false) and setting auto.offset.reset to earliest. By having one message in your compacted topic and consumer that starts from the beginning of the topic, that message is always consumed after node starts.
I need to be able to check if there is a message there and if not ask
the producer to create the message.
Unfortunately, I am not aware of any Kafka functionality that could help you with that. Most of the time Kafka is used to decouple producers and consumers.

Can i consume based on specific condition in Kafka?

I'm writing a msg in to Kafka and consuming it in the other end.
Doing some process in it and writing it back to another Kafka topic.
I want to know which message response is for which request..
currently decided to capture the offset id from consumer side then write in the response and read the response payload and decide the same.
For this approach we need to read each message is there any other way we can consume based on consumer config condition?
Consumers can only read the whole topic. You can only skip messages via seek() but there is no conditions that you can evaluate on the broker to filter messages.
You will need to consume the whole topic an process/filter in the client.