How can I get messages without missing in Kafka? - apache-kafka

i'm a newbie in Kafka. I've been testing Kafka for sending messages.
This is my situation, now.
add.java in my local VM is sending messages to kafka in my local VM regularly.
relay.java in another server is polling from kafka in my local VM and producing to kafka in another server.
While I was sending messages from kafka in my local VM to kafka in another server,
I pulled LAN cable out from my lap top. Few seconds later, I connected LAN cable to it again.
And then I found that some messages were lost while LAN cable was disconnected.
However, When the network is reconnected, I want to get all messages which are in disconnection without
missing.
Are there any suggestions?
Any help would be highly appreciated.

First of all, I suggest you use MirrorMaker (1 or 2) because it supports exactly this use case of consuming and producing to another cluster.
Secondly, add.java should not be dropping messages if your LAN is disconnected.
Whether you end up with dropped messages on the way from relay.java depends on your consumer and producer settings within there. For example, you should definitely disable auto offset commits and only commit after you have gotten a completion event and acknowledgement from its producer action. This will result in at least once delivery.
You can find multiple posts about processing guarantees in Kafka

Related

Kafka has multiple bootstrap server but we will connect only one server. Can we still publish data to it?

Kafka has multiple bootstrap server like b1.com, b2.com, b3.com. While Producer Configuration, we are passing only b1.com as bootstrap server. What will happen once we will publish data to kafka?
As of my knowledge, it should not allow to publish the data if b1.com is not leader as kafka allow publishing data through leader only. Please guide me.
Even if b1.com is not the leader, you would still be able to publish data successfully. The reason being once you connect to a server, you can get the complete metadata of your topic (partitions, their respective leaders etc).
That being said, it is still recommended to provide all servers. Reason for this is the scenario where b1.com goes down. Now since you provided only one server to your producer, it will not be able to connect to kafka and your system effectively goes down.
On the other hand, if you had provided all the servers and assuming your topic was replicated - the system would still be functional even if b1.com had gone down.

Client Local Queue in Red Hat AMQ

We have a network of Red Hat AMQ 7.2 brokers with Master/Slave configuration. The client application publish / subscribe to topics on the broker cluster.
How do we handle the situation wherein the network connectivity between the client application and the broker cluster goes down? Does Red Hat AMQ have a native solution like client local queue and a jms to jms bridge between local queue and remote broker so that network connectivity failure will not result in loss of messages.
It would be possible for you to craft a solution where your clients use a local broker and that local broker bridges messages to the remote broker. The local broker will, of course, never lose network connectivity with the local clients since everything is local. However, if the local broker loses connectivity with the remote broker it will act as a buffer and store messages until connectivity with the remote broker is restored. Once connectivity is restored then the local broker will forward the stored messages to the remote broker. This will allow the producers to keep working as if nothing has actually failed. However, you would need to configure all this manually.
That said, even if you don't implement such a solution there is absolutely no need for any message loss even when clients encounter a loss of network connectivity. If you send durable (i.e. persistent) messages then by default the client will wait for a response from the broker telling the client that the broker successfully received and persisted the message to disk. More complex interactions might require local JMS transactions and even more complex interactions may require XA transactions. In any event, there are ways to eliminate the possibility of message loss without implementing some kind of local broker solution.

Is it always necessary to restart streams after a Kafka broker outage/failover

We are using Kafka streams (0.11.0.1) api to consume events from topic. But whenever there is a Kafka Broker outage/failover, we need to restart all Kafka streamers to recover from following error:
"Connection to node 39366 could not be established. Broker may not be available."
Just wondering if it is really required for streamers to do streams close and restart? Why streamers are not able recover from this issue automatically? Or are we missing any configuration in client/Broker?
Now we are planning to introduce code changes to handle all stream exception and trigger an automated restart of streams. But am really worried if that is the right way to handle this scenario.
If you think in a real world use case where hundreds of clients connected to brokers and restarting each of them, its not making any sense.

Standalone Kafka Producer

I'm thinking about creating a stand alone Kafka producer that runs as a daemon and takes messages via a socket and send them reliable to Kafka.
But, I must not be the first one to think about this idea. The idea is to avoid writing a Kafka producer in for example PHP or Node but just deliver messages via a socket to a stand alone daemon from these languages that takes care of the delivery while the main applications keeps doing its thing.
This daemon should take care of retry delivery in case of outages and acts as a delivery point for all programs that run on the server.
Is this something that is a good idea, or is writing producers in every used language the common approach? That mmust not be the case right?
You should have a look at Kafka connectors.
Here is one of the them:
Kafka Connect Socket Source
Here you can find how to use it:
https://www.baeldung.com/kafka-connectors-guide
Sample Configuration connect-socket-source.properties:
name=socket-connector
connector.class=org.apache.kafka.connect.socket.SocketSourceConnector
tasks.max=1
topic=topic
schema.name=socketschema
port=12345
batch.size=100

Kafka: What happens when the entire Kafka Cluster is down?

We're testing out the Producer and Consumer using Kafka. A few questions:
What happens when all the brokers are down and they're not responding at all?
Does the Producer need to keep pinging the Kafka brokers to know when it is back up online? Or is there a more elegant way for the Producer application to know?
How does Zookeeper help in all this? What if the ZK is down as well?
If one or more brokers are down, the producer will re-try for a certain period of time (based on the settings). And during this time one or more of the consumers will not be able to read anything until the respective brokers are up.
But if the cluster is down for a longer period than your total re-try period, then probably you need to find a way to resend those failed messages again.
This is the one scenario where Kafka Mirroring(MirrorMaker tool) comes into picture.
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330
Producer will fail because cluster will be unavailable, this means they will get a non retriable error from kafka client implementation and depending on your client process, message will buffer on the local send queue of your application.
I'm sure that if zookeeper is down your system will not work anymore. This is one of the weakness of Kafka, he need zookeeper to work.