Kafka Producer should retry three times in case fail - apache-kafka

I want to implement Kafka producer retry logic three times in case of any failure and also i want to test manually whether producer is retrying or not. Can you suggest me, how to manually test this functionality. In below configuration is added to producer configuration to retry in case of any failure. Thank you.
props.put("retries", 3);

You should trust this core functionality of Kafka, but you can track it with capturing the packets of the Producer.
you can use tcpdump in order sniff packets on the producer server and check how many time they sent:
tcpdump -i any port 9092
I also recommend you to view this answer about using tshark for capturing Kafka.
If you want to investigate the protocol even deeper you can use WireShark.
Check out this guide about how to install WireShark on Linux.

Related

How can I get messages without missing in Kafka?

i'm a newbie in Kafka. I've been testing Kafka for sending messages.
This is my situation, now.
add.java in my local VM is sending messages to kafka in my local VM regularly.
relay.java in another server is polling from kafka in my local VM and producing to kafka in another server.
While I was sending messages from kafka in my local VM to kafka in another server,
I pulled LAN cable out from my lap top. Few seconds later, I connected LAN cable to it again.
And then I found that some messages were lost while LAN cable was disconnected.
However, When the network is reconnected, I want to get all messages which are in disconnection without
missing.
Are there any suggestions?
Any help would be highly appreciated.
First of all, I suggest you use MirrorMaker (1 or 2) because it supports exactly this use case of consuming and producing to another cluster.
Secondly, add.java should not be dropping messages if your LAN is disconnected.
Whether you end up with dropped messages on the way from relay.java depends on your consumer and producer settings within there. For example, you should definitely disable auto offset commits and only commit after you have gotten a completion event and acknowledgement from its producer action. This will result in at least once delivery.
You can find multiple posts about processing guarantees in Kafka

Standalone Kafka Producer

I'm thinking about creating a stand alone Kafka producer that runs as a daemon and takes messages via a socket and send them reliable to Kafka.
But, I must not be the first one to think about this idea. The idea is to avoid writing a Kafka producer in for example PHP or Node but just deliver messages via a socket to a stand alone daemon from these languages that takes care of the delivery while the main applications keeps doing its thing.
This daemon should take care of retry delivery in case of outages and acts as a delivery point for all programs that run on the server.
Is this something that is a good idea, or is writing producers in every used language the common approach? That mmust not be the case right?
You should have a look at Kafka connectors.
Here is one of the them:
Kafka Connect Socket Source
Here you can find how to use it:
https://www.baeldung.com/kafka-connectors-guide
Sample Configuration connect-socket-source.properties:
name=socket-connector
connector.class=org.apache.kafka.connect.socket.SocketSourceConnector
tasks.max=1
topic=topic
schema.name=socketschema
port=12345
batch.size=100

getting "org.apache.kafka.common.network.InvalidReceiveException: Invalid receive (size = 1195725856 larger than 104857600)"

I have installed zookeeper and kafka,
first step :
running zookeeper by the following commands :
bin/zkServer.sh start
bin/zkCli.sh
second step :
running kafka server
bin/kafka-server-start.sh config/server.properties
kafka should run at localhost:9092
but I am getting the following error :
WARN Unexpected error from /0:0:0:0:0:0:0:1; closing connection (org.apache.kafka.common.network.Selector)
org.apache.kafka.common.network.InvalidReceiveException: Invalid receive (size = 1195725856 larger than 104857600)
I am following the following link :
Link1
Link2
I am new to kafka ,please help me to set it up.
1195725856 is GET[space] encoded as a big-endian, four-byte integer (see here for more information on how that works). This indicates that HTTP traffic is being sent to Kafka port 9092, but Kafka doesn't accept HTTP traffic, it only accepts its own protocol (which takes the first four bytes as the receive size, hence the error).
Since the error is received on startup, it is likely benign and may indicate a scanning service or similar on your network scanning ports with protocols that Kafka doesn't understand.
In order to find the cause, you can find where the HTTP traffic is coming from using tcpdump:
tcpdump -i any -w trap.pcap dst port 9092
# ...wait for logs to appear again, then ^C...
tcpdump -qX -r trap.pcap | less +/HEAD
Overall though, this is probably annoying but harmless. At least Kafka isn't actually allocating/dirtying the memory. :-)
Try to reset socket.request.max.bytes value in $KAFKA_HOME/config/server.properties file to more than your packet size and restart kafka server.
My initial guess would be that you might be trying to receive a request that is too large. The maximum size is the default size for socket.request.max.bytes, which is 100MB. So if you have a message which is bigger than 100MB try to increase the value of this variable under server.properties and make sure to restart the cluster before trying again.
If the above doesn't work, then most probably you are trying to connect to a non-SSL-listener.
If you are using the default broker of the port, you need to verify that :9092 is the SSL listener port on that broker.
For example,
listeners=SSL://:9092
advertised.listeners=SSL://:9092
inter.broker.listener.name=SSL
should do the trick for you (Make sure you restart Kafka after re-configuring these properties).
This is how I resolved this issue after installing a Kafka, ELK and Kafdrop set up:
First stop every application one by one that interfaces with Kakfa
to track down the offending service.
Resolve the issue with that application.
In my set up it was Metricbeats.
It was resolved by editing the Metricbeats kafka.yml settings file located in modules.d sub folder:
Ensuring the Kafka advertised.listener in server.properties was
referenced in the hosts property.
Uncomment the metricsets and client_id properties.
The resulting kafka.yml looks like:
# Module: kafka
# Docs: https://www.elastic.co/guide/en/beats/metricbeat/7.6/metricbeat-module-kafka.html
# Kafka metrics collected using the Kafka protocol
- module: kafka
metricsets:
- partition
- consumergroup
period: 10s
hosts: ["[your advertised.listener]:9092"]
client_id: metricbeat
The answer is most likely in one of the 2 areas
a. socket.request.max.bytes
b. you are using a non SSL end point to connect the producer and the consumer too.
Note: the port you run it really does not matter. Make sure if you have an ELB the ELB is returning all the healthchecks to be successful.
In my case i had an AWS ELB fronting KAFKA. I had specified the Listernet Protocol as TCP instead of Secure TCP. This caused the issue.
#listeners=PLAINTEXT://:9092
inter.broker.listener.name=INTERNAL
listeners=INTERNAL://:9093,EXTERNAL://:9092
advertised.listeners=EXTERNAL://<AWS-ELB>:9092,INTERNAL://<EC2-PRIVATE-DNS>:9093
listener.security.protocol.map=INTERNAL:SASL_PLAINTEXT,EXTERNAL:SASL_PLAINTEXT
sasl.enabled.mechanisms=PLAIN
sasl.mechanism.inter.broker.protocol=PLAIN
Here is a snippet of my producer.properties and consumer.properties for testing externally
bootstrap.servers=<AWS-ELB>:9092
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
In my case, some other application was already sending data to port 9092, hence the starting of server failed. Closing the application resolved this issue.
Please make sure that you use .security.protocol=plaintext or you have mismatch server security compared to the clients trying to connect.

Kafka producer produce data to topic from PORT

I'm new to Kafka.
I have a Linux machine in which port number 2552 getting data stream from external server.
I want to use Kafka producer to listen to that port and get the stream of data to a topic.
This is a complete hack, but would work for a sandbox example:
nc -l 2552 | ./bin/kafka-console-producer --broker-list localhost:9092 --topic test_topic
It uses netcat to listen on the TCP port, and pipe anything received to a Kafka topic.
A quick Google also turned up this https://github.com/dhanuka84/kafka-connect-tcp which looks to do a similar thing but more robustly, using the Kafka Connect API.
You don't say if the traffic on port 2552 is TCP or UDP but in general you can easily write a program that listens on that port, parses the data received into discrete messages, and then publishes the data to a Kafka Topic as Kafka messages (with or without keys) using the Kafka Producer API.
In some cases there is existing open source code that might already do this for you so you do not need to write it from scratch. If the port 2552 protocol is a well known protocol like for example the TCP or UDP call-logging protocol registered in IANA ( see ftp://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt) then there might even be an existing Kafka Connector or Proxy that supports it. Search on GitHub for kafka-connect-[protocol] or take a look at the curated Connector list at https://www.confluent.io/product/connectors/
There may even be a generic TCP or UDP connector that you can use as a reference to configure or build your own for the specific protocol you are trying to ingest.

Storm cluster not working in Production mode

I have a storm topology which in two nodes. One is the nimbus and the other is the supervisor.
A proxy which is not part of storm accepts an HTTP request from a client and passes it to the storm topology.
The topology is like this:
1. The proxy passes data to a storm spout.
2. The spout passes data to multiple bolts.
3. The result is passed back to the proxy by the last bolt.
I am running the proxy and passing data to storm. I am able to connect a socket to the listener at the topology side. The data emitted by the spout is shown to be 0 in the UI. The same topology works fine in a local mode.
Thought it was a problem with supervisor, but the supervisor seems to be running fine because I am able to see the supervisor description and the individual spouts and bolts. But none of them emit anything.
Now, I am confused if the problem is the data being passed to the wrong machine or something. In order to communicate to the spout, Im creating the socket from the proxy as follows:
InetAddress stormInetAddr=InetAddress.getByName("198.18.17.16");
int stormPort=4321;
Socket stormSocket=new Socket(stormInetAddr,stormPort);
Here 198.18.17.16 is the nimbus IP. And 4321 is the port where data is being expected.
I tried giving the supervisor IP here, and it didnt connect. However, this does.
Now the proxy waits for the output on a specific port.
On the other side, after processing, data is read from the bolt. And there seems to be no activity from the cluster. But, I am getting a response which is basically the same request I had sent with some jumbled up data. And this response is supposed to be sent by the last bolt to a specific port which I had defined. And I GET data back, but the cluster shows NO ACTIVITY. I know this is very vague, but, does anyone have any idea as to whats happening?
It sounds like Storm is working fine, but your proxy/network settings are not. If it were a storm error, you should see exceptions in Nimbus UI and/or in the Storm supervisor logs.
Consider temporarily shutting down storm and use nc -l 4321 on the supervisor machines to assert your proxy is working as expected.
However...
You may have a fundamental flaw in your model. Storm's spouts are pull-based, so it seems odd to have incoming requests pushed to them. This is possible, of course, if you have your spouts start listening when they spin up and simply queue the requests. However, this presents another challenge for your model: you will likely have multiple spouts running on a single machine and they cannot share the same port (4321).
If you want to meld these two world of push & pull; then consider using a Kafka Spout.