I'm trying to write a simple Avro message producer for Kafka, using Scala.
The problem I am having is that the sending is very slow.
I am doing this:
val message: GenericRecord = getRandomMessage()
val serializedMessage: Array[Byte] = serializeMessage(message)
val queueMessage = new ProducerRecord[String, Array[Byte]](topic, message.get("id").toString, serializedMessage)
producer.send(queueMessage)
println("Sent Message: "+ message)
Both when deploying to my cluster, as well as when running from my IDE it is extremely slow to send the messages.
From what I read the message should be asynchronous and quicker than this.
Is there something obvious that I'm missing?
Thanks!
Are you sure that the message has been sent?
It seems that the send failed (or ran in a timeout) and then you print the message, which is not a guarantee of a successful sending of course.
To verify this, try to wait on the send and print the result:
println(producer.send(queueMessage).get)
Related
I'm working with a Camel flow that uses a Netty TCP socket consumer to receive messages from a client program (which is outside of my control). The client should be opening a socket, sending us one message, then closing the socket, but we've been seeing cases where instead of one message Camel is "splitting" the text stream into two parts and trying to process them separately.
So I'm trying to figure out, since you can re-use the same socket for multiple Camel messages, but TCP sockets don't have a built-in concept of "frames" or a standard for message delimiters, how does Camel decide that a complete message has been received and is ready to process? I haven't been able to find a documented answer to this in the Netty component docs (https://camel.apache.org/components/3.15.x/netty-component.html), although maybe I'm missing something.
From playing around with a test script, it seems like one answer is "Camel assumes a message is complete and should be processed if it goes more than 1ms without receiving any input on the socket". Is this a correct statement, and if so is this behavior documented anywhere? Is there any way to change or configure this behavior? Really what I would prefer is for Camel to wait for an ETX character (or a much longer timeout) before processing a message, is that possible to set up?
Here's my test setup:
Camel flow:
from("netty:tcp://localhost:3003")
.log("Received: ${body}");
Python snippet:
DELAY_MS = 3
def send_msg(sock, msg):
print("Sending message: <{}>".format(msg))
if not sock.sendall(msg.encode()) is None:
print("Message failed to send")
time.sleep(DELAY_MS / 1000.0)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
print("Using DELAY_MS: {}".format(str(DELAY_MS)))
s.connect((args.hostname, args.port))
cutoff = int(math.floor(len(args.msg) / 2))
msg1 = args.msg[:cutoff]
send_msg(s, msg1)
msg2 = args.msg[cutoff:]
send_msg(s, msg2)
response = s.recv(1024)
except Exception as e:
print(e)
finally:
s.close()
I can see that with DELAY_MS=1 Camel logs one single message:
2022-02-21 16:54:40.689 INFO 19429 --- [erExecutorGroup] route1 : Received: a long string sent over the socket
But with DELAY_MS=2 it logs two separate messages:
2022-02-21 16:56:12.899 INFO 19429 --- [erExecutorGroup] route1 : Received: a long string sen
2022-02-21 16:56:12.899 INFO 19429 --- [erExecutorGroup] route1 : Received: t over the socket
After doing some more research, it seems like what I need to do is add a delimiter-based FrameDecoder to the decoders list.
Setting it up like this:
from("netty:tcp://localhost:3003?sync=true"
+ "&decoders=#frameDecoder,#stringDecoder"
+ "&encoders=#stringEncoder")
where frameDecoder is provided by
#Bean
ChannelHandlerFactory frameDecoder() {
ByteBuf[] ETX_DELIM = new ByteBuf[] { Unpooled.wrappedBuffer(new byte[] { (char)3 }) };
return ChannelHandlerFactories.newDelimiterBasedFrameDecoder(1024, ETX_DELIM,
false, "tcp");
}
seems to do the trick.
On the flip side though, it seems like this will hang indefinitely (or until lower-level TCP timeouts kick in?) if an ETX frame is not received, and I can't figure out any way to set a timeout on the decoder, so would still be eager for input if anyone knows how to do that.
I think the default "timeout" behavior I was seeing might've just been an artifact of Netty's read loop speed -- How does netty determine when a read is complete?
I'm trying to create a SCDF Task to handle errors but I can't figure out how to get full kafka message with payload and headers.
The idea is to route the messages to a DLQ in my streams when a service is not responding. For example some HTTP service is down and the httclient app is failing.
When the HTTP service is back up, I would like to run a task which take the messages in the DLQ and resend them to the proper Kafka topic, no matter what the message is.
I'm trying to make a generic task so the DLQ and target topic are Kafka consumer and producer properties.
And I would like to use generic org.springframework.messaging.Message too.
When I'm using KafkaItemReader<String, String> and KafkaItemWriter<String, String> and it works fine with only the payload as String but all headers are lost. When I use KafkaItemReader<String, Message<?>> and KafkaItemWriter<String, Message<?>> to also get headers, I have a ClassCastException: java.lang.String cannot be cast to org.springframework.messaging.Message
2020-11-13T14:27:03.472446462+01:00 stdout F java.lang.ClassCastException: java.lang.String cannot be cast to org.springframework.messaging.Message
2020-11-13T14:27:03.472450493+01:00 stdout F at org.springframework.batch.core.step.item.SimpleChunkProcessor.doProcess(SimpleChunkProcessor.java:134) ~[spring-batch-core-4.2.4.RELEASE.jar:4.2.4.RELEASE]
2020-11-13T14:27:03.47245463+01:00 stdout F at org.springframework.batch.core.step.item.SimpleChunkProcessor.transform(SimpleChunkProcessor.java:319) ~[spring-batch-core-4.2.4.RELEASE.jar:4.2.4.RELEASE]
2020-11-13T14:27:03.472457814+01:00 stdout F at org.springframework.batch.core.step.item.SimpleChunkProcessor.process(SimpleChunkProcessor.java:210) ~[spring-batch-core-4.2.4.RELEASE.jar:4.2.4.RELEASE]
2020-11-13T14:27:03.472460712+01:00 stdout F at org.springframework.batch.core.step.item.ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:77) ~[spring-batch-core-4.2.4.RELEASE.jar:4.2.4.RELEASE]
2020-11-13T14:27:03.472463956+01:00 stdout F at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:407) ~[spring-batch-core-4.2.4.RELEASE.jar:4.2.4.RELEASE]
2020-11-13T14:27:03.472468765+01:00 stdout F at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:331) ~[spring-batch-core-4.2.4.RELEASE.jar:4.2.4.RELEASE]
Is there a way to do this ?
In fact it seems that there is no way to get message headers with KafkaItemReader and KafkaItemWriter. Serializer/deserializer are used for key and payload but I can't find a way to get headers.
I solved this issue by using a Tasklet instead of KafkaItemReader and KafkaItemWriter. In my Tasklet, I use KafkaConsumer and KafkaProducer to deal with ConsumerRecord and ProducerRecord which allow me to copy headers.
Moreover I can handle commit more properly (no auto commit) : consumer offsets are committed only if the messages are sent by the producer.
I'm using Kafka 0.11.0.0. I have a test program that publishes to a Kafka topic; if the zookeeper and Kafka servers are down (which is normal in my development environment; I bring them up as needed) then the call to KafkaProducer<>.send() hangs indefinitely.
I either need to have send() return, preferably indicating the error; or I need a way to check whether the servers are up or down. Basically, I want my test tool to be able tell me, "Hey, dummy, start up Kafka!" instead of hanging.
Is there a way for my producer task to determine whether the servers are up or down?
I'm calling the send() like this:
kafkaProducer.send(new ProducerRecord<>(KAFKA_TOPIC, KAFKA_KEY,
message), (rm, ex) -> {
System.out.println("**** " + rm + "\n**** " +ex);
});
I have linger.ms = 1; I've tried retries=0, 1, and 2, and send() still blocks. I've never seen the callback called.
Older messages suggest setting metadata.fetch.timeout.ms to a small value, but that's gone in 0.11. Others suggest calling command line utilities to see if the servers are OK...but the referenced utilities also seem to be gone.
What's the graceful way to get this done?
We can send messages to broker in three ways :
Fire-and-forget :
We send a message to the server and don’t really care if it arrives successfully or not. Most of the time, it will arrive successfully, since Kafka is highly available and the producer will retry sending messages automatically. However, some messages will get lost using this method.
Asynchronous send
We call the send() method with a callback function, which gets triggered when it receives a response from the Kafka broker.
Synchronous send
We send a message, the send() method returns a Future object, and we use get() to wait on the future and see if the send() was successful or not.
The simplest way to send a message synchronously is as follows:
ProducerRecord<String, String> record =
new ProducerRecord<>(KAFKA_TOPIC, KEY, message);
try {
producer.send(record).get();
} catch (Exception e) {
e.printStackTrace();
}
Here, we are using Future.get() to wait for a reply from Kafka. This method will throw an exception if the record is not sent successfully to Kafka. If there were no errors, we will get a RecordMetadata object that we can use to retrieve the offset the message was written to.
hope this helps.
That is strange. It should return with an error saying either "Failed to update metadata" or "Expiring x number of records".
Check request.timeout.ms and max.block.ms setting for your producer. By default request.timeout.ms is 60 seconds long
I having the following code to create a client socket to send/receive data:
val socket:Socket = new Socket(InetAddress.getByName("127.0.0.1"), 7777)
val inputStream = socket.getInputStream()
val bufferSource = new BufferedSource(inputStream)
val out = new PrintStream(socket.getOutputStream())
var data = "Hello Everyone"
out.println(data)
out.flush()
***socket.shutdownOutput()***
val in = bufferSource.getLines()
if (in.hasNext) {
println(in.next())
}
If I don't run socket.shutdownOutput(), I won't get the data from server,
because Server side is still waiting the input. Therefore I have to shutdown the outputStream.
But if shutdown the output, it can not be reopen. So I have to create a new socket for sending new data.
That caused sending one record needs to create a new socket. This is really awkward.
Is there any other way to tell the server that the output already finished without shutting down the output please.
Thanks in advance!
The problem is that the server doesn't know when to stop reading and process and reply.
What you need here is an application-level protocol that would dictate how server and clients are to communicate - what is a command, how a response to be formatted, etc.
This could be a line-oriented protocol - each new line represents a message (in general the message delimiter could be any other character sequence not appearing in the messages).
Or it could be fixed length messages; or messages pre-pended with message length (or type) to let the other side know how much data yo expect.
We are using hornetq-core 2.2.21.Final stand-alone after reading a non-transnational message , the message still remains in queue although it acknowledge
session is created using
sessionFactory.createSession(true, true, 0)
locator setting:
val transConf = new TransportConfiguration(classOf[NettyConnectorFactory].getName,map)
val locator = HornetQClient.createServerLocatorWithoutHA(transConf)
locator.setBlockOnDurableSend(false)
locator.setBlockOnNonDurableSend(false)
locator.setAckBatchSize(0) // also tried without this setting
locator.setConsumerWindowSize(0)// also tried without this setting
Message is acknowledge using message.acknowledge ()
I think that the problem might be two queues on the same address
also tried to set the message expiration but it didn't help , messages are still piling up in the queue
please advise
It seems you are using the core api. Are you explicitly calling acknowledge on the messages?
If you have two queues on the same address ack will only ack the messages on the queue you are consuming. On that case the system is acting normally.