I'm testing the async send() in my kafka producer.
The cluster I want to connect to is offline.
My assumption would be that I send 10000 individual requests (lenght of listToSend) quickly.
Next the timeout (60s) would kick in and after 60 seconds I would see the callbacks hit me with logger.error(s"failed to send record ${x._2}", e)
However it seems to take forever for the method to finish.
That's why I added in the logger.debug("test: am I sending data") line.
It prints, then nothing happens for 60 seconds. I see the failed callback for the 1st record. And only then will it move on.
Is this normal behavior or am I missing something fundamental?
listToSend.foreach { x =>
logger.debug("test: am I sending data")
// note: I added this 'val future =' in an attempt to fix this, to no avail
val future = producer.send(new ProducerRecord[String, String](topic, x._2), new Callback {
override def onCompletion(metadata: RecordMetadata, e: Exception) {
if (e != null) {
//todo: handle failed sends, timeouts, ...
logger.error(s"failed to send record ${x._2}", e)
}
else { //nice to have: implement logic here, or call another method to process metadata
logger.debug("~Callback success~")
}
}
}
)
}
note: I do not want to block this code, I want to keep it async. However it seems to be blocking on the send() regardless.
The parallelism I never figured out completely.
However it seems like my topic name (I had named it '[projectname here]_connection') was the issue.
Even though I didn't know of any reserved keywords in topic names, this behavior popped up.
Some further experimenting also brought up that a topic name with a trailing space can also cause this behavior. The producer will try to send it to this topic, but the Kafka cluster doesn't seem to know how to deal with it, causing these timeouts.
So for all of you who come across this issue, check/change your topic name before proceeding your troubleshooting.
Related
I have a load test where three sets of users create something and a different set of users perform some actions on them.
What is the recommended way to co-ordinate this behaviour in Gatling?
I'm currently using an object which contains a LinkedBlockingQueue which the "producers" put the ID and consumers take, see below.
However, it causes the test to hang after ~20s (targeting 1tps).
I've also tried using poll with a timeout, but instead of hanging the poll almost always fails (after 30s) or causes a hang if the timeout is larger (1m+).
This seems to be because all the threads are blocked waiting for something from the queue so isn't compatible with the way Gatling tests run (i.e. not 1 thread per user). Is there a non-blocking way to wait in the Gatling DSL?
Producer.scala
// ...
scenario("Produce stuff")
.exec(/* HTTP call which extracts an ID*/)
.exec(session => Queue.ids.put(session("my-id").as[String])
// ...
Consumer.scala
// ...
scenario("Consume stuff")
.exec(session => session.set("my-id", Queue.ids.take()))
.exec(/* HTTP call which users ID*/)
// ...
Queue.scala
object Queue {
val ids = new LinkedBlockingQueue[String]()
}
As an alternative I've tried to use the application functionality but it seems a harder problem to ensure that each user picks a unique item from the app.
Acknowledging this is all a hack, my current solution in Consumer.scala is:
doIf(_ => Queue.ids.size() < MIN_COUNT)(
pause(30) // wait for 30s if queue is initially too small
)
.doWhile(_ => Queue.ids.size() >= MIN_COUNT)(
exec(session => session.set("my-id", Queue.ids.take()))
.exec(...)
.pause(30)
)
I need to catch the exceptions in case of Async send to Kafka. The Kafka producer Api comes with a fuction send(ProducerRecord record, Callback callback). But when I tested this against following two scenarios :
Kafka Broker Down
Topic not pre created
The callbacks are not getting called. Rather I am getting warning in the code for unsuccessful send (as shown below).
Questions :
So are the callbacks called only for specific exceptions ?
When does Kafka Client try to connect to Kafka broker while async send : on every batch send or periodically ?
Kafka Warning Image
Note : I am also using linger.ms setting of 25 sec to batch send my records.
public class ProducerDemo {
static KafkaProducer<String, String> producer;
public static void main(String[] args) throws IOException {
final Logger logger = LoggerFactory.getLogger(ProducerDemo.class);
Properties properties = new Properties();
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.ACKS_CONFIG, "1");
properties.setProperty(ProducerConfig.LINGER_MS_CONFIG, "30000");
producer = new KafkaProducer<String, String>(properties);
String topic = "first_topic";
for (int i = 0; i < 5; i++) {
String value = "hello world " + Integer.toString(i);
String key = "id_" + Integer.toString(i);
ProducerRecord<String, String> record = new ProducerRecord<String, String>(topic, key, value);
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
//execute everytime a record is successfully sent or exception is thrown
if(e == null){
// No Exception
}else{
//Exception Handling
}
}
});
}
producer.close();
}
You will get those warning for non-existing topic as a resilience mechanism provided with KafkaProducer. If you wait a bit longer(should be 60 seconds by default), the callback will be called eventually:
Here's my snippet:
So, when something goes wrong and async send is not successful, it will eventually fail with a failed future or/and a callback with exception.
If you are not running it transactionally, it can still mean that some messages from the batch have found their way to the broker, while others haven't.
It will most certainly be a problem if you need a blocking-style acknowledgement to the upstream system(like http ingestion interface, etc.) per every message that is sent to Kafka. The only way to do that is by blocking every message with the future's get, as described in the documentation:
In general, I've noticed a lot of question related to KafkaProducer delivery semantics and guarantees. It can definitely be documented better.
One more thing, since you mentioned linger.ms:
Note that records that arrive close together in time will generally
batch together even with linger.ms=0 so under heavy load batching will
occur regardless of the linger configuration
For the first question, here is the answer.
As per the apache kafka documentation, you can capture below exceptions using onCompletion method when you are implementing Callback interface
https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/producer/Callback.html
For the second question, the combination of below properties control when to send the records and as far as i understand, it's same for synchronous or asynchronous call.
linger.ms
max.block.ms
https://kafka.apache.org/documentation/#linger.ms
So are the callbacks called only for specific exceptions ?
Yes, that's how it works. From documentation (2.5.0):
* Fully non-blocking usage can make use of the {#link Callback} parameter to provide a callback that
* will be invoked when the request is complete.
Notice the important part: when the request is complete, what means that the producer must have accepted the record and sent the ProduceRequest to Kafka Broker. Without digging too deep into internals, this means that broker metadata must be present and the partition must exist.
When it comes to formal specification, you'd need to take a good look at send()'s Javadoc and possibly at KafkaProducer's implementation of doSend method. Out there you're going to see that multiple exceptions can be thrown at the in submitting call (instead of returning a future and invoking callback), e.g. :
if broker metadata is not available in timeout given,
if data could not be serialized,
if serialized form was too large, etc.
In my project, I created UserRepositoryActor which create their own router with 10 UserRepositoryWorkerActor instances as routee, see hierarchy below:
As you see, if any error occur while fetching data from database, it will occur at worker.
Once I want to fetch user from database, I send message to UserRepositoryActor with this command:
val resultFuture = userRepository ? FindUserById(1)
and I set 10 seconds for timeout.
In case of network connection has problem, UserRepositoryWorkerActor immediately get ConnectionException from underlying database driver and then (what I think) router will restart current worker and send FindUserById(1) command to other worker that available and resultFuture will get AskTimeoutException after 10 seconds passed. Then some time later, once connection back to normal, UserRepositoryWorkerActor successfully fetch data from database and then try to send result back to the caller and found that resultFuture was timed out.
I want to propagate error from UserRepositoryWorkerActor up to the caller immediately after exception occur, so that will prevent resultFuture to wait for 10 seconds and stop UserRepositoryWorkerActor to try to fetch data again and again.
How can I do that?
By the way, if you have any suggestions to my current design, please suggest me. I'm very new to Akka.
Your assumption about Router resending the message is wrong. Router has already passed the message to routee and it doesnt have it any more.
As far as ConnectionException is concerned, you could wrap in a scala.util.Try and send response to sender(). Something like,
Try(SomeDAO.getSomeObjectById(id)) match {
case Success(s) => sender() ! s
case Failure(e) => sender() ! e
}
You design looks correct. Having a router allows you to distribute work and also to limit number of concurrent workers accessing the database.
Option 1
You can make your router watch its children and act accordingly when they are terminated. For example (taken from here):
import akka.routing.{ ActorRefRoutee, RoundRobinRoutingLogic, Router }
class Master extends Actor {
var router = {
val routees = Vector.fill(5) {
val r = context.actorOf(Props[Worker])
context watch r
ActorRefRoutee(r)
}
Router(RoundRobinRoutingLogic(), routees)
}
def receive = {
case w: Work =>
router.route(w, sender())
case Terminated(a) =>
router = router.removeRoutee(a)
val r = context.actorOf(Props[Worker])
context watch r
router = router.addRoutee(r)
}
}
In your case you can send some sort of a failed message from the repository actor to the client. Repository actor can maintain a map of worker ref to request id to know which request failed when worker terminates. It can also record the time between the start of the request and actor termination to decide whether it's worth retrying it with another worker.
Option 2
Simply catch all non-fatal exceptions in your worker actor and reply with appropriate success/failed messages. This is much simpler but you might still want to restart the worker to make sure it's in a good state.
p.s. Router will not restart failed workers, neither it will try to resend messages to them by default. You can take a look at supervisor strategy and Option 1 above on how to achieve that.
In Java, let's say I have a transaction that once it is committed, I want to do another action, in this case, send an email.
Therefore,
try {
transaction.begin()
...
transaction.commit()
// point 1
Utility.sendEmail(...)
} catch (Exception e) {
transaction.rollback();
}
Is it possible for the thread to die/get killed at point 1 and not send an email ?
Any way to minimize this ? JTA + JMS perhaps, where the action of sending a message is part of a transaction ?
I'm investigating an issue and exploring whether this is possible. JVM is still alive (no OOM). I do not know the inner working of the app server so not sure if this is possible.
I can't say for sure if the rollback() in the catch clause has any effect if the commit() was OK and sendEmail() threw an exception. The quickest way to test this is to throw an exception from the sendEmail() method and see if the transaction was actually committed.
The way I would put it though, is to move the sendEmail() call away from your try block:
try {
transaction.begin()
...
transaction.commit()
} catch (Exception e) {
transaction.rollback();
}
try {
// point 1
Utility.sendEmail(...)
} catch (Exception e) {
// handle it
}
This way you can control what will happen if a rollback was made.
Also, I think that sending the email to a JMS queue is in most cases a good idea. Doing it like that will give your DB code permission to continue and supposedly give feedback to your user that everything went OK and the email will be sent whenever it fits the email controller's schedule. For example, there might be a connection issue with your email server and the email sending would hang for say 30 seconds before throwing an exception and your user would see this as a very long button click.
I am new to scala and actors. I need to implement such hypothetical situation:
Server wait for messages, if it does not get any in say 10s period of time, it sends message to the Client. Otherwise it receives messages incoming. If it is inside processing some message and another message comes, it needs to be queued (I suppose that is done automatically by scala actors).
The second problem I am encountering is Sleeping. I need the actor to sleep for some constant period of time when it receives the message. But on the other hand I can't block, as I want incoming messages to be queued for further processing.
How about this?
loop {
reactWithin(10000) {
case TIMEOUT => // send message to client
case work => // do work
}
}
Daniel has provided a better answer to the no-input condition part of the question. So I've edited out my inferior solution.
As to the delayed response part of the question, the message queue doesn't block while an actor sleeps. It can just sleep and messages will still accumulate.
However, if you want a fixed delay from when you receive a message to when you process it, you can, for example, create an actor that works immediately but wraps the message in a request for a delay:
case class Delay(when: Long, what: Any) { }
// Inside class DelayingActor(workingActor: Actor)
case msg => workingActor ! Delay(delayValue + System.currentTimeMillis , msg)
Then, the working actor would
case Delay(t,msg) =>
val t0 = System.currentTimeMillis
if (t>t0) Thread.sleep( t - t0 )
msg match {
// Handle message
}