How to skip retry in vertx circuit breaker based on condition - vert.x

I am currently using Vertx CircuitBreaker to attempt retry on requesting event bus. Based on the ReplyException.ReplyFailure I want to skip or avoid retries.
For example I don't want retry when the event bus responds with ReplyFailure.RECIPIENT_FAILURE, because this kind of error are not application error instead logic validation failure.
Below is my code,
CircuitBreaker breaker = CircuitBreaker.create("my-circuit-breaker", vertx,
new CircuitBreakerOptions()
.setMaxFailures(2)
.setTimeout(5 * 1000)
.setFallbackOnFailure(true)
.setResetTimeout(5000)
.setMaxRetries(2)
).retryPolicy(retryCount -> retryCount * 5000L);
Using circuit breaker as below,
breaker.execute(promise -> {
vertx.eventBus().<JsonObject>request(address, jsonObject, deliveryOptions)
.onComplete(event -> {
Based on the event I wan't circuit breaker to give up on retry. I tried checking event status and then called promise.complete method with the exception, though circuit breaker identifies that as failure and starts retrying. How can I tell circuit breaker to stop retrying?
I think the circuit breaker object could have
A method to shutdown it
A method to stop retry
A method to increment failure
Update 1 ---------------------
I have implemented the retry part on my own and for timing out the operation I can set timer and check the status of the operation, if the operation is not completed I can start retry logic.
But I don't have clues on how can I cancel the operation before retrying the next one, for example how can I cancel the event bus request?
Update 2 -----------------
Vertx circuit breaker too is not handling the timed out operations. I can see older response from event bus while retry is in progress. This might confuse application logic if not handled properly.

This is not currently supported by the Vert.x Circuit Breaker API. There is an open issue about it here. There's also a PR that attempts to fix it, but it has been sitting around for a year now. Not sure why it never got reviewed.

Related

How to handle client response during Transient Exception retrying?

Context
I'm developing a REST API that, as you might expect, is backed by multiple external cross-network services, APIs, and databases. It's very possible that a transient failure is encountered at any point and for which the operation should be retried. My question is, during that retry operation, how should my API respond to the client?
Suppose a client is POSTing a resource, and my server encounters a transient exception when attempting to write to the database. Using a combination of the Retry Pattern perhaps with the Circuit Breaker Pattern, my server-side code should attempt to retry the operation, following randomized linear/exponential back-off implementations. The client would obviously be left waiting during that time, which is not something we want.
Questions
Where does the client fit into the retry operation?
Should I perhaps provide an isTransient: true indicator in the JSON response and leave the client to retry?
Should I leave retrying to the server and respond with a message and status code indicative that the server is actively retrying the request and then have the client poll for updates? How would you determine the polling interval in that case without overloading the server? Or, should the server respond via a web socket instead so the client need not poll?
What happens if there is an unexpected server crash during the retry operation? Obviously, when the server recovers, it won't "remember" the fact that it was retrying an operation unless that fact was persisted somewhere. I suppose that's a non-critical issue that would just cause further unnecessary complexity if I attempted to solve it.
I'm probably over-thinking the issue, but while there is a lot of documentation about implementing transient exception retry logic, seldom have I come across resources that discuss how to leave the client "pending" during that time.
Note: I realize that similar questions have been asked, but my queries are more specific, for I'm specifically interested in the different options for where the client fits into a given retry operation, how the client should react in those cases, and what happens should a crash occur that interrupts a retry sequence.
Thank you very much.
There are some rules for retry:
always create an idempotency key to understand that there is retry operation.
if your operation a complex and you want to wrap rest call with retry, you must ensure that for duplicate requests no side effects will be done(start from failure point and don't execute success code).
Personally, I think the client should not know that you retry something, and of course, isTransient: true should not be as a part of the resource.
Warning: Before add retry policy to something you must check side effects, put retry policy everywhere is bad practice

How to do Kafka stream transformations (map / flatMap) taking into account values in a Key/Value store?

My task is the following:
I am monitoring time synchronization events from a third-party measuring device. This time synchronization is a bit flaky so I want to detect when synchronization stops and issue an alarm.
For this, I am producing the synchronization events to a Kafka topic. I have three different events going on:
Synchronization request
Synchronization successful
Synchronization failed because other device did not respond
So, what I want to do:
When request is received, and nothing is received after a certain amount of time, I want to issue a "timeout" alarm
When request is received, and within the timeout period, a success event arrives, I want to issue a "timeout" if no request arrives after the timeout time
When a failure event arrives, I want to issue the "other device did not respond" alarm
I am currently in the process of setting up a Kafka-Streams application, and I need to store the state in case this application crashes (should not, but I want to be sure), so I set this up the following:
val builder = new StreamsBuilder
val storeBuilder = Stores.
keyValueStoreBuilder(Stores.persistentKeyValueStore("timesync-alarms"),
Serdes.String(),
logEntrySerde)
builder.addStateStore(storeBuilder)
val eventStream = builder.stream(sourceTopic, Consumed.`with`(Serdes.String(), logEntrySerde))
Now, I am stuck. What I basically think I need to do have a flatMap function on the eventStream, that, whenever an event arrives:
Queries the store for the last event that was processed
Decides if an alarm is to be raised
Updates the store with the currently-received event
Produces the alarm, if any
So, how do I achieve steps 1 and 3 here? Or am I conceptually wrong and have to do it differently?
I think you don't need to use state store directly. You can create two streams - one with sync request events, the second one with sync responses (success, fail) and join them:
requestStream.outerJoin(responseStream, (leftVal, rightVal) -> ...,
JoinWindows.of(timeout), ...);
In the case of timeout rightVal is null.
If you want to send alarms to a separate topic, you can simply filter the joined stream and write all failures (error responses and timeouts) to the topic. Otherwise you can use peek() method and trigger some action inside. Here is a simple example: https://github.com/djarza/football-events/blob/master/football-ui/src/main/java/org/djar/football/ui/projection/StatisticsPublisher.java

Retry logic in kafka consumer

I have a use case where i consume certain logs from a queue and hit some third party API's with some info from that log , in case the third party system is not responding properly i wish to implement a retry logic for that particular log .
I can add a time field and repush the message to the same queue and this message will get again consumed if its time field is valid i.e less than current time and if not then get pushed again into queue.
But this logic will add same log again and again until retry time is correct and the queue will grow unnecessarily.
Is there is better way to implement retry logic in Kafka ?
You can create several retry topics and push failed task there. For instance you can create 3 topics with different delays in mins and rotate the single failed task till the max attempt limit reached.
‘retry_5m_topic’ — for retry in 5 minutes
‘retry_30m_topic’ — for retry in 30 minutes
‘retry_1h_topic’ — for retry in 1 hour
See more for details: https://blog.pragmatists.com/retrying-consumer-architecture-in-the-apache-kafka-939ac4cb851a
In consumer, if it throws an exception, produce another message with attempt number 1. so next time when it is consumed, it has the property of attempt no 1. Handle it in the producer that, if it attempts more than your retry count, then stop producing it.
Yes, this could be one straight solution that I also thought of. But with this, we will end up in creating many topics as it is possible that message processing will fail again.
I solved this problem by mapping this use case to Rabbit MQ. In rabbit MQ we have the concept of retry exchange where if a message processing fails from an exchange then u can send it to a retry exchange with a TTL. Once TTL gets expired the message will move back to the main exchange and is ready to be processed again.
I can post some examples explaining how we can implement an exponential backoff message processing using Rabbit MQ.

Timeout configurations in Curator

I create a Curator client as follows:
RetryPolicy retryPolicy = new RetryNTimes(3, 1000);
CuratorFramework client = CuratorFrameworkFactory.newClient(zkConnectString,
15000, // sessionTimeoutMs
15000, // connectionTimeoutMs
retryPolicy);
When running my client program I simulate a network partition by bringing down the NIC that Curator is using to communicate with Zookeeper. I have a few questions based on the behavior that I am seeing:
I see a ConnectionStateManager - State change: SUSPENDED message after 10 seconds. Is the amount of time until Curator enters the SUSPENDED state configurable, based on a percentage of the other timeout values, or always 10 seconds?
I do not receive any notification after the configured 15-second session timeout has passed since the last successful heartbeat. I do see a ZooKeeper - Session: 0x14adf3f01ef0001 closed message in the log, however this does not appear to trickle up as an event that I can capture or listen on. Am I missing something here?
I eventually receive a ConnectionStateManager - State change: LOST message almost two minutes after the connection loss. Why so long?
If my goal is to use an InterProcessMutex as a means of preventing split-brain in an HA scenario, it seems that the safest approach is for the lock holder to assume that it has lost the lock when the SUSPENDED message is received, since it is entirely possible that Zookeeper has released the lock
unbeknownst to it on the other side of the network partition. Is this a typical/sane approach?
It depends which version of Curator you're using (note: I'm the main author of Curator)...
In Curator 2.x, the LOST state means that a retry policy has been exhausted. It does not mean that the Session has been lost. In ZooKeeper the session is only determined to be lost once the connection to the ensemble is repaired. So, you get SUSPENDED when Curator sees the first "Disconnected" message. Then, when an operation fails due to the retry policy giving up you get LOST.
In Curator 3.x the meaning of LOST was changed. In 3.x when the "Disconnected" is received Curator starts an internal timer. When the timer passes the negotiated session timeout Curator calls getTestable().injectSessionExpiration() and posts a LOST state change.
Correct. Assume leadership has been lost on SUSPEND and LOST.
This is the way the Apache Curator recipes work.
You may want to use Apache Curator rather than implementing your own algorithm.
https://curator.apache.org/curator-recipes/index.html
The first question, Zookeeper has a variable called MAX_SEND_PING_INTERVAL which is 10 seconds, so It's always 10 seconds on your condition.The code is in the ClientCnxn class.
//1000(1 second) is to prevent race condition missing to send the second ping
//also make sure not to send too many pings when readTimeout is small
int timeToNextPing = readTimeout / 2 - clientCnxnSocket.getIdleSend() -
((clientCnxnSocket.getIdleSend() > 1000) ? 1000 : 0);
//send a ping request either time is due or no packet sent out within MAX_SEND_PING_INTERVAL
if (timeToNextPing <= 0 || clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL) {
sendPing();
clientCnxnSocket.updateLastSend();
} else {
if (timeToNextPing < to) {
to = timeToNextPing;
}
}

Gatling synchronous Http request/response chain

I have implemented a chain of executions and each execution will send a HTTP request to the server and does check if the response status is 2XX. I need to implement a synchronous model in which the next execution in the chain should only get triggered when the previous execution is successful i.e response status is 2xx.
Below is the snapshot of the execution chain.
feed(postcodeFeeder).
exec(Seq(LocateStock.locateStockExecution, ReserveStock.reserveStockExecution, CancelOrder.cancelStockExecution,
ReserveStock.reserveStockExecution, ConfirmOrder.confirmStockExecution, CancelOrder.cancelStockExecution)
Since gatling has asynchronous IO model, what am currently observing is the HTTP requests are sent to the server in an asynchronous manner by a number of users and there is no real dependency between the executions with respect to a single user.
Also I wanted to know for an actor/user if an execution in a chain fails due the check, does it not proceed with the next execution in the chain?
there is no real dependency between the executions with respect to a single user
No, you are wrong. Except when using "resources", requests are sequential for a given user. If you want to stop the flow for a given user when it encounters an error, you can use exitblockonfail.
Gatling does not consider the failure response from the previous request before firing next in chain. You may need to cover the entire block with exitBlockOnFail{} to block the gatling to fire next.