How to evict instance immediately from local registry cache(client side) when remote service is down?

How to evict instance immediately from local registry cache(client side) when remote service is down? - spring-cloud

When a service deregisters itself, Eureka server can sense it immediately, but the clients cannot due to the multi level caches(eureka client cache, ribbon cache).
Can I manually remove the deregistered instance instead of waiting for cache update? This is important because some requests will fail due to the latency of the offline instance removal.

Related

Why am I experiencing endless connection timeouts using quarkus microprofile reactive rest client

At some point of my quarkus app life (under kubernetes) it begins getting endless connection timeouts from multiple different hosts (timeout configured to be 1 second). As of this point the app never recovers until I restart the k8s pod.
These endless connection timeouts are not due to the hosts since other apps in the cluster do not suffer from this, also a restart of my app fixes the problem.
I am declaring multiple hosts(base-uri) through the quarkus application.properties. (maybe its using a single vertx/netty event-loop and it's wrong?)

Stateless Worker service in Service Fabric restarted in the same process

I have a stateless service that pulls messages from an Azure queue and processes them. The service also starts some threads in charge of cleanup operations. We recently ran into an issue where these threads which ideally should have been killed when the service shuts down continue to remain active (definitely a bug in our service shutdown process).
Further looking at logs, it seemed that, the RunAsync methods cancellation token received a cancellation request, and later within the same process a new instance of the stateless service that was registered in ServiceRuntime.RegisterServiceAsync was created.
Is this expected behavior that service fabric can re-use the same process to start a new instance of the stateless service after shutting down the current instance.
The service fabric documentation https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-hosting-model does seem to suggest that different services can be hosted on the same process, but I could not find the above behavior being mentioned there.

In the shared process model, there's one process per ServicePackage instance on every node. Adding or replacing services will reuse the existing process.
This is done to save some (process-level) overhead on the node that runs the service, so hosting is more cost-efficient. Also, it enables port sharing for services.
You can change this (default) behavior, by configuring the 'exclusive process' mode in the application manifest, so every replica/instance will run in a separate process.
<Service Name="VotingWeb" ServicePackageActivationMode="ExclusiveProcess">
As you mentioned, you can monitor the CancellationToken to abort the separate worker threads, so they can stop when the service stops.
More info about the app model here and configuring process activation mode.

What happens when Eureka instance skips a heartbeat against a Eureka server with self preservation turned off?

Consider this set-up:
Eureka server with self preservation mode disabled i.e. enableSelfPreservation: false
2 Eureka instances each for 2 services (say service#1 and service#2). Total 4 instances.
And one of the instances (say srv#1inst#1, an instance of service#1) sent a heartbeat, but it did not reach the Eureka server.
AFAIK, following actions take place in sequence on Server side:
ServerStep1: Server observes that a particular instance has missed a heartbeat.
ServerStep2: Server marks the instance for eviction.
ServerStep3: Server's eviction scheduler (which runs periodically) evicts the instance from registry.
Now on instance (srv#1inst#1) side:
InstanceStep1: It skips a heartbeat.
InstanceStep2: It realizes heartbeat did not reach Eureka Server. It retries with exponential back-off.
AFAIK, the eviction and registration do not happen immediately. Eureka server runs separate scheduler for both tasks periodically.
I have some questions related to this process:
Are the sequences correct? If not, what did I miss?
Is the assumption about eviction and registration scheduler correct?
An instance of service#2 requests fresh registry copy from server right after ServerStep2.
Will srv#1inst#1 be in the fresh registry copy, because it has not been evicted yet?
If yes, will srv#1inst#1 be marked UP or DOWN?
The retry request from InstanceStep2 of srv#1inst#1 reaches server right after ServerStep2.
Will there be an immediate change in registry?
How that will affect the response to instance of service#2's request for fresh registry? How will it affect the eviction scheduler?

This question was answered by qiangdavidliu in one of the issues of eureka's GitHub repository.
I'm adding his explanations here for sake of completeness.
Before I answer the questions specifically, here's some high level information regarding heartbeats and evictions (based on default configs):
instances are only evicted if they miss 3 consecutive heartbeats
(most) heartbeats do not retry, they are best effort every 30s. The only time a heartbeat will retry is that if there is a threadlevel error on the heartbeating thread (i.e. Timeout or RejectedExecution), but this should be very rare.
Let me try to answer your questions:
Are the sequences correct? If not, what did I miss?
A: The sequences are correct, with the above clarifications.
Is the assumption about eviction and registration scheduler correct?
A: The eviction is handled by an internal scheduler. The registration is processed by the handler thread for the registration request.
An instance of service#2 requests fresh registry copy from server right after ServerStep2.
Will srv#1inst#1 be in the fresh registry copy, because it has not been evicted yet?
If yes, will srv#1inst#1 be marked UP or DOWN?
A: There are a few things here:
until the instance is actually evicted, it will be part of the result
eviction does not involve changing the instance's status, it merely removes the instance from the registry
the server holds 30s caches of the state of the world, and it is this cache that's returned. So the exact result as seem by the client, in an eviction scenario, still depends on when it falls within the cache's update cycle.
The retry request from InstanceStep2 of srv#1inst#1 reaches server right after ServerStep2.
Will there be an immediate change in registry?
How that will affect the response to instance of service#2's request for fresh registry? How will it affect the eviction scheduler?
A: again a few things:
When the actual eviction happen, we check each evictee's time to see if it is eligible to be evicted. If an instance is able to renew its heartbeats before this event, then it is no longer a target for eviction.
The 3 events in question (evaluation of eviction eligibility at eviction time, updating the heartbeat status of an instance, generation of the result to be returned to the read operations) all happen asynchronously and their result will depend on the evaluation of the above described criteria at execution time.

Azure Service Fabric

Please help me to know , Is there any option in the azure service fabric to delay deprovision ? I have a micro service application hosted in fabric which is distributed in different nodes at their instances . If i tried to disengage/deprovision the service from portal , Can the service fabric internally check whether any transaction is going any of the instances or not , If it is engaged , Will it wait for complete it ? Also want to know , If microsoft is not providing such a service , does we have any powershell command to check the instance status ?
Thanks

I assume that by "disengage/deprovision the service from portal" you are referring to deleting the service via the Service Fabric Explorer web app (perhaps via a link followed from the portal). Please correct me if this is wrong.
To answer your question directly, the framework will not wait for in-flight operations to complete during a service delete. Every replica for the service will lose its read and write permissions, causing all in-flight operations to fail. We do not offer a way to stall during this step in order to, for example, allow currently open transactions to be completed.
The reason we do not offer this semantic, is that service deletion is expected to be rare or permanent, and that delaying deletion for the final operation doesn't enable any additional scenarios. In either case, if a client is attempting operations on a service being deleted, either:
The last client operation may fail due to delete racing and revoking read/write permissions
Every subsequent client operation will fail due to the service no longer existing
or
The last client operation will succeed due to deletion being delayed
Every subsequent client operation will fail due to the service no longer existing
The expectation is that any client or dependent service should have already been updated or deleted prior to deleting the service they depend on, as you are making the permanent decision that this service should no longer exist.

Clustered, HA Distributed Transaction Manager

I'm looking for specific product/technology or any proposed solution for the following problem:
I need a JTA-compliant transaction manager, that can enlist XAResources via resource-adapters and perform two-phase commit
It should be transparently available in JBoss AS/WildFly
It should be clustered with high-availability for
Transaction manager itself
Application server (JBoss) with applications as clients for TM deployed at AS
As "clustered" I mean not TM clustering, but client clustering sharing the same transaction: e.g. transaction begins on one JBoss server, then continues on second and is committed/rolled back on third. So the underlying resource (Database, enterprise bus, messaging) see all the requests from several app-servers as ONE transaction
As "high-availability" I mean that any component involved in transaction work execution could have a standby/hot-active instance that could complete/rollback work in case of main instance out of order. This include:
Transaction manager itself (it should not rely on one instance running, all transaction info should be replicated on-line within cluster)
Transaction clients (application running on JBoss instance which is processing transactonal call should fail-over on other JBoss instance in case of server outage)
I can't get the JTS catch in terms of work with XA resources (not in terms of work with saved transactional objects) and have not yet achieved any success in setting up JTS in cluster/HA. May be there is an issue that transaction could be managed by only one instance of TM and if it fails the transaction is buried until server restarted.
I don't know whether what I'm looking for is an utopia or whether I an not on the right way at all :)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse