What does 'Renews' and 'Renews threshold' mean in Eureka - netflix-eureka

I'm new to Eureka and I see this information from the home page of my Eureka server(localhost:8761/). I didn't find any explanation from official docs about 'Renews' and 'Renews threshold'. Could any one please explain these words? Thanks!

Hope it helps:
Renews: total number of heartbeats the server received from clients
Renews threshold: a switch which controls the "self-preservation mode" of Eureka. If "Renews" is below "Renews threshold", the "self-preservation mode" is on.
self-preservation mode:
When the Eureka server comes up, it tries to get all of the instance registry information from a neighboring node. If there is a problem getting the information from a node, the server tries all of the peers before it gives up. If the server is able to successfully get all of the instances, it sets the renewal threshold that it should be receiving based on that information. If any time, the renewals falls below the percent configured for that value (below 85% within 15 mins), the server stops expiring instances to protect the current instance registry information.
In Netflix, the above safeguard is called as self-preservation mode and is primarily used as a protection in scenarios where there is a network partition between a group of clients and the Eureka Server. In these scenarios, the server tries to protect the information it already has. There may be scenarios in case of a mass outage that this may cause the clients to get the instances that do not exist anymore. The clients must make sure they are resilient to eureka server returning an instance that is non-existent or un-responsive. The best protection in these scenarios is to timeout quickly and try other servers.
For more details, please refer to the Eureka wiki.

Related

How to tune Netflix eureka self preservation to handle autoscaling?

The self-preservation feature that never expires does not looks friendly to cluster auto-scaling ability.
When we scale down our services after reduced load thous shutted down instances could trigger self-preservation.
As I understand self-preservation tries to tolerate short-term network issues. But there are already exists settings which allow us to tune some tolerance window:
eureka.instance.lease-expiration-duration-in-seconds = 90
eureka.instance.lease-renewal-interval-in-seconds = 30
I faced some advises to don't turn self-preservation off but seems it brings more pain than gain. Do I miss something?
First, you need to distinguish between normal shutdown and unclean termination of Eureka client. Self preservation mode only cares about unclean termination.
Namely, when you scale down your servers, if you make your application shutdown normally (unregister), self preservation mode will not be activated.
If you're using Spring cloud based Eureka client, this normal shutdown will be done when application shutdown. The problem is that some Spring cloud releases have the issue about sending shutdown(Eureka unregister) message. So if you want to make sure, just send unregister messages via REST API to Eureka server just after scaling down about the scaling downed instances.
Another possible approach is that just decreasing the threshold for self preservation.
eureka:
server:
renewal-percent-threshold: 0.50
One more thing.
You need to be careful when change eureka.instance.leaseRenewalIntervalInSeconds value. Original Eureka server source code assumes that this value is 30 seconds when it calculates the threshold for self preservation mode. I'm not sure this hard-coded part still lives in the latest Spring cloud release. You need double check.

Asterisk HA and SIP registration

I setup an Active/Passive cluster with Pacemaker/Corosync/DRBD. I wanted to make an Asterisk server HA. The solution works perfectly but when the service fails on one server and starts on another all registered SIP clients with the active server will be lost. And the passive server show nothing in the output of:
sip show peers
Until clients make a call or register again. One solution is to set the Registration rate on clients to 1 Min or so. Are there other options? For example integrating Asterisk with a DBMS helps to save this kind of state in a DB??
First of all doing clusters by non-expert is bad idea.
You can use realtime sip architecture, it save state in database. Complexity - average. Note, "sip show peers" for realtime also show nothing.
You can use memory duplicating cluster(some solution for xen exists) which will copy memory state from one server to other. Complexity - very complex.

Biztalk not tracking send/receive ports

It seems that any new send or receive ports that I create do not display any tracking even if I tick all the tracking boxes. I have an existing application and the receive port and orchestration tracking work, but the send port tracking doesn't.
On the same machine I also tried creating a new application. Created a send and a receive port and no tracking at all. I did the same thing on a fresh install of biztalk on another machine and I got tracking so I'm not crazy.
I've tried ...
ticking every box in tracking for the receive, orch, send ports.
creating a new host specifically for tracking
recreating the original host with a different name
sql service is running
reboot system
reboot host instances
restart biztalk services
nothing shows in event logs
all sql jobs ok except for 'monitor biztalk' which complains about 7 orphaned dta.
can't see anything in particular that stands out from mbv except for the above mentioned oraphaned dta.
In addition to Mike's answer:
You need to ensure that at least one of your hosts is enabled for tracking. In BizTalk Administrator, under Platform Settings, Hosts, Select the host, and enable tracking (the list of hosts also shows which host(s) are current tracking enabled).
You can also verify that the tracking SQL Agent job is running by looking directly at the database
select count(*) from BizTalkMsgBoxDb.dbo.Spool (NOLOCK)
select count(*) from BizTalkDTADb.dbo.Tracking_Parts1 (NOLOCK)
Basically, spool should be a fairly low number (< 10 000), and should come back to a static level after a spike in messages, unless your suspended orchs are growing.
And new messages should be copied across from the MessageBox to DtaDb.TrackingParts every minute, so Tracking_Parts1 should grow a few records every 60-120 seconds after processing new messages, although they will be eventually purged / archived in line with your tracking archiving / purge strategy.
In a Dev environment, the more tracking the merrier, as HAT (the orchestration debugger) will give you more information the more you track. However, in a PROD environment, you would typically want to minimize tracking to improve performance and reduce disk overhead. We just track one copy, viz 'before processing' on the receive and 'after processing' on the send ports to our partners, and nothing at all on internal Ports and Orchs. This allows us to provide sufficient evidence of data received and sent.
This post might help some people: http://learningcenter2.eworldtree.net:7090/Lists/Posts/Post.aspx?ID=78
For message tracking to work, among other factors, make sure that the "Message send and receive events" checkbox in the corresponding pipeline is enabled.
Please take a look at these two articles, What is Message Tracking? and Insight into BizTalk Server message tracking. The first article has an item of interest for you and I'll quote it below and the second should just solidify what you're trying to do.
The SQL Server Agent service must be running on all MessageBox databases. The TrackedMessages_Copy_ job makes message bodies available to tracking queries and WMI. To efficiently copy the message bodies, they remain in the MessageBox database and are periodically copied to the BizTalk Tracking (BizTalkDTADb) database by the TrackedMessages_Copy_ job. Having the SQL Server Agent service running is also a prerequisite for the archiving and purging process to work correctly.
Are you using a default pipeline? Have you checked the tracking check boxes on them? There is some bug where the pipeline tracking is disabled for default pipelines.
More info here:
http://blog.ibiz-solutions.se/integration/biztalk-global-pipeline-tracking-disabled-unexpectedly/
Please ensure that required tracking is enbled in the properties of the send pipeline used by your send port. If message body tracking is disabled on the send pipeline, nothing is tracked on the send port as well.

Improve on this pattern for handling how a mobile app knows how to find it's server

I'm considering how to identify server(s) to an app on a mobile device that utilises a wcf/web service.
(1) I anticipate all going well that I will need to migrate the server between hosts from time to time to handle load. I'd like to be able to do this without service disruption.
(2) I also anticipate that all going well I will want to improve scalability by seperating website hosting and wcf/web service hosting requiring an addressing change on the client. Until the app proves to get traction the server deployment will be shared on the same domain.
Rereleasing the client for this purpose at a glance seems complicated as you can't force updates on consumers and it's non trivial to distinguish between no data connection/server down and a server that's moved.
I was thinking this would be a solved problem, so thought to bounce it off the community for better ideas.
What I've come up with so far is as follows.
Client needs a primary and secondary URL to reference the wcf/web service. This caters for host provider changes. The old host can continue to run the service during the handover period. When succesfully deployed to the new host, the secondary/old host can be disabled. The wcf/web service is essentially stateless, so that simplifies matters somewhat.
At periodic intervals the client will initiate a service to request the primary and secondary URLs be supplied and then caches these. This future proofs the client to be able to (for a period of time) be instructed by the service behind the secondary URL to accept a new URL for future use while the primary URL services requests at the new URL. Once the secondary service has pointed to a new primary, the following periodic update requested from the primary will update the cached secondary URL.
What have I missed? Perhaps it can be simpler?
DNS is probably the way to hide your server moving to a new address. When you change your server, you update your DNS record. Put the new server in place, and once it is up and running, update the DNS record. That's all there is to do, unless I am missing something.
Edited to reflect comments :
To be sure that the change is seemless, you can reduce the TTL of your DNS entries to a very low value to be sure that the propagation is fast. For example, if before switching servers you change your TTL to 5 minutes, you ensure that you can swith your first server off 5 minutes after updating your DNS entries.
So you only need one URL, and this URL never needs to change. It answers both your first and second requirements. No need to reinvent the wheel.

Heartbeat Protocols/Algorithms or best practices

Recently I've added some load-balancing capabilities to a piece of software that I wrote. It is a networked application that does some data crunching based on input coming from a SQL database. Since the crunching can be pretty intensive I've added the capability to have multiple instances of this application running on different servers to split the load but as it is now the load balancing is a manual act. A user must specify which instances take which portion of the input domain.
I would like to take that to the next level and program the instances to automatically negotiate the diving up of the input data and to recognize if one of them "disappears" (has crashed or has been powered down) so that the remaining instances can take on the failed instance's workload.
In order to implement this I'm considering using a simple heartbeat protocol between the instances to determine who's online and who isn't and while this is not terribly complicated I'd like to know if there are any established heartbeat network protocols (based on UDP, TCP or both).
Obviously this happens a lot in the networking world with clustering, fail-over and high-availability technologies so I guess in the end I'd like to know if maybe there are any established protocols or algorithms that I should be aware of or implement.
EDIT
It seems, based on the answers, that either there are no well established heart-beat protocols or that nobody knows about them (which would imply that they aren't so well established after all) in which case I'm just going to roll my own.
While none of the answers offered what I was looking for specifically I'm going to vote for Matt Davis's answer since it was the closest and he pointed out a good idea to use multicast.
Thank you all for your time~
Distribued Interactive Simulation (DIS), which is defined under IEEE Standard 1278, uses a default heartbeat of 5 seconds via UDP broadcast. A DIS heartbeat is essentially an Entity State PDU, which fully defines the state, including the position, of the given entity. Due to its application within the simulation community, DIS also uses a concept referred to as dead-reckoning to provide higher frequency heartbeats when the actual position, for example, is outside a given threshold of its predicted position.
In your case, a DIS Entity State PDU would be overkill. I only mention it to make note of the fact that heartbeats can vary in frequency depending on the circumstances. I don't know that you'd need something like this for the application you described, but you never know.
For heartbeats, use UDP, not TCP. A heartbeat is, by nature, a connectionless contrivance, so it goes that UDP (connectionless) is more relevant here than TCP (connection-oriented).
The thing to keep in mind about UDP broadcasts is that a broadcast message is confined to the broadcast domain. In short, if you have computers that are separated by a layer 3 device, e.g., a router, then broadcasts are not going to work because the router will not transmit broadcast messages from one broadcast domain to another. In this case, I would recommend using multicast since it will span the broadcast domains, providing the time-to-live (TTL) value is set high enough. It's also a more automated approach than directed unicast, which would require the sender to know the IP address of the receiver in order to send the message.
Broadcast a heartbeat every t using UDP; if you haven't heard from a machine in more than k*t, then it's assumed down. Be careful that the aggregate bandwidth used isn't a drain on resources. You can use IP broadcast addresses, or keep a list of specific IPs you're doing work for.
Make sure the heartbeat includes a "reboot count" as well as "machine ID" so that you know previous server state isn't around.
I'd recommend using MapReduce if it fits. It would save a lot of work.
I'm not sure this will answer the question but you might be interested by the way Weblogic Server clustering work under the hood. From the book Mastering BEA WebLogic Server:
[...] WebLogic Server clustering provides a loose coupling of the servers in the cluster. Each server in the cluster is independent and does not rely on any other server for any fundamental operations. Even if contact with every other server is lost, each server will continue to run and be able to process the requests it receives. Each server in the cluster maintains its own list of other servers in the cluster through periodic heartbeat messages. Every 10 seconds, each server sends a heartbeat message to the other servers in the cluster to let them know it is still alive. Heartbeat messages are sent using IP multicast technology built into the JVM, making this mechanism efficient and scalable as the number of servers in the cluster gets large. Each server receives these heartbeat messages from other servers and uses them to maintain its current cluster membership list. If a server misses receiving three heartbeat messages in a row from any other server, it takes that server out of its membership list until it receives another heartbeat message from that server. This heartbeat technology allows servers to be dynamically added and dropped from the cluster with no impact on the existing servers’ configurations.
Cisco content switches are a hardware solution for this problem. They implement a virtual IP address as a front end to multiple real servers, whose real IP addresses are known to the switch. The switch periodically sends HTTP HEAD requests to the web servers, to verify they are still running (which the switch software calls a "keepalive", although this doesn't keep the server itself alive). The Cisco switch accepts traffic on the virtual IP and forwards it to the actual web servers, using configurable load balancing such as round-robin, or user-defined load balancing.
These switches retail in the $3-10K range, although my business partner picked one up on eBay for about $300 a year ago. If you can afford one, they do represent a proven hardware solution to the question of how to have a service spread transparently across multiple servers. Redhat includes a built-in port configuration so that you could implement your own Cisco switch using a cheap RedHat box. Google for "virtual ip address" and "cisco content router" for more information.
In addition to trying hardware load-balancers, you can also try a free-open-source load-balancing software application such as HAProxy, available for Linux and the BSDs.