LiveRebel Update Strategy - deployment

I am trying to utilize LiveRebel on my production environment. After most parts are configured I tried to perform update on my application from lets say version 1.1 to 1.3 as shown below
Does this mean that LiveRebel require two server installation on 2 physical IP addresses ? Can I have two server on 2 virtual IP addresses ?

Rolling restarts use request routing to achieve zero downtime for the users. Sessions are first drained by waiting for old sessions to expire and routing new ones to an identical application on another server. When all sessions are drained, application is updated, while the other server handles the requests.
So, as you can see, for zero downtime you need additional server to handle the requests while application is updated. Full restart doesn't have that requirement, but results in downtime for users.
As for the question about IPs, as long as the two server (virtual) machines can see each other , doesn't really make much difference.

Related

restarting a multi tier server architecture

My project has 4 servers, 2 on one layer and 2 on another layer. I use a context switch to load balance each layer so it shares requests amongst the two servers. 2 servers lie in the presentation tier side and the other 2 servers lie in the application tier (or we call it the business tier). The presentation tier has a dependency on the application tier. Now, the question I have is if one of the servers in the application tier fails to start but the other three servers start up correctly can you just restart that one application server that failed or do you have to restart all 4 servers? We are using jboss on these servers if that helps with the question. If more info is needed please ask.
I did some tests and to reiterate what was mention as a comment by alpha yes you can just restart one application tier without the need to restart all of the other servers. I did notice that, I don't know if it's a configuration thing or jboss thing, when you restart the application tier server most transactions tend to hit the other application tier that wasn't restarted. I don't know why this happens but it isn't a problem because the transactions, though minimal, that end up going to the restarted server work fine and after some time the balance does return back to 50-50.

JMeter throughput drops when hitting Amazon ELB

I am hosting a web application on Amazon's AWS Servers. I am currently in the process of load testing the application with JMeter. My main problem seems to be that when I go through an Elastic Load Balancer (ELB) to hit the Amazon server's rather than hitting the servers directly - I seem to hit a cap in my throughput.
If I hit my web application directly - for each server I am able to achieve a throughput of 50 RPS per server.
If I hit my web application via Amazon's ELB - I am only able to achieve a max throughput of 50 RPS (total)
I was wondering if anyone else has experienced similar behavior when load testing using Jmeter via Amazon's ELB.
For more context my web application is a REST application which allows users to download content (~150 kb) via HTTP requests.
I am running Jmeter with the following flag "-Dsun.net.inetaddr.ttl=0" and running it with 10 threads. I have tried running these tests with multiple clients on different machines.
Thanks for any help in advance.
Load balancers may be tricky to test as they may have different mechanisms of orchestrating traffic depending on origin. The most commonly used approach to distinguish origin of the request and redirect it to the same host, which served previous request is a cookie. You can look into HTTP Cookie Manager to correctly manipulate your cookies and make sure than you have different ones for each testing thread or thread group (depending on your use case). Another flaky area is origin host IP. You may require to bind each testing thread to different IP address in order to hit different servers behind the load balancer. There can be also some issues with DNS in regards to Amazon LBs. useful guide on how to test Amazon ELBs
Most probable cause would be DNS caching by jmeter. ELB returns IPs of additional servers depending on how autoscaling is set but JMeter does not use these additional servers. This problem can be solved by ensuring that Jmeter does not cache DNS results...
The ELB is a name, not IP, and can suffer from DNS caching. Make sure you use "-Dsun.net.inetaddr.ttl=0" when starting JMeter
http://wiki.apache.org/jmeter/JMeterAndAmazon
A really late response, and slightly different than the original question, but I hope this can help others as it took me a while to get it all straight. My original problem was not reduced throughput as a result of the ELB, but the introduction of HTTP 503 errors. Actually, the ELB increased my throughput as compared to querying the web application directly, though even with 1 hour tests, the results were sporadic to say the least.
First, the ELB has 2-staged load balancing going on. The first load balance is across the ELB's themselves. That's done by associating multiple IP addresses to the hostname provided by AWS for the ELB you provision. The second is then, of course, across your application instances behind the ELB.
Without trying to offend the SO gods, this is a really helpful article.
https://blazemeter.com/blog/dns-cache-manager-right-way-test-load-balanced-apps
The most helpful information in there was to use the DNS Cache Manager module in JMeter. This will query multiple DNS servers, and wipe out your DNS cache.
I implemented that module and then setup Wireshark, filtering on the two IP addresses belonging to the ELB hostname and sure enough, it was querying both IP addresses, though clearly favored one over the other.
That didn't make a big difference, at least not over short tests.
The real difference (2-3 times more throughput) came when I tweaked the ELB health settings. I initially had a high error rate, however after reducing the unhealthy threshold and the interval between health checks, my error rates dropped dramatically.
Additionally, whereas all my other tests had been 60 - 90 minutes in duration, this one was 8 hours. I started out with decent throughput and it then quickly dropped (by about 2/3). After about 20 minutes or more, the throughput then started ticking back up and by the end of the test, it had sustained throughput of about 5 times what I was getting without the ELB (which was similar to what the throughput was when it dropped shortly after beginning this test).

How to make restfull service truely Highly Available with Hardware load balancer

When we have a cluster of machines behind a load balancer (lb), generally hardware load balancer have persistent connections,
Now when we need to deploy some update on all machines (rolling update), the way to do is by bringing one machine Out of rotation, looks for no request sent to that server via lb. When the app reached no request state then update manually.
With 70-80 servers in picture this becomes very painful.
Can someone have a better way of doing it.
70-80 servers is a very horizontally scaled implementation... good job! Better is a very relative term, hopefully one of these suggestions count as "better".
Implement an intelligent health check for the application with the ability to adjust the health check while the application is running. What we do is have the health check start failing while the application is running just fine. This allows the load balancer to automatically take the system out of rotation. Our stop scripts query the load balancer to make sure that it is out of rotation and then shuts down normally which allows the existing connections to drain.
Batch multiple groups of systems together. I am assuming that you have 70 servers to handle peak load. This means that you should be able to restart several at a time. A standard way to do this is to implement a simple token granting service with a maximum of 10 tokens. Have your shutdown scripts checkout a token before continuing.
Another way to do this is with blue/green deploys. That means that you have an entire second server farm and then once the second server farm is updated switch load balancing to point to the new server farm.
This is an alternate to option 3. Install both versions of the app on the same servers and then have an internal proxy service (like haproxy) switch the connections between the version of the app that is deployed. For example:
haproxy listening on 8080
app version 0.1 listening on 9001
app version 0.2 listening on 9002
Once you are happy with the deploy of app version 0.2 switch haproxy to send traffic to 9002. When you release version 0.3 then switch load balancing back to 9001 etc.

JBoss multiple instances of a server, multiple ports in production environment not recommended?

The following document says:
This is easier to do and does not require a sysadmin. However, it is not the preferred approach for production systems for the reasons listed above. This approach is usually used in development to try out clustering behavior.
What are risks with this approach in the production environment? In weblogic, it is pretty common, and seen few production environments running with multiple ports(managed servers).
https://community.jboss.org/wiki/ConfiguringMultipleJBossInstancesOnOnemachine
The wiki clearly answers that question. Here is the text from the wiki for your reference
Where possible, it is advised to use a different ip address for each instance of JBoss rather than changing the ports or using the Service Binding Manager for the following reasons:
When you have a port conflict, it makes it very difficult to troubleshoot, given a large amount of ports and app servers.
Too many ports makes firewall rules too difficult to maintain.
Isolating the IP addresses gives you a guarantee that no other app server will be using the ports.
Each upgrade requires that you go in and re set the binding manager again. Most upgrades will upgrade the conf/jboss-service.xml file, which has the Service binding manager configuration in it.
The configuration is much simpler. When defining new ports(either through the Service Binding manager or by going in and changing all the ports in the configuration), it's always a headache trying to figure out which ports aren't taken already. If you use a NIC per JBoss Instance, all you have to change is the Ip address binding argument when executing the run.sh or run.bat. (-b )
Once you get 3 or 4 applications using different ports, the chances really increase that you will step on another one of your applications ports. It just gets more difficult to keep ports from conflicting.
JGroups will pick random ports within a cluster to communicate. Sometimes when clustering, if you are using the same ip address, two random ports may get picked in two different app servers(using the binding manager) that conflict. You can configure around this, but it's better not to run into this situation at all.
On a whole, having an individual IP addresses for each instance of an app server causes fewer problems (some of those problems are mentioned here, some aren't).

MSMQ redundancy

I'm looking into WCF/MSMQ.
Does anyone know how one handles redudancy with MSMQ? It is my understanding that the queue sits on the server, but what if the server goes down and is not recoverable, how does one prevent the messages from being lost?
Any good articles on this topic?
There is a good article on using MSMQ in the enterprise here.
Tip 8 is the one you should read.
"Using Microsoft's Windows Clustering tool, queues will failover from one machine to another if one of the queue server machines stops functioning normally. The failover process moves the queue and its contents from the failed machine to the backup machine. Microsoft's clustering works, but in my experience, it is difficult to configure correctly and malfunctions often. In addition, to run Microsoft's Cluster Server you must also run Windows Server Enterprise Edition—a costly operating system to license. Together, these problems warrant searching for a replacement.
One alternative to using Microsoft's Cluster Server is to use a third-party IP load-balancing solution, of which several are commercially available. These devices attach to your network like a standard network switch, and once configured, load balance IP sessions among the configured devices. To load-balance MSMQ, you simply need to setup a virtual IP address on the load-balancing device and configure it to load balance port 1801. To connect to an MSMQ queue, sending applications specify the virtual IP address hosted by the load-balancing device, which then distributes the load efficiently across the configured machines hosting the receiving applications. Not only does this increase the capacity of the messages you can process (by letting you just add more machines to the server farm) but it also protects you from downtime events caused by failed servers.
To use a hardware load balancer, you need to create identical queues on each of the servers configured to be used in load balancing, letting the load balancer connect the sending application to any one of the machines in the group. To add an additional layer of robustness, you can also configure all of the receiving applications to monitor the queues of all the other machines in the group, which helps prevent problems when one or more machines is unavailable. The cost for such queue-monitoring on remote machines is high (it's almost always more efficient to read messages from a local queue) but the additional level of availability may be worth the cost."
Not to be snide, but you kind of answered your own question. If the server is unrecoverable, then you can't recover the messages.
That being said, you might want to back up the message folder regularly. This TechNet article will tell you how to do it:
http://technet.microsoft.com/en-us/library/cc773213.aspx
Also, it will not back up express messages, so that is something you have to be aware of.
If you prefer, you might want to store the actual messages for processing in a database upon receipt, and have the service be the consumer in a producer/consumer pattern.