I am quite new to the load balancing scenarios expecially in cloud. We are using load balancers to avoid load to a particular server and to avoid single point of failure . But then how does the load balancer itself scale . Isn't it a single point of failure ? Say, We have 1000 connections from client and we distribute it to to 4 servers using load balancer equally ,Then each server handle 250 connections . But the load balancer need to handle all 1000. Then, how do we avoid single point of failure or distribute load across machines?. I know that load balancer may not require to do as much processing required by the application server, since it does not do any functional processing. But still, there are modern load balancers which does lot of functionalities like SSL offload, intelligent routing etc. How does this work in Modern Highly scalable scenarios where we need to handle millions of connections?.
Related
I read this article about the API Gateway pattern. I realize that API Gateways typically serve as reverse proxies, but this forces a bottleneck situation. If all requests to an application's public services go through a single gateway, or even a single load balancer across multiple replicas of a gateway (perhaps a hardware load balancer which can handle large amounts of bandwidth more easily than an API gateway), then that single access point is the bottleneck.
I also understand that it is a wide bottleneck, as it simply has to deliver messages in proxy, as the gateways and load balancers themselves are not responsible for any processing or querying. However, imagining a very large application with many users, one would require extremely powerful hardware to not notice the massive bandwidth traveling over the gateway or load balancer, given that every request to every microservice exposed by the gateway travels through that single access point.
If the API gateway instead simply redirected the client to publicly exposed microservices (sort of like a custom DNS lookup), the hardware requirements would be much lower. This is because the messages traveling to and from the API Gateway would be very small, the requests consisting only of a microservice name, and the responses consisting only of the associated public IP address.
I recognize that this pattern would involve greater latency due to increased external requests. It would also be more difficult to secure, as every microservice is publicly exposed, rather than providing authentication at a single entrypoint. However, it would allow for bandwidth to be distributed much more evenly, and provide a much wider bottleneck, thus making the application much more scalable. Is this a valid strategy?
A DNS/Public IP based approach not good from a lot of perspectives:
Higher attack surface area as you have too many exposed points and each need to be protected
No. of public IPs needed is higher
You may need more DNS settings with subdomains or domains to be there for these APIs
A lot of times your APIs will run on a root path but you may want to expose them on a folder path example.com/service1, which requires
you to use some gateway for the same
Handling SSL certificates for these public exposures
Security on a focussed set of nodes vs securing every publically exposed service becomes a very difficult task
While theoretically it is possible to redirect clients directly to the nodes, there are a few pitfalls.
Security, Certificate and DNS management has been covered by #Tarun
Issues with High Availability
DNS's cache the domains vs IP's they serve fairly aggressively because these seldom change. If we use DNS to expose multiple instances of services publicly, and one of the servers goes down, or if we're doing a deployment, DNS's will continue routing the requests to the nodes which are down for a fair amount of time. We have no control over external DNS's and their policies.
Using reverse proxies, we avoid hitting those nodes based on health checks.
If a monolithic back end application gets billions of requests can we add load balancer ?
If so , how it works to reduce the load ?
In order for a load balancer to be useful, it must be possible for your application to be spread across more than one "backend" server. The "purest" version of this setup is one where the backend servers are totally stateless and don't have any concept of a "connection" or "sessions" and each request will require approximately the same amount of work/resources. In this case, you can configure the loadbalancer to just randomly proxy requests to a pool of backend servers. An example of an application like this would be a static webserver.
Next, slightly less pure, would be those applications where the backend server doesn't need any particular state at the beginning of a "connection" or "session", but needs to maintain state while that sessions continues, and so each client needs to be assigned to the same server for the duration of that session. This slightly complicates things, as you then need "sticky" connections, and probably some way to pick the least-loaded servers to route new connections to, rather than doing it at random (since sessions will be of different lengths). An SMTP server is an example of this type.
The worst kind of application in this sense is one in which the backend server needs to maintain global state in order to be useful. A database server is the classic example. This kind of application is essentially impossible to load-balance without lots of trade-offs, and are usually the biggest, baddest servers that typical applications use, because it is often cheaper and easier, in terms of engineering, to simply buy the meanest, most expensive possible hardware, than it is to deal with the harsh realities of distributed systems, particularly if there are dependent systems (years of accumulated application code) that implicitly make assumptions about data integrity, etc. which cannot be met under, for example, the CAP theorem.
So let's say I have the following (obviously simplified) architecture: I have 10 servers running the same REST API endpoint. I have an intermediate API which fields requests, and then forwards it to one of the servers (a load balancer).
Now let's imagine that this is a really big, streaming response. As such, I obviously don't want the data to have to go back through the load balancer -- because wouldn't this bog down and defeat the purpose of the load balancing server?. What would be the proper way to implement a load balancing system that would delegate a request to node but not force the response back through the load balancing server?
Further, are there any REST frameworks on the JVM that implement this?
What you are looking for is called DSR (direct server return). you can attempt to google it a bit. AFAIK most hardware load balancers have this option.
The question is what load balancer are you using? Is it hardware, ELB on AWS, HAProxy?
For example:
http://blog.haproxy.com/2011/07/29/layer-4-load-balancing-direct-server-return-mode/
If you're not really into load balancers, you could attempt to set this up in 2 stages: first - the client hits the API and gets the ip of the server, second the client talks to the servers. The hard part will be not to overload some servers when leaving others idle (both initial setup and rebalancing workloads as time goes by)
I'm wondering, is it possible to have many servers with the same web service deployed and then put them to communicate together?
Can you make a distributed system (having transparency, being failsafe and such things)
on top of web services, instead of TCP?
Is this a poor idea?
This is very possible to do. You can either purchase a hardware load balancer that will direct traffic to one of its configured host or, if you are on Windows, you can use Network Load Balancing.
There are many hardware load balancers on the market, the one I have used and have been pleased with is made by Coyote Point: http://www.coyotepoint.com/
It's not a poor idea, and is a commonly used to distribute traffic.
when it comes to stream big files it seems like nginx 1Gbps upload throughput link is small to handle transfer data from 3-4 storage servers.
The cause of my problem is bottle neck in proxy - all data go through loadbalancer.
Webservers downloading files from storage and streaming files over http (now through one nginx as loadbalancer).
Is it possible configure nginx to don't mediate in data streaming (static and dynamic data will be directly from webservers) and only balance requests? If not what shoud I use?
Common used solution additional layer to dns loadbance and second solution is balancing in application layer in links. This problem is not yet fully resolved because of not monitored load, but can give extra bandwidth capacity by partitioning with this layers.