How to balance bandwidth load in file streaming? - streaming

when it comes to stream big files it seems like nginx 1Gbps upload throughput link is small to handle transfer data from 3-4 storage servers.
The cause of my problem is bottle neck in proxy - all data go through loadbalancer.
Webservers downloading files from storage and streaming files over http (now through one nginx as loadbalancer).
Is it possible configure nginx to don't mediate in data streaming (static and dynamic data will be directly from webservers) and only balance requests? If not what shoud I use?

Common used solution additional layer to dns loadbance and second solution is balancing in application layer in links. This problem is not yet fully resolved because of not monitored load, but can give extra bandwidth capacity by partitioning with this layers.

Related

Splitting incoming traffic between multiple servers?

I currently have a single server (480GB storage and 400mb bandwidth) for a clients project, and we are quickly running out of storage space, as well as bandwidth for when we do hls video streaming using an Nginx server once or twice a week.
We have considered upgrading to 2 2TB and 1gb bandwidth servers (a bit of future proofing), to be able to store all their data, and to start compensating for the glitchy streams.
As I am not a systems admin, I don't know much about load balancing and what would be the correct procedure for database and storage, do I clone the contents of one server to the other and split the traffic? Do I dedicate one to database, and another for storage?
Any help, on what services to use to split traffic and any best practices would be much appreciated
Ideally, you would distribute your video streams from a CDN. That way, the only practical limitation on scaling would be cost. Clients would be able to stream directly from nodes near them without having to hit your origin servers directly or very often. The CDN would cache the HLS segments.
At a minimum, I'd definitely separate your application servers from your video serving. They have different types of load, so you would be wasting money by providing too much CPU to host videos, or too much bandwidth to host an API. Split them up and you can scale independently as required.

How to identify the network performance issue?

I am a little confuse about my message server's network bottleneck issue. I can obviously found the problem caused by the a lot of network operation, but I am not sure why and how to identify it.
Currently we are using GCP as our VM and 4 core/8G RAM for our message server. Redis & Cassandra is in other server at the same place. The problem happened at the network operation to the redis server and cassandra server.
I need to handle 3000+ requests at once to save data to redis and 12000+ requests to cassandra server.
My task consuming all my CPU power and the CPU usage down right after I merge the redis request and cassandra request to kind of batch request. The penalty is I have to delay my data saving.
What I want to know is how can I know the network's capability of my system. How many requests within 1 second is a reasonable task?. As my testing, this is obviously true that the bottleneck is the network operation, but I can't prove it. I can't even know how to estimate a reasonable network usage of my system? Are there some tools or other thing that can help to my make sure my network's problem? Or this is just a error config of my GCP system?
Thanks,
Eric
There is a "monitoring" label in each instance where you can check through graphs values like instance CPU, Network and RAM usage.
But to further check the performance of your instance you should use StackDriver Logging1 and Monitoring2. It stores a lot of information from the internal servers and the system performance. for that you will need to install the agent in the instance. It also stores information about your Load Balancer3, in case you are using one with your web application, which is very advisable since it scale your resources up or down with intelligent Autoscaling.
But in order to test out your network you will need to use some third party tool to overload the network. There are multiple tools to achieve this, like JMeter.

Is there a way to have a proxied request respond straight to the requestor?

So let's say I have the following (obviously simplified) architecture: I have 10 servers running the same REST API endpoint. I have an intermediate API which fields requests, and then forwards it to one of the servers (a load balancer).
Now let's imagine that this is a really big, streaming response. As such, I obviously don't want the data to have to go back through the load balancer -- because wouldn't this bog down and defeat the purpose of the load balancing server?. What would be the proper way to implement a load balancing system that would delegate a request to node but not force the response back through the load balancing server?
Further, are there any REST frameworks on the JVM that implement this?
What you are looking for is called DSR (direct server return). you can attempt to google it a bit. AFAIK most hardware load balancers have this option.
The question is what load balancer are you using? Is it hardware, ELB on AWS, HAProxy?
For example:
http://blog.haproxy.com/2011/07/29/layer-4-load-balancing-direct-server-return-mode/
If you're not really into load balancers, you could attempt to set this up in 2 stages: first - the client hits the API and gets the ip of the server, second the client talks to the servers. The hard part will be not to overload some servers when leaving others idle (both initial setup and rebalancing workloads as time goes by)

Image Uploads - CDN, MongoDB, or NFS?

I have an admin type system for a website with multiple web servers where users can configure pages and upload images to appear on the page (kind of similar to a CMS). If you already have a MongoDB instance with replica sets setup, what is the preferred way to store these uploads so that failover exists and why?
CDN, such as Amazon S3 / CloudFront.
Store the images in MongoDB? I do this now and don't use GridFS cause our images are all under 1MB.
Use some type of NFS with some sort of failover setup. If #3, then how do you configure this failover?
I use #2 just fine right now and have used #3 without the failover before. If I use MongoDB as the data store for my website and for serving images, could these GET requests for the images ever impact the performance of getting non-image data out of the DB?
could these GET requests for the images ever impact the performance of getting non-image data out of the DB?
Well, more image requests = more HTTP connections to your web servers = more requests for images from MongoDB = more network traffic.
So, yes, getting more image data from the DB could, in theory, impact getting non-image data. All you need to do is request 1000 images / sec at 1MB an image and you'll start seeing lots of network traffic between your MongoDB servers and your Web servers.
Note that this isn't a MongoDB limitation, this is a limitation of network throughput.
If you start getting lots of traffic, then the CDN is definitely recommended. If you already have an HTTP page that outputs the image, this should be pretty straight-forward.
Why not a CDN in front of MongoDB?
Redhat or CentOS clustering with a shared filesystem can provide a failover mechanism for NFS.

Back up of Streaming server

I want to take a new streaming server for my website which generally holds videos and audio files. But how do we maintain backup of the streaming server if storage size is increasing day by day.
Generally Database server, like Sql Server, backup can be easily taken and restored very easily as it does not occupy much space for medium range application.
On the other hand how can we take backup of streaming server. If the server fails, the there should be an alternative server / solution that should decrease downtime of the server.
How the back-end architecture of YouTube built to handle this.
The backend architecture of YouTube probably uses Google's BigTable which stores objects redundantly over several different servers. If you are using a single server solution your only real options are backing up to an attached disk, backing up to another server or using an offsite storage system like Amazon S3 (which you could then use with their CDN to do basic HTTP streaming of content in the case of a failure).