I have an admin type system for a website with multiple web servers where users can configure pages and upload images to appear on the page (kind of similar to a CMS). If you already have a MongoDB instance with replica sets setup, what is the preferred way to store these uploads so that failover exists and why?
CDN, such as Amazon S3 / CloudFront.
Store the images in MongoDB? I do this now and don't use GridFS cause our images are all under 1MB.
Use some type of NFS with some sort of failover setup. If #3, then how do you configure this failover?
I use #2 just fine right now and have used #3 without the failover before. If I use MongoDB as the data store for my website and for serving images, could these GET requests for the images ever impact the performance of getting non-image data out of the DB?
could these GET requests for the images ever impact the performance of getting non-image data out of the DB?
Well, more image requests = more HTTP connections to your web servers = more requests for images from MongoDB = more network traffic.
So, yes, getting more image data from the DB could, in theory, impact getting non-image data. All you need to do is request 1000 images / sec at 1MB an image and you'll start seeing lots of network traffic between your MongoDB servers and your Web servers.
Note that this isn't a MongoDB limitation, this is a limitation of network throughput.
If you start getting lots of traffic, then the CDN is definitely recommended. If you already have an HTTP page that outputs the image, this should be pretty straight-forward.
Why not a CDN in front of MongoDB?
Redhat or CentOS clustering with a shared filesystem can provide a failover mechanism for NFS.
Related
I currently have a single server (480GB storage and 400mb bandwidth) for a clients project, and we are quickly running out of storage space, as well as bandwidth for when we do hls video streaming using an Nginx server once or twice a week.
We have considered upgrading to 2 2TB and 1gb bandwidth servers (a bit of future proofing), to be able to store all their data, and to start compensating for the glitchy streams.
As I am not a systems admin, I don't know much about load balancing and what would be the correct procedure for database and storage, do I clone the contents of one server to the other and split the traffic? Do I dedicate one to database, and another for storage?
Any help, on what services to use to split traffic and any best practices would be much appreciated
Ideally, you would distribute your video streams from a CDN. That way, the only practical limitation on scaling would be cost. Clients would be able to stream directly from nodes near them without having to hit your origin servers directly or very often. The CDN would cache the HLS segments.
At a minimum, I'd definitely separate your application servers from your video serving. They have different types of load, so you would be wasting money by providing too much CPU to host videos, or too much bandwidth to host an API. Split them up and you can scale independently as required.
I currently have a small website hosted on AWS.
The server is a micro-instance.
On this micro-instance:
I am running nginx to serve static files and error pages
I am running my node server
I am storing my mongoDB
As the website is getting more traffic, I reached the time where I need to scale, and I am not sure what the best-practices are and what are the implication of each.
I would love any referrals to reading materials
I was thinking of having:
2 dedicated micro-instances to run the website
1 micro-instance running nginx
1 micro-instance storing the db
questions:
Would having the db stored on a separate machine make the queries
significantly slower?
Should I in fact store the db on S3 instead?
Is it justified to have an entire instance for nginx alone?
How would you go about scaling from 1 machine to multiple ones? I am guessing moving from one to two is harder than moving from two to 50.
Any advice will be greatly appreciated!
Would having the db stored on a separate machine make the queries significantly slower?
No, the speed impact would be very minimal, and this would be needed for scalability anyway. Just make sure you use the private IP addresses of your instances for any inter-instance communication so that the traffic stays inside your VPC (for both security and performance reasons).
Should I in fact store the db on S3 instead?
No, that wouldn't work at all. You can't store a DB on S3, only DB backups.
Is it justified to have an entire instance for nginx alone?
If you are getting enough traffic, then yes absolutely.
How would you go about scaling from 1 machine to multiple ones?
In general you need to move your DB to a separate server, create multiple instances of your web server, and place a load balancer in front of them. If you want automatic scaling based on traffic then you would also place the web servers in an auto-scaling group. If all this sounds difficult then I would recommend looking into moving your web servers into Elastic Beanstalk which will manage much of this for you.
If your database is a bottleneck then you might also need to setup a MongoDB cluster and balance the load across the cluster. You could also move your DB to something like mlab which would greatly ease the management of that as well.
I have a bunch of web servers(frontends) behind balancer. Each apache process runs with it's own user for every virtualhost. Code that apache runs is PHP and it's not trusted code.
I need to have shared (between web servers) session storage and limit user(vhost) to only access it's session storage. So I want to avoid one tenant to be able to purge or corrupt memcached stored data.
So I basically looking for solution to authenticate users + create private buckets.
I know there is always MySQL way avaliable but I want to avoid performance penalty introduced by SQL layer.
Any solution in your mind so far?
I found product called CouchBase which fully comply with my requirements. It has buckets along with memcache caching layer and access protocol. It has SASL authentication and a bonus of load balancing and fail tolerance.
when it comes to stream big files it seems like nginx 1Gbps upload throughput link is small to handle transfer data from 3-4 storage servers.
The cause of my problem is bottle neck in proxy - all data go through loadbalancer.
Webservers downloading files from storage and streaming files over http (now through one nginx as loadbalancer).
Is it possible configure nginx to don't mediate in data streaming (static and dynamic data will be directly from webservers) and only balance requests? If not what shoud I use?
Common used solution additional layer to dns loadbance and second solution is balancing in application layer in links. This problem is not yet fully resolved because of not monitored load, but can give extra bandwidth capacity by partitioning with this layers.
I want to take a new streaming server for my website which generally holds videos and audio files. But how do we maintain backup of the streaming server if storage size is increasing day by day.
Generally Database server, like Sql Server, backup can be easily taken and restored very easily as it does not occupy much space for medium range application.
On the other hand how can we take backup of streaming server. If the server fails, the there should be an alternative server / solution that should decrease downtime of the server.
How the back-end architecture of YouTube built to handle this.
The backend architecture of YouTube probably uses Google's BigTable which stores objects redundantly over several different servers. If you are using a single server solution your only real options are backing up to an attached disk, backing up to another server or using an offsite storage system like Amazon S3 (which you could then use with their CDN to do basic HTTP streaming of content in the case of a failure).