I want to take a new streaming server for my website which generally holds videos and audio files. But how do we maintain backup of the streaming server if storage size is increasing day by day.
Generally Database server, like Sql Server, backup can be easily taken and restored very easily as it does not occupy much space for medium range application.
On the other hand how can we take backup of streaming server. If the server fails, the there should be an alternative server / solution that should decrease downtime of the server.
How the back-end architecture of YouTube built to handle this.
The backend architecture of YouTube probably uses Google's BigTable which stores objects redundantly over several different servers. If you are using a single server solution your only real options are backing up to an attached disk, backing up to another server or using an offsite storage system like Amazon S3 (which you could then use with their CDN to do basic HTTP streaming of content in the case of a failure).
Related
I currently have a single server (480GB storage and 400mb bandwidth) for a clients project, and we are quickly running out of storage space, as well as bandwidth for when we do hls video streaming using an Nginx server once or twice a week.
We have considered upgrading to 2 2TB and 1gb bandwidth servers (a bit of future proofing), to be able to store all their data, and to start compensating for the glitchy streams.
As I am not a systems admin, I don't know much about load balancing and what would be the correct procedure for database and storage, do I clone the contents of one server to the other and split the traffic? Do I dedicate one to database, and another for storage?
Any help, on what services to use to split traffic and any best practices would be much appreciated
Ideally, you would distribute your video streams from a CDN. That way, the only practical limitation on scaling would be cost. Clients would be able to stream directly from nodes near them without having to hit your origin servers directly or very often. The CDN would cache the HLS segments.
At a minimum, I'd definitely separate your application servers from your video serving. They have different types of load, so you would be wasting money by providing too much CPU to host videos, or too much bandwidth to host an API. Split them up and you can scale independently as required.
I read through azure documentation and found that the message size limit of Queues is 64 Kb and Service Bus is 256 KB. We are trying to develop an application which will read sensor data from the some devices, call a REST Service and upload it to cloud . This data will be stored in the queues and then dumped in to a Cloud database.
There could be chances that the sensor data collected is more than 256 KB... In such cases what is the recommended approach... Do we need to split the data
in the REST service and then put chunks of data in the queue or is there any other recommended pattern
Any help is appreciated
You have several conflicting technology statements. I will begin by clarifying a few.
Service Bus/IoT Hub are not post calls. A post call would use a
restful service, which exists separately. IoT Hub uses a low
latency message passing system that is abstracted from you. These
are intended to be high volume small packets and fits most IoT
scenarios.
In the situation in which a message is larger than 256 KB (which is very interesting for an IoT scenario, I would be interested to
see why those messages are so large), you should ideally upload to
blob storage. You can still post packets
If you have access to blob storage api's with your devices, you should go that route
If you do not have access to this, you should post big packets to a rest endpoint and cross your fingers it makes it or chop it up.
You can run post analytics on blob storage, I would recommend using the wasb prefix as those containers are Hadoop compliant and you can stand up analytics clusters on top of those storage mechanisms.
You have no real need for a queue that I can immediately see.
You should take a look at the patterns leveraging:
Stream Analytics: https://azure.microsoft.com/en-us/services/stream-analytics/
Azure Data Factory: https://azure.microsoft.com/en-us/services/data-factory/
Your typical ingestion will be: Get your data up into the cloud into super cheap storage as easily as possible and then deal with analytics later using clusters you can stand up and tear down on demand. That cheap storage is typically blob and that analytics cluster is usually some form of Hadoop. Using data factory allows you to pipe your data around as you figure out what you are going to use specific components of it for.
Example of having used HBase as ingestion with cheap blob storage as the underlayment and Azure Machine Learning as part of my analytics solution: http://indiedevspot.com/2015/07/09/powering-azureml-with-hadoop-hbase/
I have an WCF REST Service as an endpoint for the Azure Storage. The WCF REST Services handles uploads and downloads of files that usually measure 5-10 MB. When handling the stream (both for download an upload) the bytes are in the Azure VM RAM memory, right? Even if for upload the data is splitted into 4 MB blocks, those 4 MB are kept in the RAM memory until the upload is complete. For download, the bytes are kept until the download is complete. So, if I have 1000 users downloading a file at the same time that means that the Azure VM should have 4 GB RAM just for the transfer.
Is there a way to optimize this? Correct me if I'm wrong when I assume that the data is kept in VM machine RAM until the operation is finished. Should I use Microsoft's Azure REST Service? Where does that service keep the data until the transfer is finished?
I think you can avoid having all the data in memory at once by doing a couple things.
set up your WCF service to use a Streamed TransferMode
Handle your streams appropriately in code. ie, open the stream to BLOB storage, and then copy from the WCF stream to the BLOB stream in chunks, or using Stream.CopyTo().
I tried to implement this about 6 months ago, but I couldn't make WCF Streamed mode work properly in my situation. It seems to be finicky to get working, and may not be possible with HTTP endpoints. Then again, maybe my WCF-fu is just too weak. If you do get something working please post it back here, as I would very much like to see it.
Note that if you have multiple clients downloading the same data at once, you could keep a single copy in memory and stream it to each of them independently. But I don't believe that approach is compatible with the method I just described - you'd have to do one or the other.
You could use Windows Azure Blob Storage to handle all of the download/upload optimizations for you. It has a REST API, different "settings" for large/small files, shared access keys for keeping data private, http/https etc.
For downloading the files you can directly access the resource via a REST URL (pass in the optimization headers) or use the StorageClient library...and not process it via your own service.
Blob storage should be used for objects as large as you describe. Blob storage runs in a separate service than the Azure VM and is massively scaleable. Here's an article that describes how blob storage is architechted and how to implement it in your application. You don't mention the types of files you are trying to serve, but in some cases it might be useful to use a queue service as described here to cut down the data load.
S3 allows you to post directly from browser to S3 bypassing your webserver (http://doc.s3.amazonaws.com/proposals/post.html). How can I upload files to a database in a similar fashion. I don't want to first stage the file in the webserver in a temporary file and then upload from there to the database. Thanks.
If I cannot avoid the webserver, then how do I just use the webserver for streaming and not actually land the file in the webserver before loading to the database.
Thanks.
A handful of DBMSes provide an HTTP connection design, but this is more the exception, not the rule.
That said, you can make the HTTP server a thin layer over a more traditional database, but this is probably a bad idea, because most databases assume that anything that can access them has full privilege to execute queries on them, and an application (read "web server") will act as a gatekeeper between the database and obnoxious or malicious clients.
Basically, You're going to do best using a database engine that does all of these things at a fine grained level, expressly designed for it. MongoDB mostly addresses this exact use case. Otherwise, you'll just have to write an application that sits between HTTP and the raw database connection.
I have an admin type system for a website with multiple web servers where users can configure pages and upload images to appear on the page (kind of similar to a CMS). If you already have a MongoDB instance with replica sets setup, what is the preferred way to store these uploads so that failover exists and why?
CDN, such as Amazon S3 / CloudFront.
Store the images in MongoDB? I do this now and don't use GridFS cause our images are all under 1MB.
Use some type of NFS with some sort of failover setup. If #3, then how do you configure this failover?
I use #2 just fine right now and have used #3 without the failover before. If I use MongoDB as the data store for my website and for serving images, could these GET requests for the images ever impact the performance of getting non-image data out of the DB?
could these GET requests for the images ever impact the performance of getting non-image data out of the DB?
Well, more image requests = more HTTP connections to your web servers = more requests for images from MongoDB = more network traffic.
So, yes, getting more image data from the DB could, in theory, impact getting non-image data. All you need to do is request 1000 images / sec at 1MB an image and you'll start seeing lots of network traffic between your MongoDB servers and your Web servers.
Note that this isn't a MongoDB limitation, this is a limitation of network throughput.
If you start getting lots of traffic, then the CDN is definitely recommended. If you already have an HTTP page that outputs the image, this should be pretty straight-forward.
Why not a CDN in front of MongoDB?
Redhat or CentOS clustering with a shared filesystem can provide a failover mechanism for NFS.