Azure Message size limit and IOT - rest

I read through azure documentation and found that the message size limit of Queues is 64 Kb and Service Bus is 256 KB. We are trying to develop an application which will read sensor data from the some devices, call a REST Service and upload it to cloud . This data will be stored in the queues and then dumped in to a Cloud database.
There could be chances that the sensor data collected is more than 256 KB... In such cases what is the recommended approach... Do we need to split the data
in the REST service and then put chunks of data in the queue or is there any other recommended pattern
Any help is appreciated

You have several conflicting technology statements. I will begin by clarifying a few.
Service Bus/IoT Hub are not post calls. A post call would use a
restful service, which exists separately. IoT Hub uses a low
latency message passing system that is abstracted from you. These
are intended to be high volume small packets and fits most IoT
scenarios.
In the situation in which a message is larger than 256 KB (which is very interesting for an IoT scenario, I would be interested to
see why those messages are so large), you should ideally upload to
blob storage. You can still post packets
If you have access to blob storage api's with your devices, you should go that route
If you do not have access to this, you should post big packets to a rest endpoint and cross your fingers it makes it or chop it up.
You can run post analytics on blob storage, I would recommend using the wasb prefix as those containers are Hadoop compliant and you can stand up analytics clusters on top of those storage mechanisms.
You have no real need for a queue that I can immediately see.
You should take a look at the patterns leveraging:
Stream Analytics: https://azure.microsoft.com/en-us/services/stream-analytics/
Azure Data Factory: https://azure.microsoft.com/en-us/services/data-factory/
Your typical ingestion will be: Get your data up into the cloud into super cheap storage as easily as possible and then deal with analytics later using clusters you can stand up and tear down on demand. That cheap storage is typically blob and that analytics cluster is usually some form of Hadoop. Using data factory allows you to pipe your data around as you figure out what you are going to use specific components of it for.
Example of having used HBase as ingestion with cheap blob storage as the underlayment and Azure Machine Learning as part of my analytics solution: http://indiedevspot.com/2015/07/09/powering-azureml-with-hadoop-hbase/

Related

Sending data from database to IoT device at certain intervals

What is the best (and simplest) way to regularly send data from a database to an IoT device at a certain interval?
In this case I have the data in Google Cloud Datastore, and want to send it to Particle Photons (possibly via Particle Cloud, but not necessarily). But I might also be using other IoT devices and/or other database alternative like Cloud Firestore for instance, in future, so it's great if the solution is easily adoptable to this situation.
Seems like you want some kind of a cron job that takes data from datastore (or any database for that matter) and sends it to your IoT device. Assuming your IoT can be reached via a REST end point, you can use Cloud Scheduler (https://cloud.google.com/scheduler/) and do the logic. The target to the cloud scheduler can be an app engine instance or a cloud function.

Cloud service recommendation for pdf storage

I'm creating a web app that will contain a lot of pdfs for educational notes.What i have figured is that these pdfs should not be stored in my database i.e mongodb so can anyone recommend a cloud service which is free or cheap that can be used to store a large amount of pdfs for my web application.Considering storage upto 50-60gb. Also, i'm using node and express for my application.
Use an Amazon Web Services S3 Bucket on there Free tier for up to 5GB. Otherwise try Cloud Front:
https://aws.amazon.com/cloudfront/
"As part of the AWS Free Usage Tier, you can get started with Amazon CloudFront for free. Upon sign-up, new AWS customers receive 50 GB Data Transfer Out and 2,000,000 HTTP and HTTPS Requests each month for one year"

How to optimize large file transactions using a WCF Service as an endpoint for the Azure storage?

I have an WCF REST Service as an endpoint for the Azure Storage. The WCF REST Services handles uploads and downloads of files that usually measure 5-10 MB. When handling the stream (both for download an upload) the bytes are in the Azure VM RAM memory, right? Even if for upload the data is splitted into 4 MB blocks, those 4 MB are kept in the RAM memory until the upload is complete. For download, the bytes are kept until the download is complete. So, if I have 1000 users downloading a file at the same time that means that the Azure VM should have 4 GB RAM just for the transfer.
Is there a way to optimize this? Correct me if I'm wrong when I assume that the data is kept in VM machine RAM until the operation is finished. Should I use Microsoft's Azure REST Service? Where does that service keep the data until the transfer is finished?
I think you can avoid having all the data in memory at once by doing a couple things.
set up your WCF service to use a Streamed TransferMode
Handle your streams appropriately in code. ie, open the stream to BLOB storage, and then copy from the WCF stream to the BLOB stream in chunks, or using Stream.CopyTo().
I tried to implement this about 6 months ago, but I couldn't make WCF Streamed mode work properly in my situation. It seems to be finicky to get working, and may not be possible with HTTP endpoints. Then again, maybe my WCF-fu is just too weak. If you do get something working please post it back here, as I would very much like to see it.
Note that if you have multiple clients downloading the same data at once, you could keep a single copy in memory and stream it to each of them independently. But I don't believe that approach is compatible with the method I just described - you'd have to do one or the other.
You could use Windows Azure Blob Storage to handle all of the download/upload optimizations for you. It has a REST API, different "settings" for large/small files, shared access keys for keeping data private, http/https etc.
For downloading the files you can directly access the resource via a REST URL (pass in the optimization headers) or use the StorageClient library...and not process it via your own service.
Blob storage should be used for objects as large as you describe. Blob storage runs in a separate service than the Azure VM and is massively scaleable. Here's an article that describes how blob storage is architechted and how to implement it in your application. You don't mention the types of files you are trying to serve, but in some cases it might be useful to use a queue service as described here to cut down the data load.

Is a HTTP REST request the only way to access Azure Storage?

I've started reading about Azure Storage and it seems that the only way to access it is via an HTTP REST request.
I've seen that there are a few wrappers around these requests, for example, StorageClient (by Microsoft) and cloud storage api (http://cloudstorageapi.codeplex.com/), but they all still use REST in the background (to the best of my understanding).
It seems unreasonable to me that this is actually true. If I have a machine in Azure, and I want to access data stored in Azure Storage, it would seem every inefficient to
Yes, all storage calls are normalized to the REST API. Its actually very efficient when you consider the problem. You are thinking of a machine in Azure and data in azure as stored on two servers sitting in a rack. Remember in Azure, your data, your "servers", etc may be stored in different racks, different zones, and even different datacenters. With the REST API, your apps don't have to care about any of this. They just get the data with the URL.
So while a tiny HTTP overhead may appear inefficient if these were two boxes next to each other, its actually a very elegant solution when they are on different continents. Factor in concepts such as CDN, and it becomes an even better fit.
Layered onto this base concept is the Azure load balancer and other pieces of the internal infrastructure which can further optimize every request because they are all the same (HTTP). I also wouldn't be surprised (not sure at all, I dont work for MSFT) if the LB was doing traffic management optimizations when a request is made intra-datacenter.
Throughput on the storage subsystem in Windows Azure is pretty high. I'd be very surprised if the system cannot deliver to your needs.
There are also many design patterns to increase scalability of your app, like asynch processing, batching requests, delayed processing, etc.

Back up of Streaming server

I want to take a new streaming server for my website which generally holds videos and audio files. But how do we maintain backup of the streaming server if storage size is increasing day by day.
Generally Database server, like Sql Server, backup can be easily taken and restored very easily as it does not occupy much space for medium range application.
On the other hand how can we take backup of streaming server. If the server fails, the there should be an alternative server / solution that should decrease downtime of the server.
How the back-end architecture of YouTube built to handle this.
The backend architecture of YouTube probably uses Google's BigTable which stores objects redundantly over several different servers. If you are using a single server solution your only real options are backing up to an attached disk, backing up to another server or using an offsite storage system like Amazon S3 (which you could then use with their CDN to do basic HTTP streaming of content in the case of a failure).