I want to process big files with Azure Functions using HTTP(S). I need something with resumable file upload like tus.io. Is it possible to implement an Azure Function(s) with tus.io, for example by augmenting "HTTP & webhooks".
Many thanks in advance, X.
You are very likely to hit the HTTP timeout limit of 230 seconds and instead of building and maintaining your own upload service, you could directly use Azure Blob Storage for it.
By generating and using a SAS key, you can let clients directly upload to blob storage which is already designed for massive scale and supports resuming uploads/downloads.
Related
In my scenario I need to consume an external REST api. One of the fields in the response is a url to an image. What I'm trying to achieve is to grab the pic behind that url and store it in the blob storage. This would be easy with a Function or WebJob but is there a way to do it with DataFactory on its own?
Based on my research,only Http Connector supports downloading file which could be used in the copy activity as source dataset.
The sync tool takes an s3:// format address - should this work with Digital Ocean?
It seems that the more appropriate way to achieve this sync task would be the usage of the transfer tool provided by Google Cloud Platform, but instead of configuring the URL for spaces, you can create a job and specify that your transfer will be done using a URL List(via Cloud Console or Storage Transfer API)
If you have a lot of objects in your Spaces buckets, you could use the Spaces API for list bucket's content and if you prefer, use the outcome of this API to then create a transfer job using the Storage transfer API (you can take a look to the transfer spec here
I am trying to understand the general architecture and components needed to link metadata with blob objects stored into the Cloud such as Azure Blob Storage or AWS.
Consider an application which allows users to upload a blob files to the cloud. With each file there would be a miriade of metadata describing the file, its cloud URL and perhaps emails of users the file is shared with.
In this case, the file gets save to the cloud and the metadata into some type of database somewhere else. How would you go about doing this transactionally so that it is guaranteed both the file was saved and the metadata? If one of the two fails the application would need to notify the user so that another attempt could be made.
There's no built-in mechanism to span transactions across two disparate systems, such as Neo4j/mongodb and Azure/AWS blob storage as you mentioned. This would be up to your app to manage. And how you go about that is really a matter of opinion/discussion.
I am trying to use the Google Cloud Storage bucket to serve static files from a web server on GCE. I see in the docs that I have to copy files manually but I am searching for a way to dynamically copy files on demand just like other CDN services. Is that possible?
If you're asking whether Google Cloud Storage will automatically and transparently cache frequently-accessed content from your web server, then the answer is no, you will have to copy files to your bucket explicitly yourself.
However, if you're asking if it's possible to copy files dynamically (i.e., programmatically) to your GCS bucket, rather than manually (e.g., via gsutil or the web UI), then yes, that's possible.
I imagine you would use something like the following process:
# pseudocode, not actual code in any language
HandleRequest(request) {
gcs_uri = computeGcsUrlForRequest(request)
if exists(gcs_uri) {
data = read(gcs_uri)
return data to user
} else {
new_data = computeDynamicData(request)
# important! serve data to user first, to ensure low latency
return new_data to user
storeToGcs(new_data) # asynchronously, don't block the request
}
}
If this matches what you're planning to do, then there are several ways to accomplish this, e.g.,
language-specific libraries (recommended)
JSON API
XML API
Note that to avoid filling up your Google Cloud Storage bucket indefinitely, you should configure a lifecycle management policy to automatically remove files after some time or set up some other process to regularly clean up your bucket.
I am developing a REST application that can be used as a data upload service for large file. I create chunks of the file and upload each chunk. I would like to have multiple services running this service (For load balancing). I would like my REST service to be a stateless system (No information about each stored chunk). This will help me avoid server affinity. If i allow server affinity, i can have a server for each upload request and the chunks can be stored in a temporary file in the disk and can be moved to some other place once the upload is complete.
Ideally i would use a central place for the data to be stored. I would like to avoid this as this is a single point of failure (bad in a distributed system). So i was thinking about using a distributed file system say like HDFS but appending to file is not very efficient and so this is not an option.
Is it possible to use some kind of a cache for storing the data? Since the size of the data is quite big (2 -3 GB files) traditional cache solutions like Memcache cannot be used.
Is there any other option to solve this problem. Am I not looking in any particular direction?
Any help will be greatly appreciated.