As for one file, will Google gsutil get it partially uploaded before any error happens? Or if there are any errors in the middle of uploading, it will make the whole uploading fail, thus no incomplete contents will be upload onto Google Cloud Storage?
If an error occurs in the middle of a transfer, no incomplete file will be present in Google Cloud Storage.
gsutil uses the Resumable Upload protocol. Until a resumable upload is finalized, the contents of the object being written are not present in your bucket.
Related
How would you go about organizing a process of zipping objects that reside an object storage?
For context, our users sometimes request an extraction of their entire data from the app - think of "Downloading Twitter archive" feature of Twitter.
Our users are able to upload files, so the extracted data must contain files stored in a object storage (Google Cloud Storage). The requested data must be packed into a single .zip archive.
A naive approach would look like this:
download all files from object storage on a disk,
zip all files into an archive,
put it .zip back on an object storage,
send a link to download the .zip file back to user.
However, there are multiple disadvantages here:
sometimes files for even single user add up to gigabytes,
if the process of zipping is interrupted, it has to start over.
What's a reasonable way to design a process of generating a .zip archive with user files, that originally reside on an object storage?
Unfortunately, your naive approach is the only way because Cloud Storage offers no compute abilities. Archiving files requires compute, memory, and temporary storage.
The key item is to choose a service, such as Compute Engine, that can meet your file processing requirements: multi-gig files, fast processing (compression), and high-speed networking.
Another issue will be the time that it takes to download, zip, and upload. That means using an asynchronous event-based design. Start file processing and notify the user (email, message, web inbox, etc) once the file processing is complete.
You could make the process synchronous and display a progress bar, but that will complicate the design.
I am using the following workflow which leaves several copies of the original assets and blobs that should be cleaned up. I want to make sure I only keep the necessary assets in order to playback the videos that have been encoded. Also I am wondering if there is a more efficient way of creating encoded assets. It seems the only improvements that could be made is uploading the blob directly to a media service container instead of having to copy the blob.
I am using the following workflow:
From my website, a video file is uploaded to a non media service container
After file is uploaded, a message queue is created for the blob
Azure Web Job receives the message queue
The uploaded blob is copied to the media service container
Create a media service asset from the copied blob
Start a media encoder job from the new asset for H264 Adaptive Bitrate MP4 Set 720p
After the job is complete, delete the original blob, the first asset, and the queue message
As you already mentioned one of optimization step is to eliminate uploading a media file to none media associated storage. Also since you already using azure queues you can use them to be notified when job is done. With proposed changes your workflow will be.
in UI you calling asset creation before uploads starts.
User directly uploading to storage associated with media account. see https://stackoverflow.com/a/28951408/774068
Once upload is finished, trigger creation of media jobs with azure queues associated with it. See https://learn.microsoft.com/en-us/azure/media-services/media-services-dotnet-check-job-progress-with-queues
Listen when azure queue get a message about job completion and execute source asset deletion once message received. You can utilize azure functions for it. https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage
It's the first time I used Google Cloud, so I might ask the question in the wrong place.
Information provider upload a new file to Google Cloud Storage every day.
The file contains the information of all my clients/departments.
I have to sort through information and create a new file/s containing the relevant information for each department in my company .so that everyone gets the relevant information to them (security).
I can't figure out what are the steps I need to follow, to complete the task.
Can you help me?
You want to have a process that starts automatically and subsequently generates a new file once you upload something to Google Cloud Storage.
The easiest way to handle this is using Object Change Notifications. You can set up Object Change Notifications per bucket and this will send a POST request to an URL that you can define.
You can then easily set up a server (or run it on app engine) that will execute an action based on the POST request that it receives.
There is an even simpler option (although still in alpha) named cloud functions. Cloud functions is a serverless service that provides event based microservices (e.g. 'do this' if a new file is uploaded on GCS). This means you only have to write the code that defines what needs to happen if a new file is uploaded and then Cloud Functions will take care of executing the code when you upload a file to GCS. See this tutorial on using cloud functions with Google Cloud Storage.
I use gcloud node v0.24 for interacting with Google Cloud Storage. I've encountered an issue when immediate list after upload doesn't return all the files that were uploaded.
So the question is
does Bucket#getFiles always list files right after Bucket#upload?
or
is there any delay between upload's callback and when file becomes available (e.g. can be listed, downloaded)?
Note: below answer is no longer up to date -- GCS object listing is strongly consistent.
Google Cloud Storage provides strong global consistency for all read-after-write, read-after-update, and read-after-delete operations, including both data and metadata. As soon as you get a success response from an upload message, you may immediately read the object.
However, object and bucket listing is only eventually consistent. Objects will show up in a list call after you upload them, but not necessarily immediately.
In other words, if you know the name of an object that you have just uploaded, you can immediately download it, but you cannot necessarily discover that object by listing the objects in a bucket immediately.
For more, see https://cloud.google.com/storage/docs/consistency.
Is there a way to stream MP3s stored on Amazon S3 via a Flash widget embedded in a website, or some other method?
Yes it is. Firstly, you need to create a bucket in your S3 account which is all in lower case, is globally unique and is DNS-compatible; so for example I created a bucket called ‘media.torusknot.com’.
Then to make it all look nice you need to create a DNS CNAME entry to map a sub-domain of your site to that S3 bucket. That will allow you to access your files you upload to that S3 bucket via ‘http://media.example.com/somefile.mp3’. You do just need to set the ACLs on the files & the bucket to make sure public access is allowed.
Finally, if you want to stream video files via a Flash player from S3 to another domain, you also have to tell Flash that it’s ok for the content to be pulled in from a different domain. Create a file called ‘crossdomain.xml’ in the bucket, with these contents:
<cross-domain-policy>
<site-control permitted-cross-domain-policies="all"/>
</cross>
That allows the media to be accessed from anywhere - you can be more specific if you want but this is the simplest approach.
Related resources:
Using Amazon Web Services
Streaming Media From Amazon S3
To update the answer to this question, if you want to actually STREAM to clients, you can use Amazon Cloudfront on top of your S3 bucket (as mentioned by Rudolf). Create a "streaming distribution" in Cloudfront that points to your S3 bucket.
This will stream via RTMP (good for web and Android devices). You can use JW Player or a similar player to play the streamed files.
Cloudfront Streaming uses Adobe Flash Media Server 3.5.
There is also the ability to play secure content using signed urls.