Drive REST API + Shared File + "FileSize"? - rest

We've noticed that when trying to GET a shared file (doesn't matter if you're sharing, or you're being shared with) using the REST API, the FileSize field is simply non-existent, which is actually rather troublesome for our usages.
We can certainly download the file via one of the various links in the response, and then detect the size, but that would require processing on our backend, and we'd much prefer to fail out (if need be) on our frontend before reaching that point.
Per the API documentation:
fileSize long The size of the file in bytes. This is only populated for files with content stored in Drive.
Are shared files not technically "stored" in Google Drive?
Thanks all!
-Cory

Related

How to zip objects in an object storage

How would you go about organizing a process of zipping objects that reside an object storage?
For context, our users sometimes request an extraction of their entire data from the app - think of "Downloading Twitter archive" feature of Twitter.
Our users are able to upload files, so the extracted data must contain files stored in a object storage (Google Cloud Storage). The requested data must be packed into a single .zip archive.
A naive approach would look like this:
download all files from object storage on a disk,
zip all files into an archive,
put it .zip back on an object storage,
send a link to download the .zip file back to user.
However, there are multiple disadvantages here:
sometimes files for even single user add up to gigabytes,
if the process of zipping is interrupted, it has to start over.
What's a reasonable way to design a process of generating a .zip archive with user files, that originally reside on an object storage?
Unfortunately, your naive approach is the only way because Cloud Storage offers no compute abilities. Archiving files requires compute, memory, and temporary storage.
The key item is to choose a service, such as Compute Engine, that can meet your file processing requirements: multi-gig files, fast processing (compression), and high-speed networking.
Another issue will be the time that it takes to download, zip, and upload. That means using an asynchronous event-based design. Start file processing and notify the user (email, message, web inbox, etc) once the file processing is complete.
You could make the process synchronous and display a progress bar, but that will complicate the design.

A method for linking a server side file to a Squarespace page?

I'm trying to build a website on Squarespace, in which the site links to a database file. It's stored in a standard file system on a server tower with cluster. No SQL architecture or anything that I explicitly know of. Unfortunately Google Drive isn't an option due to the size of the file ( > 200 GB). I'm rather lost due to the size constraint -- does anyone have an idea about how to do this? Can I set up some sort of server query using a link on the site? Can I upload the file from my computer and store it somewhere in the backend? Thanks.
"...the size of the file ( > 200 GB)..."
Unfortunately, Squarespace's own upload limitations are far below this for the two places where files like that can be stored: file storage (20MB) and developer-mode '/assets' folder (10MB). The CSS-/Style-related storage only supports images (and likely has a limit of less than 20MB). Digital download products can be 300MB (still to small for your file) and likely can't be linked-to and accessed as you'd need for your application.
"...Can I set up some sort of server query using a link on the site?..."
If you mean a query on some other service besides Squarespace which connects to the file hosted on your Squarespace site, the answer is no simply because there's no way to upload the file to Squarespace due to its size. If, however, your mean a query from your Squarespace site to the file hosted elsewhere, then this must be done using JavaScript and done entirely client-side due to Squarespace's lack of support for server-side languages.
"...Can I upload the file from my computer and store it somewhere in the backend?..."
See the options mentioned above, though all have file size limits below that of your file.
If you are able to utilize the file on your site using client-size/front-end JavaScript only, then perhaps you could host the file on Amazon S3 or other such provider and access it that way.

REST - GET-Respone with temporary uploaded File

I know that the title is not that correct, but i don't know how to name this problem...
Currently I'm trying to design my first REST-API for a conversion-service. Therefore the user has an input file which is given to the server and gets back the converted file.
The current problem I've got is, that the converted file should be accessed with a simple GET /conversionservice/my/url. However it is not possible to upload the input file within GET-Request. A POST would be necessary (am I right?), but POST isn't cacheable.
Now my question is, what's the right way to design this? I know that it could be possible to upload the input file before to the server and then access it with my GET-Request, but those input files could be everything!
Thanks for your help :)
A POST request is actually needed for a file upload. The fact that it is not cachable should not bother the service because how could any intermediaries (the browser, the server, proxy etc) know about the content of the file. If you need cachability, you would have to implement it yourself probably with a hash (md5, sha1 etc) of the uploaded file. This would keep you from having to perform the actual conversion twice, but you would have to hash each file that was uploaded which would slow you down for a "cache miss".
The only other way I could think of to solve the problem would be to require the user to pass in an accessible url to the file in the query string, then you could handle GET requests, but your users would have to make the file accessible over the internet. This would allow caching but limit the usability.
Perhaps a hybrid approach would be possible where you accepted a POST for a file upload and a GET for a url, this would increase the complexity of the service but maximize usability.
Also, you should look into what caches you are interested in leveraging as a lot of them have limits on the size of a cache entry meaning if the file is sufficiently large it would not cache anyway.
In the end, I would advise you to stick to the standards already established. Accept the POST request for the file upload and if you are interested in speeding up the user experience maybe make the upload persist, this would allow the user to upload a file once and download it in many different formats.
You sequence of events can be as follows:
Upload your file/files using POST. For immediate response, you can return required information using your own headers. (It should return document key to access the file for future use.)
Then you can use GET for further operations using the above mentioned document key as a query string.

Delta for file upload

I would like to synchronize uploads from our own server to our clients' dropboxes to which we have full access. syncing changes on dropbox is easy because i can use the delta call, but I need a more efficient way to identify and upload changes made locally to dropbox.
The sync api would be amazing for this but I'm not trying to make a mobile app so the languages with the api are not easily accessible (AFAIK). Is there an equivalent to the sync api for python running on a linux server?
Possible solution:
So far, I was thinking of using anydbm to store string,string dictionaries that would hold folder names as the key and the hash generated from the metadata call from the server. then I could query the dropbox and every time I run into a folder, I will check the folder compared with the metadata on the anydbm.
if there is a difference, compare the file dates/sizes in the folder and if there are any subfolders, recurse the function into them,
if it the same, skip the folder.
This should save a substantial amount of time compared to the current verification of each and every file, but if there are better solutions, please do let me know.

Best DB solution for storing large files

I must provide a solution where user can upload files and they must be stored together with some metadata, and this may grow really big.
Access to these files must be controlled, so they want me to just store them in DB BLOBs, but I fear PostgreSQL won't handle it properly over time.
My first idea was use some NoSQL DB solution, but I couldn't find any that would replace a good RDBMS and elegantly store files together. Then I thought on just saving these files in HD somewhere WebServer won't serve them, name them their table ID, and just load them on RAM and print them with proper content-type.
Could anyone suggest me any better solution for this?
I had the requirement to store many images (with some meta data) and allow controlled access to them, here is what I did.
To the cloud™
I save the image files in Amazon S3. My local database holds the metadata with the S3 location of the file as one column. When an authenticated and authorized user needs to see the file they hit a URL in my system (where the authentication and authorization checks occur) which then generates a pre-signed, expiring URL for the image and sends a redirect back to the browser. The browser is then able to load the image for a given amount of time (as specified in the signature within the URL.)
With this solution I have user level access to the resources and I don't have to store them as BLOBs or anything like that which may grow unwieldy over time. I also don't use MY bandwidth to stream the files to the client and get cheap, redundant storage for them. Obviously the suitability of this solution will depend on the nature of the binary files you are looking to store and your level of trust in Amazon. The world doesn't end if there is a slip and someone sees an image from my system they shouldn't. YMMV.