I need to use google cloud storage to store some files that can contain sensitive information. File names will be generated using crypto function, and thus unguessable. Files will be made public.
Is it safe to assume that the file list will not be available to public ? I.e. file can only be accessed by someone who knows the file name.
I have ofc tried accessing the parent dir and bucket, and I do get rejected with unauthenticated error. I am wondering if there is or will ever be any other way to list the files.
Yes, that is a valid approach to security through obscurity. As long as the ACL to list the objects in a bucket is locked down, your object names should be unguessable.
However, you might consider using Signed URLs instead. They can have an expiration time set so it provides extra security in case your URLs are leaked.
Yes, but keep in mind that the ability to list objects in a bucket is allowed for anyone with read permission or better on the bucket itself. If your object names are secret, make sure to keep the bucket's read permissions locked down as much as possible.
jterrace's suggestion about preferring signed URLs is a good one. The major downside to obscure object names is that it's very difficult to remove access to a particular entity later without deleting the resource entirely.
Related
I'm trying to figure out if anyone can offer advice around bucket creation for an app that will have users with an album of photos. I was initially thinking of creating a single bucket and then prefixing the filename by user id, since google cloud storage doesn't recognize subdirectories, like so: /bucket-name/user-id1/file.png
Alternatively, I was considering creating a bucket and naming it by user id like so: /user-id1-which-is-also-bucket-name/file.png
I was wondering what I should consider in terms of cost and organization when setting up my google cloud storage. Thank you!
There is no difference in term of cost. In term of organization, it's different:
For the deletion, it's simpler to delete a bucket and not a folder in the unique bucket.
For performances, sharding is better is you have separate bucket (you have less chance to create an hotspot)
At billing perspective, you can add labels on the buckets, and get them in the billing exported to BigQuery. You can know the cost of the bucket per user, and maybe do a rebill to them
The biggest advantage of 1 bucket per user model is the security. You can grant a user (if the users have direct access to the bucket and don't use a backend service to access it) on a bucket, without the use of legacy (and almost deprecated) ACL on object. In addition, if you use ACL, you can't set ACL per folder, ACL are per object. So, everytime that you add an object in the unique bucket, you have to set the ACL on it. It's harder to achieve.
IMO, 1 bucket per user is the best model.
I know that the title is not that correct, but i don't know how to name this problem...
Currently I'm trying to design my first REST-API for a conversion-service. Therefore the user has an input file which is given to the server and gets back the converted file.
The current problem I've got is, that the converted file should be accessed with a simple GET /conversionservice/my/url. However it is not possible to upload the input file within GET-Request. A POST would be necessary (am I right?), but POST isn't cacheable.
Now my question is, what's the right way to design this? I know that it could be possible to upload the input file before to the server and then access it with my GET-Request, but those input files could be everything!
Thanks for your help :)
A POST request is actually needed for a file upload. The fact that it is not cachable should not bother the service because how could any intermediaries (the browser, the server, proxy etc) know about the content of the file. If you need cachability, you would have to implement it yourself probably with a hash (md5, sha1 etc) of the uploaded file. This would keep you from having to perform the actual conversion twice, but you would have to hash each file that was uploaded which would slow you down for a "cache miss".
The only other way I could think of to solve the problem would be to require the user to pass in an accessible url to the file in the query string, then you could handle GET requests, but your users would have to make the file accessible over the internet. This would allow caching but limit the usability.
Perhaps a hybrid approach would be possible where you accepted a POST for a file upload and a GET for a url, this would increase the complexity of the service but maximize usability.
Also, you should look into what caches you are interested in leveraging as a lot of them have limits on the size of a cache entry meaning if the file is sufficiently large it would not cache anyway.
In the end, I would advise you to stick to the standards already established. Accept the POST request for the file upload and if you are interested in speeding up the user experience maybe make the upload persist, this would allow the user to upload a file once and download it in many different formats.
You sequence of events can be as follows:
Upload your file/files using POST. For immediate response, you can return required information using your own headers. (It should return document key to access the file for future use.)
Then you can use GET for further operations using the above mentioned document key as a query string.
I realized that after creating a bucket with the default settings anyone who knows the bucket name is able to check for the existence of a file.
Example:
Someone tries the url https://storage.googleapis.com/bucket_name/file_name
If the file doesn't exist the message shown is "The specified key does not exist"
If the file does exist the message is "Anonymous callers do not have storage.objects.get access to object bucket_name/file_name"
This makes easy to discover filenames stored in a bucket, therefore, the privacy of the bucket content is not complete.
I also use S3 storage where the message is "Access denied" for both cases, so there is no way to know if the file is really there.
Is there any way to disable this behavior?
Thanks
Sorry, but there is currently not any way to get "access denied" for both cases.
Note that even if this did exist, it would not necessarily prevent a timing attack from determining whether the object existed or not. For that reason, it is recommended that you don't store sensitive data in object names, and that you obfuscate object names if determining their existence represents a risk to your business.
Is there a way to configure a container so that for a certain user it allows creation of new objects, but denies deletion and modification of existing objects?
My case is that I provide a web service which receives and serves files using remote openstack swift storage and I want that in case of a credential compromise at the web service level, the person who gains access to those credentials would not be able to alter existing files.
To the best of my knowledge, I don't think it is possible to deny any user from deleting or updating existing objects of the same container, when one can upload objects using credentials.
But you can write a java API and expose it to the user to upload file and internally you can upload the file using the set of credentials. Do not expose the functions that the user is not supposed to do (delete/update etc). You can have all your creds and everything in the code (better to be encrypted). This way you may achieve what you want. But this is a work around.
I must provide a solution where user can upload files and they must be stored together with some metadata, and this may grow really big.
Access to these files must be controlled, so they want me to just store them in DB BLOBs, but I fear PostgreSQL won't handle it properly over time.
My first idea was use some NoSQL DB solution, but I couldn't find any that would replace a good RDBMS and elegantly store files together. Then I thought on just saving these files in HD somewhere WebServer won't serve them, name them their table ID, and just load them on RAM and print them with proper content-type.
Could anyone suggest me any better solution for this?
I had the requirement to store many images (with some meta data) and allow controlled access to them, here is what I did.
To the cloud™
I save the image files in Amazon S3. My local database holds the metadata with the S3 location of the file as one column. When an authenticated and authorized user needs to see the file they hit a URL in my system (where the authentication and authorization checks occur) which then generates a pre-signed, expiring URL for the image and sends a redirect back to the browser. The browser is then able to load the image for a given amount of time (as specified in the signature within the URL.)
With this solution I have user level access to the resources and I don't have to store them as BLOBs or anything like that which may grow unwieldy over time. I also don't use MY bandwidth to stream the files to the client and get cheap, redundant storage for them. Obviously the suitability of this solution will depend on the nature of the binary files you are looking to store and your level of trust in Amazon. The world doesn't end if there is a slip and someone sees an image from my system they shouldn't. YMMV.