how to plan google cloud storage bucket creation when working with users - google-cloud-storage

I'm trying to figure out if anyone can offer advice around bucket creation for an app that will have users with an album of photos. I was initially thinking of creating a single bucket and then prefixing the filename by user id, since google cloud storage doesn't recognize subdirectories, like so: /bucket-name/user-id1/file.png
Alternatively, I was considering creating a bucket and naming it by user id like so: /user-id1-which-is-also-bucket-name/file.png
I was wondering what I should consider in terms of cost and organization when setting up my google cloud storage. Thank you!

There is no difference in term of cost. In term of organization, it's different:
For the deletion, it's simpler to delete a bucket and not a folder in the unique bucket.
For performances, sharding is better is you have separate bucket (you have less chance to create an hotspot)
At billing perspective, you can add labels on the buckets, and get them in the billing exported to BigQuery. You can know the cost of the bucket per user, and maybe do a rebill to them
The biggest advantage of 1 bucket per user model is the security. You can grant a user (if the users have direct access to the bucket and don't use a backend service to access it) on a bucket, without the use of legacy (and almost deprecated) ACL on object. In addition, if you use ACL, you can't set ACL per folder, ACL are per object. So, everytime that you add an object in the unique bucket, you have to set the ACL on it. It's harder to achieve.
IMO, 1 bucket per user is the best model.

Related

How to apply upload limit for google storage bucket per day/month/etc

Is there a way how to apply upload limit for google storage bucket per day/month/year?
Is there is a way how to apply limit on amount of Network traffic?
Is there is a way how to apply limit on Class A operations?
Is there is a way how to apply limit on Class B operations?
I found only Queries per 100 seconds per user and Queries per day using
https://cloud.google.com/docs/quota instructions, but this is JSON Api quotas
(I even not sure what kind of api is used inside of StorageClient c# client class)
For defining Quotas, and by the way SLO, you need to have SLI: Service level indicator. that means to have metrics on what you want to observe.
Here, it's not the case. Cloud Storage haven't indicator on the volume of data per day. Thus, you don't have built in indicator and metrics, ... and quotas.
If you want it, you have to build something by your own. To wrap all the Cloud Storage call in a service that count the volume of blob per days and then you will be able to apply your own rules on this personal indicator.
Of course, for preventing any by pass, you have to deny direct access to the buckets and only grant your "indicator service" to access them. Same things for the bucket creation, to register the new buckets in your service.
Not an easy task...

I want to link multiple domains to one bucket with gcs

I want to link multiple domains to one bucket with gcs
However, in an official document, the bucket name will be the domain as it is, so it seems that you can not associate multiple domains.
Do not you know someone?
GCS does not support this directly. Instead, to accomplish this, you'd likely need to make use of Google Cloud Load Balancing with your GCS bucket as a backing store. With it, you can obtain a dedicated, static IP address which you can map several domains to, and it also allows you to map static and dynamic content under the same domain, and it allows you to swap out which bucket is being served at the same path. The main downside to it is added complexity and cost.

How do I share objects on GCS without being billed for downloads?

When objects are uploaded into a GCS bucket and shared publicly, the owner of the bucket is responsible for all of the costs of users downloading these objects. How do I change that so that the downloaders are billed instead of me?
This feature is called "Requester Pays." Its documentation is here: https://cloud.google.com/storage/docs/requester-pays
The idea is that you mark a bucket as being a "requester pays" bucket. Once you've done that, your project is only responsible for the price of storing the objects in the bucket (and, if it's a nearline or coldline bucket, any early deletion fees). Anyone that wants to download an object from this bucket (or upload a new object, copy an object, etc) must specify which of their projects GCS should bill for the operation.
This is a very useful configuration for situations where you wish to make objects publicly available but don't want to be responsible for the cost of distributing it to many end users.
To enable Requester Pays on a bucket, open the Cloud Storage browser, find your bucket, and click the "off" button in the "Requester Pays" column, and follow the prompts. You can also set this flag in other ways, see the docs: https://cloud.google.com/storage/docs/using-requester-pays#enable
Downloading objects from requester pays buckets requires a Google Cloud project with billing enabled. Once you have that, you can download the object from the cloud console or using gsutil:
$> gsutil -u [PROJECT_ID] cp gs://[BUCKET_NAME]/[OBJECT_NAME] [OBJECT_DESTINATION]
The trick to this command is the -u [PROJECT_ID] bit, which specifies which project should be billed for the download.
You can also download the object using our other APIs or with the cloud console. More in the docs: https://cloud.google.com/storage/docs/using-requester-pays#using

Google storage public file security - access without a link?

I need to use google cloud storage to store some files that can contain sensitive information. File names will be generated using crypto function, and thus unguessable. Files will be made public.
Is it safe to assume that the file list will not be available to public ? I.e. file can only be accessed by someone who knows the file name.
I have ofc tried accessing the parent dir and bucket, and I do get rejected with unauthenticated error. I am wondering if there is or will ever be any other way to list the files.
Yes, that is a valid approach to security through obscurity. As long as the ACL to list the objects in a bucket is locked down, your object names should be unguessable.
However, you might consider using Signed URLs instead. They can have an expiration time set so it provides extra security in case your URLs are leaked.
Yes, but keep in mind that the ability to list objects in a bucket is allowed for anyone with read permission or better on the bucket itself. If your object names are secret, make sure to keep the bucket's read permissions locked down as much as possible.
jterrace's suggestion about preferring signed URLs is a good one. The major downside to obscure object names is that it's very difficult to remove access to a particular entity later without deleting the resource entirely.

Amazon S3 + CloudFront Queries

I am currently making a social sharing like app and I encounter a problem.
First off, S3 in my experience is slow, so I need to sync the data for multiple servers around the world to make it faster for multiple users.
So my question is, I need to create multiple buckets for each country right? Amazon has a list of their server locations. So for each user, I calculate the nearest server than upload there? How?
Next question, in my app people can subscribe to others and check for their updates. So realistically, this would not create a speed difference. If someone in Singapore uploaded a piece of text and has a subscriber in United States, it wouldn't be any quicker for this subscriber because he has to download a piece of text stored all the way in the Singapore.
All of this is making me confused! I personally find S3 very slow, which is why I am using CloudFront.
Any help? Am I misunderstanding the process? Thanks!
Buckets are not per country, they are per region (EU, US, Asia, etc.)
Secondly, you do not have to manage closest URL to your S3 buckets, that's what CloudFront is for, you just get a single URL for each bucket and CloudFront will manage routing the user's request to the closest edge location.
PS: In addition, Amazon replicates data uploaded to your bucket across all edge locations transparently.
Amazon in no way "automatically" replicates your content out to the edge locations. Instead, your content is copied to a single edge location, if (and only) if the content is not there (could be the first pull, could be it's expired) when a user tries to access it from that edge. It is a pull mechanism, not a push. See "Download Distributions for HTTP Delivery" section of http://aws.amazon.com/cloudfront/