What mechanism is there to prevent GCP bucket name squatting? - google-cloud-storage

My company's name is mycomp.
GCP buckets are in a global, public namespace and their names must be publicly unique so all my buckets are prefixed with mycomp.
So mycomp-production, mycomp-test, mycomp-stage, etc.
What is to prevent someone from grabbing mycomp-dev? Like cybersquatting on that bucket name. Something like that could potentially really screw up my organizational structure.
How can I stop or reserve a bucket prefix? Is this even possible? If I want to be an A-hole whats to stop me from grabbing "Nordstrom" or "walmart" if I get their first?

GCS supports domain-named buckets. This would allow you to create buckets like production.mycomp.com and test.mycomp.com. Since the domain must be owned to create buckets with the domain suffix, it ensures that other people can't create buckets with that naming scheme.

Related

is it possible to copyObject from one cloud object storage instance to another. The buckets are in different regions

I would like to use the node sdk to implement a backup and restore mechanism between 2 instances of Cloud Object Storage. I have added a service ID to the instances and added a permissions for the service id to access the buckets present in the instance i want to write to. The buckets will be in different regions. I have tried a variety of endpoints both legacy and non-legacy private and public to achieve this but i usually get Access Denied.
Is what I am trying to do possible with the sdk? if so can someone point me in the right direction?
var config = {
"apiKeyId": "xxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxx",
"endpoint": "s3.eu-gb.objectstorage.softlayer.net",
"iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - crn:v1:bluemix:public:cloud-object-storage:global:a/xxxxxxxxxxx:xxxxxxxxxxx::",
"iam_apikey_name": "auto-generated-apikey-xxxxxxxxxxxxxxxxxxxxxx",
"iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
"iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/0xxxxxxxxxxxxxxxxxxxx::serviceid:ServiceIdxxxxxxxxxxxxxxxxxxxxxx",
"serviceInstanceId": "crn:v1:bluemix:public:cloud-object-storage:global:a/xxxxxxxxxxxxxxxxxxx:xxxxxxxxxxxxxxxxxxxxxxxxxx::",
"ibmAuthEndpoint": "iam.cloud.ibm.com/oidc/token"
}
This should work as long as you are able to properly grant the requesting user access to be able to read the source of the put-copy, so long as you are not using KeyProtect based keys.
So the breakdown here is a bit confusing due to some unintuitive terminology.
A service instance is a collection of buckets. The primary reason for having multiple instances of COS is to have more granularity in your billing, as you'll get a separate line item for each instance. The term is a bit misleading, however, because COS is a true multi-tenant system - you aren't actually provisioning an instance of COS, you're provisioning a sort of sub-account within the existing system.
A bucket is used to segment your data into different storage locations or storage classes. Other behavior, like CORS, archiving, or retention, acts on the bucket level as well. You don't want to segment something that you expect to scale (like customer data) across separate buckets, as there's a limit of ~1k buckets in an instance. IBM Cloud IAM treats buckets as 'resources' and are subject to IAM policies.
Instead, data that doesn't need to be segregated by location or class, and that you expect to be subject to the same CORS, lifecycle, retention, or IAM policies can be separated by prefix. This means a bunch of similar objects share a path, like foo/bar and foo/bas have the same prefix foo/. This helps with listing and organization but doesn't provide granular access control or any other sort of policy-esque functionality.
Now, to your question, the answer is both yes and no. If the buckets are in the same instance then no problem. Bucket names are unique, so as long as there isn't any secondary managed encryption (eg Key Protect) there's no problem copying across buckets, even if they span regions. Keep in mind, however, that large objects will take time to copy, and COS's strong consistency might lead to situations where the operation may not return a response until it's completed. Copying across instances is not currently supported.

Different S3 behavior using different endpoints?

I'm currently writing code to use Amazon's S3 REST API and I notice different behavior where the only difference seems to be the Amazon endpoint URI that I use, e.g., https://s3.amazonaws.com vs. https://s3-us-west-2.amazonaws.com.
Examples of different behavior for the the GET Bucket (List Objects) call:
Using one endpoint, it includes the "folder" in the results, e.g.:
/path/subfolder/
/path/subfolder/file1.txt
/path/subfolder/file2.txt
and, using the other endpoint, it does not include the "folder" in the results:
/path/subfolder/file1.txt
/path/subfolder/file2.txt
Using one endpoint, it represents "folders" using a trailing / as shown above and, using the other endpoint, it uses a trailing _$folder$:
/path/subfolder_$folder$
/path/subfolder/file1.txt
/path/subfolder/file2.txt
Why the differences? How can I make it return results in a consistent manner regardless of endpoint?
Note that I get these same odd results even if I use Amazon's own command-line AWS S3 client, so it's not my code.
And the contents of the buckets should be irrelevant anyway.
Your assertion notwithstanding, your issue is exactly about the content of the buckets, and not something S3 is doing -- the S3 API has no concept of folders. None. The S3 console can display folders, but this is for convenience -- the folders are not really there -- or if there are folder-like entities, they're irrelevant and not needed.
In Amazon S3, buckets and objects are the primary resources, where objects are stored in buckets. Amazon S3 has a flat structure with no hierarchy like you would see in a typical file system. However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects. Amazon S3 does this by using key name prefixes for objects.
http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
So why are you seeing this?
Either you've been using EMR/Hadoop, or some other code written by someone who took a bad example and ran with it... or is doing something differently than it should have been done for quite some time.
Amazon EMR is a web service that uses a managed Hadoop framework to process, distribute, and interact with data in AWS data stores, including Amazon S3. Because S3 uses a key-value pair storage system, the Hadoop file system implements directory support in S3 by creating empty files with the <directoryname>_$folder$ suffix.
https://aws.amazon.com/premiumsupport/knowledge-center/emr-s3-empty-files/
This may have been something the S3 console did many years ago, and apparently (since you don't report seeing them in the console) it still supports displaying such objects as folders in the console... but the S3 console no longer creates them this way, if it ever did.
I've mirrored the bucket "folder" layout exactly
If you create a folder in the console, an empty object with the key "foldername/" is created. This in turn is used to display a folder that you can navigate into, and upload objects with keys beginning with that folder name as a prefix.
The Amazon S3 console treats all objects that have a forward slash "/" character as the last (trailing) character in the key name as a folder
http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
If you just create objects using the API, then "my/object.txt" appears in the console as "object.txt" inside folder "my" even though there is no "my/" object created... so if the objects are created with the API, you'd see neither style of "folder" in the object listing.
That is probably a bug in the API endpoint which includes the "folder" - S3 internally doesn't actually have a folder structure, but instead is just a set of keys associated with files, where keys (for convenience) can contain slash-separated paths which then show up as "folders" in the web interface. There is the option in the API to specify a prefix, which I believe can be any part of the key up to and including part of the filename.
EMR's s3 client is not the apache one, so I can't speak accurately about it.
In ASF hadoop releases (and HDP, CDH)
The older s3n:// client uses $folder$ as its folder delimiter.
The newer s3a:// client uses / as its folder marker, but will handle $folder$ if there. At least it used to; I can't see where in the code it does now.
The S3A clients strip out all folder markers when you list things; S3A uses them to simulate empty dirs and deletes all parent markers when you create child file/dir entries.
Whatever you have which processes GET should just ignore entries with "/" or $folder at the end.
As to why they are different, the local EMRFS is a different codepath, using dynamo for implementing consistency. At a guess, it doesn't need to mock empty dirs, as the DDB tables will host all directory entries.

How to set access permissions of google cloud storage bucket folder

How do I set access permissions for entire folder in storage bucket? Example; I have 2 folders (containing many subfolders/objects) in single bucket (let's call them folder 'A' and 'B') and 4 members in project team. All 4 members can have read/edit access for folder A but only 2 of the members are allowed to have access to folder 'B'. Is there a simple way to set these permissions for each folder? There are hundreds/thousands of files within each folder and it would be very time consuming to set permissions for each individual file. Thanks for any help.
It's very poorly documented, but search for "folder" in the gsutil acl ch manpage:
Grant the user with the specified canonical ID READ access to all objects in example-bucket that begin with folder/:
gsutil acl ch -r \
-u 84fac329bceSAMPLE777d5d22b8SAMPLE785ac2SAMPLE2dfcf7c4adf34da46:R \
gs://example-bucket/folder/
Leaving this here so someone else doesn't waste an afternoon beating their head against this wall. It turns out that 'list' permissions are handled at the bucket level in GCS and you can't restrict them using a Condition based on object name prefix. If you do, you won't be able to access any resources in the bucket, so you have to setup the Member with unrestricted 'Storage Object Viewer' role and use Conditions with specified object prefix for 'Storage Object Admin' or 'Storage Object Creator' to restrict (over)write access. Not ideal if you are trying to keep the contents of your bucket private.
https://cloud.google.com/storage/docs/access-control/iam
"Since the storage.objects.list permission is granted at the bucket level, you cannot use the resource.name condition attribute to restrict object listing access to a subset of objects in the bucket. Users without storage.objects.list permission at the bucket level can experience degraded functionality for the Console and gsutil."
It looks like this has become possible through IAM Conditions.
You need to set a IAM Condition like:
resource.name.startsWith('projects/_/buckets/[BUCKET_NAME]/objects/[OBJECT_PREFIX]')
This condition can't be used for the permission storage.objects.list though. Add two roles to a group/user. The first one to grant list access to the whole bucket and the second one containing the condition above to allow read/write access to all objects in your "folder". Like this the group/user can list all objects in the bucket, but can only read/download/write the allowed ones.
There are some limitations here, such as no longer being able to use the gsutil acl ch commands referenced in other answers.
You cannot do this in GCS. GCS provides permissions to buckets and permissions to objects. A "folder" is not a GCS concept and does not have any properties or permissions.
Make sure, you have configured your bucket to have Fine-Grained Permission.
gsutil -m acl ch -r -g All:R gs://test/public/another/*
If doesn't work,
3. add yourself as gcs admin, legacy reader/writer permission. (which is irrelevant).
But worked for me.
I tried all suggestions here including providing access with CEL. Then I come across why everyone is not successful in resolving this issue is because GCP does not treat folders as existing.
From https://cloud.google.com/storage/docs/folders:
Cloud Storage operates with a flat namespace, which means that folders don't actually exist within Cloud Storage. If you create an object named folder1/file.txt in the bucket your-bucket, the path to the object is your-bucket/folder1/file.txt, but there is no folder named folder1; instead, the string folder1 is part of the object's name.
It's just a visual representation that provides us a hierarchical feel of the bucket and objects within it.

Google Cloud Storage : how many can create ACL

In the google document, it says:
"The maximum number of ACL entries you can create for a bucket or object is 100"
does that mean I can create just 100 Regardless of objects or buckets? or I can create 100 each objects and bucket?
Any help? Thanks.
All objects and all buckets have an ACL list. Any ACL list may have up to 100 entries, but no more. So a bucket can have 100 entries in its ACL, and an object in that bucket may also have 100 entries in that object's ACL.
Note: it is generally not recommended to place large numbers of ACL entries in an object or bucket's ACL list. Instead, consider one of these alternatives, which both have the advantage of not needing to modify the bucket or object when adding or removing users and groups:
Add the user or groups you need to your project's OWNER, EDITOR, and VIEWER roles, and use those roles in your bucket and object ACLs.
Add the user or groups you need to a google group and then add that google group to your bucket and object ACLs.

Google storage api list storage bucket with "/" in the name

I am trying to list all objects in a bucket(Google storage) in the google storage api. The bucket is nested like a folder, such as "my-bucket/sub-folder". I got the following error:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
If I use a bucket name without "/" it works fine. How can I list a bucket like a folder structure?
Google Cloud Storage buckets do not have slashes in their name. In the example above, the bucket is named "my-bucket" and the object is named something like "sub-folder/object.txt" or just "object.txt".
It's useful to remember that GCS does not have any real notion of folders. There are only buckets and objects in buckets. If you have a subdirectory named "dir" in bucket named "mybucket", and that subdirectory has 5 objects in it, what you really have is 5 objects named "dir/obj1", "dir/obj2", etc, all still within bucket "mybucket."
A number of tools (like gsutil and the GCS web-based storage browser) make it appear that there are folders, through use of markers and prefixes in the API -- even though as noted, there really are just objects that have slashes in the name.