IBM Cloud Object Storage: Get bucket size using CLI - ibm-cloud

I'm trying to find a way to automate the task of getting COS bucket sizes on IBM Cloud.
I have dozens of buckets over different accounts but still couldn't find a way to get this information using IBM Cloud COS CLI, just other information, like bucket names, etc.

The COS S3 API does not return size information for buckets. Thus, the CLI, which is based on the API, won't return size information either.
But here's an indirect way to find the size of a bucket by looping through the sizes of the individual objects in the bucket
ibmcloud cos objects --bucket <BUCKET_NAME> --output JSON | jq 'reduce (.Contents[] | to_entries[]) as {$key,$value} ({}; .[$key] += $value) | .Size'
The output is in bytes
You may have to loop through the bucket names may be in a shell script. For all the buckets in an account + resource group, run the below command
ibmcloud cos buckets --output JSON
Note: Before running the above commands, remember to add the COS service CRN to the config with the below command
ibmcloud cos config crn --crn <SERVICE_CRN>

The answer that loops through the individual objects is indeed the only (and likely best) way to use the IBM Cloud CLI to find that information, but there are a few other ways worth mentioning for completion.
If you need to do this elegantly on the command line, the Minio Client provides a Linux-esque syntax:
mc du cos/$BUCKET
This returns the size of the bucket in MiB.
Additionally, the COS Resource Configuration API will directly return a bytes_used value, with no iterating over objects behind the scenes. While there's no official CLI implementation yet (although it's in the pipeline) it's relatively easy to use cURL or httpie to query the bucket.
curl "https://config.cloud-object-storage.cloud.ibm.com/v1/b/$BUCKET" \
-H 'Authorization: bearer $TOKEN'

Related

target multiple buckets with eventarc?

currently we are trying to use eventarc to send us all finalized files for buckets.
this works create however, currently it looks like event-arc can only target a single bucket and we would need to enable it for every bucket on it's own. is there a way to target multiple buckets?
currently we use the following to create the eventarc trigger:
gcloud eventarc triggers create storage-events \
--location="$LOCATION" \
--destination-gke-cluster="CLUSTER-NAME" \
--destination-gke-location="$LOCATION" \
--destination-gke-namespace="$NAMSEPACE" \
--destination-gke-service="$SERVICE" \
--destination-gke-path="api/events/receive" \
--event-filters="type=google.cloud.storage.object.v1.finalized" \
--event-filters="bucket=$BUCKET" \
--service-account=$SERVICEACCOUNT-compute#developer.gserviceaccount.com
the problem is, that we generate a bucket per customer, thus we would need to create the trigger for each bucket (which is a alot) is there a simpler way?
You have several options.
If you want to use the native event google.cloud.storage.object.v1.finalized, you must select one and only one bucket. Therefore, you have to create one eventarc per bucket.
If you can use the audit logs event storage.objects.create, you have to activate the audit logs but you are not filter on the buckets. ALL the buckets are listen. If you don't want, you can play with the Cloud logging router to discard the logs that you don't want (especially the audit logs on the buckets that you don't want)
A latest solution, if you really want to use eventarc, especially for the Cloud Event format of the messages, you can do that:
Create a Coud Storage PubSub notification for all your bucket that you want to listen. Use the same PubSub topic everytime
Create a Custom eventarc on PubSub and catch the message published on the Topic.

gcloud list instances in managed group sorted by creation time

I need to get the oldest instance from an instance group. I am using the following command:
gcloud compute instance-groups managed list-instances "instance-group-name" --region "us-central1" --format="value(NAME,ZONE,CREATION_TIMESTAMP)" --sort-by='~CREATION_TIMESTAMP'
But it seems --sort-by is not working or I am using it a bit wrong.
Could you please suggest the right way.
It's probably creationTimestamp not CREATION_TIMESTAMP.
See: instances.list and the response body for the underlying field names.
It's slightly confusing but gcloud requires you to use the (field|property) names of the underlying request|response types not the output names.
Another way to more readily determine this is to add --format=yaml or --format=json to gcloud compute instances list (or any gcloud command) to get an idea of what's being returned so that you can begin filtering and formatting it.

Is there a way to figure out in which region a Google Cloud Storage bucket is hosted?

NCBI (the National Center for Biotech Info) generously provided their data for 3rd parties to consume. The data is located in cloud buckets such as gs://sra-pub-run-1/. I would like to read this data without incurring additional costs, which I believe can be achieved by reading it from the same region as where the bucket is hosted. Unfortunately, I can't figure out in which region the bucket is hosted (NCBI mentions in their docs that's in the US, but not where in the US). So my questions are:
Is there a way to figure out in which region a bucket that I don't own, like gs://sra-pub-run-1/ is hosted?
Is my understanding correct that reading the data from instances in the same region is free of charge? What if the GCS bucket is multi-region?
Doing a simple gsutil ls -b -L either provides no information (when listing a specific directory within sra-pub-run-1 or I get a permission denied error if I try to list info on gs://sra-pub-run-1/ directly using:
gsutil -u metagraph ls -b gs://sra-pub-run-1/
You cannot specify a specific Compute Engine zone as a bucket location, but all Compute Engine VM instances in zones within a given region have similar performance when accessing buckets in that region.
Billing-wise, egressing data from Cloud Storage into a Compute Engine instance in the same location/region (for example, US-EAST1 to US-EAST1) is free, regardless of zone.
So, check the "Location constraint" of the GCS bucket (gsutil ls -Lb gs://bucketname ), and if it says "US-EAST1", and if your GCE instance is also in US-EAST1, downloading data from that GCS bucket will not incur an egress fee.

With AWS Powershell commandlets, how do I specify a different endpoint for S3 buckets

I have AWS instances in several regions (us-east-1, us-west-2). I use CodeDeploy to take .zip files stored in S3 and deploy them to AutoScale groups. However, since the S3 bucket only exists in us-east-1, and I am attempting to deploy to us-west-2, specifying a region in my PowerShell commandlet (New-CDDeployment) doesn't work.
I need to specify a region (us-west-2), but pull the files from the S3 bucket in us-east-1 by using a custom endpoint (s3-us-east-1.amazonaws.com), but I cannot find any way of doing this within the PowerShell commandlet.
Use cross-region replication to replicate your bucket from us-east-1 into us-west-2, and reference the replica bucket in your powershell cmdlet since it will be in the same region.
Even if you didn't have this issue, this would be a good general practice so that you don't lose access to your code on S3 during us-east-1 S3 outages.

Setting the Durable Reduced Availability (DRA) attribute for a bucket using Storage Console

When manually creating a new cloud storage bucket using the web-based storage console (https://console.developers.google.com/), is there a way to specify the DRA attribute? From the documentation, it appears that the only way to create buckets with that attribute is to either use Curl, gsutil or some other script, but not the console.
There is currently no way to do this.
At present, the storage console provides only a subset of the Cloud Storage API, so you'll need to use one of the tools you mentioned to create a DRA bucket.
For completeness, it's pretty easy to do this using gsutil (documentation at https://developers.google.com/storage/docs/gsutil/commands/mb):
gsutil mb -c DRA gs://some-bucket