gsutil - what are the storage class options for cp? - google-cloud-storage

I'm using gsutil CLI to copy files to buckets of Google Cloud. But I didn't find what are the options for specifying a storage class? I read the documentation and the options are not written there. It's just written to use -s <class> but what are the options for <class>?

These are the following Storage Class that you can define:
STANDARD
NEARLINE
COLDLINE
ARCHIVE
Additional: You should only use copy to copy between the same location and storage class

Related

move a bucket to other storage class

I want to transfer a complete bucket to coldline easy. My problem is that when I try to run gsutil, it disconnects and charges me each time.
This is the command I'm trying to use:
gsutil rewrite -s coldline gs: // bucket / **
You can use a lifecycle policy on the bucket to downgrade all objects in the bucket to coldline.

Downloading public data directory from google cloud storage with command line utilities like wget

I would like to download publicly available data from google cloud storage. However, because I need to be in a Python3.x environment, it is not possible to use gsutil. I can download individual files with wget as
wget http://storage.googleapis.com/path-to-file/output_filename -O output_filename
However, commands like
wget -r --no-parent https://console.cloud.google.com/path_to_directory/output_directoryname -O output_directoryname
do not seem to work as they just download an index file for the directory. Neither do rsync or curl attempts based on some initial attempts. Any idea of how to download publicly available data on google cloud storage as a directory?
The approach you mentioned above does not work because Google Cloud Storage doesn't have real "directories". As an example, "path/to/some/files/file.txt" is the entire name of that object. A similarly named object, "path/to/some/files/file2.txt", just happens to share the same naming prefix.
As for how you could fetch these files: The GCS APIs (both XML and JSON) allow you to do an object listing against the parent bucket, specifying a prefix; in this case, you'd want all objects starting with the prefix "path/to/some/files/". You could then make individual HTTP requests for each of the objects specified in the response body. That being said, you'd probably find this much easier to do via one of the GCS client libraries, such as the Python library.
Also, gsutil currently has a GitHub issue open to track adding support for Python 3.

How to download multiple objects from IBM Cloud Object Storage?

I am trying to use IBM Cloud Object Storage to store images uploaded to my site by users. I have this functionality working just fine.
However, based on the documentation here (link) it appears as though only one object can be downloaded from a bucket at a time.
Is there any way a list of objects could all be downloaded from the bucket? Is there a different approach to requesting multiple objects from a COS bucket?
Via the REST API, no, you can only download a single object at a time. But most tools (like the AWS CLI, or Minio Client) allow downloading all objects that share a prefix (eg foo/bar and foo/bas). The IBM forks of the S3 libraries also are now integrated with Aspera, and can transfer large directories all at once. What are you trying to do?
According to S3 spec (https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html), you can only download one object at a time.
There are various tools which may help to download multiple objects at a time from COS. I used AWS CLI tool to download and upload the objects from/to COS.
So install aws-cli tool and configure it by supplying access_key_id and secret_access_key here.
Recursively copying S3 objects to a local directory: When passed with the parameter --recursive, the following cp command recursively copies all objects under a specified prefix and bucket to a specified directory.
C:\Users\Shashank>aws s3 cp s3://yourBucketName . --recursive
for example:
C:\Users\Shashank>aws --endpoint-url http://s3.us-east.cloud-object-storage.appdomain.cloud s3 cp s3://yourBucketName D:\s3\ --recursive
In my case having endpoint based on us-east region and I am copying objects into D:\s3 directory.
Recursively copying local files to S3: When passed with the parameter --recursive, the following cp command recursively copies all files under a specified directory to a specified bucket.
C:\Users\Shashank>aws s3 cp myDir s3://yourBucketName/ --recursive
for example:
C:\Users\Shashank>aws --endpoint-url http://s3.us-east.cloud-object-storage.appdomain.cloud s3 cp D:\s3 s3://yourBucketName/ --recursive
I am copying objects from D:\s3 directory to COS.
For more reference, you can see the link here.
I hope it works for you.

Mount Bucket on Google Storage

I want to mount a Google bucket to a local server. However, when I run the line, the directory I point it to is empty. Any ideas?
gcsfuse mssng_vcf_files ./mountbucket/
It reports:
File system has been successfully mounted.
but the directory mountbucket/ is empty.
gcsfuse will not show any directory defined by a file with a slash in its name. So if your bucket contains /files/index.txt it will not show until you create a object named files. I am assuming here your bucket contains directories then files, and if that is the case this may be your problem.
gcsfuse supports a flag called --implicit-dirs that changes the behaviour. When this flag is enabled, name lookup requests from the kernel use the GCS API's Objects.list operation to search for objects that would implicitly define the existence of a directory with the name in question. So, in the example above, there would appear to be a directory named "files".
There are some drawbacks which are defined here -
https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/semantics.md#implicit-directories
So you have 2 options
Create the directories in your bucket which will make your files appear
Look at --implicit-dirs flag to get them to always appear.
Hope this helps.

Change storage class of (existing) objects in Google Cloud Storage

I recently learnt of the new storage tiers and reduced prices announced on the Google Cloud Storage platform/service.
So I wanted to change the default storage class for one of my buckets from Durable Reduced Availability to Coldline, as that is what is appropriate for the files that I'm archiving in that bucket.
I got this note though:
Changing the default storage class only affects objects you add to this bucket going forward. It does not change the storage class of objects that are already in your bucket.
Any advice/tips on how I can change class of all existing objects in the bucket (using Google Cloud Console or gsutil)?
The easiest way to synchronously move the objects to a different storage class in the same bucket is to use rewrite. For example, to do this with gsutil, you can run:
gsutil -m rewrite -s coldline gs://your-bucket/**
Note: make sure gsutil is up to date (version 4.22 and above support the -s flag with rewrite).
Alternatively, you can use the new SetStorageClass action of the Lifecycle Management feature to asynchronously (usually takes about 1 day) modify storage classes of objects in place (e.g. by using a CreatedBefore condition set to some time after you change the bucket's default storage class).
To change the storage class from NEARLINE to COLDLINE, create a JSON file with the following content:
{
"lifecycle": {
"rule": [
{
"action": {
"type": "SetStorageClass",
"storageClass": "COLDLINE"
},
"condition": {
"matchesStorageClass": [
"NEARLINE"
]
}
}
]
}
}
Name it lifecycle.json or something, then run this in your shell:
$ gsutil lifecycle set lifecycle.json gs://my-cool-bucket
The changes may take up to 24 hours to go through. As far as I know, this change will not cost anything extra.
I did this:
gsutil -m rewrite -r -s <storage-class> gs://my-bucket-name/
(-r for recursive, because I want all objects in my bucket to be affected).
You could now use "Data Transfer" to change a storage class by moving your bucket objects to a new bucket.
Access this from the left panel of Storage.
If you couldn't access to the gsutil console, as in Google Cloud Function environment because Cloud Functions server instances don't have gsutil installed. Gsutil works on your local machine because you do have it installed and configured there. For all these cases I suggest you to evaluate the update_storage_class() blob method in python. This method is callable when you retrieve the single blob (in other words it refers to your specific object inside your bucket). Here an example:
from google.cloud import storage
storage_client = storage.Client()
blobs = storage_client.list_blobs(bucket_name)
for blob in blobs:
print(blob.name)
print(blob.storage_class)
all_classes = ['NEARLINE_STORAGE_CLASS', 'COLDLINE_STORAGE_CLASS', 'ARCHIVE_STORAGE_CLASS', 'STANDARD_STORAGE_CLASS', 'MULTI_REGIONAL_LEGACY_STORAGE_CLASS', 'REGIONAL_LEGACY_STORAGE_CLASS']
new_class = all_classes[my_index]
update_storage_class(new_class)
References:
Blobs / Objects documentation: https://googleapis.dev/python/storage/latest/blobs.html#google.cloud.storage.blob.Blob.update_storage_class
Storage classes: https://cloud.google.com/storage/docs/storage-classes