Can we preserve the storage class when copying an object to a new bucket? - google-cloud-storage

We have two different buckets: short-term, that has lifecycle policies applied, and retain, where we put data that we intend to keep indefinitely. The way we get data into the retain bucket is usually by copying the original object from the short-term bucket using the JSON API.
The short-term bucket after 30 days moves data to nearline, after 60 days to coldline, and after 90 days deletes the data. The storage class for our retain bucket is standard. When we're copying data from short-term bucket to the retain bucket, we'd like to preserve the storage-class of the file that we're duplicating - is it possible for us to specify the storage class on the destination file using the JSON API?

If you want to preserve the storage class it is recommended to perform a rewrite instead:
Use the copy method to copy between objects in the same location and storage class
In the rewrite you should set the storage class. The other way should be in the case that you have separated the objects according to the storage class, but as per my understanding, this is not your case.

Related

How to store AWS S3 object data to a postgres DB

I'm working on a Golang application where users will be able to upload files:Images & PDFs.
The files will be stored in AWS S3 bucket which I've implemented. However I dont know how to go about retrieving identifiers for the stored items to save them in Postgres.
I was thinking of using an item.ID but the AWS sdk for go method does not provide an object ID:
for _,item:=range response.Contents{
log.Printf("Name : %s\n",item.Key)
log.Printf("ID : %s\n",*item.)
}
What other options are available to retrieve stored object references from AWS S3?
A common approach is to event source a lambda with an S3 bucket event. This way, you can get more details about the object created within your bucket. Then you can make this lambda function to persist the object metadata into postgres
Another option would be simply to append the object key you are using in your SDK to the bucket name you're targeting, then the final result would be full URI that points to the object stored. Something like this
s3://{{BUCKET_NAME/{{OBJECT_KEY}}

Google Cloud Storage Python API: blob rename, where is copy_to

I am trying to rename a blob (which can be quite large) after having uploaded them to a temporary location in the bucket.
Reading the documentation it says:
Warning: This method will first duplicate the data and then delete the old blob. This means that with very large objects renaming could be a very (temporarily) costly or a very slow operation. If you need more control over the copy and deletion, instead use google.cloud.storage.blob.Blob.copy_to and google.cloud.storage.blob.Blob.delete directly.
But I can find absolutely no reference to copy_to anywhere in the SDK (or elsewhere really).
Is there any way to rename a blob from A to B without the SDK copying the file. In my case overwriting B, but I can remove B first if it's easier.
The reason is checksum validation, I'll upload it under A first to make sure it's successfully uploaded (and doesn't trigger DataCorruption) and only then replace B (the live object)
GCS itself does not support renaming objects. Renaming with a copy+delete is done in the client as a helper, and there is no better way to rename an object at the moment.
As you say your goal is checksum validation, there is a better solution. Upload directly to your destination and use GCS's built in checksum verification. How you do this depends on the API:
JSON objects.insert: Set crc32c or md5Hash header.
XML PUT object: Set x-goog-hash header.
Python SDK Blob.upload_from_* methods: Set checksum="crc32c" or checksum="md5" method parameter.

Setting "default" metadata for all+new objects in a GCS bucket?

I run a static website (blog) on Google Cloud Storage.
I need to set a default metadata header for cache-control header for all existing and future objects.
However, editing object metadata instructions show the gsutil setmeta -h "cache-control: ..." command, which doesn't seem to be neither applying to "future" objects in the bucket, nor giving me a way to set a
bucket-wide policy that can be inherited to existing/future objects (since the command is executed per-object).
This is surprising to me because there are features like gsutil defaclwhich let you set a policy for the bucket that is inherited by objects created in the future.
Q: Is there a metadata policy for the entire bucket that would apply to all existing and future objects?
There is no way to set default metadata on GCS objects. You have to set the metadata at write time, or you can update it later (e.g., using gsutil setmeta).
Extracted from this question
According to the documentation, if an object does not have a Cache-Control entry, the default value when serving that object would be public,max-age=3600 if the object is publicly readable.
In the case that you still want to modify this meta-data, you could do that using the JSON API inside a Cloud Funtion that would be triggered every time a new object is created or an existing one is overwritten.

Google cloud storage: change bucket region

I'm using google cloud storage.
I have created a bucket in US, and now I need to move it to EU, due to GDPR.
Is there any way to change the bucket location?
If not, In case of removing the bucket and creating a new one instead - can I give it the same name? (as it is globally unique).
Yes, as per Cloud Storage documentation, you can move or rename your bucket:
If there is no data in your old bucket, delete the bucket and create another bucket with a new name, in a new location, or in a new
project.
If you have data in your old bucket, create a new bucket with the desired name, location, and/or project, copy data from the old bucket
to the new bucket, and delete the old bucket and its contents. (The link provided above describes this process).
Keep in mind that if you would like your new bucket to have the same name as your old bucket, you must move your data twice: an intermediary bucket temporarily holds your data so that you can delete the original bucket and free up the bucket name for the final bucket.

is it possible to copyObject from one cloud object storage instance to another. The buckets are in different regions

I would like to use the node sdk to implement a backup and restore mechanism between 2 instances of Cloud Object Storage. I have added a service ID to the instances and added a permissions for the service id to access the buckets present in the instance i want to write to. The buckets will be in different regions. I have tried a variety of endpoints both legacy and non-legacy private and public to achieve this but i usually get Access Denied.
Is what I am trying to do possible with the sdk? if so can someone point me in the right direction?
var config = {
"apiKeyId": "xxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxx",
"endpoint": "s3.eu-gb.objectstorage.softlayer.net",
"iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - crn:v1:bluemix:public:cloud-object-storage:global:a/xxxxxxxxxxx:xxxxxxxxxxx::",
"iam_apikey_name": "auto-generated-apikey-xxxxxxxxxxxxxxxxxxxxxx",
"iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
"iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/0xxxxxxxxxxxxxxxxxxxx::serviceid:ServiceIdxxxxxxxxxxxxxxxxxxxxxx",
"serviceInstanceId": "crn:v1:bluemix:public:cloud-object-storage:global:a/xxxxxxxxxxxxxxxxxxx:xxxxxxxxxxxxxxxxxxxxxxxxxx::",
"ibmAuthEndpoint": "iam.cloud.ibm.com/oidc/token"
}
This should work as long as you are able to properly grant the requesting user access to be able to read the source of the put-copy, so long as you are not using KeyProtect based keys.
So the breakdown here is a bit confusing due to some unintuitive terminology.
A service instance is a collection of buckets. The primary reason for having multiple instances of COS is to have more granularity in your billing, as you'll get a separate line item for each instance. The term is a bit misleading, however, because COS is a true multi-tenant system - you aren't actually provisioning an instance of COS, you're provisioning a sort of sub-account within the existing system.
A bucket is used to segment your data into different storage locations or storage classes. Other behavior, like CORS, archiving, or retention, acts on the bucket level as well. You don't want to segment something that you expect to scale (like customer data) across separate buckets, as there's a limit of ~1k buckets in an instance. IBM Cloud IAM treats buckets as 'resources' and are subject to IAM policies.
Instead, data that doesn't need to be segregated by location or class, and that you expect to be subject to the same CORS, lifecycle, retention, or IAM policies can be separated by prefix. This means a bunch of similar objects share a path, like foo/bar and foo/bas have the same prefix foo/. This helps with listing and organization but doesn't provide granular access control or any other sort of policy-esque functionality.
Now, to your question, the answer is both yes and no. If the buckets are in the same instance then no problem. Bucket names are unique, so as long as there isn't any secondary managed encryption (eg Key Protect) there's no problem copying across buckets, even if they span regions. Keep in mind, however, that large objects will take time to copy, and COS's strong consistency might lead to situations where the operation may not return a response until it's completed. Copying across instances is not currently supported.