Allow gzip on existing files - google-cloud-storage

I have static assets stored in GCS and I'd like to serve them gzipped (but they were uploaded without compression). Is there any way to set files to be compressed without downloading and re-uploading them in gzipped format?
I tried setting the content-encoding header with gsutil (i.e., gsutil setmeta -h 'Content-Encoding:gzip' <some_object_uri>, but it just led to a "Service Unavailable" on the file (which I assume is from the server attempting to ungzip the file and failing or something like that).

There is no way to compress the objects without downloading them and re-uploading.
However, you can have gsutil do this for you, and if you run it from a Google Compute Engine (GCE) Virtual Machine (VM), you'll only be charged for operation counts, not for bandwidth.
Also, regarding setting the content-encoding header with setmeta, you're right in your interpretation of what happened. You set the metadata on the object to indicate that it contained gzip data, but the contents did not contain a valid gzip stream, so when you try to download it with Accept-Encoding: gzip, the GCS service tries to decompress the stream and fails.
I'd suggest downloading the bucket to the local disk on a GCE VM:
gsutil cp -r gs://bucket /path/to/local/disk
Then, use the -z option to indicate which file extensions to gzip:
gsutil cp -z js,css,html -r /path/to/local/disk gs://bucket

Related

trickle does not limit the bandwidth of gsutil

I have tried to copy a .mp4 file from my local directory to my google cloud bucket,
using:
gsutil cp my_filefile.mp4 gs://my_bucket
This part works as expected, but when i try to limit the bandwidth, using:
trickle -d 10 -u 10 gsutil cp my_filefile.mp4 gs://my_bucket
the uploading happens at the same rate, and not with 10 kb/s. I have read that trickle does not handle static executable files, which the .mp4 appears to be since when running ldd my_file.mp4, in the terminal, it returns not a dynamic executable.
Has anyone experienced the same issue, and if that is the case, how was the problem handled, or am i approaching this issue the wrong way?
UPDATE 1:
Turns out it does not matter what file i use. gsutil still bypasses trickle somehow. I have tested to see if trickle worked with other programs, and it performed as expected, with bandwidth control.
I have also tested gsutil mv and gsutil rsync, with the same results, as with cp. I have also tested the bandwidth throttling on an arm64 system, with the same results.
You should limit the number of thread and process as described in the documentation. Trickle shouldn't been applied in case of multi process.
trickle -d 10 -u 10 gsutil -o "GSUtil:parallel_process_count=1" \
-o "GSUtil:parallel_thread_count=1" cp my_filefile.mp4 gs://my_bucket

How do I set a default cache-control for for new images uploaded to buckets on google storage

I know that you can run a command at upload to set the cache-control of the image being uploaded
gsutil -h "Cache-Control:public,max-age=2628000" cp -a public-read \\
-r html gs://bucket
But I'm using carrierwave in rails and don't think its possible to set it up to run this command each time an image is uploaded.
I was looking around to see if you can change the default cache-control number but cant find any solutions. Currently I run gsutil -m setmeta -h "Cache-Control:public, max-age=2628000" gs://bucket/*.png every now and then to update new images but this is a horrible solution.
Any ideas on how to set the default cache-control for files uploaded to a bucket?
There's no way to set a default Cache-Control header on newly uploaded files. It either needs to be set explicitly (by setting the header) at the time the object is written, or after the upload by updating the object's metadata using something like the gsutil command you noted.

How to save a file from an https link to Google Cloud Storage

I would like to save a large file (approximately 50 GB) directly on Google Cloud storage. I tried gsutil cp https://archive.org/download/archiveteam-twitter-stream-2015-08/archiveteam-twitter-stream-2015-08.tar gs://my/folder, but that didn't work (InvalidUrlError: Unrecognized scheme "https").
Is there a way of doing that, without having to first download the file to my local storage?
Thanks!
You can use curl to fetch the URL and pipe it to gsutil. For example:
curl -L https://archive.org/download/archiveteam-twitter-stream-2015-08/archiveteam-twitter-stream-2015-08.tar | gsutil cp - gs://your/folder/archiveteam-twitter-stream-2015-08.tar

gsutil copy to storage failing

i'm working in a instance at us-central1-a zone and I can't copy a ~200GB file.
i've tried :
gsutil -m cp -L my.log my.file gs://my-bucket/
gsutil -m cp -L my.second.log my.file gs://my-bucket2/
And after several "catch ups" I get the following error:
CommandException: Some temporary components were not uploaded successfully. Please retry this upload.
CommandException: X files/objects could not be transferred.
Any clues?
Thanks
This is a message you'll see if gsutil's parallel composite uploads feature fails to upload at least one of the pieces of the file.
A couple of questions...
Have you already tried performing this upload again, after you saw this message?
If this error persists, could you please provide the stack trace from gsutil -d cp...
If you're consistently seeing this error and need an immediate fix (if this is a bug with parallel uploads), you can set parallel_composite_upload_threshold=0 in the GSUtil section your boto config to disable parallel uploads.
I had the same experience using gsutil. I fixed by installing the crcmod.
First run the command you have issues with using the debug flag, for example:
gsutil -d -m cp gs://<path_to_file_in_bucket>
In the output I can see:
CommandException: Downloading this composite object requires integrity checking with CRC32c, but your crcmod installation isn't using the module's C extension, so the hash computation will likely throttle download performance. For help installing the extension, please see "gsutil help crcmod".
To download regardless of crcmod performance or to skip slow integrity checks, see the "check_hashes" option in your boto config file.
NOTE: It is strongly recommended that you not disable integrity checks. Doing so could allow data corruption to go undetected during uploading/downloading.
You can follow the instructions here from google to install crcmod for your specific os: https://cloud.google.com/storage/docs/gsutil/addlhelp/CRC32CandInstallingcrcmod
I got the same error message. I tried login in to gcloud again with
gcloud auth login
and then I could run the command successfully.

Compress images in google cloud storage

according to the recommendations of page speed, how i can compress images to upload in my account in google cloud storage to server fast to my wordpress blog?
Thanks for you support.
Try below command, you can select image format like png, jpg etc also set cache expiry for 7 days which will save your bandwidth and billing charges
gsutil -h "Cache-Control:public, max-age=604800" cp -z png /images/* gs://backetname/images
or
gsutil -h "Cache-Control:public, max-age=604800" cp -z png /images/logo.png gs://backetname/images