Unzipping parts of file with GSUTIL before downloading - google-cloud-storage

I am downloading a bunch of ZIP files from GCS with gsutil. Then I extract them to my local drive and only keep some of the files I need.
gsutil cp gs://uspto-pair/applications/*.zip .
unzip -jo \\*.zip *SRNT.pdf -d ./SRNT_files
This is working fine but seems wasteful on the bandwith (I am throwing away most of the content).
Is there any way to unzip the file on GCS and then download only the parts I need?

No, Cloud Storage does not have enough intelligence for this. If bandwidth is the problem, do that operation from a Compute Engine instance. The download will be very fast.
You can also use App Engine but the memory is more limited and you do not have access to the filesystem (so you must keep everything you download in memory). That would not be easy unless you only have small files (<100MB).

Related

Is it possible to run perl in google drive

I have uploaded my entire perl directory to google drive, including perl.exe, /lib, perl scripts, and data files.
Is it possible to run perl.exe on perl scripts using the data files, within the google drive?
If so, where can I find out how to do it?
Google drive is a File storge system.
It is not a server that can run any applications.

Yocto build tips

When I build yocto, there are some files that has larges size and they take me a lot of time to dowload.
I tried placing them in sources/ but doing this is otiose. My question: Is there any way to pass checksum mechanism of yocto?
Thankyou so much for any helping!
The checksum in Yocto is used to make sure that downloaded files are not corrupted or tampered with.
Instead of trying to bypass the checksum mechanism, you can try using a download accelerator or mirror to download the files more quickly.
You can also use a cache server to store the downloaded files, so you
don't have to download them again in the future.
If you need to re-download a file that was partially downloaded or
corrupted, you can use the bitbake command with the -f flag to force
a download.

Unzip a file from several large compressed folder from command line

I have several large zipped folders in my cloud storage drive. I want to transfer a specific file from each of the zipped folders to my local hard drive (I cant copy all of them since i dont have enough space). Is there a way to do this using command line/cmd or powershell. I am using Windows 10 (Build 18362).
The file name is the same so i was hoping if i can write a loop to do this.

How to download files from s3 latest files to local folder using powerShell script and need to include file corruption check point

I need to download latest files from AWS S3 bucket using powershell script and as well as how can tackle file corruption while downloading.

Should I check in *.mo files?

Should I check in *.mo translation files into my version control system?
This is a general question. But in particular I'm working on Django projects with git repositories.
The general answer is:
if you do need those files to compile or to deploy (in shot: to "work" with) your component (set of files queried from your VCS), then yes, they should be stored in it (here: in Git).
This is the same for other kind of files (like project files for instance)
.mo files are particular:
django-admin.py compilemessages utility.
This tool runs over all available .po files and creates .mo files, which are binary files optimized for use by gettext
Meaning:
you should be able to rebuild them every time you need them (guarantying in effect that they are in synch with their .po couterparts)
Git is not so good with binary storage and that would avoid it to store a full version for every changes
So the specific answer is not so clear-cut:
if your po files are stables and will not evolve too often, you could definitively store the .mo file
you should absolutely store a big README file explaning how to generate mo from po files.
The general answer is to not store generated contents in version control.
You can include it in tarball, if it requires rare tools, or even have separate repository or disconnected branch with only those generated files (like 'html' and 'man' branches in git.git repository).
For asked question Jakub answer is pretty neat.
But one could ask:
So where should I store such files? Should I generate them every time I deploy my code?
And for that... it depends. You could deploy it in tarball (as Jakub sugested) or even better - create pip or system package (RPM for fedora, DEB for debian, etc.).