I'm slowly seeding a NAS backup to Google Cloud Storage with rclone. The standard process is to "rclone sync" a big directory, and then run "rclone check" before moving on to the next one.
As of yesterday, rclone started finding missing files in the Google Storage bucket when running "rclone check", and the list of missing files would be different with each invocation.
I was sure it was something with rclone, but then I decided to run gsutil on a VM instance in the cloud, and "gsutil ls" was exhibiting the same problem!
Here's an example (the data in the bucket has not been modified in over 24 hours):
vitaly#data-exporter:~$ gsutil ls -lR gs://bucketname/Photography/2016 |wc
8817 31667 1034417
vitaly#data-exporter:~$ gsutil ls -lR gs://bucketname/Photography/2016 |wc
8810 31643 1033605
vitaly#data-exporter:~$ gsutil ls -lR gs://bucketname/Photography/2016 |wc
8813 31656 1033965
vitaly#data-exporter:~$ gsutil ls -lR gs://bucketname/Photography/2016 |wc
8818 31671 1034544
vitaly#data-exporter:~$ gsutil ls -lR gs://bucketname/Photography/2016 |wc
8816 31664 1034294
I am using the Nearline storage class. Regardless of the storage type, shouldn't the listing of bucket contents produce the same number of files?
I would appreciate some ideas. This was working as expected last week.
P.S. I enabled versioning recently, but did not see these issues last week with versioning enabled.
Related
When I do gsutil ls -p myproject-id I get a list of buckets (in my case 2 buckets), which I expect to be the list of all my buckets in the project:
gs://bucket-one/
gs://bucket-two/
But, if I do gsutil ls -p myproject-id gs://asixtythreecharacterlongnamebucket I actually get the elements of that long-named bucket:
gs://asixtythreecharacterlongnamebucket/somefolder/
So my question is: why when I do a ls to the project I don't get in the results the long-named bucket?
The only explanation it made sense to me was this: https://stackoverflow.com/a/34738829/3457432
But I'm not sure. Is this the reason? Or could it be other ones?
Are you sure that asixtythreecharacterlongnamebucket belongs to myproject-id? It really sounds like asixtythreecharacterlongnamebucket was created in a different project.
You can verify this by checking the bucket ACLs for asixtythreecharacterlongnamebucket and bucket-one and seeing if the project numbers in the listed entities match:
$ gsutil ls -Lb gs://asixtythreecharacterlongnamebucket | grep projectNumber
$ gsutil ls -Lb gs://bucket-one | grep projectNumber
Also note that the -p argument to ls has no effect in your second command when you're listing objects in some bucket. The -p argument only affects which project should be used when you're listing buckets in some project, as in your first command. Think of ls as listing the children resources belonging to some parent -- the parent of a bucket is a project, while the parent of an object is a bucket.
You don't perform the same request!
gsutil ls -p myproject-id
Here you ask all the bucket resources that belong to a project
gsutil ls -p myproject-id gs://asixtythreecharacterlongnamebucket
Here you ask all the objects that belong to the bucket asixtythreecharacterlongnamebucket and you use the quota project myproject-id
In both case, you need to have permissions to access the resources
I want to find files that size is less than 10k in google cloud storage.
gsutil ls -l gs://my-bucket/
This gsutil command shows list of files with sizes.
Is there a option for filtering the list by file size?
I'll write a script for it, if there is no default option is provided.
gsutil ls doesn't support filtering by size. You could do it with a script like this:
gsutil ls -l gs://your-bucket | awk '/gs:\/\// {if ($1 > 10000) {print $NF}}'
I'm using gsutil and I need to copy a large number of files/subdirectories from a directory on a windows server to a Google Cloud Storage Bucket.
I have checked the documentation but somehow I can't seem to get the syntax right - I'm trying something along these lines:
c:\test>gsutil -m cp -r . gs://mytestbucket
But I keep getting the message:
CommandException: No URLs matched: .
What am I doing wrong here?
Regards
Morten Hjorth Nielsen
Try gsutil -m cp -r * gs://mytestbucket
Or gsutil -m cp -r *.* gs://mytestbucket
Or if your local directory is called test go one dir up and type: gsutil -m cp -r test gs://mytestbucket
Not sure which syntax you need on Windows, but probably the first.
Scenario: there are multiple folders and many files stored in storage bucket that is accessible by project team members. Instead of downloading individual files one at a time (which is very slow and time consuming), is there a way to download entire folders? Or at least multiple files at once? Is this possible without having to use one of the command consoles? Some of the team members are not tech savvy and need to access these files as simple as possible. Thank you for any help!
I would suggest downloading the files with gsutil. However if you have a large number of files to transfer you might want to use the gsutil -m option, to perform a parallel (multi-threaded/multi-processing) copy:
gsutil -m cp -R gs://your-bucket .
The time reduction for downloading the files can be quite significant. See this Cloud Storage documentation for complete information on the GCS cp command.
If you want to copy into a particular directory, note that the directory must exist first, as gsutils won't create it automatically. (e.g: mkdir my-bucket-local-copy && gsutil -m cp -r gs://your-bucket my-bucket-local-copy)
I recommend they use gsutil. GCS's API deals with only one object at a time. However, its command-line utility, gsutil, is more than happy to download a bunch of objects in parallel, though. Downloading an entire GCS "folder" with gsutil is pretty simple:
$> gsutil cp -r gs://my-bucket/remoteDirectory localDirectory
To download files to local machine need to:
install gsutil to local machine
run Google Cloud SDK Shell
run the command like this (example, for Windows-platform):
gsutil -m cp -r gs://source_folder_path "%userprofile%/Downloads"
gsutil rsync -d -r gs://bucketName .
works for me
when using gsutil -m rsync -p -d -r
the ownership became root
Any idea how to run gsutil rsync just like rsync -a?
thanks
Peter
gsutil rsync doesn't currently support preserving POSIX file attributes in the cloud.
It's not guaranteed that the uid/gid on the system that uploaded a file is even valid on the system that downloaded the file. So (at least for now), you'll need to manage your file permissions manually.