Cannot restore files in GCS with versioning, when deleting folder - google-cloud-storage

For backups, I have set versioning in GCS.
Then I created folder and I put a file in the folder. After that, I've deleted the folder.
Then I used gsutil ls -alr command, but I cannot find the file in the bucket.
I found the folder, but I cannot restore the file in the folder.
When I delete a folder, why can't I restore a file in that folder even if setting versioning of GCS?

Files in the Google Cloud Storage bucket that are archived and NOT live during the deletion of the folder remain in the archived list and can be retrieved.
For example you can:
Create a folder in bucket using Google Cloud Console: gs://[BUCKET_NAME]/example
Put a file to the folder using Google Cloud Console: gs://[BUCKET_NAME]/example/file_1.txt
Put another file to the folder using Google Cloud Console: gs://[BUCKET_NAME]/example/file_2.txt
Using the Google Cloud Console delete the file_1.txt
Using the Google Cloud Console delete the folder example
Run the command gsutil ls -alr gs://[BUCKET_NAME]/example
You will see a result as follows:
$ gsutil ls -alr gs://[BUCKET_NAME]/example
gs://[BUCKET_NAME]/example/:
11 2019-02-27T11:48:54Z gs://[BUCKET_NAME]/example/#1551268... metageneration=1
14 2019-02-27T11:49:49Z gs://[BUCKET_NAME]/example/file_1.txt#1551268189... metageneration=1
TOTAL: 2 objects, 25 bytes (25 B)
You will notice that only the file_1.txt is available for retrial since it is the one that was archived and NOT LIVE when the folder was deleted.
Also, to list all the archived objects of the bucket you can run gsutil ls -alr gs://[BUCKET_NAME]/**.
So if your files were archived and deleted before the folder was deleted, you can list them using gsutil ls -alr gs://[BUCKET_NAME]/** and retrieve them using another command, for more info visit the Using Object Versioning > Copying archived object versions documentation.

Related

How to download all bucket files. (The issue with the -m flag gsutil)

I am trying to copy all files from cloud storage bucket recursively and I am having problem with the -m flag as I have investigated.
The command that I am running
gsutil -m cp -r gs://{{ src_bucket }} {{ bucket_backup }}
I am getting something like this:
CommandException: 1 file/object could not be transferred.
where the number of files/objects differs every time.
After investigation I have tried to reduce number of threads/processes which used with the -m option, but this has not helped, so I am looking for some advice about this. I have 170 MiB data on the bucket which is approximately 300k files. I need to download them as fast as possible
UPD:
Logs with -L flag
[Errno 2] No such file or directory: '<path>/en_.gstmp' -> '<path>/en'
6 errors like that.
The root of the issue might be that both directory and file of the same name exist in the GCS bucket. Try executing the command with -L flag, so you will get additional logs on the execution and you will be able to find the file that is causing this error.
I would suggest you delete that file and make sure there is no directory in the bucket of that name and then upload this file to the bucket again.
Also check if any of the directory created with Jar name. Delete them and processed the copy files.
And check if the required file is already at destination and delete the file at destination and execute copy again.
There are alternatives to copy, for example, it is possible to transfer files using rsync, as described here.
You can also check similar threads: thread1 , thread2 & thread3

gcloud alpha storage ls isn't finding uploaded file

I uploaded a folder structure with a single file inside to an existing gcloud storage bucket.
C:\Users\Administrator\Desktop>gcloud alpha storage cp -r testfolder gs://auction-engine-upload
Copying file://testfolder\testSubfolder\MAXPOWER.png to gs://auction-engine-upload/testfolder/testSubfolder/MAXPOWER.png
Completed files 1/1 | 10.0kiB/10.0kiB
Then I tried to verify the file was uploaded by using the ls command:
gcloud alpha storage ls gs://auction-engine-upload
This lists about 40 directories that are not the /testfolder directory, so I tried a few different ways to get only the /testfolder to list:
gcloud alpha storage ls gs://auction-engine-upload/testfolder
gcloud alpha storage ls gs://auction-engine-upload/testfolder/
gcloud alpha storage ls gs://auction-engine-upload/testfolder/*
But I keep getting this error:
ERROR: (gcloud.alpha.storage.ls) One or more URLs matched no objects.
Am I screwing up syntax or is the file actually not uploaded?
I don't have access to change the permissions in the bucket, so I had to have the account owner create another bucket and give me permission to create the file there.

Nearline - Backup Solution - Versioning

I've setup some Nearline buckets and enabled versioning and object lifecycle management. The use-case is to replace my current backup solution, Crashplan.
Using gsutil I can see the different versions of a file using a command like gsutil ls -al gs://backup/test.txt.
First, is there any way of finding files that don't have a live version (e.g. deleted) but still have a version attached?
Second, is there any easier way of managing versions? For instance if I delete a file from my PC, it will no longer have a live version in my bucket but will still have the older versions associated. Say, if I didn't know the file name would I just have to do a recursive ls on the entire bucket and sift through the output?
Would love a UI that supported versioning.
Thanks.
To check if the object currently has no life version use x-goog-if-generation-match header equal to 0, for example :
gsutil -h x-goog-if-generation-match:0 cp file.txt gs://bucket/file.txt
will fail (PreconditionException: 412 Precondition Failed) if file has a live version and will succeed if it has only archived versions.
In order to automatically synchronize your local folder and folder in the bucket (or the other way around) use gcloud rsync:
gcloud rsync -r -d ./test gs://bucket/test/
notice the trailing / in gs://bucket/test/, without it you will receive
CommandException: arg (gs://graham-dest/test) does not name a directory, bucket, or bucket subdir.
-r synchronize all the directories in ./test recursively to gs://bucket/test/`
-d will delete all files from gs://bucket/test/that are not found in./test`
Regarding UI, there already exists a future request. I don't know anything about third party applications however.

gsutil rsync uploading then immediately deleting file, leaving source and target in different states

I have a script which is running gsutil rsync -r -d -c, and occasionally it will leave the source and target directories out of sync. The last file in in the list (named version.json) is first uploaded, and then immediately deleted.
Has anybody encountered this bug?
Additional information:
versioning is turned off in the target bucket
This occurs when attempting to overwrite the entire contents of the target bucket, which is already present.

Google Cloud Storage upload files modified today

I am trying to figure out if I can use the cp command of gsutil on the Windows platform to upload files to Google Cloud Storage. I have 6 folders on my local computer that get daily new pdf documents added to them. Each folder contains around 2,500 files. All files are currently on google storage in their respective folders. Right now I mainly upload all the new files using Google Cloud Storage Manager. Is there a way to create a batch file and schedule to run it automatically every night so it grabs only files that have been scanned today and uploads it to Google Storage?
I tried this format:
python c:\gsutil\gsutil cp "E:\PIECE POs\64954.pdf" "gs://dompro/piece pos"
and it uploaded the file perfectly fine.
This command
python c:\gsutil\gsutil cp "E:\PIECE POs\*.pdf" "gs://dompro/piece pos"
will upload all of the files into a bucket. But how do I only grab files that were changed or generated today? Is there a way to do it?
One solution would be to use the -n parameter on the gsutil cp command:
python c:\gsutil\gsutil cp -n "E:\PIECE POs\*" "gs://dompro/piece pos/"
That will skip any objects that already exist on the server. You may also want to look at using gsutil's -m flag and see if that speeds the process up for you:
python c:\gsutil\gsutil -m cp -n "E:\PIECE POs\*" "gs://dompro/piece pos/"
Since you have Python available to you, you could write a small Python script to find the ctime (creation time) or mtime (modification time) of each file in a directory, see if that date is today, and upload it if so. You can see an example in this question which could be adapted as follows:
import datetime
import os
local_path_to_storage_bucket = [
('<local-path-1>', 'gs://bucket1'),
('<local-path-2>', 'gs://bucket2'),
# ... add more here as needed
]
today = datetime.date.today()
for local_path, storage_bucket in local_path_to_storage_bucket:
for filename in os.listdir(local_path):
ctime = datetime.date.fromtimestamp(os.path.getctime(filename))
mtime = datetime.date.fromtimestamp(os.path.getmtime(filename))
if today in (ctime, mtime):
# Using the 'subprocess' library would be better, but this is
# simpler to illustrate the example.
os.system('gsutil cp "%s" "%s"' % (filename, storage_bucket))
Alternatively, consider using Google Cloud Store Python API directly instead of shelling out to gsutil.