Export firestore data by overwriting existing data gcloud firestore - google-cloud-firestore

I am trying to overwrite existing export data in gcloud using:
gcloud firestore export gs://<PROJECT>/dir --collection-ids='tokens'
But I get this error:
(gcloud.firestore.export) INVALID_ARGUMENT: Path already exists: /fcm-test-firebase.appspot.com/dir/dir.overall_export_metadata
Is there anyway to either delete the path or export with replace?

You can easily determine the list of available flags for any gcloud.
Here are variants of the command and you can see that there's no overwrite option:
gcloud firestore export
gcloud alpha firestore export
gcloud beta firestore export
Because the export is too a Google Cloud Storage (GCS) bucket, you can simply delete the path before attempting the export.
BE VERY CAREFUL with this command as it recursively deletes objects
gsutil rm -r gs://<PROJECT>/dir
If you would like Google to consider adding an overwrite feature, consider filing a feature request on it's public issue tracker.
I suspect that the command doesn't exist for various reasons:
GCS storage is cheap
Many backup copies is ∞>> no backup copies
It's easy to delete copies using gsutil

Related

Is there a way to tag or version Cloud Storage buckets?

I have a shell script which refreshes my emulators data to the latest data from prod.
Part of the script is removing the existing bucket and then re exporting it to avoid the Path already exists error.
I know that I can manually add version buckets like /firestore_data/v1 but that would require me to find out what the last version is from the console and then update the shell script each time I need to refresh the emulators data.
Ideally I would like to be able to run gsutil -m cp -r gs://my-app.appspot.com/firestore_data#latest
Is there any way to version storage buckets, or to leave tags that can be used when adding and copying down?

Exporting firebase data to json

I'm trying to export a particular Firestore collection using gcloud
Right now, I did
gcloud storage buckets create gs://my-bucket
gcloud firestore export gs://my-bucket --collection-ids=my-collection
gcloud storage cp -r gs://my-bucket/2022-09-22T09:20:16_11252 my-directory
which results in some content in my-directory/all_namespaces/.../output-1. The output-1 file definitely seems to contain some relevant data but it is not very readable. Hence some questions:
which file format is used?
can I export to JSON (or CSV or XML) directly
can I convert the current file to JSON, CSV or XML
And related
why is output-0 empty?
Firestore does not support exporting existing data to a readable file but Firestore do have a managed Exporting and importing data that allows you to dump your data into a GCS bucket. It produces a format that is the same as Cloud Datastore uses. This means you can then import it into BigQuery. You can refer to this stack overflow and this video
Also as mentioned above in comment by Dazwilkin, The output of a managed export uses the LevelDB log format.
Additionally you can have a look at this link1 & link2
Surprisingly, there don't seem to be a lot of LevelDB tools available so exporting to LevelDB format is not convenient.
I managed to export to csv by adding two extra steps: loading and extracting to/from BigQuery. So I now do something like
# create bucket
gcloud storage buckets create gs://my-bucket
# firestore export
gcloud firestore export gs://my-bucket/my-prefix --collection-ids=my-collection
# create dataset
bq mk --dataset my-dataset
# load bucket into BigQuery
bq load --replace --source_format=DATASTORE_BACKUP my-dataset.input \
gs://my-bucket/my-prefix/all_namespaces/.../....export_metadata
# export BigQuery as csv to bucket
bq extract --compression GZIP 'my-dataset.input' gc://my-bucket/results.csv
# download csv file
gcloud storage cp -r gc://my-bucket/results.csv <local-dir>
Yes you can export firebase data as json, follow this article for exporting the data from firebase I hope this helps - https://support.google.com/firebase/answer/6386780?hl=en#zippy=%2Cin-this-article

Reading bucket from another project in cloudshell

Because Firestore does not have a way to clone projects, I am attempting to achieve the equivalent by copying data from one project into a GCS bucket and read it into another project.
Specifically, using cloudshell I populate the bucket with data exported from Firestore project A and am attempting to import it into Firestore project B. The bucket belongs to Firestore project A.
I am able to export the data from Firestore project A without any issue. When I attempt to import into Firestore project B with the cloudshell command
gcloud beta firestore import gs://bucketname
I get the error message
project-b#appspot.gserviceaccount.com does not have storage.
buckets.get access to bucketname
I have searched high and low for a way to provide the access rights storage.bucket.get to project B, but am not finding anything that works.
Can anyone point me to how this is done? I have been through the Google docs half a dozen times and am either not finding the right information or not understanding the information that I find.
Many thanks in advance.
For import from a project A in a project B, the service account in project B must have the right permissions for the Cloud Storage bucket in project A.
In your case, the service account is:
project-ID#appspot.gserviceaccount.com
To grant the right permissions you can use this command on the Cloud Shell of project B:
gsutil acl ch -u project-ID#appspot.gserviceaccount.com:OWNER gs://[BUCKET_NAME]
gsutil -m acl ch -r -u project-ID#appspot.gserviceaccount.com:OWNER gs://[BUCKET_NAME]
Then, you can import using the firestore import:
gcloud beta firestore import gs://[BUCKET_NAME]/[EXPORT_PREFIX]
I was not able to get the commands provided by "sotis" to work, however his answer certainly got me heading down the right path. The commands that eventually worked for me were:
gcloud config set project [SOURCE_PROJECT_ID]
gcloud beta firestore export gs://[BUCKET_NAME]
gcloud config set project [TARGET_PROJECT_ID]
gsutil acl ch -u [RIGHTS_RECIPIENT]:R gs://[BUCKET_NAME]
gcloud beta firestore import gs://[BUCKET_NAME]/[TIMESTAMPED_DIRECTORY]
Where:
* SOURCE_PROJECT_ID = the name of the project you are cloning
* TARGET_PROJECT_ID = the destination project for the cloning
* RIGHTS_RECIPIENT = the email address of the account to receive read rights
* BUCKET_NAME = the name of the bucket that stores the data.
Please note, you have to manually create this bucket before you export to it.
Also, make sure the bucket is in the same geographic region as the projects you are working with.
* TIMESTAMPED_DIRECTORY = the name of the data directory automatically created by the "export" command
I am sure that this is not the only way to solve the problem, however it worked for me and appears to be the "shortest path" solution I have seen.

Google Cloud Firestore: How to copy Firestore collection to Cloud Storage

Writing a code is the only option to copy Firestore collections to Cloud Storage or is there some kind of a magic feature I can use?
I know this new feature announcement of importing Firestore collection into BigQuery in the Firestore talk during the Next conference. Is there something similar for Cloud Storage?
https://cloud.google.com/firestore/docs/manage-data/export-import. Not so sure whether this is a new feature but I am going to try this out.
Yes, finally, Firebase enabled this feature.
Create Cloud Storage Bucket
install gcloud if not already: in terminal run curl
https://sdk.cloud.google.com | bash
after prompt Modify profile to update your $PATH and enable bash completion? (Y/n) type y + enter
next, run source .bash_profile
afterwards, run: gcloud beta firestore export gs://[BUCKET-NAME].
and in case, you want to save the folder locally, simply run gsutil cp -r gs://[BUCKET-NAME] /path/to/folder

How to skip existing files in gsutil rsync

I want to copy files between a directory on my local computer disk and my Google Cloud Storage bucket with the below conditions:
1) Copy all new files and folders.
2) Skip all existing files and folders irrespective of whether they have been modified or not.
I have tried to implement this using the Google ACL policy, but it doesn't seem to be working.
I am using Google Cloud Storage admin service account to copy my files to the bucket.
As #A.Queue commented, the solution to skip existing files would be the use of the gsutil cp command with the -n option. This option means no-clobber, so that all files and directories already present in the Cloud Storage bucket will not be overwritten, and only new files and directories will be added to the bucket.
If you run the following command:
gsutil cp -n -r . gs://[YOUR_BUCKET]
You will copy all files and directories (including the whole directory tree with all files and subdirectories underneath) that are not present in the Cloud Storage bucket, while all of those which are already present will be skipped.
You can find more information related to this command in this link.