Exporting firebase data to json - google-cloud-firestore

I'm trying to export a particular Firestore collection using gcloud
Right now, I did
gcloud storage buckets create gs://my-bucket
gcloud firestore export gs://my-bucket --collection-ids=my-collection
gcloud storage cp -r gs://my-bucket/2022-09-22T09:20:16_11252 my-directory
which results in some content in my-directory/all_namespaces/.../output-1. The output-1 file definitely seems to contain some relevant data but it is not very readable. Hence some questions:
which file format is used?
can I export to JSON (or CSV or XML) directly
can I convert the current file to JSON, CSV or XML
And related
why is output-0 empty?

Firestore does not support exporting existing data to a readable file but Firestore do have a managed Exporting and importing data that allows you to dump your data into a GCS bucket. It produces a format that is the same as Cloud Datastore uses. This means you can then import it into BigQuery. You can refer to this stack overflow and this video
Also as mentioned above in comment by Dazwilkin, The output of a managed export uses the LevelDB log format.
Additionally you can have a look at this link1 & link2

Surprisingly, there don't seem to be a lot of LevelDB tools available so exporting to LevelDB format is not convenient.
I managed to export to csv by adding two extra steps: loading and extracting to/from BigQuery. So I now do something like
# create bucket
gcloud storage buckets create gs://my-bucket
# firestore export
gcloud firestore export gs://my-bucket/my-prefix --collection-ids=my-collection
# create dataset
bq mk --dataset my-dataset
# load bucket into BigQuery
bq load --replace --source_format=DATASTORE_BACKUP my-dataset.input \
gs://my-bucket/my-prefix/all_namespaces/.../....export_metadata
# export BigQuery as csv to bucket
bq extract --compression GZIP 'my-dataset.input' gc://my-bucket/results.csv
# download csv file
gcloud storage cp -r gc://my-bucket/results.csv <local-dir>

Yes you can export firebase data as json, follow this article for exporting the data from firebase I hope this helps - https://support.google.com/firebase/answer/6386780?hl=en#zippy=%2Cin-this-article

Related

Best practice for importing bulk data to AWS RDS PostgreSQL database

I have a big AWS RDS database that needs to be updated with data on a periodic basis. The data is in JSON files stored in S3 buckets.
This is my current flow:
Download all the JSON files locally
Run a ruby script to parse the JSON files to generate a CSV file matching the table in the database
Connect to RDS using psql
Use \copy command to append the data to the table
I would like switch this to an automated approach (maybe using an AWS Lambda). What would be the best practices?
Approach 1:
Run a script (Ruby / JS) that parses all folders in the past period (e.g., week) and within the parsing of each file, connect to the RDS db and execute an INSERT command. I feel this is a very slow process with constant writes to the database and wouldn't be optimal.
Approach 2:
I already have a Ruby script that parses local files to generate a single CSV. I can modify it to parse the S3 folders directly and create a temporary CSV file in S3. The question is - how do I then use this temporary file to do a bulk import?
Are there any other approaches that I have missed and might be better suited for my requirement?
Thanks.

Importing Csv file from GCS to postgres Cloud SQL instance invalid input syntax error

When importing a csv file from Cloud Storage into Cloud SQL Postgres using Cloud Composer (AIRFLOW ),I would like to remove the header, or skip rows automatically (in my dag operator: CloudSQLImportInstanceOperator) but i keep having error,It seems CloudSQLImportInstanceOperator doesn't support skip rows,how to resolve such issue?

Export firestore data by overwriting existing data gcloud firestore

I am trying to overwrite existing export data in gcloud using:
gcloud firestore export gs://<PROJECT>/dir --collection-ids='tokens'
But I get this error:
(gcloud.firestore.export) INVALID_ARGUMENT: Path already exists: /fcm-test-firebase.appspot.com/dir/dir.overall_export_metadata
Is there anyway to either delete the path or export with replace?
You can easily determine the list of available flags for any gcloud.
Here are variants of the command and you can see that there's no overwrite option:
gcloud firestore export
gcloud alpha firestore export
gcloud beta firestore export
Because the export is too a Google Cloud Storage (GCS) bucket, you can simply delete the path before attempting the export.
BE VERY CAREFUL with this command as it recursively deletes objects
gsutil rm -r gs://<PROJECT>/dir
If you would like Google to consider adding an overwrite feature, consider filing a feature request on it's public issue tracker.
I suspect that the command doesn't exist for various reasons:
GCS storage is cheap
Many backup copies is ∞>> no backup copies
It's easy to delete copies using gsutil

AWS mirgate data from MongoDB to DynamoDB/S3/Redshift

The issue is that mirgating data from MongoDB to DynamoDB/S3/Redshift currently, as I unterstand for us is not available via AWS DMS Service, as it does not support all data types. Or maybe I'm wrong.
The probelm is that our Mongo object contain not scalar fields(arrays, maps).
So when I make a mirgation task via AWS DMS with table mode, it pull data badly.Buy some reason only selection works. Transformation rules are ignored by DMS(tried renaming and removing).
In the doc mode is all ok, but how can I run migration with some custom script for transformation? As storing data this way still need transformation.
We need some modifications like: rename, remove fields and flatting some fields(for example we ahve a map object and it should be flatten into several scalar fields).
Migration should be done into one of the sources: S3, Dyanamo, Redshift
Will be thankfull for any help and suggestions.
use the following below script to take a backup of the MongoDB DB
mongodump -h localhost:27017 -d my_db_name -o $DEST
use the below command to sync your backup to S3 bucket
aws s3 sync ~/db_backups s3://my-bucket-name
Once your data in S3, you can load very easily to Redshift using copy command

Mongoexport DocumentDB clusters to S3 Bucket

I have two mongodb cluster - want to export data.
I am using EC2 instance to login to DocumentDB cluster and use mongoexport to get all the documents in JSON format.
Problem:
Number of records are more than 2 Billion and mongoexport will create one single file with all records.
Any suggestions on how
1. mongoexport all data to multiple files
2. write all exported data directly to s3 bucket instead of first writing to EC2 and then using aws s3 cp/ sync to upload it to s3.
Looked at https://www.npmjs.com/package/mongo-to-s3 - too old to use
https://www.npmjs.com/package/mongo-dump-s3-2 - it takes mongodump, I want data in Json format.