Import Firestore backup to BigQuery programmatically - google-cloud-firestore

I have a firestore backup file in GCS with the name: all_namespaces_kind_Rates.export_metadata. I have set up a cron job to update this file every 24 hours. What I need now is to find a way to programmatically send this export_metadata file to BigQuery. BigQuery has the capability of scheduling data transfer from GCS, but only for files with format: CSV, JSON, AVRO, PARQUET AND ORC. How can I transfer my firestore backup files programmatically into BigQuery?

If your cron job can access bq command line tool, have you tried:
bq load --source_format=DATASTORE_BACKUP [DATASET].[TABLE] [PATH_TO_SOURCE]
See more about the command:
https://cloud.google.com/bigquery/docs/loading-data-cloud-firestore#loading_cloud_firestore_export_service_data

Related

How to get Talend to wait for a file to land in S3

I have a file that lands in AWS S3 several times a day. I am using Talend as my ETL tool to populate a warehouse in Snowflake and need it to watch for the file to trigger my job. I've tried tWaitForFile but can't seem to get it to connect to S3. Has anyone done this before?
Can you check below link automate pipeline using S3 and lambda to trigger files to talend job.
Automate S3 File Push

is there a way dump a TSV file from Storage Bucket to Cloud MySql in GCP?

is there a way to dump a TSV file from Storage Bucket to Cloud MySql in GCP ?. I have large file of TSV with 4M rows.
I couldn't convert it into CSV.
As of today, Cloud SQL only supports CSV and SQL. Nonetheless, I suggest that you take a look at this solution. I used Python to be able to automate this process in case you really need it to make it more than one time. In this case I tried to reproduce your issue and I code a script that basically:
Downloads the TSV file from the Cloud Storage Bucket specified.
Converts the TSV file to a CSV file. Uploads the CSV file to the
Cloud Storage Bucket specified.
Imports the newly added CSV file to
Cloud SQL.
You can find the code as well as the requirements for running this script here. Furthermore, take into account that you will need to replace those values closed by claudators such as [BUCKET_NAME] before running it. Also keep in mind that this script does not delete the TSV download it as well as the CSV file, therefore you will need to delete it manually or you can modify the code in order to delete the files automatically.
Finally, if you would like to investigate further about the API used on the script section, I will attach the documentation need it here & here.

Scheduled Load of Firestore Export to BigQuery

I am successfully exporting all needed collections from Firestore daily using a scheduled cloud function to the storage bucket. I can manually import the collection data into BigQuery using Create Table and choosing Google Cloud Storage as my data source, defining the location and that it is a Cloud Datastore Backup (file format). I can't seem to figure out how to create a scheduled version of this job ( I can rerun it manually from the job history ). Any help on figuring out how to automate these "create table" jobs would be appreciated!

Job is not failing in pentaho which i read it through VFS when the file is not available in S3

i have a job in pentaho which read the data from S3 through virtual file system. So the extracted data from source is not regular as its a adhoc basis. Ideally i had to write a loop condition from today's date to the date which matches in S3.
When the file is not available in s3 then i gave a failure condition to parse it through loop condition, but it is not failing. Could anyone suggest the way?

How to reduce the time for file copy to S3 using Talend

I have created a small job to copy a csv file of 3 million records (350MB) in zip format to Amazon S3 via Talend Data Integration, using tS3put component. The job took around 2hrs 20 min for completion. But when i copy the same file via AWS Cli or Informatica it got completed within an hour.
Do anyone have an idea how to reduce the copy time to S3 using Talend Data Integration Tool?