Create materialized views from Google Cloud PostgreSQL in BigQuery - postgresql

We have a database in Google Cloud SQL which has partitioned tables created with same naming format every month.
I would like to create and update materialized view from these partitioned tables for last few days (updated every 5-10 mins), last few weeks or months (updated every hour or daily).
Is using BigQuery right way of doing so? I couldn't find a way to pull data in some of the ingestion guides.
If I do it in PostgreSQL instance itself, what is the best way to update (I have created materialized view in the past but didn't update them)?

Related

Refresh materialized view on Postgresql 11 on RDS

We are currently on Postgres 11.13 on AWS RDS. I am trying to create a materialized view that takes about 6-7 minutes to run. What is the best way to keep this MV mostly up to date? I was thinking of pg_cron but I believe that is only available on PG 12.5+. A trigger on the underlying tables could be an option, but these tables either do not get updated at all, or underlying tables have many many inserts that occur in a short period of time. I don't want to trigger excessive refreshes.
Any suggestions for this scenario?

Big Query Table not being partitioned with firestore Export Collections to BigQuery extension

I am using Export Collections to BigQuery extension with firestore to export data to the bigquery.I can export data to the big query but the table is not getting partitioned even I have enabled the table partitioned with extension while configuring extension at the creation time.
I had the same issue today when exporting data from Firestore to BigQuery and doing a historical load with npx firebaseextensions/fs-bq-import-collection command.
These are my findings.
If you configure the extension to have partitioning based on the ingestion time, you have to wait for a first new record that will trigger that streaming extension and create a partitioned table.
If you just create an extension and immediately execute a historical load, the table created won't be partitioned. I tested this 3 times today.
A question for you -- did you execute a historical load or imported only new records?

How to trigger a Google Cloud Composer DAG based on a latest record in the control table

I have a DAG thet loads the data into no of raw tables .There is a control table that stores the list of tables and when they are last updated by the DAG .This is all managed by different ta=eam. I am trying to create a DAG to run a query on one of the raw table and load it into persistant table. I would like to run the DAG as soon as i see the latest time stamp than what i already processed in my control table.
I am new to the Cloud Composer, can you please let me know how i can accomplish it?
Thanks

Daily data archival from Postgres to Hive/HDFS

I am working on IOT data pipeline and I am getting messages every second from multiple devices into postgres database. Postgres will be having data for only two days and after two days data will be flushed so that every time there is data for last two days. Now I needed to do data archival from postgres to HDFS daily. The parameters I have are :
deviceid, timestamp, year, month, day, temperature, humidity
I want to archive it daily into HDFS and query that data using hive query. For that I need to create external partitioned table in Hive using deviceid, year and month as partitions. I have tried following options but its not working:
I have tried using sqoop for data copying but it can't create dynamic folders based on different deviceid,year and month so that the external hive table can pick partitions
Used sqoop import using --hive-import attribute so that data can be copied directly into hive table but in this case it overwrites the existing table and I am also not sure whether this works for partitioned table or not
Please suggest some solutions for the archival.
Note: I am using azure services so option for Azure Data Factory is open.

Using existing Cloud SQL Table loads slow; Creating Table through AppMaker in Cloud SQL loads fast

I am using a Cloud SQL database with one table of ~700 records with ~100 fields and it takes 8-10 seconds to load an individual record from the database into AppMaker. This is how long it takes even when I have set the Query Page size to "1".
To test if this was an issue with my database I created an External Model (e.g., new table) in the database through AppMaker. The new table created through AppMaker loads in less than 2 seconds, which is fairly typical load speeds for AppMaker.
Why would my pre-existing table load slowly whereas my table created in the SQL DB through AppMaker load quickly?
The only idea that popups in my mind is that you had different regions for the databases. App Maker's documentation recommends to create database instance in us-central region to achieve best performance. Some time ago I've tried to pair App Maker with database from some European region and it worked fairly slow.