The Flyway FAQ says I can't make structural changes to the DB outside of Flyway. Does that includes creating new partitions for an existing table?
If so, is there any way to use Flyway to automatically create daily partitions as required? Bear in mind that the process will be running for more than one day so it's not something that can be just triggered on start-up.
We're stuck with Postgres 9.6 at the mo, so the partitions have to be created manually.
Related
I have set up a Fivetran connector to connect to a PostgreSQL database in an EC2 server and snowflake. The connection seems to work (no error), but the data is not really updated.
On the EC2 server, every day a script will pull down the latest dump of our app production database and restore it on the EC2 server, and then the Fivetran connector is expected to sync the database to snowflake. But the data after the first setup date is not synced with the snowflake. Could FiveTran be used in such a setup? If so, do you know what may be the issue of the sync failing?
Could FiveTran be used in such a setup?
Yes, but it's not ideal.
If so, do you know what may be the issue of the sync failing?
It's hard to answer this question without more context, however: Fivetran uses logging to replicate your DB (WAL in the case of PostgreSQL), so if you restore the DB every single day Fivetran will loose track of the changes and will need to re-sync the whole database.
The point made by NickW is completely valid, why not replicate from the DB? I assume the answer is along the lines of the data you need to modify. You can use column blocking and/or hashing to prevent sensible data from being transfered, or to obfuscate it before it's flushed to Snowflake.
We are facing well-known pg_dumps effeciency problems in terms of velocity. We currently have a Azure hosted PostgreSQL, which holds our resources that are being created/updated by SmileCDR. Somehow after three months it is getting larger due to saving FHIR objects. Now, we want to have brand new environment; in that case persistent data in PostgreSQL has to be ripped out and new database has to be initiated with old data set.
Please be advised.
pg_dump consumes relative much more time, almost a day. How can we speed up backup-restore process?
What kind of alternatives that we could use and apply whereas pg_dump to achieve the goal?
Important notes;
Flyway utilized by SmileCDR to make versioning in PostgreSQL.
Everything has to be copied from old one to new one.
PostgreSQL version is 11, 2vCores, 100GB storage.
FHIR objects are being kept in PostgreSQL.
Some kind of suggestions like multiple jobs, without compress, directory format have been practiced but it didn't affect significantly.
Since you put yourself in the cage of database hosting, you have no alternative to pg_dump. They make it deliberately hard for you to get your data out.
The best you can do is a directory format dump with many processes; that will read data in parallel wherever possible:
pg_dump -F d -j 6 -f backupdir -h ... -p ... -U ... dbname
In this example, I specified 6 processes to run in parallel. This will speed up processing unless you have only one large table and all the others are quite small.
Alternatively, you may use smileutil with the synchronize-fhir-servers command, bulk export API on the system level, subscription mechanism. Just a warning that these options may be too slow to migrate the 100Gb database.
As you mentioned, if you can replicate the Azure VM that may be the fastest.
I plan to run my Spark SQL jobs on AWS's EMR, and I plan to use AWS's Glue Metastore to persist tables' schema and file location metadata. The problem I'm facing is I'm not sure how to isolate our test vs prod environments. There are times when I might add a new column to a table, and I want to test that logic in the test environment before making the change to production. It seems that the Glue Metastore only supports one entry per database-table pair, which means that test and prod would point to the same Glue Metastore record, so whatever change I make to the test environment would also immediately impact prod. How have others tackled this issue?
Couple of days ago we made a mistake. We have a kubernetes cluster with a pipeline that times out in 25 minutes, meaning if the deployment wasn't done in 25 minutes, it will fail. We deployed a flyway migration that involves some queries that run for more than an hour. Stupid, I know. now we ran the queries in the migration manually, We want to manually mark the flyway migration as done, otherwise redeployment won't work. Is there a way this could be done?
So we ended up manually inserting a migration row in the database. flyway keeps a table flyway_schema_history in your schema. If you manually insert a row there it will skip the migration. The only tricky part is calculating the checksum. You can either migrate locally, get the checksum and inject it the live database, or just re-calculate the checksum on your own.
You will find how they calculate the checksum in the AbstractLoadableResource class.
I am using Entity Framework (version 6.1.3) - Code First - for my application.
The application is hosted on the Azure platform, and uses Azure SQL Databases.
I have a database instance in two different regions, and I am using the Sync Preview to keep the data in sync.
Since the sync takes care of ensuring the data is kept synchronised, when I run a migration, I'd like the schema changes and seed to happen in only one database, and the schema changes only (with no seed) in the other.
Is this possible with the EF tooling, or do I need to move the seeding out to a manual script?
This is possible by spreading out your deployment.
if worker role 1 updates your database and seed
if after the sync worker role 2 connects to your other database it will see that the migration already took place.
One way to trigger this is to disable automatic migrations on all but 1 worker role. The problem is that you potentially have to deal with downtime/issues while part of your application landscape is updated/migrated but your database is still syncing.
(worker role can also be replaced by webjob , website etc )