How do I restore a database dump to a Citus cluster? - postgresql

While restoring a (pg_dump-produced) database dump, I get the following error:
Cannot execute COPY FROM on a distributed table on master node
How can I work around this?

COPY support was added in Citus 5.1, which was released May 2016 and is available in the official PostgreSQL Linux package repositories (PGDG).
Are you trying to load data via a pg_dump output? Creating distributed tables is slightly different than regular tables, and requires picking of partition columns and partitioning method. Take a look at the docs to get more information on both.

Related

Postgres pg_dumpall consistency across databases that are backed up

Is there a way to backup all of the databases on a hyperscaler managed postgres server as of a certain time in order to maintain data consistency between the databases with either pg_dumpall, pg_dump or something else?
Background:
With the utilization of micro-services, an application may have many databases associated to it on a single hyperscaler managed postgres server. The hyperscalers do perform a functional snapshot backup; however, when a hyperscaler managed postgres server is accidentally deleted, the postgres backups are lost as well. These hyperscalers provide locks to prevent accidental deletes of a postgres server and mention that their support teams can be contacted to restore a deleted server, however, we still had a postgres server get deleted. We were able to recover by contacting the hyperscalers support team but would like to have a second way of backing up a hyperscaler managed postgres server.
I realize that the micro-services should be able to auto-recover to a data consistent point but the reality is that many of the micro-services have not been designed nor written to that requirement. I really do not want to get into the aspect of micro-service design and want to retain this to be a DBA backup question.
pg_dumpall will not take a consistent backup across all databases. Each database backup will be consistent, but the snapshots for the backups of the different databases will be taken at different times.
If you need a consistent backup across several databases in a single cluster, use an online file system backup with pg_basebackup.
You can use pg_basebackup which most of the PostgreSQL DBA’s would end up using on a daily basis be it via scripts or manually. This creates the base backup of the database which can help in recovering in multiple situations. This takes an online backup of the database and hence is very useful when being used in production
you can review this for more details

Validating the postgres upgrade using logical replication

Currently I am trying to upgrade my postgres 9.1 to 10 using logical replication. As 9.1 does not support native logical_replication, I tried slony and made a replica successfully.
P.S. The above replica I created is using a sample dump from an year ago which is only of 800mb.
Now I have few questions.
How can I validate whether the replica has all the data replicated successfully. Few suggested to put the master on maintenance mode(few downtime) and do a last N items comparison with both the database on all the tables.
I tried with 800mb. Will there be any problem when I go and try with 100+ GB?
Please share your personal experience here. I have been trying to document what are the things that could go wrong so I can always try to anticipate the next course of action.
You may use the Data Validator that is shipped with trial version of EDB Postgres Replication server for validating the data between old and new PostgreSQL databases.
You may read the details of Data Validator at Data Validator Document
To download the EDB Replication Server please follow this link: EDB Replication Server
Disclosure: I work for EnterpriseDB (EDB)

Gcloud SQL upgrade postgres 9.6 to 11

I'd like to be able to upgrade my existing cloudsql postgres 9.6 instance to 11 to use some new pg 11 features.
I've been trying to figure out a good migration plan but it seems like the only option available is sql dump and restore. The database is 100Gig+ so this will take quite some time, and I'd like to avoid downtime as much as possible. Are there any options available? I was considering enabling statement logging: log_statement=mod, creating a dump, importing it into a pg-11 instance taking down the db + then scraping the logs to reply the latest updates into the pg-11 instance by downloading the logs and writing a script to re-run the inserts. Seems doable, but doesn't feel nice.
I am wondering if anyone faced this before and has had any other solutions?
Postgres 11 on Cloud SQL is still in Beta. It is not recommended to be using a product that is in Beta on a production environment.
However, should you choose to proceed, you must export the data by either creating a SQL dump or putting the data into a .csv file (depending on your needs)(best practices) create a Postgres 11 instance, and then import the data.
For the data that won’t be in the dump, you can either:
a) Do what you have suggested by logging the queries and then re-run the inserts
b) Create a dump, import it onto the new instance make it live and then take another dump of the old one again, compare to remove duplicates and import the differences. This will be difficult if you have auto-incrementing primary keys.
c) Create the schema on the Postgres 11 instance and deploy it. Then create the dump and import at a later time. If you have primary keys as auto incrementing, alter the schema to start at a value that you would like.

Backing up redshift database

I want to backup entire redshift cluster, such that I can use it in other databases like mysql or hadoop in future.
I was looking up and creating a manual screenshot seems to be an option but I guess that wont work for cross database languages.
So what would be the detailed steps to backup the entire cluster of redshift
Cluster backups can be done via the aws console, however these can only be restored to another redshift cluster.
Because Redshift is not the same as postgres in many ways, it will be inpossible / tricky to use standard tools like pg_dump and pg_restore.
I think that your best option is to :
extract the ddl from the Redshift tables that you wish to create elsewhere, most ide's have a simple way to do this.
modify the ddl to work with your target database (e.g. postgres will
be easy, mysql harder)
copy the contents of the Redshift database, one table at a time to s3 using
the unload command
import the data that you unloaded in step 3 to your target tables

Backup specific tables in AWS RDS Postgres Instance

I have two databases on Amazon RDS, both Postgres. Database 1 and 2
I need to restore an instance from a snapshot of Database 1 for my Staging environment. (Database 2 is my current Staging DB).
However, I want the data from a few of the tables in Database 2 to overwrite the tables in the newly restored snapshot. What is the best way to do this?
When restoring RDS from a Snapshot, a new database instance is created. If you only wish to copy a portion of the snapshot:
Restore the snapshot to a new (temporary) database
Connect to the new database and dump the desired tables using pg_dump
Connect to your staging server and restore the tables using pg_restore (most probably deleting any matching existing tables first)
Delete the temporary database
pg_dump actually outputs SQL commands that are then used to recreate tables and restore data. Look at the content of a dump to understand how the restore process actually works.
I hope this still works for someone else.
With my team we faced a similar issue. We also had 2 Postgres databases and we also just needed to backup some tables from db1 to db2.
What we did is to use a lambda function using Python (from AWS lambda ofc) that connected to both databases and validates if db1.table1 has the same data as db2.table1, if not, then the lambda function should write the missing data from db1.table1 into db2.table1. The approach of using lambda was because we wanted to automate the process due to the main db (let's say db1) is constantly being updated. In addition, it allowed us to only backup our desired tables (let's say 3 tables out of 10), instead of backing up the whole database.
Note: Maybe you want to do these writes using temporary tables to avoid issues with any constraints you have in your tables.