Moving records from one database instance to another - postgresql

I have a PostgreSQL database instance located in EU region. I plan on introducing another PostgreSQL database instance located in a new geographical region.
As part this work, I am to migrate data for selected customers from a database instance in EU to a database instance in this new geo region and am seeking for advice.
On a surface, this boils down to doing the following work:
given a specific accounts.id,
find and copy the record from accounts table from EU database instance to accounts table in another region's database instance,
identify and copy records across all tables that are related to given account record, recursively (e.g. as well as potentially from tables related to those tables...).
Effectively, having a specific DB record as starting point, I need to:
build a hierarchy, or rather a graph of DB records across all available tables, all directly (or indirectly) related to the "starting point" record (all possible relations, perhaps, could be established based on a foreign key constraints),
for each record found across all tables, generate a string containing an INSERT statement,
replay all INSERT statements, in a transaction, on another database instance.
It appears as if I might need to build a tool to do this kind of work. But before I do, I wonder:
is there a common approach for implementing this?,
if not, what might be a good starting point to approach this problem?

Indeed You need a whole proccess yo do this, i think that You should create a new schema to do the data Select i think functions could do the magic, then replicate that data.
They replication tool it's not that hard to configurate.
Here it's the link:
https://www.postgresql.org/docs/current/runtime-config-replication.html!

Related

How does pglogical-2 handle logical replication on same table while allowing it to be writeable on both databases?

Based on the above image, there are certain tables I want to be in the Internal Database (right hand side). The other tables I want to be replicated in the external database.
In reality there's only one set of values that SHOULD NOT be replicated across. The rest of the database can be replicated. Basically the actual price columns in the prices table cannot be replicated across. It should stay within the internal database.
Because the vendors are external to the network, they have no access to the internal app.
My plan is to create a replicated version of the same app and allow vendors to submit quotations and picking items.
Let's say the replicated tables are at least quotations and quotation_line_items. These tables should be writeable (in terms of data for INSERTs, UPDATEs, and DELETEs) at both the external database and the internal database. Hence at both databases, the data in the quotations and quotation_line_items table are writeable and should be replicated across in both directions.
The data in the other tables are going to be replicated in a single direction (from internal to external) except for the actual raw prices columns in the prices table.
The quotation_line_items table will have a price_id column. However, the raw price values in the prices table should not appear in the external database.
Ultimately, I want the data to be consistent for the replicated tables on both databases. I am okay with synchronous replication, so a bit of delay (say, a couple of second for the write operations) is fine.
I came across pglogical https://github.com/2ndQuadrant/pglogical/tree/REL2_x_STABLE
and they have the concept of PUBLISHER and SUBSCRIBER.
I cannot tell based on the readme which one would be acting as publisher and subscriber and how to configure it for my situation.
That won't work. With the setup you are dreaming of, you will necessarily end up with replication conflicts.
How do you want to prevent that data are modified in a conflicting fashion in the two databases? If you say that that won't happen, think again.
I believe that you would be much better off using a single database with two users: one that can access the “secret” table and one that cannot.
If you want to restrict access only to certain columns, use a view. Simple views are updateable in PostgreSQL.
It is possible with BDR replication which uses pglogical. On a basic level by allocating ranges of key ids to each node so writes are possible in both locations without conflict. However BDR is now a commercial paid for product.

Postgres inherit from schema

I am initiating a new project which will be available as a SaaS for multiple customers. So, I am thinking of creating a database and then create individual schema for every customer.
I have defined some rules and the first rule is all the customers must always have the same schema. No matter what. If one customer gets an update, all the other customers will get the update as well.
For this purpose, my question is, is it possible to inherit schema from another schema in the same database? If not, do I have to manually create all the tables and indexes in the new schema and inherit them from the tables in master schema?
I am using Postgresql 9.6 but I can upgrade it as well if needed.
I open to suggestions.
Thanks in advance
There is no automated way to establish inheritance between all tables in two schemas, you'd have to do it one by one (a function can help).
However, I invite you to stop and think about your data model for a bit. How many users do you expect? If there could be many, plan differently, because databases with thousands of schemas become unwieldy (e.g. catalog lookups will become slow).
You might be better off with one schema for all users. If you are concerned with separation of the data and security, row level security might be the solution for you.

Postgres Multi-tenant administration/maintenance

We have a SaaS application where each tenant has its own database in Postgres. How would I apply a patch to all the databses? For example if I want to add a table or add a column to a table, I have to either write a program that loops through all databases and execute a SQL against them or using pgadmin, go through them one by one.
Is there smarter and/or faster way?
Any help is greatly appreciated.
Yes, there's a smarter way.
Don't create a new database for each tenant. If everything is in one database then you only need to alter one database.
Pick one database, alter each table to have the column TENANT and add this to the primary key. Then insert into this database every record for all tenants and drop the other databases (obviously considerably more work than this as your application will need to be changed).
The differences with your approach are extensively discussed elsewhere:
What problems will I get creating a database per customer?
What are the advantages of using a single database for EACH client?
Multiple schemas versus enormous tables
Practicality of multiple databases per client vs one database
Multi-tenancy - single database vs multiple database
If you don't put everything in one database then I'm afraid you have to alter them all individually, and doing it programatically would be simplest.
At a higher level, all multi-tenant applications follow one of three approaches:
One tenant's data lives in one database,
One tenant's data lives in one schema, or
Add a tenant_id / account_id column to your tables (shared schema).
I usually find that developers use the following criteria when they evaluate these different approaches.
Isolation: Since you can put each tenant into its own database in one hand, and have tenants share the same table on the other, this becomes the most apparent dimension. If you provide your users raw SQL access or you're in a regulated industry such as healthcare, you may need strict guarantees from your database. That said, PostgreSQL 9.5 comes with row level security policies that makes this less of a concern for most applications.
Extensibility: If your tenants are sharing the same schema (approach #3), and your tenants have fields that varies between them, then you need to think about how to merge these fields.
This article on multi-tenant databases has a great summary of different approaches. For example, you can add a dozen columns, call them C1, C2, and so forth, and have your application infer the actual data in this column based on the tenant_id. PostgresQL 9.4 comes with JSONB support and natively allows you to use semi-structured fields to express variations between different tenants' data.
Scaling: Another criteria is how easily your database would scale-out. If you create a tenant per database or schema (#1 or #2 above), your application can make use of existing Ruby Gems or [Django packages][1] to simplify app integration. That said, you'll need to manually manage your tenants' data and the machines they live on. Similarly, you'll need to build your own sharding logic to propagate foreign key constraints and ALTER TABLE commands.
With approach #3, you can use existing open source scaling solutions, such as Citus. For example, this blog post describes how to easily shard a multi-tenant app with Postgres.
it's time for me to give back to the community :) So after 4 years, our multi-tenant platform is in production and I would like to share the following observations/experiences with all of you.
We used a database per each tenant. This has given us extreme flexibility as the size of the databases in the backups are not huge and hence we can easily import them into our staging environment for customers issues.
We use Liquibase for database development and upgrades. This has been a tremendous help to us, allowing us to package the entire build into a simple war file. All changes are easily versioned and managed very efficiently. There is a bit of learning curve here an there but nothing substantial. 2-5 days can significantly save you time.
Given that we use Spring/JPA/Hibernate, we use a technique called Dynamic Data Source Routing. So when a user logs-in, we find the related datasource with a lookup and connect them to the session to the right database. That's also when the Liquibase scripts get applied for updates.
This is, for now, I will come back with more later on.
Well, there are problems with one database for all tenants in our case for sure.
The backup file gets huge and becomes almost not practical hard to manage
For troubleshooting, we need to restore customer's data in our dev env, we just use that customer's backup file and usually the file is not as big as if we were to use one database for all customers.
Again, Liquibase has been key in allowing to manage updates across all the tenants seamlessly and without any issues. Without Liquibase, I can see lots of complications with this approach. So Liquibase, Liquibase and more Liquibase.
I also suspect that we would need a more powerful hardware to manage a huge database with large joins across millions of records vs much lighter database with much smaller queries.
In case of problems, the service doesn't go down for everyone and there will be limited to one or few tenants.
In general, for our purposes, this has been a great architectural decision and we are benefiting from it every day. One time we had one customer that didn't have their archiving active and their database size grew to over 3 GB. With offshore teams and slower internet as well as storage/bandwidth prices, one can see how things may become complicated very quickly.
Hope this helps someone.
--Rex

libpq code to create, list and delete databases (C++/VC++, PostgreSQL)

I am new to the PostgreSQL database. What my visual c++ application needs to do is to create multiple tables and add/retrieve data from them.
Each session of my application should create a new and distinct database. I can use the current date and time for a unique database name.
There should also be an option to delete all the databases.
I have worked out how to connect to a database, create tables, and add data to tables. I am not sure how to make a new database for each run or how to retrieve number and name of databases if user want to clear all databases.
Please help.
See the libpq examples in the documentation. The example program shows you how to list databases, and in general how to execute commands against the database. The example code there is trivial to adapt to creating and dropping databases.
Creating a database is a simple CREATE DATABASE SQL statement, same as any other libpq operation. You must connect to a temporary database (usually template1) to issue the CREATE DATABASE, then disconnect and make a new connection to the database you just created.
Rather than creating new databases, consider creating new schema instead. Much less hassle, since all you need to do is change the search_path or prefix your table references, you don't have to disconnect and reconnect to change schemas. See the documentation on schemas.
I question the wisdom of your design, though. It is rarely a good idea for applications to be creating and dropping databases (or tables, except temporary tables) as a normal part of their operation. Maybe if you elaborated on why you want to do this, we can come up with solutions that may be easier and/or perform better than your current approach.

Database setup for multiyear event manager

I am currently in the process of setting up a database structure to manage events.
Events have properties which are stored in separate tables like 'location', 'timeslots', 'files' etc.
This in itself is not so difficult to set up. However, the tool needs to be able to host multiple events at the same time. So, for example a user can manage a the ABC event which occurs simultaneously with the DEF event. Obviously the database needs to be able to differentiate between these different events.
My first idea would be to add a table with unique identifiers describing the event (name:ABC) and then add a field to all my tables with this unique identifier.
This would however mean that the tool can become a bit slow because it has to query tables that contain data completely irrelevant to that particular event.
Are there any other solutions or should I just not worry about the bloat?
Answering a pretty old question, but it comes out 6th in a google query for postgre database events so it could be helpful to others: no, don't worry about it. Just create indices on the foreign key in the referencing tables to speed up the look up.