Joining Tables between Multiple Foreign Servers with Foreign Data Wrapper Causes Performance Issue - postgresql

One of my legacy PHP applications is using a PostgreSQL database with Foreign Data Wrapper. This database has a local table and two foreign servers set up (one pointing to database A, another pointing to database B).
The application uses ORM to construct SQL queries. One of the complex queries is actually joining 6 tables across the two foreign servers and also the local table. And the query just hangs forever because the those 6 tables have on average millions of records.
There are many more queries like this in the legacy app. I have configured the foreign servers to use_remote_estimate 'true' and increase the fetch_size but still see no drastic improvements.
I'm wondering if there are some configurations that can be done on the foreign server to optimise the query speed. Before I start rewriting the whole application to not use PHP and ORM.

Selectivity estimation problems in FDW are very common, and can lead to plans with atrocious performance. Since you are looking for magic bullet, have you tried running ANALYZE on the foreign tables in the local server, so it can use local statistics to some come up with plans? You might want to set up a clone to test this in. ANALYZE can also make things worse, and there is no easy way to undo it once done.
Another step might be setting cursor_tuple_fraction to 1 (or at least much higher than the defaults) on the servers on the foreign sides. This could help if the overall query plan is sound on the local side, but the execution on the foreign sides is bad.
Barring those, you need to look at EXPLAIN (VERBOSE) and EXPLAIN (ANALYZE) of an archetypical bad query to figure out what is going on.
Before I start rewriting the whole application to not use PHP and ORM.
Why would that help? Do you already know how rewrite the queries to make them faster, you just can't get the ORM to cooperate?

Related

Postgres foreign data tables - keys and indexes - ANALYZE

We're planning to use PostgreSQL in our production system with foreign data wrapper (FDW).
A replication from the master server is due to technical issues not an option.
However, we have one big issue, which is performance.
We figured out that wie need the ANALYZE command to run periodically for postgres to collect statistics.
But, how often is "periodically"? - Is there any chance to "scan" that our statistics are outdated or not existing at all?
What would a query on "pg_statistic" would look like in this scenario?
And most importantly, is this way a good one for production; we plan to ANALYZE all important tables once a day, should that be enough?

Asp Net Boilerplate - Setup Schema-Per-Tenant Multitenancy (EntityFrameworkCore & PostgreSQL)

We are looking into using Asp Net Boilerplate. Looks very promising. We love the framework, but we would like to be able to use a per-schema Multitenancy configuration. Instead of sharing the data in the same db & tables, each tenant would "have" a schema, in which the whole database structure would be replicated.
One of our data tables will be quite big (sometimes +1 million entries / tenant), and we were advised that for performance reasons, it's better to keep the number of entries as low as possible. Also, this particular table will be queried & inserted a lot. It would be unrealistic that this table would hold data for 40+ tenants. For that reason, and others, we would prefer to have a distinct schema per tenant.
Our DB is a single PostgreSQL server (might scale up to more in the future). We use EntityFramework & Npgsql. We already noticed that it is possible to set up a different ConnectionString for specific tenants that would have bigger data requirements.
http://www.summa.com/blog/2013/09/17/approaches-to-multi-tenancy See separate schema per tenant
Any idea on how to acheive a schema-per-tenant multitenancy? There's a lot of moving parts in this, I'm not sure where to start.

(How) Is it possible to convert tables into foreign tables in Postgres?

We have a large table in our Postgres production database which we want to start "sharding" using foreign tables and inheritance.
The desired architecture will be to have 1 (empty) table that defines the schema and several foreign tables inheriting from the empty "parent" table. (possible with Postgres 9.5)
I found this well written article https://www.depesz.com/2015/04/02/waiting-for-9-5-allow-foreign-tables-to-participate-in-inheritance/ that explains everything on how to do it from scratch.
My question is how to reduce the needed migration of data to a minimum.
We have this 100+ GB table now, that should become our first "shard". And in the future we will regulary add new "shards". At some point, the older shards will be moved to another tablespace (on cheaper hardware since they become less important).
My question now:
Is there a way to "ALTER" an existing table to be a foreign table instead?
No way to use alter table to do this.
You really have to basically do it manually. This is no different (really) than doing table partitioning. You create your partitions, you load the data. You direct reads and writes to the partitions.
Now in your case, in terms of doing sharding there are a number of tools I would look at to make this less painful. First, if you make sure your tables are split the way you like them first, you can use a logical replication solution like Bucardo to replicate the writes while you are moving everything over.
There are some other approaches (parallelized readers and writers) that may save you some time at the expense of db load, but those are niche tools.
There is no native solution for shard management of standard PostgreSQL (and I don't know enough about Postgres-XL in this regard to know how well it can manage changing shard criteria). However pretty much anything is possible with a little work and knowledge.

Multiple database in EF6

We are involved in quite a new development in which we are remaking our current web shop platform.
In the current platform we do not use EF6 neither other ORM but store procedures to access to the db, but in the new building is what we do.
We have a doubt regarding database design of the new platform. In the current platform we use several different databases depending on the content of them.
For example, we have dedicated databases to store information for products catalogs other dedicated db for handling orders.
Currently all data access is done through stored procedures, so we have no problem with the links between different databases.
The problem appears to us now when we have started to use EF6. In this case each DB is associated with a context and it is not possible to know data from one context to another
unless we implement directly in the source code these relationships using various contexts. It looks like these means we will lose the power of EF6.
The questions we have are:
Is it a bad design maintaining different databases for the same application using EF6?
in case this is a poor design and choosing for a single database, is the performance going to be optimum even driving hundreds of tables (almost 1000) with several TBytes of information?
in the other hand, in the case of opting for the design in which several bbdd appear (it would be much better in our case), what is the best way to handle them EF6?
Thank you very much for your help!
First of all EF is not written to be cross database. You can't write cross database (cross context) queries, lazy load does not work and so on.
This is a big limitation in your case.
EF could work with several schema (actually I don't use it and I don't like it but is just my opinion).
You can use your stored procedures with EF but as I understand you are thinking to stop to use them.
In my experience I wrote several applications with more than one database but the use of the different databases was very limited. In this cases I use cross database views (i.e. one database per company and some common tables with views in company databases that selects data in common tables). In your case, if the tables are sharded everywhere I don't think this is a way you can choose.
So, in my opinion you could change the approach.
If you have backups problems you could shard the huge tables (I think facts tables and tables with pictures) and create cross database views. BTW, also, cross database referential integrity is not supported in SQL Server so you need to write triggers to check it.
If you need to split different application functions (i.e. WMS, CRM and so on) you can use namespaces without bothering about how tables are stored in the DB.

Postgres Multi-tenant administration/maintenance

We have a SaaS application where each tenant has its own database in Postgres. How would I apply a patch to all the databses? For example if I want to add a table or add a column to a table, I have to either write a program that loops through all databases and execute a SQL against them or using pgadmin, go through them one by one.
Is there smarter and/or faster way?
Any help is greatly appreciated.
Yes, there's a smarter way.
Don't create a new database for each tenant. If everything is in one database then you only need to alter one database.
Pick one database, alter each table to have the column TENANT and add this to the primary key. Then insert into this database every record for all tenants and drop the other databases (obviously considerably more work than this as your application will need to be changed).
The differences with your approach are extensively discussed elsewhere:
What problems will I get creating a database per customer?
What are the advantages of using a single database for EACH client?
Multiple schemas versus enormous tables
Practicality of multiple databases per client vs one database
Multi-tenancy - single database vs multiple database
If you don't put everything in one database then I'm afraid you have to alter them all individually, and doing it programatically would be simplest.
At a higher level, all multi-tenant applications follow one of three approaches:
One tenant's data lives in one database,
One tenant's data lives in one schema, or
Add a tenant_id / account_id column to your tables (shared schema).
I usually find that developers use the following criteria when they evaluate these different approaches.
Isolation: Since you can put each tenant into its own database in one hand, and have tenants share the same table on the other, this becomes the most apparent dimension. If you provide your users raw SQL access or you're in a regulated industry such as healthcare, you may need strict guarantees from your database. That said, PostgreSQL 9.5 comes with row level security policies that makes this less of a concern for most applications.
Extensibility: If your tenants are sharing the same schema (approach #3), and your tenants have fields that varies between them, then you need to think about how to merge these fields.
This article on multi-tenant databases has a great summary of different approaches. For example, you can add a dozen columns, call them C1, C2, and so forth, and have your application infer the actual data in this column based on the tenant_id. PostgresQL 9.4 comes with JSONB support and natively allows you to use semi-structured fields to express variations between different tenants' data.
Scaling: Another criteria is how easily your database would scale-out. If you create a tenant per database or schema (#1 or #2 above), your application can make use of existing Ruby Gems or [Django packages][1] to simplify app integration. That said, you'll need to manually manage your tenants' data and the machines they live on. Similarly, you'll need to build your own sharding logic to propagate foreign key constraints and ALTER TABLE commands.
With approach #3, you can use existing open source scaling solutions, such as Citus. For example, this blog post describes how to easily shard a multi-tenant app with Postgres.
it's time for me to give back to the community :) So after 4 years, our multi-tenant platform is in production and I would like to share the following observations/experiences with all of you.
We used a database per each tenant. This has given us extreme flexibility as the size of the databases in the backups are not huge and hence we can easily import them into our staging environment for customers issues.
We use Liquibase for database development and upgrades. This has been a tremendous help to us, allowing us to package the entire build into a simple war file. All changes are easily versioned and managed very efficiently. There is a bit of learning curve here an there but nothing substantial. 2-5 days can significantly save you time.
Given that we use Spring/JPA/Hibernate, we use a technique called Dynamic Data Source Routing. So when a user logs-in, we find the related datasource with a lookup and connect them to the session to the right database. That's also when the Liquibase scripts get applied for updates.
This is, for now, I will come back with more later on.
Well, there are problems with one database for all tenants in our case for sure.
The backup file gets huge and becomes almost not practical hard to manage
For troubleshooting, we need to restore customer's data in our dev env, we just use that customer's backup file and usually the file is not as big as if we were to use one database for all customers.
Again, Liquibase has been key in allowing to manage updates across all the tenants seamlessly and without any issues. Without Liquibase, I can see lots of complications with this approach. So Liquibase, Liquibase and more Liquibase.
I also suspect that we would need a more powerful hardware to manage a huge database with large joins across millions of records vs much lighter database with much smaller queries.
In case of problems, the service doesn't go down for everyone and there will be limited to one or few tenants.
In general, for our purposes, this has been a great architectural decision and we are benefiting from it every day. One time we had one customer that didn't have their archiving active and their database size grew to over 3 GB. With offshore teams and slower internet as well as storage/bandwidth prices, one can see how things may become complicated very quickly.
Hope this helps someone.
--Rex