Postgres Cluster exceeds temp_file_limit - postgresql

Recently, we are trying to migrate our database from SQL Server to PostgreSQL. But, we didn't know that by default, tables in Potsgres are ot clustered. Now, when our data has increased so much, we want to CLUSTER our table like so
CLUSTER table USING idx_table;
But seems like my data is a lot (maybe), so that it produces
SQL Error [53400]: ERROR: temporary file size exceeds temp_file_limit
(8663254kB)
Since, its not resulted by a query, which I cannot tune it to perform better, Is there any solution for this?
If for example I am needed to increase my temp_file_limit, is it possible to increase it only for temporary? Since I'm only running this CLUSTER once.

There is some important differences between SQL Server and PostgreSQL.
Sybase SQL Server has been designed from INGRES in the beginning of the eighties when INGRES was using massively the concept of CLUSTERED indexes which means that table is organized as an index. The SQL Engine was designed especially to optimize the use of CLUSTERED index. That is the ways that SQL Server actually works...
When Postgres was designed, the use of CLUSTERED indexes disappeared.
When Postgres switched to the SQL language, an then be renamed to PostgreSQL nothing have changed to use CLUSTERED indexes.
So the use of CLUSTER tables in PostgreSQL is rarely optimal in execution plans. You have to prove individually for each table and for some queries involving those tables, if there is a benefit or not...
Another thing is that CLUSTERing a table in PostgreSQL is not the equivalent of MS SQL Server's CLUSTERED indexes...
More information about this will be find in my paper :
PostgreSQL vs. SQL Server (MSSQL) – part 3 – Very Extremely Detailed Comparison
An especially in § : "6 – The lack of Clustered Index (AKA IOT)"

Related

Have an ordinary table on a PostgreSQL TimescaleDB (timeseries) database

For a project I need two types of tables.
hypertable (which is a special type of table in PostgreSQL (in PostgreSQL TimescaleDB)) for some timeseries records
my ordinary tables which are not timeseries
Can I create a PostgreSQL TimescaleDB and store my ordinary tables on it? Are all the tables a hypertable (time series) on a PostgreSQL TimescaleDB? If no, does it have some overhead if I store my ordinary tables in PostgreSQL TimescaleDB?
If I can, does it have any benefit if I store my ordinary table on a separate ordinary PostgreSQL database?
Can I create a PostgreSQL TimescaleDB and store my ordinary tables on it?
Absolutely... TimescaleDB is delivered as an extension to PostgreSQL and one of the biggest benefits is that you can use regular PostgreSQL tables alongside the specialist time-series tables. That includes using regular tables in SQL queries with hypertables. Standard SQL works, plus there are some additional functions that Timescale created using PostgreSQL's extensibility features.
Are all the tables a hypertable (time series) on a PostgreSQL TimescaleDB?
No, you have to explicitly create a table as a hypertable for it to implement TimescaleDB features. It would be worth checking out the how-to guides in the Timescale docs for full (and up to date) details.
If no, does it have some overhead if I store my ordinary tables in PostgreSQL TimescaleDB?
I don't think there's a storage overhead. You might see some performance gains e.g. for data ingest and query performance. This article may help clarify that https://docs.timescale.com/timescaledb/latest/overview/how-does-it-compare/timescaledb-vs-postgres/
Overall you can think of TimescaleDB as providing additional functionality to 'vanilla' PostgreSQL and so unless there's a reason around application design to separate non-time-series data to a separate database then you aren't obliged to do that.
One other point, shared by a very experienced member of our Slack community [thank you Chris]:
To have time-series data and “normal” data (normalized) in one or separate databases for us came down to something like “can we asynchronously replicate the time-series information”?
In our case we use two different pg systems, one replicating asynchronously (for TimescaleDB) and one with synchronous replication (for all other data).
Transparency: I work for Timescale

Joining Tables between Multiple Foreign Servers with Foreign Data Wrapper Causes Performance Issue

One of my legacy PHP applications is using a PostgreSQL database with Foreign Data Wrapper. This database has a local table and two foreign servers set up (one pointing to database A, another pointing to database B).
The application uses ORM to construct SQL queries. One of the complex queries is actually joining 6 tables across the two foreign servers and also the local table. And the query just hangs forever because the those 6 tables have on average millions of records.
There are many more queries like this in the legacy app. I have configured the foreign servers to use_remote_estimate 'true' and increase the fetch_size but still see no drastic improvements.
I'm wondering if there are some configurations that can be done on the foreign server to optimise the query speed. Before I start rewriting the whole application to not use PHP and ORM.
Selectivity estimation problems in FDW are very common, and can lead to plans with atrocious performance. Since you are looking for magic bullet, have you tried running ANALYZE on the foreign tables in the local server, so it can use local statistics to some come up with plans? You might want to set up a clone to test this in. ANALYZE can also make things worse, and there is no easy way to undo it once done.
Another step might be setting cursor_tuple_fraction to 1 (or at least much higher than the defaults) on the servers on the foreign sides. This could help if the overall query plan is sound on the local side, but the execution on the foreign sides is bad.
Barring those, you need to look at EXPLAIN (VERBOSE) and EXPLAIN (ANALYZE) of an archetypical bad query to figure out what is going on.
Before I start rewriting the whole application to not use PHP and ORM.
Why would that help? Do you already know how rewrite the queries to make them faster, you just can't get the ORM to cooperate?

Creating an in-memory table in PostgreSQL?

My understanding of an in-memory table is a table that will be created in memory and would resort to disk as little as possible, if at all. I am assuming that I have enough RAM to fit the table there, or at least most of it. I do not want to use an explicit function to load tables (like pg_prewarm) in memory, I just want the table to be there by default as soon as I issue a CREATE TABLE or CREATE TABLE AS select statement, unless memory is full or unless I indicate otherwise. I do not particularly care about logging to disk.
7 years ago, a similar question was asked here PostgreSQL equivalent of MySQL memory tables?. It has received 2 answers and one of them was a bit late (4 years later).
One answer says to create a RAM disk and to add a tablespace for it. Or to use an UNLOGGED table. Or to wait for global temporary tables. However, I do not have special hardware, I only have regular RAM - so I am not sure how to go about that. I can use UNLOGGED feature, but as I understand, there is still quite a bit of disk interaction involved (this is what I am trying to reduce) and I am not sure if tables will be loaded in memory by default. Furthermore, I do not see how global temporary spaces are related. My understanding of them is that they are just tables in spaces that can be shared.
Another answer recommends an in-memory column store engine. And to then use a function to load everything in memory. The issue I have with this approach is that the engine being referred to looks old and unmaintained and I cannot find any other. Also, I was hoping I wouldn't have to explicitly resort to using a 'load into memory' function, but instead that everything will happen by default.
I was just wondering how to get in-memory tables now in Postgres 12, 7 years later.
Postgres does not have in-memory tables, and I do not have any information about any serious work on this topic now. If you need this capability then you can use one of the special in-memory databases like REDIS, MEMCACHED or MonetDB. There are FDW drivers for these databases. So you can create in-memory tables in a specialized database and you can work with these tables from Postgres via foreign tables.
MySQL in-memory tables were necessary when there was only the MyISAM engine, because this engine had very primitive capabilities with regard to IO and MySQL did not have its own buffers. Now MySQL has the InnoDB engine (with modern form of joins like other databases) and a lot of the arguments for using MySQL in-memory tables are obsolete. In comparison to the old MySQL Postgres has its own buffers and does not bypass file system caches, so all of the RAM is available for your data and you have to do nothing. Ten years ago we had to use MySQL in-memory engine to have good enough performance. But after migrating to Postgres we have had better performance without in-memory tables.
If you have a lot of memory then Postgres can use it by default - via file system cache.
As This question is specific to Postgres
There is no in-memory table but in-memory view, Materialize view which can also be refreshed. See if your requirements fits in

DBLINK vs Postgres_FDW, which one may provide better performance?

I have a use case to distribute data across many databases on many servers, all in postgres tables.
From any given server/db, I may need to query another server/db.
The queries are quite basic, standard selects with where clauses on standard fields.
I have currently implemented postgres_FDW, (I'm, using postgres 9.5), but I think the queries are not using indexes on the remote db.
For this use case (a random node may query N other nodes), which is likely my best performance choice based on how each underlying engine actually executes?
The Postgres foreign data wrapper (postgres_FDW) is newer to
PostgreSQL so it tends to be the recommended method. While the
functionality in the dblink extension is similar to that in the
foreign data wrapper, the Postgres foreign data wrapper is more SQL
standard compliant and can provide improved performance over dblink
connections.
Read this article for more detailed info: Cross Database queryng
My solution was simple: I upgraded to Postgres 10, and it appears to push where clauses down to the remote server.

Postgresql for OLAP

Does anyone have experience of using PostgreSQL for an OLAP setup, using cubes against the database etc. Having come across a number of idiosyncracies when using MySQL for OLAP, are there reasons in favour of using PostgreSQL instead (assuming that I want to go the open source route)?
There are a number of data warehousing software vendors that are based on Postgresql (and contribute OLAP related changes back to core fairly regularly). Check out https://greenplum.org/. You'll find that PG works a lot better (for nearly any workload, OLAP especially) than MySQL. Greenplum and other similar solutions should work a bit better than PG depending on your data sets and use cases.
PGSQL is much better suited for Data Warehousing compared to MySQL. We had thought initially to go with MySQL, but it performs poorly in aggregations if data grows to a few million rows. PGSQL performs almost 20 times faster in caparison with MySQL for 20 million records for a single fact table on same hardware setup. If for some reason you choose to go with MySQL, then you should use MyISAM storage engine for fact tables rather then InnoDB; you will see slightly better performance.