Temporary tables vs unlogged tables performance in PostgreSQL? - postgresql

I want to know about the Temporary tables vs unlogged tables performance in PostgreSQL?
I mean which is more fast for read and write operations?

Both are equally fast, since both bypass WAL.
The only difference is that temporary tables are cached in process private memory, governed by the temp_buffers parameter, while unlogged tables are cached in shared_buffers. So the size of these parameters will affect the performance.

Related

Limit size of temporary tables (PostgreSQL)

I'm managing a PostgreSQL database server for some users who need to create temporary tables. One user accidentally sent a query with ridiculously many outer joins, and that completely filled the disk up.
PostgreSQL has a temp_file_limit parameter but it seems to me that it is not relevant:
It should be noted that disk space used for explicit temporary tables, as opposed to temporary files used behind-the-scenes in query execution, does not count against this limit.
Is there a way then to put a limit on the size on disk of "explicit" temporary tables? Or limit the row count? What's the best approach to prevent this?
The only way to limit a table's size in PostgreSQL is to put it in a tablespace on a file system of an appropriate size.
Since temporary tables are created in the default tablespace of the database you are connected to, you have to place your database in that size restricted tablespace. To keep your regular tables from being limited in the same way, you'd have to explicitly create them in a different, less limited tablespace. Make sure that your user has no permissions on that less limited tablespace.
This is a rather unappealing solution, so maybe you should rethink your requirement. After all, the user could just as well fill up the disk by inserting the data into a permanent table.

Full Load in Redshift - DROP vs TRUNCATE

As part of daily load in Redshift, I have a couple of tables to drop and full load all of them, (data size is small, less than 1 million).
My question is which of the below two strategies is better in terms of CPU utilization and memory in Redshift:
1) Truncate data
2) DROP and Recreate Table.
If I truncate tables, should I perform Vacuum on tables every day as I have read that frequent drop and recreate tables in the database cause fragmentation of pages.
And one of the tables I would like to enable compression. So, is there any downside creating DDL with encoding every day.
Please advise! Thank you!
If you drop the tables you will lose assigned permissions to these tables. If you have views for these tables they will be obsolete.
Truncate is a better option, truncate does not require vacuum or analyze, it is built for use cases like this.
For further info Redshift Truncate documentation

Need to drop 900+ postgres schemas but it wants me to vacuum first

I have 900+ postgres schemas (which collectively hold 40,000 tables) that I'd like to drop. However, it appears that it wants me to vacuum everything first, because I get this whenever I try to drop a schema.
ERROR: database is not accepting commands to avoid wraparound data loss in database
Is there a way to drop a large number of schemas without having to vacuum first?
IS there any problem is running the vacuum command. It is like a garbage collection for a database. I use postgre database and I use this command before doing any major work like backup or creating a sql scripts of the whole database.
VACUUM reclaims storage occupied by dead tuples. In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Therefore it's necessary to do VACUUM periodically, especially on frequently-updated tables.
You've got two choices. Do the vacuum, or drop the whole database. xid wrap-around must be avoided.
https://blog.sentry.io/2015/07/23/transaction-id-wraparound-in-postgres
There is not much you can do, except VACUUM oder dropping the database.
In addition, if you don't do the VACUUM, the database will not work for anything, not just for the schemas you want to drop.

understanding unlogged tables, commits and checkpoints postgres

I have to upsert large number of rows in multiple tables in Postgres 9.6 daily. 6 million rows of 1kb each can be loaded on some of these tables, volume is extremely large and this needs to be done rather quickly.
I have some transformation needs so I cannot use copy directly from source; also, copy doesn't update existing rows. I could use fdw wrapper after transformation, but that will also require SQL. So, I decided to write a java program using producer consumer concurrency pattern with producers reading from files, and consumers writing to postgres with multi row upserts.
I can recreate data from ultimate source, I figure I can skip WAL and use unlogged tables - make them unlogged before copy and make them logged after copy, I am on v9.6.
I did all this and now want to perform performance tests, before I do that I want to know what a commit and checkpoint of an unlogged table means. I suspect data and index files (I have dropped the index) are going to be written at checkpoint. Here are my questions
1) What happens at commit since commit applies to WAL and there is no data for unlogged tables in WAL? Is commit writing my data and index files to disk; or, its unnecessary operation and data will only be written at checkpoint?
2) Ultimately, what should I be tuning for unlogged tables?
Thanks.

Very simple in memory storage

I need some very simple storage to keep data, before it will be added to PostgreSQL. Currently, I have web application that collects data from clients. This data is not very important. Application collects about 50 kb of data (simple string) from one client. It will be collect about 2GB data per hour.
This data is not needed ASAP and it's nothing if it will be lost.
Is there a existing solution to store it in memory for a while (~ 1 hour), and then write it all in PostgreSQL. I don't need to query it in any way.
I can use Redis, probably, but Redis is too complex for this task.
I can write something by myself, but this tool will be must to handle many requests to store data (maybe about 100 per second) and existing solution may be better.
Thanks,
Dmitry
If you do not plan to work this data operatively so why do you want to store it in memory? You may create UNLOGGED table and store data in this table.
Look at the documentation for details:
UNLOGGED
If specified, the table is created as an unlogged table. Data written
to unlogged tables is not written to the write-ahead log, which makes
them considerably faster than ordinary tables.
However, they are not crash-safe: an unlogged table is automatically
truncated after a crash or unclean shutdown. The contents of an
unlogged table are also not replicated to standby servers. Any indexes
created on an unlogged table are automatically unlogged as well;
however, unlogged GiST indexes are currently not supported and cannot
be created on an unlogged table.
Storing data in memory sounds like caching to me. So, if you are using Java, I would recommend Guava Cache to you!
It seems to fit all your requirements, e.g. setting an expiry delay, handling the data once it is evicted from the cache:
private Cache<String, Object> yourCache = CacheBuilder.newBuilder()
.expireAfterWrite(2, TimeUnit.HOURS)
.removalListener(notification -> {
// store object in DB
})
.build();