Postgres recovery after destruction of temporary tablespace - postgresql

I'm attempting to speed up the performance of postgresql on ec2.
An ec2 node is structured as follows - you have slow, durable network attached storage (EBS), and you also have a fast, volatile storage (ephemeral storage). I.e., in a system crash, ephemeral storage will be lost.
In order to speed up db performance I'm considering setting my postgres temp_tablespaces to a directory living in ephemeral storage. However, ephemeral storage has no durability guarantees - in a system crash it will all be completely and permanently destroyed.
Does this run the risk of any data loss? In principle, it seems to me it should not, since the temp_tablespace is used for temporary objects. But I'm not intimately familiar with the postgres data model - are there dangers here that I'm missing?

Yes, that should be safe, if you crash before the operation that requires the temporary table is fully committed, you should recover to the point before the operation. Though, I don't know if Postgresql clears that area on a restart, I would check for yourself.
Now a proper geek would try to implement a file system over Amazon's memcache equivalent and use that...

Related

Postgresql Aurora DB freeable_memory

I have a question regarding the freeable memory for AWS Aurora Postgres.
We recently wanted to create an index on one of our dbs and the db died and made a failover to the slave which all worked fine. It looks like the freeable memory dropped by the configured 500mb of maintenance_work_mem and by that went to around 800mb of memory - right after that the 32gig instance died.
1) I am wondering if the memory that is freeable is the overall system memory and if a low memory here could invoke the system oom killer on the AWS Aurora instance? So we may want to plan in more head room for operational tasks and the running of autovacuum jobs to not encounter this issue again?
2) The actual work of the index creation should then have used the free local storage as far as I understood, so the size of the index shouldn't have mattered, right?
Thanks in advance,
Chris
Regarding 1)
Freeable Memory from (https://forums.aws.amazon.com/thread.jspa?threadID=209720)
The freeable memory includes the amount of physical memory left unused
by the system plus the total amount of buffer or page cache memory
that are free and available.
So it's freeable memory across the entire system. While MySQL is the
main consumer of memory on the host we do have internal processes in
addition to the OS that use up a small amount of additional memory.
If you see your freeable memory near 0 or also start seeing swap usage
then you may need to scale up to a larger instance class or adjust
MySQL memory settings. For example decreasing the
innodb_buffer_pool_size (by default set to 75% of physical memory) is
one way example of adjusting MySQL memory settings.
That also means that if the memory gets low its likely to impact the process in some form. In this thread (https://forums.aws.amazon.com/thread.jspa?messageID=881320&#881320) e.g. it was mentioned that it caused the mysql server to restart.
Regarding 2)
This is like it is described in the documentation (https://aws.amazon.com/premiumsupport/knowledge-center/postgresql-aurora-storage-issue/) so I guess its right and the size shouldn't have mattered.
Storage used for temporary data and logs (local storage). All DB
temporary files (for example, logs and temporary tables) are stored in
the instance local storage. This includes sorting operations, hash
tables, and grouping operations that are required by queries.
Each Aurora instance contains a limited amount of local storage that
is determined by the instance class. Typically, the amount of local
storage is twice the amount of memory on the instance. If you perform
a sort or index creation operation that requires more memory than is
available on your instance, Aurora uses the local storage to fulfill
the operation.

Postgresql DB backup Ideal practices

• What are ideal practices for taking PostgreSQL logical backup using pg_dump?
• Is it ideal to take backup from a standby/slave node? If replication lag is less than 200ms
• Is it ideal to take backup from standby/slave node, and is there any specific configuration we need to change?
• Which method is a good way for taking backups logical backup or physical backup? where DB is getting updated frequently. As a backup is taken for disaster recovery which method is the faster and better backup and disaster recovery(restore).
updated
Our current database size is 5GB and replication is on hot standby mode.
We are running the Backup script on slave node but it takes remote backup from the master node every 30 minutes.
The reason I created this question is to understand when the backup is running some COPY statements takes 6 mins to complete, even though it will not affect other transactions on DB, is there any other issues occurs if a statement is taking more time.
I thought about what you wrote and here are some ideas for you:
If you need backup which will really be consistent to some point in time then you must use pg_basebackup or pg_barman (internally uses pg_basebackup) - explanation is in 1. link below. Latest pg_basebackup 10 streams WAL logs so you backup also all changes done during backup. Of course this backup takes only the whole PG instance. On the other hand it does not lock any table. And if you do it from remote instance then it causes only small CPU load on PG instance and disk IO is not as big as some texts suggests. See links 4 about my experiences. Restoration is quite simple - see link 5.
If you use pg_dump you must understand that you have no guarantee that your backup is really consistent to the point in time - again see link 1. There is a possibility to use snapshot of the database (see links 2 and 3) but even with it you cannot count on 100% consistency. We used pg_dump only on our analytical database which loads new only 1x per day (yesterdays partitions from production database). You can speed it with parallel option (works only for directory backup format). But downside is much higher load on PG instance - higher CPU usage, much higher disk IO. Even if you run pg_dump remotely - in such case you save only disk IO for saving of backup files. Plus pg_dump needs to place read lock on tables so it can collied either with new inserts or with replication (when taken on replica). But when your database reaches hundreds of GBs then even parallel dump can takes hours and in that moment you would need to switch to pg_basebackup anyway.
pg_barman is "comfortable version" of pg_basebackup + it allows you to prevent data loss even when your PG instance crashes very badly. Setting it to work requires more changes but it is definitely worth it. You will have to set WAL log archiving (see link 6) and if you PG is <10 you will have to set "max_wal_senders" and "max_replication_slots" (which you need for replication anyway) - everything is in pg-barman manual although description is not exactly great. pg_barman will stream and store WAL records even between backups so this way you can be sure that data loss in case of very bad crash will be almost none. But making it work can take many hours because descriptions are not exactly good. pg-barman does both backup and restoration with its commands.
Your database is 5GB big so any backup method will be quick. But you have to decide if you need point in time recovery and almost zero data loss or not - so if you will invest time to setting pg-barman or not.
Links:
PostgreSQL, Backups and everything you need to know
Review for Paper: 14-Serializable Snapshot Isolation in PostgreSQL - about snapshots
Parallel dumping of databases - example how to use snapshot
pg_basebackup experiencies
pg_basebackup - restore tar backup
Archiving WAL logs using script

Postgres load balance with limited hardware resources

I've got a task to do and some limited hardware resources, as always.
I need to setup postgres server with single database, with a table of largeobjects (3TB+) and a few small, heavily accessed tables (<10 GB).
I've got old physical server with ~5 TB of harddisk space, with limited CPU and RAM, I can also use much faster (in CPU and RAM) virtual server - but limited in storage.
I won't have much DELETE statements, most SELECT statements will be to recent data. There will be one simultanous connection doing all the job, client on one host only.
I see a few scenarios:
Postgres on virtual machine with remote storage (single instance)
Postgres on old hardware with local storage (single instance)
Postgres on both, with some kind of replication (high speed virtual machine for new data, low speed for older data on the old hardware)
Any other ideas?
Is it even possible to replicate just the most recent part of the postgres database?
90% of SELECT queries will be to the most recent ~5-10 gigabytes of data, but I need seamless access to the rest 2,990 TB.
What should I do? (except buying appropriate hardware;)
It doesn't really matter as long as you have enough RAM to buffer the 10GB of heavily accessed data.
You'll need some additional RAM to read large objects without pushing the 10GB out of the cache, but that shouldn't be a problem on today's machines.
If all your work is done on one connection, that sounds like there will be no high load on the database.
So I wouldn't really worry about scaling with requirements like that.
Your biggest worry should probably be how to backup 3TB of data in a reasonable time.
Edit: If you have much less memory, you should take the machine with the faster storage.
Finally I've checked several different scenarios and decided not to keep files/largeobjects in database.
Postgres with database location mounted over NFS (v4) had some lags - It was faster but it was choking for a few seconds periodically, i decided to store plain files over NFS which is significantly slower but more stable.
I'm sure there was a way to tune it, but this solution is fine too.
Postgres is used for file index and keeps their files on local harddisk.

MongoDB replication and EBS vs ephemeral

I've read all of the MongoDB related documentation talking about the recommended practices for deploying Mongo on AWS, but I don't understand the recommendation to install on EBS with RAID-10 (pdf) to avoid data loss.
This seems like admitting that replication doesn't work. Why shouldn't one run Mongo using ephemeral drives and a cluster of 5 servers doing replication?
Performance is much greater and latency is predictable on local disks.
If a server goes down, the EBS backed store would have to be resynced with the replica anyway. Sure you have the data, but it is already out of date.
Using EBS makes for a much more complicated setup. You need to use LVM or some other layer if you want to take snapshots, since EBS snapshots won't work across RAID. You need to monitor and manage your RAID array and rebuild in the case of failure or if one of the EBS volumes has performance issues.
What exactly does using EBS protect against if one has backups and a large replica set? It's almost admitting that replica sets won't protect you against dataloss. (ignoring for the moment the race condition when writes have been sent to secondaries and a failure on the master happens before acknowledgements have been sent).
Why shouldn't one run Mongo using ephemeral drives and a cluster of 5 servers doing replication?
AWS is not perfect, it can have a network failure which results in the entire set being down. with ephemeral memory you would lose all your data. Plus block devices survive restarts of nodes.
That is a few things, I am sure there are more.
If a server goes down, the EBS backed store would have to be resynced with the replica anyway.
Only after the point it went down, if that is a considerable amount of time then yes, it might be easier to copy the directory frm one replica to the other.
Using EBS makes for a much more complicated setup. You need to use LVM or some other layer if you want to take snapshots, since EBS snapshots won't work across RAID.
You don't really need RAID within AWS itself, I mean they RAID each of your block devices and replica sets are good as throw away sets. You can get by with one block device per node.
What exactly does using EBS protect against if one has backups and a large replica set?
It safe guards your sanity, restoring a backup of sizeable data across 10 odd members and resetting all the firewall/user permissions and OS etc etc could be...well...nasty.
I mean imagine having to re-setup your OS every single time you restart it.
It's almost admitting that replica sets won't protect you against dataloss.
Hmm, you must have misread some where brecaue THAT is not what they guarantee. It is true that it is harder to lose data with repilica sets (if they are setup right) but they are actually designed to give High Availability (HA).
Backups and jornalling and other consistentcy methods are designed to not lose data.
So where do you see the recommendation to run RAID10 on EBS for mongodb? Their docs list it as an option but specifically recommend only EBS and Provisioned IOPS.
For almost all deployments EBS will be the better choice. For production systems we recommend using
EBS-optimized EC2 instances
Provisioned IOPS (PIOPS) EBS volumes
http://docs.mongodb.org/ecosystem/platforms/amazon-ec2/
We run all of our mongodb instances at EC2 and all of them use EBS storage volumes with production instances using provisioned IO. Here's why:
Bringing back a failed member is faster. If an instance fails and needs to be stopped and restarted (not that frequent but it does happen) we can just detach the storage and re-attach it to another instance. Mongod comes up fine, recovers via the journal and then re-syncs with the primary for only the delta in data since the failure. This is a big deal when you have large data sets that may take many hours to restore from scratch. Storing the data on an ephemeral drive does not provide this capability.
Backups are easier (at least for replica sets under 1 TB). With a single EBS storage volume (up to 1 TB) we can take snapshots of a live secondary. As long as the journal is on the same storage volume the backup will be consistent. No need for a dedicated secondary for backups that has to be brought offline to backup.
No need for RAID except for multiple TB replica sets or for performance. EBS is already RAID behind the scenes for redundancy. We do use RAID when a replica set grows beyond 1 TB in storage but that's it and have not yet hit a point where a high IOPS EBS volume provides sufficient performance.
Provisioned IOPS give decent control of performance vs. cost. Being able to select EBS storage rated up to 4000 IOPS has allowed us to scale up performance for production systems (at higher cost) while still gaining the benefits of EBS storage. We use regular EBS volumes at lower cost for test systems.
Copying production data off for use in a test environment is much easier for large data sets. Snapshot the volumes, create a new storage volume from the snapshot and you're up and running.
I certainly can imagine future deployments using ephemeral storage (particularly as SSD costs drop) for certain high performance situations but EBS has been fairly reliable and dependable for us. Of course your experience and needs can and will differ but for us following the recommendation from MongoDB has served us well. In fact it's been reliable enough that for some environments we've moved to 1 Primary, 1 Secondary and an Arbiter, which helps with cost savings. Probably would not have done that without the ease of recovery and overall reliability of using EBS volumes on the Primary and Secondary.

PostgreSQL In Memory Database

I want to run my PostgreSQL database server from memory. The reason is that on my new server, I have 24 GB of memory, and hardly any of it is used.
I know I can run this command to make a ramdisk:
mdmfs -s 1024m md2 /mnt
And I could theoretically have PostgreSQL store its data there. But the problem with this is that if the server crashes or reboots, the data will be gone.
Basically, I want the database to be loaded in memory at all times so that it does not have to go to the hard disk drive to read every record, since I have TONS of memory and since memory is faster than hard disk drives.
Is there a way to do this while also having PostgreSQL write to disk so I don't lose any data in case the server goes down? Or is there a way to cache all data in memory?
I'm now using streaming replication which is async. This means my MASTER could be running all in memory, with the separate SLAVE instance using traditional disk.
A machine restart would involve stopping the SLAVE, copying the postgresql data back into ramdisk and then restarting the MASTER followed by the SLAVE. This would be an interesting possibility which compares well with something like REDIS, but with the advantage of redundancy / hotstandby / backup / sql / rich toolset etc.
have you seen the Server Configuration manual chapter? check it out, then google postgresql memory tuning.
I have to believe that Postgres is written in such a way as to take full advantage of available RAM in the server. As you may have guessed by now, there's no reliable way to do this outside of Postgres.
Within Postgres, transactions assure that all operations are atomic, so if the power goes down while you are writing to a Postgres database, you will only lose that particular operation, and not the entire database.
The answer is caching. Look into adding memory to the server, then tuning PostgreSQL to maximize memory usage. Also, the file system cache will help with this, doing some of it automatically. You will be able to speed up performance, almost as if it were in memory except for the first hit, while not having to manage it yourself, and being able to have a database larger than the physical memory.