Postgres load balance with limited hardware resources - postgresql

I've got a task to do and some limited hardware resources, as always.
I need to setup postgres server with single database, with a table of largeobjects (3TB+) and a few small, heavily accessed tables (<10 GB).
I've got old physical server with ~5 TB of harddisk space, with limited CPU and RAM, I can also use much faster (in CPU and RAM) virtual server - but limited in storage.
I won't have much DELETE statements, most SELECT statements will be to recent data. There will be one simultanous connection doing all the job, client on one host only.
I see a few scenarios:
Postgres on virtual machine with remote storage (single instance)
Postgres on old hardware with local storage (single instance)
Postgres on both, with some kind of replication (high speed virtual machine for new data, low speed for older data on the old hardware)
Any other ideas?
Is it even possible to replicate just the most recent part of the postgres database?
90% of SELECT queries will be to the most recent ~5-10 gigabytes of data, but I need seamless access to the rest 2,990 TB.
What should I do? (except buying appropriate hardware;)

It doesn't really matter as long as you have enough RAM to buffer the 10GB of heavily accessed data.
You'll need some additional RAM to read large objects without pushing the 10GB out of the cache, but that shouldn't be a problem on today's machines.
If all your work is done on one connection, that sounds like there will be no high load on the database.
So I wouldn't really worry about scaling with requirements like that.
Your biggest worry should probably be how to backup 3TB of data in a reasonable time.
Edit: If you have much less memory, you should take the machine with the faster storage.

Finally I've checked several different scenarios and decided not to keep files/largeobjects in database.
Postgres with database location mounted over NFS (v4) had some lags - It was faster but it was choking for a few seconds periodically, i decided to store plain files over NFS which is significantly slower but more stable.
I'm sure there was a way to tune it, but this solution is fine too.
Postgres is used for file index and keeps their files on local harddisk.

Related

Postgres configuration for use in embedded?

I have the same 5000 key/value pairs being read/written continuously (every 150ms or so) on a Debian system equivalent to a Raspberry Pi 3.
I don't care about persisting this data, it's recreated whenever my application server is launched.
Initially I used SQLite for this, using an in-memory table. However, now I want to access the data from multiple processes (using a tmpfs didn't work out great) and even from a remote client, as well as add an HTTP API, use LISTEN/NOTIFY for change notifications, so I'd like to switch to PG which is more appropriate for these.
Given these circumstances:
small dataset that fits in RAM
no need for persistence
low power PC
running 24/7 forever
don't want to thrash the flash storage
...what would be a good approach to configuring PG?
I found this 10yo question and the last update was 5 years ago saying to use a 3rd party extension, which I'm not too excited about.
You should create few indexes apart from the primary keys and keep the fillfactor of all your tables low, perhaps around 50. That should get you HOT updates, which will reduce the need for VACUUM and the amount of data written.
You may want to reduce shared_buffers to conserve memory, but keep it big enough to contain the database.
Set synchronous_commit to off to have less disk I/O. If you are ready to ditch the database after an unclean shutdown or system crash, you can set fsync = off, but then you have to remove the cluster after each crash. If you take it that far, you could reduce the write load further by using unlogged tables.
Set checkpoint_timeout high for fewer writes.

PostgreSQL on a RAM disk

I'm trying to get the fastest possible queries from PostgreSQL, and I'm going to be testing this out but I want to know what kinds of issues could I run in to.
Servers
1X PostgreSQL Master. With all of it's data on a 20GB Ramdisk. (Leaving ~12GB of RAM for OS and programs)
2X PostgreSQL Replica (Hot-Standby). With all of it's data on a RAID 10 of SSDs.
Config
Synchronous commit is disabled
wal_buffer is set to 16MB
wal_writer_delay is 400ms
checkpoint_segments is 64
shared_buffers is 3GB
Loss of data that has not been committed yet is acceptable in this setup. But once the data is committed after the 400ms then it needs to be able to survive any single machine in this setup failing.
If the master fails that is okay and the last ~400ms is lost which is fine. But one of the other two nodes should then pick up where the master left off; although without the RAM disk.
We want to be able to query and insert data as fast as absolutely possible, and we have contingencies built into our application to handle the master failing.
What problems would this configuration cause, or what problems or difficulties might we face?
Any other information that might be needed I can provide.

Heroku PostgreSQL Crane DB vs Linode 1GB with PostgreSQL installed

I'm trying to decide between using the Heroku Crane PostgreSQL database ($50/month - https://postgres.heroku.com/pricing) or setting up a Linode 1GB Ram / 8 CPU / 48GB Storage / 2TB Transfer instance with PostgreSQL installed ($20/month - https://www.linode.com/).
I know that from a management perspective, using Heroku Crane PostgreSQL would be much easier, as everything is managed with security and backups taken care of.
What I was curious about is how performance of the two databases would compare. With the Linode 1GB / 8 CPU instance, only my database will be used on it. I see with Heroku Crane that it says it only gets 400 MB RAM. It also isn't clear with Heroku Crane how many CPU's I get and whether its a dedicated instance.
Does the Heroku DB manages the RAM/Cache of the DB more efficiently? Its unclear to me whether the Linode PostgreSQL instance would automatically use the 1GB RAM available to it efficiently, or if it would require custom setup on my part to ensure the DB is loaded into RAM.
If it is that the Heroku DB would be less performant for the money, but is a better deal because security, backups and management are taken care of, that is probably acceptable, I just want to understand the tradeoffs.
Thanks for any info people can provide. I'm new to DB management, and have been using a Linode 1GB instance with PostgreSQL installed for development and testing, but now that I'm going to production, am questioning whether to move over to Heroku Crane. Also, not sure if this matters, but my server is hosted through Heroku web instances.
Lower Heroku plans are on a shared server partitioned into containers with LXC. The details are on Heroku's site. Your plan appears to be one of them.
This can actually be a win as discussed in this question if you happen to be on an instance where other users aren't putting much load on the server. It makes your performance less predictable though.
The only good way to characterize performance is to benchmark with a simulation of your production workload.
Whether the RAM actually matters or not depends on your data. If your frequently accessed data and indexes fit in RAM on one machine, but not on another, then the RAM difference will make a huge difference. If the data fits in RAM on both hosts there's little benefit to adding RAM. If the hot data won't fit in RAM on either machine then disk I/O performance becomes more important than RAM, mostly random read I/O and the fsync() flush rate.
So. Benchmark with a simulation of your workload with your expected data size and see.
Heroku discusses cache in more detail here.
(I work for another company in the same kind of space as Heroku, per my profile, so I'm reluctant to express a strong opinion one way or the other).

MongoDB Sharding On One Machine

Does it make sense to implement mongodb sharding with say 100 shards on one beefier machine just to achieve higher concurrenct write into the database as I am told, there is a global lock for each monogod.exe process? Assuming that is possible, will that aproach give me higher write concurrency?
Running multiple mongods on a machine is not a good idea. Every one of the mongod processes will try to use all the available memory, forcing other mongod's memory mapped pages out of memory. This will create an enormous amount of swapping in most cases.
The global database lock is generally not a problem as is demonstrated in: http://blog.pythonisito.com/2011/12/mongodbs-write-lock.html
Only use one mongod per machine (but it's fine to add a mongos or config server as well), unless it's for some simple testing.
cheers,
Derick
I totally disagree. We run 8 shards per box in our setup. It consists of two head nodes each with two other machines for replication. 6 boxes total. These are beefy boxes with about 120GB of RAM, 32 Cores and 2TB each. By having 8 shards per box (we could go higher by the way this is set at 8 for historic purposes) we make sure we utilize the CPU efficiently. The RAM sorts itself out. You do have to watch the metrics and make sure you aren't paging too much but with SSD drives (which we have) if you do spill onto the disk drives it isn't too bad.
The only use case where I found running several mongod on the same server was to increase replication speed on high latency connection.
As highlighted by Derick, the write lock is not really your issue when running mongodb.
To answer your question : yes you can demonstrate mongo scaling with several instance per machine (4 instances per server sems to be enough) if your test does not involve too much data (otherwise page out will dramatically decrase your performance, I have already tested it)
However, instances will still compete for resources. All you will manage to do is to shift the database lock issue to a resource lock issue.
Yes, you can and in fact that's what we do for 50+ mil write-heavy database. Just make sure all your indexes per mongod fit into the RAM and there's room for growth and maintenance.
However, there's a small trade-off: Depending on what your target QPS is, this kind of sharing requires machines with more horsepower, whereas sharding on a single machine will not and in most cases you can do away with commodity, cheaper hardware.
Whatever the case is, do the series of performance tests (ageinst IO, Network, PQS etc) and establish your baseline carefully and consider SSD drives for storage and this may sound biased, but Linux XFS storage is also something to consider.

PostgreSQL In Memory Database

I want to run my PostgreSQL database server from memory. The reason is that on my new server, I have 24 GB of memory, and hardly any of it is used.
I know I can run this command to make a ramdisk:
mdmfs -s 1024m md2 /mnt
And I could theoretically have PostgreSQL store its data there. But the problem with this is that if the server crashes or reboots, the data will be gone.
Basically, I want the database to be loaded in memory at all times so that it does not have to go to the hard disk drive to read every record, since I have TONS of memory and since memory is faster than hard disk drives.
Is there a way to do this while also having PostgreSQL write to disk so I don't lose any data in case the server goes down? Or is there a way to cache all data in memory?
I'm now using streaming replication which is async. This means my MASTER could be running all in memory, with the separate SLAVE instance using traditional disk.
A machine restart would involve stopping the SLAVE, copying the postgresql data back into ramdisk and then restarting the MASTER followed by the SLAVE. This would be an interesting possibility which compares well with something like REDIS, but with the advantage of redundancy / hotstandby / backup / sql / rich toolset etc.
have you seen the Server Configuration manual chapter? check it out, then google postgresql memory tuning.
I have to believe that Postgres is written in such a way as to take full advantage of available RAM in the server. As you may have guessed by now, there's no reliable way to do this outside of Postgres.
Within Postgres, transactions assure that all operations are atomic, so if the power goes down while you are writing to a Postgres database, you will only lose that particular operation, and not the entire database.
The answer is caching. Look into adding memory to the server, then tuning PostgreSQL to maximize memory usage. Also, the file system cache will help with this, doing some of it automatically. You will be able to speed up performance, almost as if it were in memory except for the first hit, while not having to manage it yourself, and being able to have a database larger than the physical memory.