AWS RDS I/O usage very different from Postgres blocks read - postgresql

I created a test Postgres database in AWS RDS. Created a 100 million row, 2 column table. Ran select * on that table. Postgres reports "Buffers: shared hit=24722 read=521226" but AWS reports IOPS in the hundreds. Why this huge discrepancy? Broadly, I'm trying to figure out how to estimate the number of AWS I/O operations a query might cost.

PostgreSQL does not have insight into what the kernel/FS get up to. If PostgreSQL issues a system call to read the data, then it reports that buffer as "read". If it was actually served out of the kernel's filesystem cache, rather than truly from disk, PostgreSQL has no way of knowing that (although you can make some reasonable statistical guesses if track_io_timing is on), while AWS's IO monitoring tool would know.
If you set shared_buffers to a large fraction of memory, then there would be little room left for a filesystem cache, so most buffers reported as read should truly have been read from disk. This might not be a good way run the system, but it might provide some clarity to your EXPLAIN plans. I've also heard rumors that Amazon Aurora reimplemented the storage system so that it uses directIO, or something similar, and so doesn't use the filesystem cache at all.

Related

Cloud SQL disk size is much larger than actual database

Cloud SQL reports that I've used ~4TB of SSD storage, but my database is only ~225 GB. What explains this discrepancy? Is there something I can delete to free up space? If I moved it to a different instance, would the required storage go down?
There are a couple of options about why your Cloud SQL storage has increase:
-Did you enable Point-in-time recovery? PITR uses write-ahead logs and if you enabled this feature, that could be the reason why of your increases.
-Have you used temporary tables and you have not deleted them?
If none of the above applies to you, I highly recommend you to open a case with GCP support team so that they take a look at your Cloud SQL instance.
On the other hand, you should open a case to decrease the disk size to a smaller one so it won’t be necessary to create a new instance and copy all the data to that new instance in addition that shrinking the disk is done at Google's end making the effort from you the lowest possible.
A maintenance window can be scheduled where Google can proceed with this task and you may want to schedule a maintenance window to minimize the impact of the downtime. For this case it is necessary to know the new disk size and when you would like to perform this operation.
Finally, if you prefer to use the migration method, you should export the DB, then create the new instance, import the DB and synchronize the old one with the new one to have all the data in both instances to which can take several hours to complete those four steps.
You do not specify what kind of database. In my case, for a MySQL database, there were several hundred GB as binary logs (mysql flag).
You could check with:
SHOW BINARY LOGS;

Postgres why is swap-usage growing? How to reduce it? - AWS RDS

Having a postgres DB on AWS-RDS the Swap Usage in constantly rising.
Why is it rising? I tried rebooting but it does not sink. AWS writes that high swap usage is "indicative of performance issues"
I am writing data to this DB. CPU and Memory do look healthy:
To be precise i have a
db.t2.micro-Instance and at the moment ~30/100 GB Data in 5 Tables - General Purpose SSD. With the default postgresql.conf.
The swap-graph looks as follows:
Swap Usage warning:
Well It seems that your queries are using a memory volume over your available. So you should look at your queries execution plan and find out largest loads. That queries exceeds the memory available for postgresql. Usually over-much joining (i.e. bad database structure, which would be better denonarmalized if applicable), or lots of nested queries, or queries with IN clauses - those are typical suspects. I guess amazon delivered as much as possible for postgresql.conf and those default values are quite good for this tiny machine.
But once again unless your swap size is not exceeding your available memory and your are on a SSD - there would be not that much harm of it
check the
select * from pg_stat_activity;
and see if which process taking long and how many processes sleeping, try to change your RDS DBparameter according to your need.
Obviously you ran out of memory. db.t2.micro has only 1GB of RAM. You should look in htop output to see which processes takes most of memory and try to optimize memory usage. Also there is nice utility called pgtop (http://ptop.projects.pgfoundry.org/) which shows current queries, number of rows read, etc. You can use it to view your postgress state in realtime. By the way, if you cannot install pgtop you can get just the same information from posgres internal tools - check out documentation of postgres stats collector https://www.postgresql.org/docs/9.6/static/monitoring-stats.html
Actually it is difficult to say what the problem is exactly but db.t2.micro is a very limited instance. You should consider taking a biggier instance especially if you are using postgres in production.

Choosing size of Cloud SQL Instance

I have a MySQL database whose tables Weight in at 40GB, and when I run the command "top -c" it shows that MySQL is using 14GB of RAM.
In order to migrate this database to Cloud SQL, do I have to maintain the same amount of ram? Or does the way that Cloud SQL manage indexes and memory mean that I can choose a server is less RAM? Are there tools that will allow me to know if performance is suffering due to lack of RAM?
Thanks for your feedback
Regards,
Marcelo
In general, you would not have to maintain that same amount of RAM. MySQL is caching data in RAM for fast access, so less RAM = slower queries.
Before migrating to the cloud, I'd run some test using MySQL benchmarks, which should give you some empirical numbers to help make your decision as to the amount of RAM you need.

PostgreSQL Performance: keep seldomly used small database in memory while server busy with big database

I have a server with 64GB RAM and PostgreSQL 9.2. On it is one small database "A" with only 4GB which is only queried once an hour or so and one big database "B" with about 60GB which gets queried 40-50x per second!
As expected, Linux and PostgreSQL fill the RAM with the bigger database's data as it is more often accessed.
My problem now is that the queries to the small database "A" are critical and have to run in <500ms. The logfile shows a couple of queries per day that took >3s though. If I execute them by hand they, too, take only 10ms so my indexes are fine.
So I guess that those long runners happen when PostgreSQL has to load chunks of the small databases indexes from disk.
I already have some kind of "cache warmer" script that repeats "SELECT * FROM x ORDER BY y" queries to the small database every second but it wastes a lot of CPU power and only improves the situation a little bit.
Any more ideas how to tell PostgreSQL that I really want that small database "sticky" in memory?
PostgreSQL doesn't offer a way to pin tables in memory, though the community would certainly welcome people willing to work on well thought out, tested and benchmarked proposals for allowing this from people who're willing to back those proposals with real code.
The best option you have with PostgreSQL at this time is to run a separate PostgreSQL instance for the response-time-critical database. Give this DB a big enough shared_buffers that the whole DB will reside in shared_buffers.
Do NOT create a tablespace on a ramdisk or other non-durable storage and put the data that needs to be infrequently but rapidly accessed there. All tablespaces must be accessible or the whole system will stop; if you lose a tablespace, you effectively lose the whole database cluster.

Configuration Recommendations for a PostgreSQL Installation

I have a Windows Server 2003 machine which I will be using as a Postgres database server, the machine is a Dual Core 3.0Ghz Xeon with 4 GB ECC Memory and 4 x 120GB 10K RPM SAS Drives, all stripped.
I have read that the default Postgres install is configured to run nicely on a 486 with 32MB RAM, and I have read several web pages about configuration optimizations - but was hoping for something more concrete from my Stackoverflow peeps.
Generally, its only going to serve 1 database (potentially one or two more) but the catch is that the database has 1 table in particular which is massive (hundreds of millions of records with only a few coloumn). Presently, with the default configuration, it's not slow, but I think it could potentially be even faster.
Can people please give me some guidance and recomendations for configuration settings which you would use for a server such as this.
4*stripped drive was a bad idea — if any of this drives will fail then you'll loose all data, and even SAS drives sometimes fail — with 4 drivers it is 4 times more likely than with 1 drive; you should go for RAID 1+0.
Use the latest version of Postgres, 8.3.7 now; there are many performance improvements in every major version.
Set shared_buffers parameter in postgresql.conf to about 1/4 of your memory.
Set effective_cache_size to about 1/2 of your memory.
Set checkpoint_segments to about 32 (checkpoint every 512MB) and checkpoint_completion_target to about 0.8.
Set default_statistics_target to about 100.
Migrate to Enterprise Linux or FreeBSD: Postgres works much better on Unix type systems — Windows support is a recent addition, not very mature.
You can read more on this page: Tuning Your PostgreSQL Server — PostgreSQL Wiki
My experience suggests that (within limits) the hardware is typically the least important factor in database performance.
Assuming that you have enough memory to keep commonly used data in cache, then your CPU speed may vary 10-50% between a top-of the line machine and a common or garden box.
However, a missing index in an important search, or a poorly written recursive trigger could easily make a difference of 1,000% or 10,000% or more in your response times.
Without knowing exactly your table structure and row counts, I think anybody would suggest that your your hardware looks amply sufficient. It is only your database structure which will kill you. :)
UPDATE:
Without knowing the specific queries and your index details, there's not much more we can do. And in general, even knowing the queries, it's often very difficult to optimize without actually installing and running the queries with realistic data sets.
Given the cost of the server, and the cost of your time, I think you need to invest thirty bucks in a book. Then install your database with test data, run the queries, and see what runs well and what runs badly. Fix, rinse, and repeat.
Both of these books are specific to SQL Server and both have high ratings:
http://www.amazon.com/Inside-Microsoft%C2%AE-SQL-Server-2005/dp/0735621969/ref=sr_1_1
http://www.amazon.com/Server-Performance-Tuning-Distilled-Second/dp/B001GAQ53E/ref=sr_1_5