Is this disk read speed to be expected (Amazon EBS)? - postgresql

our Amazon EBS backed instance has slowed down considerably (maybe shifted physical host?).
I've checked the instance using top and the CPU use is very low when the process is activated (like 1%). Using iotop I have monitored the disk read speed of postgresql. When there is only one postgresql thread running it's reporting about a 5M/s read speed. Is this rather slow or is this in the parameters of usual disk read speeds?
Thanks

5MB/s its more or less the typical for one single hard drive. I mean sequential accesses of course. If you have only one hard disk then your CPU its fine since one hard disk its not enough to stress out your CPU probably, if you are not reaching any more speed than that even with constant queries then your hard disk its the bottleneck.

Related

PostgreSQL benchmarking over a RAMdisk?

I have been considering the idea of moving to a RAMdisk for a while. I know its risks, but just wanted to do a little benchmark. I just had two questions: (a) when reading the query plan, will it still differentiate between disk and buffers hits? If so, should I assume that both are equally expensive or should I assume that there is a difference between them?
(b) a RAM disk is not persistent, but if I want to export some results to persistent storage, are there some precautions I would need to take? Is it the same as usual e.g. COPY command?
I do not recommend using RAM disks in PostgreSQL for persistent storage. With careful tuning, you can get PostgreSQL not to use more disk I/O than what is required to make your data persistent.
I recommend doing this:
Have more RAM in your machine than the size of the database.
Define shared_buffers big enough to contain the database (on Linux, define memory hugepages to contain them).
Increase checkpoint_timeout and max_wal_size to get fewer checkpoints.
Set synchronous_commit = off to keep PostgreSQL from syncing WAL to disk on every commit.
If you are happy to lose all your data in the case of a crash, define your tables UNLOGGED. The data will survive a normal shutdown.
Anyway, to answer your questions:
(a) You should set seq_page_cost and random_page_cost way lower to tell PostgreSQL how fast your storage is.
(b) You could run backups with either pg_dump or pg_basebackup, they don't care what kind of storage you have got.
when reading the query plan, will it still differentiate between disk and buffers hits?
It never distinguished between them in the first place. It distinguishes between "hit" and "read", but the "read" can't tell which are truly from disk and which are from OS/FS cache.
PostgreSQL has no idea you are running on a RAM disk, so will continue to report those as it always has.
If so, should I assume that both are equally expensive or should I assume that there is a difference between them?
This is a question that should be answered through your benchmarking. On some systems, memory can be read-ahead from main memory into the faster caches, making sequential reads still faster than random reads. If you care, you will have to benchmark it on your own system.
Reading data from RAM into shared_buffers is still surprisingly expensive due to things like lock management. So as a rough starting point, maybe seq_page_cost=0.1 and random_page_cost=0.15.
a RAM disk is not persistent, but if I want to export some results to persistent storage, are there some precautions I would need to take?
The risk would be that your system crashes before the export has finished. But what precaution can you take against that?

Postgresql Aurora DB freeable_memory

I have a question regarding the freeable memory for AWS Aurora Postgres.
We recently wanted to create an index on one of our dbs and the db died and made a failover to the slave which all worked fine. It looks like the freeable memory dropped by the configured 500mb of maintenance_work_mem and by that went to around 800mb of memory - right after that the 32gig instance died.
1) I am wondering if the memory that is freeable is the overall system memory and if a low memory here could invoke the system oom killer on the AWS Aurora instance? So we may want to plan in more head room for operational tasks and the running of autovacuum jobs to not encounter this issue again?
2) The actual work of the index creation should then have used the free local storage as far as I understood, so the size of the index shouldn't have mattered, right?
Thanks in advance,
Chris
Regarding 1)
Freeable Memory from (https://forums.aws.amazon.com/thread.jspa?threadID=209720)
The freeable memory includes the amount of physical memory left unused
by the system plus the total amount of buffer or page cache memory
that are free and available.
So it's freeable memory across the entire system. While MySQL is the
main consumer of memory on the host we do have internal processes in
addition to the OS that use up a small amount of additional memory.
If you see your freeable memory near 0 or also start seeing swap usage
then you may need to scale up to a larger instance class or adjust
MySQL memory settings. For example decreasing the
innodb_buffer_pool_size (by default set to 75% of physical memory) is
one way example of adjusting MySQL memory settings.
That also means that if the memory gets low its likely to impact the process in some form. In this thread (https://forums.aws.amazon.com/thread.jspa?messageID=881320&#881320) e.g. it was mentioned that it caused the mysql server to restart.
Regarding 2)
This is like it is described in the documentation (https://aws.amazon.com/premiumsupport/knowledge-center/postgresql-aurora-storage-issue/) so I guess its right and the size shouldn't have mattered.
Storage used for temporary data and logs (local storage). All DB
temporary files (for example, logs and temporary tables) are stored in
the instance local storage. This includes sorting operations, hash
tables, and grouping operations that are required by queries.
Each Aurora instance contains a limited amount of local storage that
is determined by the instance class. Typically, the amount of local
storage is twice the amount of memory on the instance. If you perform
a sort or index creation operation that requires more memory than is
available on your instance, Aurora uses the local storage to fulfill
the operation.

Why program is executed on Memory not HardDisk?

when I study in Computer Architecture and System Programming, some question rises up.
First of all, program is in SSD or Hard Disk but when it executed, this load to memory(RAM). Why program is not executed on HardDisk directly?? Why need to load on RAM?
Thanks
This is simply done because your RAM is way faster than your Hard Disk.
When your computer executes a program, the CPU reads all the instructions from memory one after another and executes them. The CPU itself cannot store the whole program while executing it, so it has to be read from somewhere else. If the CPU had to read the instructions from a hard disk, it would be crazy slow.
Now that we have SSDs this is becoming somewhat less relevant, but in the old days the difference between RAM ("Random Access Memory") and HDD ("Hard Disk Drive") was that the RAM could access any memory address at any point in time, thus "Random Access". The HDD would have to rotate the hard disk your data is stored on to read from a certain address. Accessing random memory addresses is very hard for an HDD.
When the CPU executes a program it has to jump around all the time. It also has to store the program's memory somewhere and access that as quickly as possible whenever needed. An HDD is very bad at those two things, a RAM is very good.
So why did we use HDDs at all? Because RAM
is way to expensive
does not persist data when turned off
And what about SSDs? They are a lot better at random access that HDDs, but they're still considerably slower than RAM.
Also, you have to take swap files into account. The computer can use some of your HDD or SSD storage as system memory if it needs to. This can be very useful if the data that's using up your RAM does not get accessed by the CPU very often.

Performance benchmarks for attaching read-only disks to google compute engine

Has anyone benchmarked the performance of attaching a singular, read-only disk to multiple Google Compute Engine instances (i.e., the same disk in read-only mode)?
The Google documentation ( https://cloud.google.com/compute/docs/disks/persistent-disks#use_multi_instances ) indicates that it is OK to attach multiple instances to the same disk, and personal experience has shown it to work at a small scale (5 to 10 instances), but soon we will be running a job across 500+ machines (GCE instances). We would like to know how performance scales out as the number of parallel attachments grows, and as the bandwidth of those attachments grows. We currently pull down large blocks of data (read-only) from Google Cloud Storage Buckets, and are wondering about the merits of switching to a Standard Persistent Disk configuration. This involves Terabytes of data, so we don't want to change course, willy-nilly.
One important consideration: It is likely that code on each of the 500+ machines will try to access the same file (400MB) at the same time. How do buckets and attached drives compare in that case? Maybe the answer is obvious - and it would save having to set up a rigorous benchmarking system (across 500 machines) ourselves. Thanks.
Persistent disks on GCE should have consistent performance. Currently that is 12MB/s and 30IOPS per 100GB of volume size for a standard persistent disk:
https://cloud.google.com/compute/docs/disks/persistent-disks#pdperformance
Using it on multiple instances should not change the disk's overall performance. It will however make it easier to use those limits since you don't need to worry about using the instance's maximum read speed. However, accessing the same data many times at once might. I do know how either persistent disks or GCS handle contention.
If it is only a 400MB file that are in contention, it may make sense to just benchmark the fastest method to deliver this separately. One possible solution is to make duplicates of your critical file and pick which one you access at random. This should cause less nodes to contend for each file.
Duplicating the critical file means a bigger disk and therefore also contributes to your IO performance. If you already intended to increase your volume size for better performance, the copies are free.

How to keep 32 bit mongodb memory usage down on changing dataset

I'm using MongoDB on a 32 bit production system, which sucks but it's out of my control right now. The challenge is to keep the memory usage under ~2.5GB since going over this will cause 32 bit systems to crash.
According to the mongoDB team, the best way to track the memory usage is to use your operating system's process tracking system (i.e. ps or htop on Unix systems; Process Explorer on Windows.) for virtual memory size.
The DB mainly consists of one table which is continually cycling data, i.e. receiving data at regular intervals from sensors, and every day a cron job wipes all data from before the last 3 days. Over a period of time, the memory usage slowly increases. I took some notes over time using db.serverStats(), db.lectura.totalSize() and ps, shown in the chart below. Note that the size of the table in question has reduced in the last month but the memory usage increased nonetheless.
Now, there is some scope for adjustment in how many days of data I store. Today I deleted basically half of the data, and then restarted mongodb, and yet the mem virtual / mem mapped and most importantly memory usage according to ps have hardly changed! Why do these not reduce when I wipe data (and restart)? I read some other questions where people said that mongo isn't really using all the memory that it might appear to be using, and that you can't clear the cache or limit memory use. But then how can I ensure I stay under the 2.5GB limit?
Unless there is a way to stem this dataset-size-irrespective gradual increase in memory usage, it seems to me that the 32-bit version of Mongo is unuseable. Note: I don't mind losing a bit of performance if it solves the problem.
To answer regarding why the mapped and virtual memory usage does not decrease with the deletes, the mapped number is actually what you get when you mmap() the entire set of data files. This does not shrink when you delete records, because although the space is freed up inside the data files, they are not themselves reduced in size - the files are just more empty afterwards.
Virtual will include journal files, and connections, and other non-data related memory usage also, but the same principle applies there. This, and more, is described here:
http://www.mongodb.org/display/DOCS/Checking+Server+Memory+Usage
So, the 2GB storage size limitation on 32-bit will actually apply to the data files whether or not there is data in them. To reclaim deleted space, you will have to run a repair. This is a blocking operation and will require the database to be offline/unavailable while it was run. It will also need up to 2x the original size in terms of free disk space to be able to run the repair, since it essentially represents writing out the files again from scratch.
This limitation, and the problems it causes, is why the 32-bit version should not be run in production, it is just not suitable. I would recommend getting onto a 64-bit version as soon as possible.
By the way, neither of these figures (mapped or virtual) actually represents your resident memory usage, which is what you really want to look at. The best way to do this over time is via MMS, which is the free monitoring service provided by 10gen - it will graph virtual, mapped and resident memory for you over time as well as plenty of other stats.
If you want an immediate view, run mongostat and check out the corresponding memory columns (res, mapped, virtual).
In general, when using 64-bit builds with essentially unlimited storage, the data will usually greatly exceed the available memory. Therefore, mongod will use all of the available memory it can in terms of resident memory (which is why you should always have swap configured to the OOM Killer does not come into play).
Once that is used, the OS does not stop allocating memory, it will just have the oldest items paged out to make room for the new data (LRU). In other words, the recycling of memory will be done for you, and the resident memory level will remain fairly constant.
Your options for stretching 32-bit are limited, but you can try some things. The thing that you run out of is address space, and the increases in the sizes of additional database files mean that you would like to avoid crossing over the boundary from "n" files to "n+1". It may be worth structuring your data into more or fewer databases so that you can get the maximum amount of actual data into memory and as little as possible "dead space".
For example, if your database named "mydatabase" consists of the files mydatabase.ns (the namespace file) at 16 MB, mydatabase.0 at 64 MB, mydatabase.1 at 128 MB and mydatabase.2 at 256 MB, then the next file created for this database will be mydatabase.3 at 512 MB. If instead of adding to mydatabase you instead created an additional database "mynewdatabase" it would start life with mynewdatabase.ns at 16 MB and mynewdatabase.0 at 64 MB ... quite a bit smaller than the 512 MB that adding to the original database would be. In fact, you could create 4 new databases for less space than would be consumed by adding a new file to the original database, and because the files are smaller they would be easier to fit into contiguous blocks of memory.
It is a well-known message that 32-bit should not be used for production.
Use 64-bit systems.
Point.