Choosing size of Cloud SQL Instance - google-cloud-sql

I have a MySQL database whose tables Weight in at 40GB, and when I run the command "top -c" it shows that MySQL is using 14GB of RAM.
In order to migrate this database to Cloud SQL, do I have to maintain the same amount of ram? Or does the way that Cloud SQL manage indexes and memory mean that I can choose a server is less RAM? Are there tools that will allow me to know if performance is suffering due to lack of RAM?
Thanks for your feedback
Regards,
Marcelo

In general, you would not have to maintain that same amount of RAM. MySQL is caching data in RAM for fast access, so less RAM = slower queries.
Before migrating to the cloud, I'd run some test using MySQL benchmarks, which should give you some empirical numbers to help make your decision as to the amount of RAM you need.

Related

AWS RDS I/O usage very different from Postgres blocks read

I created a test Postgres database in AWS RDS. Created a 100 million row, 2 column table. Ran select * on that table. Postgres reports "Buffers: shared hit=24722 read=521226" but AWS reports IOPS in the hundreds. Why this huge discrepancy? Broadly, I'm trying to figure out how to estimate the number of AWS I/O operations a query might cost.
PostgreSQL does not have insight into what the kernel/FS get up to. If PostgreSQL issues a system call to read the data, then it reports that buffer as "read". If it was actually served out of the kernel's filesystem cache, rather than truly from disk, PostgreSQL has no way of knowing that (although you can make some reasonable statistical guesses if track_io_timing is on), while AWS's IO monitoring tool would know.
If you set shared_buffers to a large fraction of memory, then there would be little room left for a filesystem cache, so most buffers reported as read should truly have been read from disk. This might not be a good way run the system, but it might provide some clarity to your EXPLAIN plans. I've also heard rumors that Amazon Aurora reimplemented the storage system so that it uses directIO, or something similar, and so doesn't use the filesystem cache at all.

Postgres why is swap-usage growing? How to reduce it? - AWS RDS

Having a postgres DB on AWS-RDS the Swap Usage in constantly rising.
Why is it rising? I tried rebooting but it does not sink. AWS writes that high swap usage is "indicative of performance issues"
I am writing data to this DB. CPU and Memory do look healthy:
To be precise i have a
db.t2.micro-Instance and at the moment ~30/100 GB Data in 5 Tables - General Purpose SSD. With the default postgresql.conf.
The swap-graph looks as follows:
Swap Usage warning:
Well It seems that your queries are using a memory volume over your available. So you should look at your queries execution plan and find out largest loads. That queries exceeds the memory available for postgresql. Usually over-much joining (i.e. bad database structure, which would be better denonarmalized if applicable), or lots of nested queries, or queries with IN clauses - those are typical suspects. I guess amazon delivered as much as possible for postgresql.conf and those default values are quite good for this tiny machine.
But once again unless your swap size is not exceeding your available memory and your are on a SSD - there would be not that much harm of it
check the
select * from pg_stat_activity;
and see if which process taking long and how many processes sleeping, try to change your RDS DBparameter according to your need.
Obviously you ran out of memory. db.t2.micro has only 1GB of RAM. You should look in htop output to see which processes takes most of memory and try to optimize memory usage. Also there is nice utility called pgtop (http://ptop.projects.pgfoundry.org/) which shows current queries, number of rows read, etc. You can use it to view your postgress state in realtime. By the way, if you cannot install pgtop you can get just the same information from posgres internal tools - check out documentation of postgres stats collector https://www.postgresql.org/docs/9.6/static/monitoring-stats.html
Actually it is difficult to say what the problem is exactly but db.t2.micro is a very limited instance. You should consider taking a biggier instance especially if you are using postgres in production.

design mongodb to load entire content in memory

I am involved in a project where they get enough RAM to store the entire database in memory. According to the manager, that is what 10Gen recommended. This is counter intuitive. Is that really the way you want to use Mongodb?
It is not counter intuitive... I find it quite intuitive, actually.
In How much faster is the memory usually than the disk? you can read:
(...) memory is only about 6 times faster when you're doing sequential
access (350 Mvalues/sec for memory compared with 58 Mvalues/sec for
disk); but it's about 100,000 times faster when you're doing random
access.
So if you can fit all your data in RAM, it is quite good because you are going to be really fast reading your data.
Regarding MongoDB, from the FAQ's:
It’s certainly possible to run MongoDB on a machine with a small
amount of free RAM.
MongoDB automatically uses all free memory on the machine as its
cache. System resource monitors show that MongoDB uses a lot of
memory, but its usage is dynamic. If another process suddenly needs
half the server’s RAM, MongoDB will yield cached memory to the other
process.
Technically, the operating system’s virtual memory subsystem manages
MongoDB’s memory. This means that MongoDB will use as much free memory
as it can, swapping to disk as needed. Deployments with enough memory
to fit the application’s working data set in RAM will achieve the best
performance.
The problem is that you usually have much more data than memory available. And then you have to go to disk, and disk I/O is slow. Regarding database performance, avoiding full scan queries is key (much more important when accessing to disk). Therefore, if your data set does not fit in memory, you should aim at having indexes for the vast majority of your access patterns and try to fit those indexes in memory:
If you have created indexes for your queries and your working data set
fits in RAM, MongoDB serves all queries from memory.
It all depends on the size of your database. I am guessing that you said your database was actually quite small, otherwise I cannot see how someone at 10gen gave such advice, I mean not even #Stennie gives such advice (he is 10gen by the way).
Even if your database is small I don't see how the manager recommended that. MongoDB does not do memory management of its own as such it does not "pin" data into pages like memcached does or other memory based databases do.
This means that the paging of mongods data can be quite unpredicatable, a.k.a you will spend more time trying to keep things in RAM than paging in data. This is why it is better to just make sure your working set fits and it can loaded with speed, such things are based upon your hardware and queries.
#Stennies comment pretty much sums up the stance you should be taking with MongoDB.

PostgreSQL Performance: keep seldomly used small database in memory while server busy with big database

I have a server with 64GB RAM and PostgreSQL 9.2. On it is one small database "A" with only 4GB which is only queried once an hour or so and one big database "B" with about 60GB which gets queried 40-50x per second!
As expected, Linux and PostgreSQL fill the RAM with the bigger database's data as it is more often accessed.
My problem now is that the queries to the small database "A" are critical and have to run in <500ms. The logfile shows a couple of queries per day that took >3s though. If I execute them by hand they, too, take only 10ms so my indexes are fine.
So I guess that those long runners happen when PostgreSQL has to load chunks of the small databases indexes from disk.
I already have some kind of "cache warmer" script that repeats "SELECT * FROM x ORDER BY y" queries to the small database every second but it wastes a lot of CPU power and only improves the situation a little bit.
Any more ideas how to tell PostgreSQL that I really want that small database "sticky" in memory?
PostgreSQL doesn't offer a way to pin tables in memory, though the community would certainly welcome people willing to work on well thought out, tested and benchmarked proposals for allowing this from people who're willing to back those proposals with real code.
The best option you have with PostgreSQL at this time is to run a separate PostgreSQL instance for the response-time-critical database. Give this DB a big enough shared_buffers that the whole DB will reside in shared_buffers.
Do NOT create a tablespace on a ramdisk or other non-durable storage and put the data that needs to be infrequently but rapidly accessed there. All tablespaces must be accessible or the whole system will stop; if you lose a tablespace, you effectively lose the whole database cluster.

Configuration Recommendations for a PostgreSQL Installation

I have a Windows Server 2003 machine which I will be using as a Postgres database server, the machine is a Dual Core 3.0Ghz Xeon with 4 GB ECC Memory and 4 x 120GB 10K RPM SAS Drives, all stripped.
I have read that the default Postgres install is configured to run nicely on a 486 with 32MB RAM, and I have read several web pages about configuration optimizations - but was hoping for something more concrete from my Stackoverflow peeps.
Generally, its only going to serve 1 database (potentially one or two more) but the catch is that the database has 1 table in particular which is massive (hundreds of millions of records with only a few coloumn). Presently, with the default configuration, it's not slow, but I think it could potentially be even faster.
Can people please give me some guidance and recomendations for configuration settings which you would use for a server such as this.
4*stripped drive was a bad idea — if any of this drives will fail then you'll loose all data, and even SAS drives sometimes fail — with 4 drivers it is 4 times more likely than with 1 drive; you should go for RAID 1+0.
Use the latest version of Postgres, 8.3.7 now; there are many performance improvements in every major version.
Set shared_buffers parameter in postgresql.conf to about 1/4 of your memory.
Set effective_cache_size to about 1/2 of your memory.
Set checkpoint_segments to about 32 (checkpoint every 512MB) and checkpoint_completion_target to about 0.8.
Set default_statistics_target to about 100.
Migrate to Enterprise Linux or FreeBSD: Postgres works much better on Unix type systems — Windows support is a recent addition, not very mature.
You can read more on this page: Tuning Your PostgreSQL Server — PostgreSQL Wiki
My experience suggests that (within limits) the hardware is typically the least important factor in database performance.
Assuming that you have enough memory to keep commonly used data in cache, then your CPU speed may vary 10-50% between a top-of the line machine and a common or garden box.
However, a missing index in an important search, or a poorly written recursive trigger could easily make a difference of 1,000% or 10,000% or more in your response times.
Without knowing exactly your table structure and row counts, I think anybody would suggest that your your hardware looks amply sufficient. It is only your database structure which will kill you. :)
UPDATE:
Without knowing the specific queries and your index details, there's not much more we can do. And in general, even knowing the queries, it's often very difficult to optimize without actually installing and running the queries with realistic data sets.
Given the cost of the server, and the cost of your time, I think you need to invest thirty bucks in a book. Then install your database with test data, run the queries, and see what runs well and what runs badly. Fix, rinse, and repeat.
Both of these books are specific to SQL Server and both have high ratings:
http://www.amazon.com/Inside-Microsoft%C2%AE-SQL-Server-2005/dp/0735621969/ref=sr_1_1
http://www.amazon.com/Server-Performance-Tuning-Distilled-Second/dp/B001GAQ53E/ref=sr_1_5