GCloud SQL - unexpected disk consumption - google-cloud-sql

Ive got mysql 5.7 cloud instance in europe-west1 zone with 1vCPU and 4Gb RAM without redundancy enabled.
my db always wasnt above 30Gb, but suddenly became about 11TB.
I logged to instance and notify that DB size only 15,8Gb.
can somebody explain what happened and how i can reduce space using, which costs me 70$/day?

If you don't see an abnormal increment of binary logs while running the command SHOW BINARY LOGS; try to check the size of temporary tablespace by running the following command:
SELECT FILE_NAME, TABLESPACE_NAME, ENGINE, INITIAL_SIZE, TOTAL_EXTENTS*EXTENT_SIZE
AS TotalSizeBytes, DATA_FREE, MAXIMUM_SIZE FROM INFORMATION_SCHEMA.FILES
WHERE TABLESPACE_NAME = 'innodb_temporary'\G
You can also try to check the size of the general logs if you enabled them by running:
SELECT ROUND(SUM(LENGTH(argument)/POW(1024,2),2) AS GB from mysql.general_log;
If you want to solve this issue quickly to avoid more charges you can try to restart your instance (if temporal logs are filling your disk this will delete them) and export your database to a new instance with a smaller disk and then delete your old instance.
You can also try to contact Google Cloud support if you need further help with your Cloud SQL instance

Related

Cloud SQL - Growing each day, but not replicating

I've had a replica slave set up for about two weeks now. It has been failing replication due to configuration issues, but still growing in the size of the master each day (about 5gb a day).
Until today, binary logs were disabled. And if I go to Monitoring -> slave instance, under Backup Configuration, it says "false".
How do I determine why this is growing each day?
I noticed in monitoring in the InnoDB Pages Read/Write section, there are upticks of Write each day, but no read. But what is it writing to? The DB hasn't changed. and there are no binary logs.
I noticed in the docs, it says "Point-in-time recovery is enabled by default when you create a new Cloud SQL instance."
But there has never been a "Backup" listed in the Operations list on the instance. And when I do gcloud sql instances describe my-instance, it's not listed under backUpConfiguration
The issue you are having could possibly happen due to Point-in-time recovery, it will show an increase to your storage constantly.
There, you will be able to keep automated backups enabled while disabling point-in-time recovery. Once you disable it, the binary logs will be deleted and you will notice an immediate reduction in storage usage.
Here are the steps to disable Point-in-time recovery:
Select your instance
Select Backups
Under Settings, select Edit
Uncheck box for point-in-time recovery
To add an explanation of Point-in-time recovery, I will add Google Cloud SQL documentation with Postgres and MySQL.
It is necessary to archive the WAL files for instances it is enabled on. This archiving is done automatically on the backend and will consume storage space (even if the instance is idle), and, consequently, using this feature would cause an increased storage space on your DB instance.

RDS Serverless - Could not verify and start postgres

In the last few days, I'm having this weird issue with my Serverless Postgres RDS.
After deploying new code to the backend service the RDS server becomes unavailable, the only logs I could find are those :
Freeable Memory (MB):
The only document I found is this one, which said AWS working on fixing this issue.
Any help will be much appreciated.
As per the AWS Blog on RDS serverless best practices:
Aurora Serverless scales up when capacity constraints are seen in CPU or connections. However, finding a scaling point can take time (see the Scale-blocking operations section). If there is a sudden spike in requests, you can overwhelm the database. Aurora Serverless might not be able to find a scaling point and scale quickly enough due to a shortage of resources.
The error - Error restarting database: Unable to find shared memory value in the postgres.log file from pg_ctl getSharedMemory command ideally would replace to memory allocation issue.
The best way to handle it would be to keep a buffer/minimum higher allocation of memory while expecting a load on the server.

Google Cloud SQL - Postgresql storage keeps growing

Ive recently started tinkering with Google Cloud SQL - PostgreSQL.
I have created an empty database and over 4-5 days its storage usage has grown to over 20GB. Its just keeps going up, but there is no data in the database. Its not even being used.
Does anyone know what would be doing this and how to stop it?
Yes, this is most likely due to Point-in-time recovery which will show an increase to your storage every few minutes. You are able to keep automated backups enabled while disabling point-in-time recovery. Once you disable it the binary logs will be deleted and you will notice an immediate reduction in storage usage. That said, according to the documentation: "The binary logs are automatically deleted with their associated automatic backup, which generally happens after about 7 days."
To disable point-in-time recovery:
Select your instance
Select Backups
Under Settings select Edit
Uncheck box for point-in-time recovery
Most likely you have turned on the automated backups setting. You can confirm this by clicking the backups tab in your Cloud SQL instance. Be careful with disabling and deleting backups in case you will start using your database later!

AWS RDS with Postgres : Is OOM killer configured

We are running load test against an application that hits a Postgres database.
During the test, we suddenly get an increase in error rate.
After analysing the platform and application behaviour, we notice that:
CPU of Postgres RDS is 100%
Freeable memory drops on this same server
And in the postgres logs, we see:
2018-08-21 08:19:48 UTC::#:[XXXXX]:LOG: server process (PID XXXX) was terminated by signal 9: Killed
After investigating and reading documentation, it appears one possibility is linux oomkiller running having killed the process.
But since we're on RDS, we cannot access system logs /var/log messages to confirm.
So can somebody:
confirm that oom killer really runs on AWS RDS for Postgres
give us a way to check this ?
give us a way to compute max memory used by Postgres based on number of connections ?
I didn't find the answer here:
http://postgresql.freeideas.cz/server-process-was-terminated-by-signal-9-killed/
https://www.postgresql.org/message-id/CAOR%3Dd%3D25iOzXpZFY%3DSjL%3DWD0noBL2Fio9LwpvO2%3DSTnjTW%3DMqQ%40mail.gmail.com
https://www.postgresql.org/message-id/04e301d1fee9%24537ab200%24fa701600%24%40JetBrains.com
AWS maintains a page with best practices for their RDS service: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html
In terms of memory allocation, that's the recommendation:
An Amazon RDS performance best practice is to allocate enough RAM so
that your working set resides almost completely in memory. To tell if
your working set is almost all in memory, check the ReadIOPS metric
(using Amazon CloudWatch) while the DB instance is under load. The
value of ReadIOPS should be small and stable. If scaling up the DB
instance class—to a class with more RAM—results in a dramatic drop in
ReadIOPS, your working set was not almost completely in memory.
Continue to scale up until ReadIOPS no longer drops dramatically after
a scaling operation, or ReadIOPS is reduced to a very small amount.
For information on monitoring a DB instance's metrics, see Viewing DB Instance Metrics.
Also, that's their recommendation to troubleshoot possible OS issues:
Amazon RDS provides metrics in real time for the operating system (OS)
that your DB instance runs on. You can view the metrics for your DB
instance using the console, or consume the Enhanced Monitoring JSON
output from Amazon CloudWatch Logs in a monitoring system of your
choice. For more information about Enhanced Monitoring, see Enhanced
Monitoring
There's a lot of good recommendations there, including query tuning.
Note that, as a last resort, you could switch to Aurora, which is compatible with PostgreSQL:
Aurora features a distributed, fault-tolerant, self-healing storage
system that auto-scales up to 64TB per database instance. Aurora
delivers high performance and availability with up to 15 low-latency
read replicas, point-in-time recovery, continuous backup to Amazon S3,
and replication across three Availability Zones.
EDIT: talking specifically about your issue w/ PostgreSQL, check this Stack Exchange thread -- they had a long connection with auto commit set to false.
We had a long connection with auto commit set to false:
connection.setAutoCommit(false)
During that time we were doing a lot
of small queries and a few queries with a cursor:
statement.setFetchSize(SOME_FETCH_SIZE)
In JDBC you create a connection object, and from that connection you
create statements. When you execute the statments you get a result
set.
Now, every one of these objects needs to be closed, but if you close
statement, the entry set is closed, and if you close the connection
all the statements are closed and their result sets.
We were used to short living queries with connections of their own so
we never closed statements assuming the connection will handle the
things once it is closed.
The problem was now with this long transaction (~24 hours) which never
closed the connection. The statements were never closed. Apparently,
the statement object holds resources both on the server that runs the
code and on the PostgreSQL database.
My best guess to what resources are left in the DB is the things
related to the cursor. The statements that used the cursor were never
closed, so the result set they returned never closed as well. This
meant the database didn't free the relevant cursor resources in the
DB, and since it was over a huge table it took a lot of RAM.
Hope it helps!
TLDR: If you need PostgreSQL on AWS and you need rock solid stability, run PostgreSQL on EC2 (for now) and do some kernel tuning for overcommitting
I'll try to be concise, but you're not the only one who has seen this and it is a known (internal to Amazon) issue with RDS and Aurora PostgreSQL.
OOM Killer on RDS/Aurora
The OOM killer does run on RDS and Aurora instances because they are backed by linux VMs and OOM is an integral part of the kernel.
Root Cause
The root cause is that the default Linux kernel configuration assumes that you have virtual memory (swap file or partition), but EC2 instances (and the VMs that back RDS and Aurora) do not have virtual memory by default. There is a single partition and no swap file is defined. When linux thinks it has virtual memory, it uses a strategy called "overcommitting" which means that it allows processes to request and be granted a larger amount of memory than the amount of ram the system actually has. Two tunable parameters govern this behavior:
vm.overcommit_memory - governs whether the kernel allows overcommitting (0=yes=default)
vm.overcommit_ratio - what percent of system+swap the kernel can overcommit. If you have 8GB of ram and 8GB of swap, and your vm.overcommit_ratio = 75, the kernel will grant up to 12GB or memory to processes.
We set up an EC2 instance (where we could tune these parameters) and the following settings completely stopped PostgreSQL backends from getting killed:
vm.overcommit_memory = 2
vm.overcommit_ratio = 75
vm.overcommit_memory = 2 tells linux not to overcommit (work within the constraints of system memory) and vm.overcommit_ratio = 75 tells linux not to grant requests for more than 75% of memory (only allow user processes to get up to 75% of memory).
We have an open case with AWS and they have committed to coming up with a long-term fix (using kernel tuning params or cgroups, etc) but we don't have an ETA yet. If you are having this problem, I encourage you to open a case with AWS and reference case #5881116231 so they are aware that you are impacted by this issue, too.
In short, if you need stability in the near term, use PostgreSQL on EC2. If you must use RDS or Aurora PostgreSQL, you will need to oversize your instance (at additional cost to you) and hope for the best as oversizing doesn't guarantee you won't still have the problem.

MongoDB - Quota exceeded error code: 12501

I'm using MongoDB's Sandbox (Version: 3.2.12) database and deployment is on mLab. I have a strange issue while adding/inserting the records (From shell and also from the application).
I have not set any quota limits to the database files.
Error Message:
"error message: quota exceeded error code: 12501”
But i have sufficient memory in database and i can able to add/insert records in other collection apart from “xyz” collection.
However, i have removed some records from "xyz" collection to free up memory and tried again to insert records. E.g. Removed 2 records in collection and after that only 2 records gets inserted.
Can i add more files to my quota, if yes how?
Is there any way to debug this? Or any other solution without dropping database?
Please try repairing your database.
Sometimes it’s necessary to compact your database in order to reclaim disk space (e.g., are you quickly approaching your storage limits?) and/or reduce fragmentation. When you compact your database, you are effectively reducing its file size.
Compacting Sandbox and single-node plan deployments
If you are on a Sandbox or single-node plan and would like to try to
reclaim disk space, you can use MongoDB’s repairDatabase command.
If your fileSize or “Size on Disk” is under 1.5 GB, you can run this
repair command directly through our UI by visiting the page for your
database, clicking on the “Tools” tab next, and then selecting
“repairDatabase” from the drop-down list. Otherwise, you can run the
db.repairDatabase() command after connecting to your database with the
mongo shell.
https://docs.mlab.com/ops/#compacting
The repairDatabase command is a blocking operation. Your database will be unavailable until the repair is complete.
It may take around 20-30 seconds.