Cloud SQL disk size is much larger than actual database - google-cloud-sql

Cloud SQL reports that I've used ~4TB of SSD storage, but my database is only ~225 GB. What explains this discrepancy? Is there something I can delete to free up space? If I moved it to a different instance, would the required storage go down?

There are a couple of options about why your Cloud SQL storage has increase:
-Did you enable Point-in-time recovery? PITR uses write-ahead logs and if you enabled this feature, that could be the reason why of your increases.
-Have you used temporary tables and you have not deleted them?
If none of the above applies to you, I highly recommend you to open a case with GCP support team so that they take a look at your Cloud SQL instance.
On the other hand, you should open a case to decrease the disk size to a smaller one so it won’t be necessary to create a new instance and copy all the data to that new instance in addition that shrinking the disk is done at Google's end making the effort from you the lowest possible.
A maintenance window can be scheduled where Google can proceed with this task and you may want to schedule a maintenance window to minimize the impact of the downtime. For this case it is necessary to know the new disk size and when you would like to perform this operation.
Finally, if you prefer to use the migration method, you should export the DB, then create the new instance, import the DB and synchronize the old one with the new one to have all the data in both instances to which can take several hours to complete those four steps.

You do not specify what kind of database. In my case, for a MySQL database, there were several hundred GB as binary logs (mysql flag).
You could check with:
SHOW BINARY LOGS;

Related

Google Cloud SQL - move instance from one project to another with zero downtime?

What is the easiest way to move Google Cloud SQL instance(Postgres 9.6) from one google project to another with minimum or zero downtime? Instance size is about 20 GB
There is a service called "Migration job" which looks very relevant https://cloud.google.com/database-migration/docs/postgres/create-migration-job . But I cannot understand whether it can be used to move instance from one google project to another.
Simple restoring from backup is not really my case because I want to achieve minimum possible downtime, so I'm looking for something like 2 running instances with the synced real-time data.
PS. I also have configured VM with pgbouncer
Yes, Database Migration Service could be used to move one Cloud SQL instance from one GCP project to another. This is a cheaper way than the next approach, and although it requires more setup, it should be faster too. A connection profile can be created for the existing Cloud SQL instance, and a Cloud SQL target must be created in the destination project, but once everything is set up, most of the migration will be automatic. This is a well documented procedure, of which you can find information in our documentation.
Developers sometimes want to migrate their (normal) relational database with “zero” downtime. While downtime can be reduced, migration cannot be done without any impact on applications (that is, zero downtime). Replication causes replication lag.
The instant the decision is made to “migrate” all applications from one replica to another, applications (and therefore developers) have to wait (that is, downtime) for at least as long as the “replication lag” before using the new database. In practice, the downtime is a few orders of magnitude higher (minutes to hours) because:
Database queries can take multiple seconds to complete, and in flight queries must be completed or aborted at the time of migration.
The database has to be “warmed up” if it has substantial buffer memory - common in large databases.
If database shards have duplicate tables, some writes may need to be paused while the shards are being migrated.
Applications must be stopped at source and restarted in GCP, and connection to the GCP database instance must be established.
Network routes to the applications must be rerouted. Based on how DNS entries are set up, this can take some time.
All of these can be reduced with some planning and “cost” (some operations not permitted for some time before/after migration).
Decreasing the load on the source DB until the migration completes might help, and downtime might be less disruptive.
Other considerations:
Increase the machine types to increase network throughput.
Increase SSD size for higher IOPS/MBps.
More about.
The most intuitive way would be to export the data from the Cloud SQL instance to a GCS bucket, and import it to a new instance in the new project. This would imply some downtime, and you would have to manually create the instance in the target project with the same configuration as the original; it does require some manual steps, but it would be a simple and verifiable way to copy the data across an instance in a different project.

Google Cloud cloud storage operation costs

I am looking into using Google Cloud cloud storage buckets as a cheaper alternative to compute engine snapshots to store backups.
However, I am a bit confused about the costs per operation. Specifically the insert operation. If I understand the documentation correctly, it doesn't seem that it matters how large the file is that you want to insert is, it always counts as 1 operation.
So if I upload a single 20 TB file using one insert to a standard storage class bucket, wait 14 days, then retrieve it again, and all this within the same region, I practically only pay for storing it for 14 days?
Doesn't that mean that even the standard storage class bucket is a more cost effective option for storing backups compared to snapshots, as long as you can get your whole thing into a single file?
It's not fully accurate, and all depends on what cost for you.
First of all, the maximum size of an object in Cloud Storage is 5 TiB, so you can't store 1 file of 20Tb, but 4, at the end, it's the same principle.
The persistent disk snapshot is a very powerful feature:
The snapshot doesn't need CPUs to be done, compared to your solution.
The snapshot doesn't need network bandwidth to be done, compared to your solution.
The snapshot can be done anytime, on the fly.
The snapshot can be restored in the current VM, or you can create a new VM with a snapshot to investigate on it, for example.
You can perform incremental snapshots saving money (cheaper than full image snapshot).
You don't need additional space on your persistent disk to be done (compared to your solution where you need to create an archive before sending it to Cloud Storage).
In your scenario seems like using snapshots seems like the best solution in terms of time efficiency. Now, is using Cloud Storage a cheaper solution? Probably, as it is listed as the most affordable storage option, but in the end, you will have to calculate the cost-benefits on your own.

AWS RDS I/O usage very different from Postgres blocks read

I created a test Postgres database in AWS RDS. Created a 100 million row, 2 column table. Ran select * on that table. Postgres reports "Buffers: shared hit=24722 read=521226" but AWS reports IOPS in the hundreds. Why this huge discrepancy? Broadly, I'm trying to figure out how to estimate the number of AWS I/O operations a query might cost.
PostgreSQL does not have insight into what the kernel/FS get up to. If PostgreSQL issues a system call to read the data, then it reports that buffer as "read". If it was actually served out of the kernel's filesystem cache, rather than truly from disk, PostgreSQL has no way of knowing that (although you can make some reasonable statistical guesses if track_io_timing is on), while AWS's IO monitoring tool would know.
If you set shared_buffers to a large fraction of memory, then there would be little room left for a filesystem cache, so most buffers reported as read should truly have been read from disk. This might not be a good way run the system, but it might provide some clarity to your EXPLAIN plans. I've also heard rumors that Amazon Aurora reimplemented the storage system so that it uses directIO, or something similar, and so doesn't use the filesystem cache at all.

dealing with full mongodb-atlas database

I have set up a free tier MongoDB-atlas database and have a script that is storing tweets on it. Using db.collection.stats() it says storage size is 32768 which will fill up quite fast. Firstly, what happens when you exceed this limit? are new entries rejected or something else? Secondly, is there a way to deal with this without upgrading? For example, is it possible to clear entries before exceeding capacity?
When you exceed the limit the atlas cluster node will have exceeded the limit will be unavailable. It may be possible that the entire cluster will go down and then you will need to contact the MongoDB support to make the cluster up.
Although the best option is this that you need to upgrade to next tier for having more storage capacity. But in case you don't want that in that case you may write a script to delete old data from your cluster and after deleting the data make sure to run the compact command to reclaim the data storage.

Safe way to backup PostgreSQL when using Persistent Disk

I’m trying to set up daily backups (using Persistent Disk snapshots) for a PostgreSQL instance I’m running on Google Compute Engine and whose data directory lives on a Persistent Disk.
Now, according to the Persistent Disk Backups blog post, I should:
stop my application (PostgreSQL)
fsfreeze my file system to prevent further modifications and flush pending blocks to disk
take a Persistent Disk snapshot
unfreeze my filesystem
start my application (PostgreSQL)
This obviously brings with it some downtime (each of the steps took from seconds to minutes in my tests) that I’d like to avoid or at least minimize.
The steps of the blog post are labeled as necessary to ensure the snapshot is consistent (I’m assuming on the filesystem level), but I’m not interested in a clean filesystem, I’m interested in being able to restore all the data that’s in my PostgreSQL instance from such a snapshot.
PostgreSQL uses fsync when committing, so all data which PostgreSQL acknowledges as committed has made its way to the disk already (fsync goes to the disk).
For the purpose of this discussion, I think it makes sense to compare a Persistent Disk snapshot without stopping PostgreSQL and without using fsfreeze with a filesystem on a disk that has just experienced an unexpected power outage.
After reading https://wiki.postgresql.org/wiki/Corruption and http://www.postgresql.org/docs/current/static/wal-reliability.html, my understanding is that all committed data should survive an unexpected power outage.
My questions are:
Is my comparison with an unexpected power outage accurate or am I missing anything?
Can I take snapshots without stopping PostgreSQL and without using fsfreeze or am I missing some side-effect?
If the answer to the above is that I shouldn’t just take a snapshot, would it be idiomatic to create another Persistent Disk, periodically use pg_dumpall(1) to dump the entire database and then snapshot that other Persistent Disk?
1) Yes, though it should be even safer to take a snapshot. The fsfreeze stuff is really to be 100% safe (anecdotally: I never use fsfreeze on my PDs and have not run into issues)
2) Yes, but there is no 100% guarantee that it will always work (paranoid solution: take a snapshot, spin up a temp VM with that snapshot, check the disk is ok, and delete the VM. This can be automated)
3) No, I would not recommend this over snapshots. It will take a lot more time, might degrade your DB performance, and what happens if something happens in the middle of a dump? Also, PDs are very expensive for incremental backups. Snapshots are diffed, so you don't have to pay for the whole disk every copy (just the first one), only the changes.
Possible recommendation:
Do #3, but then create a snapshot of the new PD and then delete the PD.
https://cloud.google.com/compute/docs/disks/persistent-disks#creating_snapshots has recently been updated and now includes this new paragraph:
If you skip this step, only data which was successfully flushed to disk by the application will be included in the snapshot. The application experiences this scenario as if it was a sudden power outage.
So the answers to my original questions are:
Yes
Yes
N/A, since the answer to ② is Yes.