Reasons why Cloud SQL would crash approximately every 24 hours - google-cloud-sql

I'm consistently seeing less than 24 hours uptime on my cloud SQL instance, despite having it set to always on.
It looks like the instance is crashing as I keep losing new users added (and the mysql users table is in myisam so can't recover from a crash as my indoor data tables can).
Is there a problem with Cloud SQL causing this or is it likely to be something with my configuration?

InnoDB is the Google recommended database engine for Cloud SQL.
If MyISAM is a requirement for you, I´d use MySQL on a Google Compute Engine.
Check:
https://cloud.google.com/sql/faq#innodb
and
https://cloud.google.com/sql/docs/launch-checklist

Related

How to truncate the MySQL performance_schema in Google Cloud SQL without restarting DB instance?

According to the MySQL documentation we can truncate the performance_schema with the help of the following call:
CALL sys.ps_truncate_all_tables(FALSE);
Internally this procedure is coded like follows. It actually executes TRUNCATE TABLE statements against a list of tables obtained with the masks '%summary%' and '%history%'.
The problem is that root user isn't able to perform the TRUNCATE TABLE statement on the performance_schema database in Google Cloud SQL due to superuser restrictions.
mysql> truncate table performance_schema.events_statements_summary_by_digest ;
ERROR 1227 (42000): Access denied; you need (at least one of)
the SUPER privilege(s) for this operation
I didn't find any Cloud SQL Admin API or other method to do this.
Any advice, how to reset the MySQL performance_schema in Google Cloud SQL without restarting DB instance.
UPD. I have found that it does not work for MySQL 5.7 but works well for MySQL 8.0 in Google Cloud SQL. So that if you can migrate your Google DB instance to MySQL 8.0 it would be workaround.
Currently there are no workarounds other than restarting the DB instances. And there is already a feature request raised for the same. You can +1 and CC yourself in the request to show interest in this being implemented and receive an email in case there are any updates.
In case you want to use performance_schema for database sql query analysis, as an alternative, you can use CloudSql Query Insights. Query insights helps you detect, diagnose, and prevent query performance problems for Cloud SQL databases. It supports intuitive monitoring and provides diagnostic information that helps you go beyond detection to identify the root cause of performance problems.
Or you can contact Google support, Product Team may offer up the stored procedure solution.

streaming PostgreSQL tables into Google BigQuery

I would like to automatically stream data from an external PostgreSQL database into a Google Cloud Platform BigQuery database in my GCP account. So far, I have seen that one can query external databases (MySQL or PostgreSQL) with the EXTERNAL_QUERY() function, e.g.:
https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries
But for that to work, the database has to be in GCP Cloud SQL. I tried to see what options are there for streaming from the external PostgreSQL into a Cloud SQL PostgreSQL database, but I could only find information about replicating it in a one time copy, not streaming:
https://cloud.google.com/sql/docs/mysql/replication/replication-from-external
The reason why I want this streaming into BigQuery is that I am using Google Data Studio to create reports from the external PostgreSQL, which works great, but GDS can only accept SQL query parameters if it comes from a Google BigQuery database. E.g. if we have a table with 1M entries, and we want a Google Data Studio parameter to be added by the user, this will turn into a:
SELECT * from table WHERE id=#parameter;
which means that the query will be faster, and won't hit the 100K records limit in Google Data Studio.
What's the best way of creating a connection between an external PostgreSQL (read-only access) and Google BigQuery so that when querying via BigQuery, one gets the same live results as querying the external PostgreSQL?
Perhaps you missed the options stated on the google cloud user guide?
https://cloud.google.com/sql/docs/mysql/replication/replication-from-external#setup-replication
Notice in this section, it says:
"When you set up your replication settings, you can also decide whether the Cloud SQL replica should stay in-sync with the source database server after the initial import is complete. A replica that should stay in-sync is online. A replica that is only updated once, is offline."
I suspect online mode is what you are looking for.
What you are looking for will require some architecture design based on your needs and some coding. There isn't a feature to automatically sync your PostgreSQL database with BigQuery (apart from the EXTERNAL_QUERY() functionality that has some limitations - 1 connection per db - performance - total of connections - etc).
In case you are not looking for the data in real time, what you can do is with Airflow for instance, have a DAG to connect to all your DBs once per day (using KubernetesPodOperator for instance), extract the data (from past day) and loading it into BQ. A typical ETL process, but in this case more EL(T). You can run this process more often if you cannot wait one day for the previous day of data.
On the other hand, if streaming is what you are looking for, then I can think on a Dataflow Job. I guess you can connect using a JDBC connector.
In addition, depending on how you have your pipeline structure, it might be easier to implement (but harder to maintain) if at the same moment you write to your PostgreSQL DB, you also stream your data into BigQuery.
Not sure if you have tried this already, but instead of adding a parameter, if you add a dropdown filter based on a dimension, Data Studio will push that down to the underlying Postgres db in this form:
SELECT * from table WHERE id=$filter_value;
This should achieve the same results you want without going through BigQuery.

Sync Elasticsearch Postgresql on a Springboot application

I have Postgresql as my primary database and I would like to take advantage of the Elasticsearch as a search engine for my SpringBoot application.
Problem: The queries are quite complex and with millions of rows in each table, most of the search queries are timing out.
Partial solution: I utilized the materialized views concept in the Postgresql and have a job running that refreshes them every X minutes. But on systems with huge amounts of data and with other database transactions (especially writes) in progress, the views tend to take long times to refresh (about 10 minutes to refresh 5 views). I realized that the current views are at it's capacity and I cannot add more.
That's when I started exploring other options just for the search and landed on Elasticsearch and it works great with the amount of data I have. As a POC, I used the Logstash's Jdbc input plugin but then it doesn't support the DELETE operation (bummer).
From here the soft delete is the option which I cannot take because:
A) Almost all the tables in the postgresql DB are updated every few minutes and some of them have constraints on the "name" key which in this case will stay until a clean-up job runs.
B) Many tables in my Postgresql Db are referenced with CASCADE DELETE and it's not possible for me to update 220 table's Schema and JPA queries to check for the soft delete boolean.
The same question mentioned in the link above also provides PgSync that syncs the postgresql with elasticsearch periodically. However, I cannot go with that either since it has LGPL license which is forbidden in our organization.
I'm starting to wonder if anyone else encountered this strange limitation of elasticsearch and RDMS.
I'm open to other options rather than elasticsearch to solve my need. I just don't know what's the right stack to use. Any help here is much appreciated!

Migration from Google cloud sql to datastore

Is there an easy way to migrate a large (100G) Google cloud sql database to Google datastore?
The way that comes to mind is to write a python appengine script for each database and table and then put it into the datastore. That sounds tedious but maybe it has to be done?
Side note, the reason I'm leaving cloud sql is because I have jsp pages with multiple queries on them and they are incredibly slow even with a d32 sql instance. I hope that putting it in the datastore will be faster?
There seems to be a ton of questions about moving away from the datastore to cloud sql, but I couldn't find this one.
Thanks
Here are a few options:
Write an App Engine mapreduce [1] program that pulls data in appropriate chunks from Cloud SQL and write is to Datastore .
Spin up a VM on Google Compute Engine and write a program that fetches the data from Cloud SQL and write to Datastore using the Datastore external API [2].
Use the Datastore restore [3]. I'm not familiar with the format so I don't know how much work is to get produce something that the restore will accept.
[1] https://cloud.google.com/appengine/docs/python/dataprocessing/
[2] https://cloud.google.com/datastore/docs/apis/overview
[3] https://cloud.google.com/appengine/docs/adminconsole/datastoreadmin?csw=1#restoring_data
I wrote a couple scripts that do this running on compute engine.
The gcp datastore api
import googledatastore
Here is the code:
https://gist.github.com/nburn42/d8b488da1d2dc53df63f4c4a32b95def
And the dataflow api
from apache_beam.io.gcp.datastore.v1.datastoreio import WriteToDatastore
Here is the code:
https://gist.github.com/nburn42/2c2a06e383aa6b04f84ed31548f1cb09
I get a quota exceeded though after it hits 100,000 entities, and I have to wait another day to do another set.
Hopefully these are useful to someone with a smaller database than me.
( The quota problem is here
Move data from Google Cloud-SQL to Cloud Datastore )

Migrating Azure Table storage

I have a cloud native app based on azure. The app uses azure table storage.
Due to a fantastic opportunity I have decided to also provide the app on-premises. So I have to replace the NoSql data provider... my question is: Which solution is more alike Azure Table Storage? Mongo? Raven? you name it!
What I intend is to migrate the code effortlessly, like migrating from SQL Azure to Sql Server 2012... no code change needed... but I know that theres no equivalent for table storage... so I intend to find the one that will reduce my TTM as much as possible...
MongoDB and Table Storage are not exactly swappable replacements for each other. One is key/value, the other is document. I compared the two in this answer.
There's no getting around the fact that Table Storage is Storage-as-a-Service and you only pay for quantity of data (plus a very small per-transaction cost), whereas to work with MongoDB, you'd either have to host it in your own VMs (which gives you plenty of storage room, but at the expense of VMs) or work with a hoster (such as MongoLab, which offers 500MB for free currently). Regardless, you'd have do do some code changes to work with MongoDB over Table Storage.
I'm not sure if there exists a key/value store equivalent to Table Storage that's locally-installable. No matter what you pick, you'll have modifications on your Azure-side solution if you swap out Table Storage.
Is it possible, for your on-premises solution, to provide a MongoDB backend that stays relatively simple? That is: Stick with a single index to substitute for rowkey, and then store your table entities as documents (avoiding sub-documents)? This would keep your data layout very similar. At that point, you could use things like Aggregation Framework for a bit of data processing, and not damage the overall layout style/schema of your data.
MongoDB would give you a consistent storage framework that you could use in-cloud and on-premises, and has good support for Windows Azure.