Amazon Redshift public database - amazon-redshift

We would like to test some code against Redshift jdbc driver without the hassle to create our own instance.
Is there a public instace of Redshift available for developement testing ?

Public instance is not available.
You can create test cluster ( single node) using smallest possible box ( dw2.large $0.25 per hour ) and test your code with cluster and terminate it once done.
S

With some limitations, Redshift is basically a Postgres database, so for basic development you can use this instead.

Redshift does not have a public instance. But they do provide 2 months of trial period free. You can get 750 hours of usage with node type DC1.Large.
However, there is a site where you can play around with different databases (but Redshift has not been added there yet):
http://sqlfiddle.com/

Related

How to truncate the MySQL performance_schema in Google Cloud SQL without restarting DB instance?

According to the MySQL documentation we can truncate the performance_schema with the help of the following call:
CALL sys.ps_truncate_all_tables(FALSE);
Internally this procedure is coded like follows. It actually executes TRUNCATE TABLE statements against a list of tables obtained with the masks '%summary%' and '%history%'.
The problem is that root user isn't able to perform the TRUNCATE TABLE statement on the performance_schema database in Google Cloud SQL due to superuser restrictions.
mysql> truncate table performance_schema.events_statements_summary_by_digest ;
ERROR 1227 (42000): Access denied; you need (at least one of)
the SUPER privilege(s) for this operation
I didn't find any Cloud SQL Admin API or other method to do this.
Any advice, how to reset the MySQL performance_schema in Google Cloud SQL without restarting DB instance.
UPD. I have found that it does not work for MySQL 5.7 but works well for MySQL 8.0 in Google Cloud SQL. So that if you can migrate your Google DB instance to MySQL 8.0 it would be workaround.
Currently there are no workarounds other than restarting the DB instances. And there is already a feature request raised for the same. You can +1 and CC yourself in the request to show interest in this being implemented and receive an email in case there are any updates.
In case you want to use performance_schema for database sql query analysis, as an alternative, you can use CloudSql Query Insights. Query insights helps you detect, diagnose, and prevent query performance problems for Cloud SQL databases. It supports intuitive monitoring and provides diagnostic information that helps you go beyond detection to identify the root cause of performance problems.
Or you can contact Google support, Product Team may offer up the stored procedure solution.

Can I have multi schema DB on a single AWS Aurora PostgreSQL instance?

We're working on a multi-tenant SaaS solution that has 4 different subscription Plans, so we want to have their data isolated in a single cluster DB (everybody sharing a single logical instance) but with a different schema per tenant (plan), which seems to be better for us from a cost point of view. How should we create it?
Any clue will be welcome!

streaming PostgreSQL tables into Google BigQuery

I would like to automatically stream data from an external PostgreSQL database into a Google Cloud Platform BigQuery database in my GCP account. So far, I have seen that one can query external databases (MySQL or PostgreSQL) with the EXTERNAL_QUERY() function, e.g.:
https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries
But for that to work, the database has to be in GCP Cloud SQL. I tried to see what options are there for streaming from the external PostgreSQL into a Cloud SQL PostgreSQL database, but I could only find information about replicating it in a one time copy, not streaming:
https://cloud.google.com/sql/docs/mysql/replication/replication-from-external
The reason why I want this streaming into BigQuery is that I am using Google Data Studio to create reports from the external PostgreSQL, which works great, but GDS can only accept SQL query parameters if it comes from a Google BigQuery database. E.g. if we have a table with 1M entries, and we want a Google Data Studio parameter to be added by the user, this will turn into a:
SELECT * from table WHERE id=#parameter;
which means that the query will be faster, and won't hit the 100K records limit in Google Data Studio.
What's the best way of creating a connection between an external PostgreSQL (read-only access) and Google BigQuery so that when querying via BigQuery, one gets the same live results as querying the external PostgreSQL?
Perhaps you missed the options stated on the google cloud user guide?
https://cloud.google.com/sql/docs/mysql/replication/replication-from-external#setup-replication
Notice in this section, it says:
"When you set up your replication settings, you can also decide whether the Cloud SQL replica should stay in-sync with the source database server after the initial import is complete. A replica that should stay in-sync is online. A replica that is only updated once, is offline."
I suspect online mode is what you are looking for.
What you are looking for will require some architecture design based on your needs and some coding. There isn't a feature to automatically sync your PostgreSQL database with BigQuery (apart from the EXTERNAL_QUERY() functionality that has some limitations - 1 connection per db - performance - total of connections - etc).
In case you are not looking for the data in real time, what you can do is with Airflow for instance, have a DAG to connect to all your DBs once per day (using KubernetesPodOperator for instance), extract the data (from past day) and loading it into BQ. A typical ETL process, but in this case more EL(T). You can run this process more often if you cannot wait one day for the previous day of data.
On the other hand, if streaming is what you are looking for, then I can think on a Dataflow Job. I guess you can connect using a JDBC connector.
In addition, depending on how you have your pipeline structure, it might be easier to implement (but harder to maintain) if at the same moment you write to your PostgreSQL DB, you also stream your data into BigQuery.
Not sure if you have tried this already, but instead of adding a parameter, if you add a dropdown filter based on a dimension, Data Studio will push that down to the underlying Postgres db in this form:
SELECT * from table WHERE id=$filter_value;
This should achieve the same results you want without going through BigQuery.

Creating a high available and heavy usable Database

Currently, I have an application consisting of a backend, frontend, and database. The Postgres database has a table with around 60 million rows.
This table has a foreign key to another table: categories. So, if want to count—I know it's one of the slowest operations in a DB—every row from a specific category, on my current setup this will result in a 5-minute query. Currently, the DB, backend, and frontend a just running on a VM.
I've now containerized the backend and the frontend and I want to spin them up in Google Kubernetes Engine.
So my question, will the performance of my queries go up if you also use a container DB and let Kubernetes do some load balancing work, or should I use Google's Cloud SQL? Does anyone have some experience in this?
will the performance of my queries go up if you also use a container DB
Raw performance will only go up if the capacity of the nodes (larger nodes) is larger than your current node. If you use the same node as a kubernetes node it will not go up. You won't get benefits from containers in this case other than maybe updating your DB software might be a bit easier if you run it in Kubernetes. There are many factors that are in play here, including what disk you use for your storage. (SSD, magnetic, clustered filesystem?).
Say if your goal is to maximize resources in your cluster by making use if that capacity when say not many queries are being sent to your database then Kubernetes/containers might be a good choice. (But that's not what the original question is)
should I use Google's Cloud SQL
The only reason I would use Cloud SQL is that if you want to offload managing your SQL db. Other than that you'll get similar performance numbers than running in the same size instance on GCE.

Reasons why Cloud SQL would crash approximately every 24 hours

I'm consistently seeing less than 24 hours uptime on my cloud SQL instance, despite having it set to always on.
It looks like the instance is crashing as I keep losing new users added (and the mysql users table is in myisam so can't recover from a crash as my indoor data tables can).
Is there a problem with Cloud SQL causing this or is it likely to be something with my configuration?
InnoDB is the Google recommended database engine for Cloud SQL.
If MyISAM is a requirement for you, I´d use MySQL on a Google Compute Engine.
Check:
https://cloud.google.com/sql/faq#innodb
and
https://cloud.google.com/sql/docs/launch-checklist