How to speed up insert in cloudsql - google-cloud-sql

I can not change this global parameter in google cloud.
set global innodb_flush_log_at_trx_commit = 0
How to speed up insert in cloudSQL?
I have tried use bulk insert and any other flags. But it dos not work.

You cannot change every parameter you want for Cloud SQL since it is a Google managed service. Anyway, the innodb_flsuh_log_at_trx_commit parameter should be kept to the value 1, in general, because it helps InnoDB for being ACID compliant. If you ever try to modify this parameter you risk loosing some data in your transactions.
Going back to your issue, here you can find a set of tips for improving performance of your Cloud SQL instance.
If you really want to have full control over your databases, you can always opt for setting your databases on a Compute Engine instance.

Related

Using variables for schema and table names in a Redshift query

I want to be able to use the variable names in Redshift which refers to my DB Objects (like schema and table names). Something like...
SET my_schema="schema":
SET my_table="table";
SELECT * from #my_schema.#my_table;
But looks like Redshift doesn't have such feature. Is there any workaround possible to achieve this?
There are a few ways you try to attack this. But first trying to use a database engine for functions beyond querying the database is a waste of horsepower and the road to db lock-in. So I'm going to focus on ways to do this before the database.
The most complete way is to use a front-end system that clients connect to and then this system in turn connects to the db. The one I've used in the past is pgbounce-rr which pools connections to the the db but also allow for modifications to the SQL before being sent on. This will do what you want but you will need a computer to perform this work.
If you use Redshift data-api you could put a Lambda function in series which performs the SQL modifications you desire (but make sure you get your API permissions right). However, I expect it is unlikely that you are looking to move to an API access model.
Many benches support variable substitution and simple replacements in the SQL can be done by the bench. However, this is very dependent on which bench you use and having all users' benches configured correctly.
Bottom line - if you want something to modify your SQL do if before it goes to Redshift.

Is it safe taking an SQL export form a running production GCP SQL service?

We have one Google Cloud SQL instance with 1 vCPU for production. I want to grab a copy of the data by exporting to a bucket. Is this safe to do? As in might it block other operations on the instance?
I think it's important to take into consideration the RDBMS that you are using, it's mentioned in here that PostgreSQL has issues when handling big blobs in an export, and at this other SO post there's an answer with the most votes with hints to have an smoother export, since it can lead to DBs getting unresponsive, which is a pretty well known fact.
In the case of MySQL, the product doc have some tips for this case in this article where it stated:
"If the server is running, it is necessary to perform appropriate locking so that the server does not change database contents during the backup"
And you can achive this by using mysqldump --lock-tables=false into your export command.

How to see changes in a postgresql database

My postresql database is updated each night.
At the end of each nightly update, I need to know what data changed.
The update process is complex, taking a couple of hours and requires dozens of scripts, so I don't know if that influences how I could see what data has changed.
The database is around 1 TB in size, so any method that requires starting a temporary database may be very slow.
The database is an AWS instance (RDS). I have automated backups enabled (these are different to RDS snapshots which are user initiated). Is it possible to see the difference between two RDS automated backups?
I do not know if it is possible to see difference between RDS snapshots. But in the past we tested several solutions for similar problem. Maybe you can take some inspiration from it.
Obvious solution is of course auditing system. This way you can see in relatively simply way what was changed. Depending on granularity of your auditing system down to column values. Of course there is impact on your application due auditing triggers and queries into audit tables.
Another possibility is - for tables with primary keys you can store values of primary key and 'xmin' and 'ctid' hidden system columns (https://www.postgresql.org/docs/current/static/ddl-system-columns.html) for each row before updated and compare them with values after update. But this way you can identify only changed / inserted / deleted rows but not changes in different columns.
You can make streaming replica and set replication slots (and to be on the safe side also WAL log archiving ). Then stop replication on replica before updates and compare data after updates using dblink selects. But these queries can be very heavy.

How to clear monitoring statistics in IBM DB2 9.7

I am monitoring query information on my IBM DB2 9.7 such as how long some queries take to execute. But how do I reset this information and clear the monitors? Apparently they are reset when the whole DB instance is reset, but this forces all connections to close also on other databases in this instance (not good). Any ideas on how to reset the monitor statistics only on a particular DB? Thanks.
That is correct, these monitors cannot be reset en DB2 V9.7, however you can simulate a reset by following the steps of this article: http://www.ibm.com/developerworks/data/library/techarticle/dm-1009db2monitoring1/
You create a set of objects that keep track of the values, and when you want to reset, they keep the value at that time, and then they just give the difference between the value stored and the most recent value.

Upsert in Amazon RedShift without Function or Stored Procedures

As there is no support for user defined functions or stored procedures in RedShift, how can i achieve UPSERT mechanism in RedShift which is using ParAccel, a PostgreSQL 8.0.2 fork.
Currently, i'm trying to achieve UPSERT mechanism using IF...THEN...ELSE... statement
e.g:-
IF NOT EXISTS(SELECT...WHERE(SELECT..))
THEN INSERT INTO tblABC() SELECT... FROM tblXYZ
ELSE UPDATE tblABC SET.,.,.,. FROM tblXYZ WHERE...
which is giving me error. As i'm writing this code independently without including it in function or SP's.
So, is there any solution to achieve UPSERT.
Thanks
You should probably read this article on upsert by depesz. You can't rely on SERIALIABLE for this since, AFAIK, ParAccel doesn't support full serializability support like in Pg 9.1+. As outlined in that post, you can't really do what you want purely in the DB anyway.
The short version is that even on current PostgreSQL versions that support writable CTEs it's still hard. On an 8.0 based ParAccel, you're pretty much out of luck.
I'd do a staged merge. COPY the new data to a temporary table on the server, LOCK the destination table, then do an UPDATE ... FROM followed by an INSERT INTO ... SELECT. Doing the data uploads in big chunks and locking the table for the upserts is reasonably in keeping with how Redshift is used anyway.
Another approach is to externally co-ordinate the upserts via something local to your application cluster. Have all your tools communicate via an external tool where they take an "insert-intent lock" before doing an insert. You want a distributed locking tool appropriate to your system. If everything's running inside one application server, it might be as simple as a synchronized singleton object.