Is it safe taking an SQL export form a running production GCP SQL service? - google-cloud-sql

We have one Google Cloud SQL instance with 1 vCPU for production. I want to grab a copy of the data by exporting to a bucket. Is this safe to do? As in might it block other operations on the instance?

I think it's important to take into consideration the RDBMS that you are using, it's mentioned in here that PostgreSQL has issues when handling big blobs in an export, and at this other SO post there's an answer with the most votes with hints to have an smoother export, since it can lead to DBs getting unresponsive, which is a pretty well known fact.
In the case of MySQL, the product doc have some tips for this case in this article where it stated:
"If the server is running, it is necessary to perform appropriate locking so that the server does not change database contents during the backup"
And you can achive this by using mysqldump --lock-tables=false into your export command.

Related

Is there a way to show everything that was changed in a PostgreSQL database during a transaction?

I often have to execute complex sql scripts in a single transaction on a large PostgreSQL database and I would like to verify everything that was changed during the transaction.
Verifying each single entry on each table "by hand" would take ages.
Dumping the database before and after the script to plain sql and using diff on the dumps isn't really an option since each dump would be about 50G of data.
Is there a way to show all the data that was added, deleted or modified during a single transaction?
Dude, What are you looking for is the most searchable thing on the internet when it comes to capturing Database changes. It is a kind of version control we can say.
But as long as I know, sadly there are no in-built approaches are available in PostgreSQL or MySql. But you can overcome it by setting/adding some triggers for your most usable operations.
You can create some backup schemas, and tables to capture your changes that are changed(updated), created, or deleted.
In this way you can achieve what you want. I know this process is fully manual, But really effective.
If you need to analyze the script's behaviour only sporadically, then the easiest approach would be to change server configuration parameter log_min_duration_statement to 0 and then back to any value it had before the analysis. Then all of the script activity will be written to the instance log.
This approach is not suitable if your storage is not prepared to accommodate this amount of data, or for systems in which you don't want sensitive client data to be written to a plain-text log file.

Postgresql tuning to be used as DatawareHouse

I am assembling a Business Intelligence solution using the Pentaho software as a BI engine. Within this solution, I had to set up a requirements for a PostgreSQL database server.
The current situation is very easy, since no ETL process is being carried out for data extraction, so the PostgreSQL configuration has not changed it much, and it is practically as it is configured as "factory".
I would like to know what Postgres configuration parameters have to be touched and modified to optimize it as a Datawarehouse. I have seen a lot of documentation, but it is not clear to me at all, since one documentation says that such values have to be modified, and other documentation, other completely different values.
I would like to know just that, if there is a clearer and more precise documentation to optimize a postgres 9.6 to be used as a Pentaho DW.
Thank you very much

Running Maintenance Plan Wizard

I've upgraded a server from SQL Server 2005 to SQL Server 2008 but the database runs slower when running certain stored procedures especially against records which contain more data than others.
It's been suggested that I run a basic reindex to see if this resolves.
Can someone take a look at the screenshot and advise if this will remove any data from my database - if so then this isn't the right thing to do.
Thanks James
p.s I will now attach a screen-shot if I can as not done that before using this Forum
Those actions won't remove any data from the database, but generally I wouldn't advise trying to shrink the database unless you really need the space as this can cause more fragmentation of indexes. The only options that you have ticked there that have the ability to improve performance are the rebuild/reorganise indexes and the update statistics options.
Rather than maintenance plans though I would generally recommend using Ola Hallengren's DB maintenance scripts though as they offer more flexibility and are generally a lot better than these plans:
Ola Hallengren - SQL Server Maintenance Solution

Data mining with postgres in production environment - is there a better way?

There is a web application which is running for a years and during its life time the application has gathered a lot of user data. Data is stored in relational DB (postgres). Not all of this data is needed to run application (to do the business). However form time to time business people ask me to provide reports of this data data. And this causes some problems:
sometimes these SQL queries are long running
quires are executed against production DB (not cool)
not so easy to deliver reports on weekly or monthly base
some parts of data is stored in way which is not suitable for such
querying (queries are inefficient)
My idea (note that I am a developer not the data mining specialist) how to improve this whole process of delivering reports is:
create separate DB which regularly is update with production data
optimize how data is stored
create a dashboard to present reports
Question: But is there a better way? Is there another DB which better fits for such data analysis? Or should I look into modern data mining tools?
Thanks!
Do you really do data mining (as in: classification, clustering, anomaly detection), or is "data mining" for you any reporting on the data? In the latter case, all the "modern data mining tools" will disappoint you, because they serve a different purpose.
Have you used the indexing functionality of Postgres well? Your scenario sounds as if selection and aggregation are most of the work, and SQL databases are excellent for this - if well designed.
For example, materialized views and triggers can be used to process data into a scheme more usable for your reporting.
There are a thousand ways to approach this issue but I think that the path of least resistance for you would be postgres replication. Check out this Postgres replication tutorial for a quick, proof-of-concept. (There are many hits when you Google for postgres replication and that link is just one of them.) Here is a link documenting streaming replication from the PostgreSQL site's wiki.
I am suggesting this because it meets all of your criteria and also stays withing the bounds of the technology you're familiar with. The only learning curve would be the replication part.
Replication solves your issue because it would create a second database which would effectively become your "read-only" db which would be updated via the replication process. You would keep the schema the same but your indexing could be altered and reports/dashboards customized. This is the database you would query. Your main database would be your transactional database which serves the users and the replicated database would serve the stakeholders.
This is a wide topic, so please do your diligence and research it. But it's also something that can work for you and can be quickly turned around.
If you really want try Data Mining with PostgreSQL there are some tools which can be used.
The very simple way is KNIME. It is easy to install. It has full featured Data Mining tools. You can access your data directly from database, process and save it back to database.
Hardcore way is MADLib. It installs Data Mining functions in Python and C directly in Postgres so you can mine with SQL queries.
Both projects are stable enough to try it.
For reporting, we use non-transactional (read only) database. We don't care about normalization. If I were you, I would use another database for reporting. I will desing the tables following OLAP principals, (star schema, snow flake), and use an ETL tool to dump the data periodically (may be weekly) to the read only database to start creating reports.
Reports are used for decision support, so they don't have to be in realtime, and usually don't have to be current. In other words it is acceptable to create report up to last week or last month.

Upsert in Amazon RedShift without Function or Stored Procedures

As there is no support for user defined functions or stored procedures in RedShift, how can i achieve UPSERT mechanism in RedShift which is using ParAccel, a PostgreSQL 8.0.2 fork.
Currently, i'm trying to achieve UPSERT mechanism using IF...THEN...ELSE... statement
e.g:-
IF NOT EXISTS(SELECT...WHERE(SELECT..))
THEN INSERT INTO tblABC() SELECT... FROM tblXYZ
ELSE UPDATE tblABC SET.,.,.,. FROM tblXYZ WHERE...
which is giving me error. As i'm writing this code independently without including it in function or SP's.
So, is there any solution to achieve UPSERT.
Thanks
You should probably read this article on upsert by depesz. You can't rely on SERIALIABLE for this since, AFAIK, ParAccel doesn't support full serializability support like in Pg 9.1+. As outlined in that post, you can't really do what you want purely in the DB anyway.
The short version is that even on current PostgreSQL versions that support writable CTEs it's still hard. On an 8.0 based ParAccel, you're pretty much out of luck.
I'd do a staged merge. COPY the new data to a temporary table on the server, LOCK the destination table, then do an UPDATE ... FROM followed by an INSERT INTO ... SELECT. Doing the data uploads in big chunks and locking the table for the upserts is reasonably in keeping with how Redshift is used anyway.
Another approach is to externally co-ordinate the upserts via something local to your application cluster. Have all your tools communicate via an external tool where they take an "insert-intent lock" before doing an insert. You want a distributed locking tool appropriate to your system. If everything's running inside one application server, it might be as simple as a synchronized singleton object.