Recommended way of invoking pg_repack on google cloud - postgresql

We have installed pg_repack on our postgresql database.
What is the best way to periodically invoke the pg_repack command using GCP infrastructure?
We tried running it using Cloud Run, but the 1 hour time limit often means that it times out before it can finish.
When it times out, we face the following error on subsequent runs:
WARNING: the table "public.<table name>" already has a trigger called "repack_trigger"
DETAIL: The trigger was probably installed during a previous attempt to run pg_repack on the table which was interrupted and for some reason failed to clean up the temporary objects. Please drop the trigger or drop and recreate the pg_repack extension altogether to remove all the temporary objects left over.
Which forces us to manually recreate the extension.
What is the easiest way to schedule pg_repack without the fear of it timing out? Alternatively, is there a way to gracefully shut down pg_repack, so that we can retry without having to recreate the extension?

This pg_repack command takes a lot to complete when you use Cloud Run .
The best and easiest approach is to work with Google Cloud Functions or Google Cloud Scheduler to schedule the pg_repack command to run smoothly.
Without the risk of timing out by using Cloud Scheduler to trigger the cloud function at the required interval( such as once a day) run periodically.
Below the code, the repack_table_felix function is used in this code to repack the designated table. The force_inplace=true parameter forces the repack to take place in-place, which might be faster but uses up more disc space. This parameter can be changed to suit your unique use case.
The "repack_trigger" handles the exception error messages, This will allow the function to exit gracefully without leaving any object behind.
import psycopg2
def repack_database(request):
conn = psycopg2.connect(host="<host name>", dbname="<database name>", user="<username>", password="<password>")
cur = conn.cursor()
try:
cur.execute("SELECT pg_repack.repack_table_felix('<your table name>', force_inplace=true)")
conn.commit()
except Exception as e:
conn.rollback()
print(e)
finally:
cur.close()
conn.close()
return "pg_repack completed successfully"

Related

Cannot execute DROP EXTENSION in a read-only transaction (drop extension if exists google_insights)

I'm using Google Cloud SQL (Postgres) and created read replica for my DB.
Now I see in logs such an error:
2021-01-16 12:02:46.393 UTC [93149]: [9-1] db=cloudsqladmin,user=cloudsqladmin ERROR: cannot execute DROP EXTENSION in a read-only transaction
2021-01-16 12:02:46.393 UTC [93149]: [10-1] db=cloudsqladmin,user=cloudsqladmin STATEMENT: drop extension if exists google_insights;
These errors repeat constantly - exactly 120 errors every single hour.
As I understand the Google Cloud tries to drop some of its custom extensions for Postgres and can't do that because replica is read only.
Does anyone know why it happens and how to fix that?
The error message is caused by an issue with the Query Insight feature (in order to avoid getting this error message, simply avoid enabling the Query Insight feature when creating the master and the read replica).
I created the following issue on your behalf that I recommend you to star and follow to check all the relevant updates from the Cloud SQL product team.

JDBC connection lost while UNLOADing from Redshift to S3. What should happen?

Reshift newbie here - greetings!
I am trying to unload data to S3 from Redshift, using a java program running locally which issues an UNLOAD statement over a JDBC connection. At some point the JDBC connection appears lost on my end (exception caught).
However, looking at the S3 location, it seems that the unload runs to completion. It is true however that I am unloading a rather small set of data.
So my question is, in principle, how is unload supposed to behave in case of a lost connection (say, a firewall kills it or even someone does a kill -9 on the process that executes the unload)? Will it run to completion? Will it stop as soon as it senses that the connection is lost? I have been unable to find the answer neither by rtfm'ing, nor by googling...
Thank you!
The UNLOAD will run until it completes, is cancelled, or encounters an error. Loss of the issuing connection is not interpreted as a cancel.
The statement can be cancelled on a separate connection using CANCEL or PG_CANCEL_BACKEND.
http://docs.aws.amazon.com/redshift/latest/dg/r_CANCEL.html
http://docs.aws.amazon.com/redshift/latest/dg/PG_CANCEL_BACKEND.html

Postgres: "ERROR: cached plan must not change result type"

This exception is being thrown by the PostgreSQL 8.3.7 server to my application.
Does anyone know what this error means and what I can do about it?
ERROR: cached plan must not change result type
STATEMENT: select code,is_deprecated from country where code=$1
I figured out what was causing this error.
My application opened a database connection and prepared a SELECT statement for execution.
Meanwhile, another script was modifying the database table, changing the data type of one of the columns being returned in the above SELECT statement.
I resolved this by restarting the application after the database table was modified. This reset the database connection, allowing the prepared statement to execute without errors.
I'm adding this answer for anyone landing here by googling ERROR: cached plan must not change result type when trying to solve the problem in the context of a Java / JDBC application.
I was able to reliably reproduce the error by running schema upgrades (i.e. DDL statements) while my back-end app that used the DB was running. If the app was querying a table that had been changed by the schema upgrade (i.e. the app ran queries before and after the upgrade on a changed table) - the postgres driver would return this error because apparently it does caching of some schema details.
You can avoid the problem by configuring your pgjdbc driver with autosave=conservative. With this option, the driver will be able to flush whatever details it is caching and you shouldn't have to bounce your server or flush your connection pool or whatever workaround you may have come up with.
Reproduced on Postgres 9.6 (AWS RDS) and my initial testing seems to indicate the problem is completely resolved with this option.
Documentation: https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters
You can look at the pgjdbc Github issue 451 for more details and history of the issue.
JRuby ActiveRecords users see this: https://github.com/jruby/activerecord-jdbc-adapter/blob/master/lib/arjdbc/postgresql/connection_methods.rb#L60
Note on performance:
As per the reported performance issues in the above link - you should do some performance / load / soak testing of your application before switching this on blindly.
On doing performance testing on my own app running on an AWS RDS Postgres 10 instance, enabling the conservative setting does result in extra CPU usage on the database server. It wasn't much though, I could only even see the autosave functionality show up as using a measurable amount of CPU after I'd tuned every single query my load test was using and started pushing the load test hard.
For us, we were facing similar issue. Our application works on multiple schema. Whenever we were doing schema changes, this issue started occruding.
Setting up prepareThreshold=0 parameter inside JDBC parameter disables statement caching at database level. This solved it for us.
I got this error, I manually ran the failing select query and it fixed the error.

Run SSIS Package from T-SQL

I noticed you can use the following stored procedures (in order) to schedule a SSIS package:
msdb.dbo.sp_add_category #class=N'JOB', #type=N'LOCAL', #name=N'[Uncategorized (Local)]'
msdb.dbo.sp_add_job ...
msdb.dbo.sp_add_jobstep ...
msdb.dbo.sp_update_job ...
msdb.dbo.sp_add_jobschedule ...
msdb.dbo.sp_add_jobserver ...
(You can see an example by right clicking a scheduled job and selecting "Script Job as-> Create To".)
AND you can use sp_start_job to execute the job immediately, effectively running SSIS packages on demand.
Question: does anyone know of any msdb.dbo.[...] stored procedures that simply allow you to run SSIS packages on the fly without using sp_cmdshell directly, or some easier approach?
Well, you don't strictly need the sp_add_category, sp_update_job or sp_add_jobschedule calls. We do an on-demand package execution in our app using SQL Agent with the following call sequence:
- sp_add_job
- sp_add_jobstep
- sp_add_jobserver
- sp_start_job
Getting the job status is a little tricky if you can't access the msdb..sysjobXXX tables, but our jobs start & run just fine.
EDIT:: Other than xp_cmdshell, I'm not aware of another way to launch the the SSIS handlers from withinSQL Server. Anyone with permissions on the server can start the dtexec or dtutil executables; then you can use batch files, job scheduler etc.
Not really... you could try sp_OACreate but it's more complicated and may not do it.
Do you need to run them from SQL? They can be run from command line, .net app etc
In SQL Server 2012+ it is possible to use the following functions (found in the SSISDB database, not the msdb database) to create SSIS execution jobs, prime their parameters, and start their execution:
[catalog].[create_execution]
[catalog].[set_execution_parameter_value]
[catalog].[start_execution]

DB2 Transaction log is full. How to flush / clear it?

I’m working on a experiment regarding to a course I’m taking about tuning DB2. I’m using the EC2 from Amazon (aws) to conduct the experiment.
My problem is, however, that I have to test a non-compression against row-compression in DB2 and to do that I’ve created a bsh file that run those experiments. But when I reach to my compression part I get the error ”Transaction log is full”; and no matter how low I set the inserts for it is complaining about my transaction log.
I’ve scouted Google for a day now trying to find some way to flush / clear the log or just get rit of it, i don’t need it. I’ve tried to increase the size but nothing has helped.
Please, I hope someone has an answer to solve this frustrating problem
Thanks
- Mestika
There is no need to "clear the log" in DB2. When a transaction is rolled back, DB2 releases the log space used by the transaction.
If you've increased the log size and it has not helped, please post more information about what you're trying to do.
No need of restarting. Just try to force the applications using DB2 force applications all.
Increase the Actie Log File Size and try to force application connections and terminate the connections.
Try to run the job now.
db2 force applications all
db2 update db cfg for sample using logfilsiz 5125
db2 force applications all
db2 terminate
db2 connect to sample
Run your job and monitor.
Just restart the instance, it would release the pending logs and you should be fine