Could I script my monthly postgres maintenance? - postgresql

I have to perform a monthly maintenance to a postgres database.
I puTTy into the system, navigate to the database and then run 3 commands on 40 different tables:
CLUSTER [table1] USING [primarykey];
ANALYZE [table1];
REINDEX TABLE [table1];
I have to wait for each command to finish executing before I can run the next one (i.e. CLUSTER, -wait up to a few minutes-, ANALYZE -wait-, REINDEX -wait-, )
It's very simple to do but it takes around 30-45 minutes of me just copying and pasting 120 lines, one line at a time... is there any way to automate this process?
I have zero experience with scripting and I know very little about postgreSQL.
My question is somewhat unique because I cannot install anything in the postgreSQL database. I want to have this script localized on my computer and then be able to run it when it's time for the maintenance.

Clustering automatically reindexes the table. There is no reason to reindex the table immediately after you cluster it.
Do you actually need to do this stuff? Do you have evidence that your tables are in need of clustering? Or you just assuming they do because of something you read off the internet referring to a decade-old version of PostgreSQL written by someone who didn't know what they were talking about in the first place? It is possible you really would benefit from this. It is even more possible you wouldn't, and it is just a waste of time.
If you know nothing about scripting, then you need to learn something about scripting. You should probably tag your post as being about scripting, in whichever shell/language you would like to use.
At the core, all you have to do is write a series of commands to be executed from the command line, and shove them into a text file. The easiest way is probably to install psql on your local computer, if it is not already there.
psql -c 'cluster foobar' -h thehost.example.com
psql -c 'analyze foobar' -h thehost.example.com
You might need to do some configuration to make this connection work with whatever authentication method you have in place, but without knowing which authentication method that is I can't comment further.
If the cluster for some reason fails, there is little reason to proceed to try to analyze it. (But there is also little harm in doing so). If you want to fine tune this situation, there are a variety of ways to do it, depending on which shell you are writing your script for, and what you want it to do.

Related

Is there a way to show everything that was changed in a PostgreSQL database during a transaction?

I often have to execute complex sql scripts in a single transaction on a large PostgreSQL database and I would like to verify everything that was changed during the transaction.
Verifying each single entry on each table "by hand" would take ages.
Dumping the database before and after the script to plain sql and using diff on the dumps isn't really an option since each dump would be about 50G of data.
Is there a way to show all the data that was added, deleted or modified during a single transaction?
Dude, What are you looking for is the most searchable thing on the internet when it comes to capturing Database changes. It is a kind of version control we can say.
But as long as I know, sadly there are no in-built approaches are available in PostgreSQL or MySql. But you can overcome it by setting/adding some triggers for your most usable operations.
You can create some backup schemas, and tables to capture your changes that are changed(updated), created, or deleted.
In this way you can achieve what you want. I know this process is fully manual, But really effective.
If you need to analyze the script's behaviour only sporadically, then the easiest approach would be to change server configuration parameter log_min_duration_statement to 0 and then back to any value it had before the analysis. Then all of the script activity will be written to the instance log.
This approach is not suitable if your storage is not prepared to accommodate this amount of data, or for systems in which you don't want sensitive client data to be written to a plain-text log file.

How to DoS the Postgres Database

I am doing small research work on how to protect database against DDoS atack.
I am using postgres database in my testings.
I want to perform a DDoS of Database on my local machine. But I don't really know how to do it.
My plan is to create script that runs bunch of queries. But I want these queries took as much time to complete as possible.
I saw this example: select tab1 from (select decode(encode(convert(compress(post) using latin1),concat(post,post,post,post)),sha1(concat(post,post,post,post))) as tab1 from table_1)a;
But I am failing to replicate it in postgres.
I need help in translating this query in postgres or other examples of functions or queries that would take lots of time to complete.
Edit:
sleep functions might not work. They are not loading system enough.
In my understanding, DDoS should be performed with functions that are taking a long time to perform and sucks up tons of compute powers of the system.

Giving access to execute SQL queries on a static database

I am working on a project where i want to give people the possibility to execute SQL queries on an PostgreSQL database. I then only need to prevent people from hacking/attacking my database.
I thought that maybe a way to do that, is by giving only view access to de database connection. And using EXPLAIN ANALYSE to calculating the cost of the SQL query.
Is EXPLAIN ANALYSE trustworthy enough to make sure there are no cheap ways to get the website down?
Do you have suggestions?
EXPLAIN ANALYSE will execute the query, including any side-effects it may have. PostgreSQL also allows running arbitrary Perl and Python code if configured to do so, so be careful. You're likely better off running PostgreSQL instances in per-request VMs or in similar highly isolated environments.

How to profile plpgsql procedures

I'm trying to improve the performance of a long-running plpgsql stored procedure, but I have no idea what, if any, profiling tools are available. Can anyone offer suggestions for how to go about profiling such a procedure?
Raise some notices from the procedure including the clock_timestamp() to see where the database spends time. And make the procedures a simple as possible.
Could you show us an example?
We are currently looking for a better answer to this question, and have stumbled across this tool:
http://www.openscg.com/2015/02/postgresql-plpgsql-profiler/
Hosted at:
https://bitbucket.org/openscg/plprofiler
It claims to give you what you are looking for, including the total time spent on each line of the function. We have not investigated it further yet, but based on the author's claims, we are optimistic.
To start with, you could turn on logging of all statements into the Postgres logfile. The log will contain the runtime for each statement. This way you can identify the slowest queries and try to optimize them.
But reading your comment to Frank's post I'd guess that the looping is your problem. Try to get rid of the looping and do everything in a single query. One statement that reads a lot of rows is usually more efficient than a lot of statements reading only a few rows.
Try to use pg_stat_statements extension ( http://www.postgresql.org/docs/9.2/static/pgstatstatements.html ).
It can show call number and total call time for all statements (including sub-statements within plpgsql procedures).
The tool to use is https://github.com/bigsql/plprofiler
If you installed PostgreSQL using the PGDG yum repository, then installing plprofiler is very straightforward, just run the commands below but keep in mind it's only available for PostgreSQL versions 11 and higher (replace XX with your version number):
yum install plprofiler_XX-server plprofiler_XX-client plprofiler_XX
Then add the profiler extension to your database:
CREATE EXTENSION plprofiler;
Then to generate a profile report on a plpgsql function, run a command like this:
plprofiler run -U your_username -d your_database --command "SELECT * FROM your_custom_plpgsql_function()" --output profile.html

Is there a PostgreSQL equivalent of SQL Server profiler?

I need to see the queries submitted to a PostgreSQL server. Normally I would use SQL Server profiler to perform this action in SQL Server land, but I'm yet to find how to do this in PostgreSQL. There appears to be quite a few pay-for tools, I am hoping there is an open source variant.
You can use the log_statement config setting to get the list of all the queries to a server
https://www.postgresql.org/docs/current/static/runtime-config-logging.html#guc-log-statement
Just set that, and the logging file path and you'll have the list. You can also configure it to only log long running queries.
You can then take those queries and run EXPLAIN on them to find out what's going on with them.
https://www.postgresql.org/docs/9.2/static/using-explain.html
Adding to Joshua's answer, to see which queries are currently running simply issue the following statement at any time (e.g. in PGAdminIII's query window):
SELECT datname,procpid,current_query FROM pg_stat_activity;
Sample output:
datname | procpid | current_query
---------------+---------+---------------
mydatabaseabc | 2587 | <IDLE>
anotherdb | 15726 | SELECT * FROM users WHERE id=123 ;
mydatabaseabc | 15851 | <IDLE>
(3 rows)
I discovered pgBadger (https://pgbadger.darold.net/) and it is a fantastic tool that saved my life many times. Here is an example of an
report.
If you open it and go to 'top' menu you can see the slowest queries and the time consuming queries.
Then you can ask details and see nice graphs that show you the queries by hour and if you use detail button you can see the SQL text in a pretty form. So I can see that this tool is free and perfect.
I need to see the queries submitted to a PostgreSQL server
As an option, if you use pgAdmin (on my picture it's pgAdmin 4 v2.1). You can observe queries via "Dashboard" tab:
Update on Jun, 2022. Answering to the questions in the comments.
Question 1: My long SQL query gets truncated, is there any workaround?
Follow steps below:
Close pgAdmin
Find postgresql.conf file. On my computer it is located in c:\Program Files\PostgreSQL\13\data\postgresql.conf. If you can't find it - read this answer for more details.
Open postgresql.conf file and find property called track_activity_query_size. As you see by default the value is 1024 that means - all queries bigger than 1024 symbols will be truncated. Uncomment this property and set a new value, for example:
track_activity_query_size = 32768
Restart PostgreSQL service on your computer
P.S: now everything is ready, but keep in mind that this change can slightly decrease the performance. From development/debugging standpoint you won't see any difference, but better don't forget to revert this property in 'production' environment. For more details read this article.
Question 2: I ran my function/method that triggers SQL query but I still can't see it in pgAdmin, or sometimes I see it but it runs so quickly so I can't even expand the session on 'Dashboard' tab?
Answer: Try to run your application in 'debug' mode and set a breakpoint right before you close the connection to the database. At the same time (while you debugging) click on 'refresh' button on 'Dashboard' tab in pgAdmin.
You can use the pg_stat_statements extension.
If running the db in docker just add this command in docker-compose.yml, otherwise just look at the installation instructions for your setup:
command: postgres -c shared_preload_libraries=pg_stat_statements -c pg_stat_statements.track=all -c max_connections=200
And then in the db run this query:
CREATE EXTENSION pg_stat_statements;
Now to see the operations that took more time run:
SELECT * FROM pg_stat_statements ORDER BY total_time/calls DESC LIMIT 10;
Or play with other queries over that view to find what you are looking for.
All those tools like pgbadger or pg_stat_statements require access to the server, and/or altering the server-settings/server-log-settings, which is not such a good idea, especially if it requires server-restart and because logging slows everything down, including production use.
In addition to that, extensions such as pg_stat_statements don't really show the queries, let alone in chronological order, and pg_stat_activity doesn't show you anything that doesn't run right now, and in addition, queries that are running that are from other users than you.
Instead of running any such crap, you can add a TCP-proxy in between your application and PostgreSQL-server.
Then your TCP-proxy reads all the sql-query-statements from what goes over the wire from your application to the server, and outputs it to console (or wherever). Also it forwards everything to PostgreSQL and returns the answer(s) to your application.
This way, you don't need to stop/start/restart your db-server, you don't need admin/root rights !ON THE DB-SERVER! to change the config file, and you don't need any access to the db-server. All you need to do is change the db connection string in your application (e.g. in your dev-environment) to point to the proxy server instead of the sql-server (the proxy-server then needs to point to the sql-server). Then you can see (in chronological order) what your <insert_profanity_here> application does on the database - and also, other people's queries don't show up (which makes it even better than sql-server-profiler). [Of course, you can also see what other people do if you put it on the db server on the old db port, and assing the db a new port. ]
I have implemented this with pg_proxy_net
(runs on Windows, Linux and Mac and doesn't require OS-dependencies, as it is .NET-Core-self-contained-deployment).
That way, you get appx. "the same" as you get with sql-server profiler.
Wait, if you aren't disturbed by other people's queries, what you get with pg_proxy_net is actually better than what you get with sql-server profiler.
Also, on github, I have a command-line MS-SQL-Server profiler that works on Linux/Mac.
And an GUI MS-SQL-Express-Profiler for Windows.
The funny thing is, once you have written one such tool, writing some more is just a piece of cake and done in under a day.
Also, if you want to get pg_stat_statements to work, you need to alter the config file (postgresql.conf), adding tracking and preloading libraries, and then restart the server:
CREATE EXTENSION pg_stat_statements;
-- D:\Programme\LessPortableApps\SQL_PostGreSQL\PostgreSQLPortable\Data\data\postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
You find the documentation for the PostgreSQL protocol here:
https://www.postgresql.org/docs/current/protocol-overview.html
You can see how the data is written into the TCP-buffer by looking at the source code of a postgresql-client, e.g. FrontendMessages of Npgsql on github:
https://github.com/npgsql/npgsql/blob/main/src/Npgsql/Internal/NpgsqlConnector.FrontendMessages.cs
Also, just in case you have a .NET application (with source code) that uses Npgsql, you might want to have a look at Npgsql.OpenTelemetry.
PS:
To configure the logs, see ChartIO Tutorial and TablePlus.
Cheers !
Happy "profiling" !