How to run multiple transactions concurrently in PostgreSQL - postgresql

I want to do some basic experiment on PostgreSQL, for example to generate deadlocks, to create non-repeatable reads, etc. But I could not figure out how to run multiple transactions at once to see such behavior.
Can anyone give me some Idea?

Open more than one psql session, one terminal per session.
If you're on Windows you can do that by launching psql via the Start menu multiple times. On other platforms open a couple of new terminals or terminal tabs and start psql in each.
I routinely do this when I'm examining locking and concurrency issues, used in answers like:
https://stackoverflow.com/a/12456645/398670
https://stackoverflow.com/a/12831041/398670
... probably more. A useful trick when you want to set up a race condition is to open a third psql session and BEGIN; LOCK TABLE the_table_to_race_on;. Then run statements in your other sessions; they'll block on the lock. ROLLBACK the transaction holding the table lock and the other sessions will race. It's not perfect, since it doesn't simulate offset-start-time concurrency, but it's still very helpful.
Other alternatives are outlined in this later answer on a similar topic.

pgbench is probably the best solution in yours case. It allows you to test different complex database resource contentions, deadlocks, multi-client, multi-threaded access.
To get dealocks you can simply right some script like this ('bench_script.sql):
DECLARE cnt integer DEFAULT 0;
BEGIN;
LOCK TABLE schm.tbl IN SHARE MODE;
select count(*) from schm.tabl into cnt;
insert into schm.tbl values (1 + 9999*random(), 'test descr' );
END;
and pass it to pgbench with -f parameter.
For more detailed pgbench usage I would recommend to read the official manual for postgreSQL pgBench
and get acquented with my pgbench question resolved recently here.

Craig Ringer provide a way that open mutiple transactions manualy, if you find that is not very convenient, You can use pgbench run multiple transactions at once.

Related

Execute Multiple SQL Insert Statements in parallel in Snowflake

I have a question about how it works when several SQL statements are executed in parallel in Snowflake.
For example, if I execute 10 insert statements on 10 different tables with the same base table - will the tables be loaded in parallel?
Since Copy and Insert statement only write in new partitions they can run in parallel with other Copy or Insert statements.
https://docs.snowflake.com/en/sql-reference/transactions.html
"https://docs.snowflake.com/en/sql-reference/transactions.html#transaction-commands-and-functions" states that "Most INSERT and COPY statements write only new partitions. Those statements often can run in parallel with other INSERT and COPY operations,..."
I assume that statements cannot run in parallel when they want to insert into the same micro partition. Is that correct or is there another explanation why locks on INSERTs can happen?
I execute 10 insert statements on 10 different tables
with the same base table - will the tables be loaded in parallel?
YES!
Look for multi-table insert in SF https://docs.snowflake.com/en/sql-reference/sql/insert-multi-table.html
You can execute queries parallelly by just adding a ">" symbol.
for example:
The below statement will submit all the mentioned queries parallelly to snowflake. It will not exit out though if there is any error encountered in any of the queries.
snowsql -o log_level=DEBUG -o exit_on_error=true -q "select 1;>select * from SNOWSQLTABLE;>select 2;>select 3;>insert into TABLE values (1)>;select * from SNOWLTABLE;>select 5;"
The below statement will cause the queries to run one at a time and exit if any error is found.
snowsql -o log_level=DEBUG -o exit_on_error=true -q "select 1;select * from SNOWSQLTABLE;select 2;select 3;insert into SNOQSQLTABLE values (1);select * from SNOWLTABLE;select 5;"
Concurrency in Snowflake is managed with either multiple warehouses (compute resources) or enabling multi-clustering on a warehouse (one virtual warehouse with more than one cluster of servers).
https://docs.snowflake.com/en/user-guide/warehouses-multicluster.html
I'm working with a customer today that does millions of SQL commands a day, they have many different warehouses and most of these warehouses are set to multi-cluster "auto-scale" mode.
Specifically, for your question, it sounds like you have ten sessions connected, running inserts into ten tables via querying a single base table. I'd probably begin my testing of this with one virtual warehouse, configured with a minimum
of one cluster and a maximum of three or four, and then run tests and review the results.
The size of the warehouse I would use would mostly be determined by how large the query is (the SELECT portion), you can start with something like a medium and review the performance and query plans of the inserts to see if that is the appropriate size.
When reviewing the plans, check for queuing time to see if perhaps three or four clusters isn't enough, it probably will be fine.
Your query history will also indicate which "cluster_number" your query ran on, within the virtual warehouse. This is one way to check to see how many clusters were running (the maximum cluster_number), another is to view the warehouses tab in the webUI or to execute the "show warehouses;" command.
Some additional links that might help you:
https://www.snowflake.com/blog/auto-scale-snowflake-major-leap-forward-massively-concurrent-enterprise-applications/
https://community.snowflake.com/s/article/Putting-Snowflake-s-Automatic-Concurrency-Scaling-to-the-Test
https://support.snowflake.net/s/question/0D50Z00009T2QTXSA3/what-is-the-difference-in-scale-out-vs-scale-up-why-is-scale-out-for-concurrency-and-scale-up-is-for-large-queries-

Schema pg_dump failed due to a Lock on a table

I'm running backup restore on a schema every day and get this every now and then:
pg_dump: Error message from server: ERROR: relation not found (OID
86157003) DETAIL: This can be validly caused by a concurrent delete
operation on this object. pg_dump: The command was: LOCK TABLE
myschema.products IN ACCESS SHARE MODE
How can this be avoided? It seems the table was being used at the time, or someone was running something against the table. can I just kill all connections to the DB before restoring or is there another alternative?
As far as I understand, pg_dump could run even if users are doing something with the table but it doesn't seem to be the case.
Thanks,
It is somewhat buried but the answer lies here:
https://www.postgresql.org/docs/current/app-pgdump.html
"
-j njobs
...
To detect this conflict, the pg_dump worker process requests another shared lock using the NOWAIT option. If the worker process is not granted this shared lock, somebody else must have requested an exclusive lock in the meantime and there is no way to continue with the dump, so pg_dump has no choice but to abort the dump.
"
Which is borne out by the this in the error message:
"LOCK TABLE myschema.products IN ACCESS SHARE MODE"
ACCESS SHARE will cooperate with all other locks modes except ACCESS EXCLUSIVE. ACCESS EXCLUSIVE is used by DROP TABLE, TRUNCATE, REINDEX, etc. See here Locks for more information. So you need to do the dump during a time where the operations listed for ACCESS EXCLUSIVE are known to not happen or by blocking/dropping connections.
Somebody dropped a table between the time pg_dump took an inventory of the tables and the time it tries to dump the table.
This can happen if your application is in the habit of dropping tables all the time.
This is not an answer to your main question, but a caution regarding:
As far as I understand, pg_dump could run even if users are doing something with the table but it doesn't seem to be the case.
It assumes that the application performs every action in a single transaction. I have known of applications which accomplish some tasks using more than one.
I don't know exactly what the tasks were or if it was unavoidable that they use multiple transactions, but dumps could only be trusted when the application was idle or, better yet, when the service was stopped.
For the function that those applications performed, it wasn't a big deal to work around down times or stop services.
I don't know how you'd determine this behaviour without being told by the developers. Just something to consider.

Dropping index concurrently PostgreSQL 9.1

DROP INDEX CONCURRENTLY first appeared in PSQL 9.2, but my server runs 9.1. Unfortunately that operation locks my app for an unpredictable amount of time, that's a very sad fact when doing it on production.
Is there a way to drop an index concurrently?
No, there's no simple workaround - otherwise it's rather less likely that DROP INDEX CONCURRENTLY would've been added in 9.2.
However, you can kill all sessions to force the drop to occur promptly.
What you want to avoid is the drop waiting on a partially acquired exclusive lock that prevents other transactions from proceeding, but doesn't let it proceed either, while it waits for other transactions to finish and release their share locks. The best way to ensure that happens is to kill all concurrent sessions.
So, in one session:
DROP INDEX my_index;
In another session, as a superuser, terminate all other sessions using the following untested query, which you'll need to adapt appropriately and test before use:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE procpid <> (
SELECT pid
FROM pg_stat_activity
WHERE query = 'DROP INDEX my_index;')
AND
procpid <> pg_backend_pid();
Your well-written and well-tested application will immediately reconnect and retry its queries without bothering the user or getting upset, because it knows that transient errors are something it has to cope with so it runs all its database access in retry loops. If it isn't well written then you'll discover that with a flood of error user-visible messages. If it's really not well written you'll have to restart it before it gets its head together, but it's rare to see apps that are quite that broken.
This is a heavy handed approach. You can be rather softer about it by joining against pg_locks and only terminating sessions that actually hold a lock on the relation you're interested in or the index you wish to modify. You get to enjoy writing that query, because my interest in working around limitations of older database versions is limited.

export/import all the information of a table

For a mandatory assignment of a DB2 class I'm asked to write o procedure to export "export information about all xxx, delete all xxx and import the information again." where xxx is my table.
This procedure has to be as efficiently as possible.
I'm quite stuck here, quite naively I see two options
1) write a select * from xxx; drop ...; insert; using python or something
2) using some export/import utility of db2
But I can be totally wrong, suggestions?
what I've noticed is that there are not integrity constraints.
You can do that via "export/load/set integrity". I think it is the best way if you execute that in the server.
If you use python, you will have to use a odbc driver or similar to get the data, processes, etc.
If you use python just to execute the commands, it is ok, finally, it is just a call to the database.
If you execute the process in other machine, the net use is increased, and the performance is lower.
Using import, it is just like an "insert" per row in the file which uses a lot of transaction log. Instead, the load command, puts the data diretly in the tablespace and then check the referential integrity (faster process)
Finally, if you want to extract the information very fast, you can buy the IBM InfoSphere® Optim™ High Performance Unload for DB2 for Linux, UNIX and Windows
I have had a similar task before.
The solution is simple and sweet:
A simple export to csv; and once the data has been exported, the main thing is to TRUNCATE the table with your logs being disabled and then load the data back into the table.
EXPORT TO <FileName>.CSV OF DEL SELECT * FROM <TableName>;
ALTER TABLE <TableName> ACTIVATE NOT LOGGED INITIALLY WITH EMPTY TABLE;
LOAD FROM "./<FileName>.CSV" OF DEL INSERT INTO <TableName>;

How to profile plpgsql procedures

I'm trying to improve the performance of a long-running plpgsql stored procedure, but I have no idea what, if any, profiling tools are available. Can anyone offer suggestions for how to go about profiling such a procedure?
Raise some notices from the procedure including the clock_timestamp() to see where the database spends time. And make the procedures a simple as possible.
Could you show us an example?
We are currently looking for a better answer to this question, and have stumbled across this tool:
http://www.openscg.com/2015/02/postgresql-plpgsql-profiler/
Hosted at:
https://bitbucket.org/openscg/plprofiler
It claims to give you what you are looking for, including the total time spent on each line of the function. We have not investigated it further yet, but based on the author's claims, we are optimistic.
To start with, you could turn on logging of all statements into the Postgres logfile. The log will contain the runtime for each statement. This way you can identify the slowest queries and try to optimize them.
But reading your comment to Frank's post I'd guess that the looping is your problem. Try to get rid of the looping and do everything in a single query. One statement that reads a lot of rows is usually more efficient than a lot of statements reading only a few rows.
Try to use pg_stat_statements extension ( http://www.postgresql.org/docs/9.2/static/pgstatstatements.html ).
It can show call number and total call time for all statements (including sub-statements within plpgsql procedures).
The tool to use is https://github.com/bigsql/plprofiler
If you installed PostgreSQL using the PGDG yum repository, then installing plprofiler is very straightforward, just run the commands below but keep in mind it's only available for PostgreSQL versions 11 and higher (replace XX with your version number):
yum install plprofiler_XX-server plprofiler_XX-client plprofiler_XX
Then add the profiler extension to your database:
CREATE EXTENSION plprofiler;
Then to generate a profile report on a plpgsql function, run a command like this:
plprofiler run -U your_username -d your_database --command "SELECT * FROM your_custom_plpgsql_function()" --output profile.html