How to handle large result sets with psql? - postgresql

I have a query which gives about 14M rows (I was not aware of this). When I use psql to run the query, my Fedora machine froze. Also after the query was done, I could not use Fedora anymore and had to restart my machine. When I redirected standard output to a file, Fedora also froze.
So how should I handle large resultsets with psql?

psql accumulates complete results in client memory by default. This behavior is usual for all libpq based Postgres applications or drivers. The solutions are cursors - then you are fetching only N rows from server. Cursors can be used by psql too. You can change it by setting FETCH_COUNT variable, then it will use cursors with batch retrieval size FETCH_COUNT.
postgres=# \set FETCH_COUNT 1000
postgres=# select * from generate_series(1,100000); -- big query

Related

Monitor postgres 9 queries

There is a postgres 9 with database AAA that is used by my application. I use pgadmin 4 to manage it manually.
I would like to check what queries are executed by this application on database AAA in real time.
I did a research about monitoring options in pgadmin in vain.
Is is possible to do that by using just pgadmin4? Or is it necessary to use another tool (if yes - what is he name of this tool)?
When I point pgAdmin4 at a 9.6 server, I see a dashboard by default which shows every session (done by querying pg_stat_activity). You can then drill down in a session to see the query. If a query last for less time than the monitoring interval, then you might not see it if the sample is taken at the wrong time.
If that isn't acceptable, then you should probably use a logging solution (like log_statemnt='all') or maybe the pg_stat_statements extension, rather than sample-based monitoring. pg_stat_statements doesn't integrate with the dashboard in pgAdmin4, but you can select from the view from an SQL window just like you can run any other SQL. I don't believe pgAdmin4 offers a built-in way to monitor the database server's log files, the way pgAdmin3 did.

Get all database names through JDBC

Is there any way how to get all database names out of a postgres database using JDBC? I can get the current one, but thats not what I am looking for...
I have a jUnit rule, which creates database for each test and after the test it drops it, but in some special cases, when the JVM dies, the drop never happens. So I'd like to check in the rule also existing database and clean some, which are not used any more. What I am looking for is some \l metacommand (but I can't easily ssh to the machine from unit tests...)
What would be also a solution for me would be some database ttl, something like some amqp queues have, but I suppose thats not in postgres either...
Thanks
Just run:
select datname
from pg_database
through JDBC. It returns all databases on the server you are connected to.
If you know how to get the information you want through a psql meta command (e.g. \l) just run psql with the -E switch - all internal SQL queries for the meta commands are then printed to the console.
-l actually uses a query that is a bit more complicated, but to only the the names, the above is sufficient

PostgreSQL How to check if my function is still running

Just a quick question .
I have one heavy function in PostgreSQL 9.3 , how I can check if the function is still running after several hours and how to run a function in background in psql ( my connection is unstable from time to time)
Thanks
For long running functions, it can be useful to have them RAISE LOG or RAISE NOTICE from time to time, indicating progress. If they're looping over millions of records, you might emit a log message every few thousand records.
Some people also (ab)use a SEQUENCE, where they get the nextval of the sequence in their function, and then directly read the sequence value to check progress. This is crude but effective. I prefer logging whenever possible.
To deal with disconnects, run psql on the remote side over ssh rather than connecting to the server directly over the PostgreSQL protocol. As Christian suggests, use screen so the remote psql doesn't get killed when the ssh session dies.
Alternately, you can use the traditional unix command nohup, which is available everywhere:
nohup psql -f the_script.sql </dev/null &
which will run psql in the background, writing all output and errors to a file named nohup.out.
You may also find that if you enable TCP keepalives, you don't lose remote connections anyway.
pg_stat_activity is a good hint to check if your function is still running. Also use screen or tmux on the server to ensure that it will survive a reconnect.
1 - Login to the psql console.
$ psql -U user -d database
2- Issue a \x command to format the results.
3- SELECT * from pg_stat_activity;
4- Scroll until you see your function name on the list. It should have an active status.
5- Check if there are any blocks on the table that your function relies on using:
SELECT k_locks.pid AS pid_blocking, k_activity.usename AS user_blocking, k_activity.query AS query_blocking, locks.pid AS pid_blocked, activity.usename AS user_blocked, activity.query AS query_blocked, to_char(age(now(), activity.query_start), 'HH24h:MIm:SSs') AS blocking_age FROM pg_catalog.pg_locks locks JOIN pg_catalog.pg_stat_activity activity ON locks.pid = activity.pid JOIN pg_catalog.pg_locks k_locks ON locks.locktype = k_locks.locktype and locks.database is not distinct from k_locks.database and locks.relation is not distinct from k_locks.relation and locks.page is not distinct from k_locks.page and locks.tuple is not distinct from k_locks.tuple and locks.virtualxid is not distinct from k_locks.virtualxid and locks.transactionid is not distinct from k_locks.transactionid and locks.classid is not distinct from k_locks.classid and locks.objid is not distinct from k_locks.objid and locks.objsubid is not distinct from k_locks.objsubid and locks.pid <> k_locks.pid JOIN pg_catalog.pg_stat_activity k_activity ON k_locks.pid = k_activity.pid WHERE k_locks.granted and not locks.granted ORDER BY activity.query_start;

Postgres: how to start a procedure right after database start?

I have dozens of unlogged tables, and doc says that an unlogged table is automatically truncated after a crash or unclean shutdown.
Based on that, I need to check some tables after database starts to see if they are "empty" and do something about it.
So in short words, I need to execute a procedure, right after database is started.
How the best way to do it?
PS: I'm running Postgres 9.1 on Ubuntu 12.04 server.
There is no such feature available (at time of writing, latest version was PostgreSQL 9.2). Your only options are:
Start a script from the PostgreSQL init script that polls the database and when the DB is ready locks the tables and populates them;
Modify the startup script to use pg_ctl start -w and invoke your script as soon as pg_ctl returns; this has the same race condition but avoids the need to poll.
Teach your application to run a test whenever it opens a new pooled connection to detect this condition, lock the tables, and populate them; or
Don't use unlogged tables for this task if your application can't cope with them being empty when it opens a new connection
There's been discussion of connect-time hooks on pgsql-hackers but no viable implementation has been posted and merged.
It's possible you could do something like this with PostgreSQL bgworkers, but it'd be a LOT harder than simply polling the DB from a script.
Postgres now has pg_isready for determining if the database is ready.
https://www.postgresql.org/docs/11/app-pg-isready.html

App to monitor PostgreSQL queries in real time?

I'd like to monitor the queries getting sent to my database from an application. To that end, I've found pg_stat_activity, but more often then not, the rows which are returned read " in transaction". I'm either doing something wrong, am not fast enough to see the queries come through, am confused, or all of the above!
Can someone recommend the most idiot-proof way to monitor queries running against PostgreSQL? I'd prefer some sort of easy-to-use UI based solution (example: SQL Server's "Profiler"), but I'm not too choosy.
PgAdmin offers a pretty easy-to-use tool called server monitor
(Tools ->ServerStatus)
With PostgreSQL 8.4 or higher you can use the contrib module pg_stat_statements to gather query execution statistics of the database server.
Run the SQL script of this contrib module pg_stat_statements.sql (on ubuntu it can be found in /usr/share/postgresql/<version>/contrib) in your database and add this sample configuration to your postgresql.conf (requires re-start):
custom_variable_classes = 'pg_stat_statements'
pg_stat_statements.max = 1000
pg_stat_statements.track = top # top,all,none
pg_stat_statements.save = off
To see what queries are executed in real time you might want to just configure the server log to show all queries or queries with a minimum execution time. To do so set the logging configuration parameters log_statement and log_min_duration_statement in your postgresql.conf accordingly.
pg_activity is what we use.
https://github.com/dalibo/pg_activity
It's a great tool with a top-like interface.
You can install and run it on Ubuntu 21.10 with:
sudo apt install pg-activity
pg_activity
If you are using Docker Compose, you can add this line to your docker-compose.yaml file:
command: ["postgres", "-c", "log_statement=all"]
now you can see postgres query logs in docker-compose logs with
docker-compose logs -f
or if you want to see only postgres logs
docker-compose logs -f [postgres-service-name]
https://stackoverflow.com/a/58806511/10053470
I haven't tried it myself unfortunately, but I think that pgFouine can show you some statistics.
Although, it seems it does not show you queries in real time, but rather generates a report of queries afterwards, perhaps it still satisfies your demand?
You can take a look at
http://pgfouine.projects.postgresql.org/