At my work, I needed to build a new join table in a postgresql database that involved doing a lot of computations on two existing tables. The process was supposed to take a long time so I set it up to run over the weekend before I left on Friday. Now, I want to check to see if the query finished or not.
How can I check if an INSERT command has finished yet while not being at the computer I ran it on? (No, I don't know how many rows it was suppose to add.)
Select * from pg_stat_activity where state not ilike 'idle%' and query ilike 'insert%'
This will return all non-idle sessions where the query begins with insert, if your query does not show in this list then it is no longer running.
pg_stat_activity doc
You can have a look at the table pg_stat_activity which contains all database connections including active query, owner etc.
At https://gist.github.com/rgreenjr/3637525 there is a copy-able example how such a query could look like.
Related
I would like to query data from a table, and if a row is locked, show it as a different color. Is this possible using postgresql's locking for update?
e.g.
select
*,
(select from pg_x -- link row somehow )
from table
thank you
There is no good way to do that. The row locks are stored in the row (system column xmax), but this attribute serves other purposes too, and the flags that determine if it is indeed a row lock or perhaps a rolled back update are not exposed via SQL.
There are only unpleasant alternatives:
Use the pageinspect contrib module to examine those flags. That would be a second scan of the table, and such a query doesn't respect MVCC visibility.
Run a second query:
SELECT * FROM atable
FOR UPDATE SKIP LOCKED;
That would lock all rows in the table and be very bad for concurrency.
Besides, that information would be pretty useless for the user. In a well-written application, row locks are only held for split seconds, so the information would be outdated by the time it reaches the user.
Is there a way to fetch the tables that are operated by a given query? For example, the below query operates on table 'abc':
select * from abc
After a query is executed successfully, can we fetch the tables that the query actually operated on in redshift?
Harsha - yes and in a number of ways. The most straight forward is to query the stl_scan system table which lists all table scans and the query number that generated the scan. The question for you is how do you want to identify the query you just ran? By text? By current session id? Stl_scan will have lots of data in it for a busy cluster so you want to find only those rows you care about. If current session you can use "where pid = (SELECT pg_backend_pid())" to get the query run by the current session but pid isn't in stl_scan so you will need to join with stl_query which has both pid and query number. You will also want to have a "where starttime > getdate() - interval '1 hour'" in your query so you aren't looking through all of history for information about a query you just ran.
I'm running queries on a Redshift cluster using DataGrip that take upwards of 10 hours to run and unfortunately these often fail. Alas, DataGrip doesn't maintain a connection to the database long enough for me to get to see the error message with which the queries fail.
Is there a way of retrieving these error messages later, e.g. using internal Redshift tables? Alternatively, is there are a way to make DataGrip maintain the connection for long enough?
Yes, you Can!
Query stl_connection_log table to find out pid by looking at the recordtime column when your connection was initiated and also dbname, username and duration column helps to narrow down.
select * from stl_connection_log order by recordtime desc limit 100
If you can find the pid, you can query stl_query table to find out if are looking at right query.
select * from stl_query where pid='XXXX' limit 100
Then, check the stl_error table for your pid. This will tell you the error you are looking for.
select * from stl_error where pid='XXXX' limit 100
If I’ve made a bad assumption please comment and I’ll refocus my answer.
I'm in the process of benchmarking some queries in redshift so that I can say something intelligent about changes I've made to a table, such as adding encodings and running a vacuum. I can query the stl_query table with a LIKE clause to find the queries I'm interested in, so I have the query id, but tables/views like stv_query_summary are much too granular and I'm not sure how to generate the summarization I need!
The gui dashboard shows the metrics I'm interested in, but the format is difficult to store for later analysis/comparison (in other words, I want to avoid taking screenshots). Is there a good way to rebuild that view with sql selects?
To add to Alex answer, I want to comment that stl_query table has the inconvenience that if the query was in a queue before the runtime then the queue time will be included in the run time and therefore the runtime won't be a very good indicator of performance for the query.
To understand the actual runtime of the query, check on stl_wlm_query for the total_exec_time.
select total_exec_time
from stl_wlm_query
where query='query_id'
There are some usefuls tools/scripts in https://github.com/awslabs/amazon-redshift-utils
Here is one of said scripts stripped out to give you query run times in milliseconds. Play with the filters, ordering etc to show the results you are looking for:
select userid, label, stl_query.query, trim(database) as database, trim(querytxt) as qrytext, starttime, endtime, datediff(milliseconds, starttime,endtime)::numeric(12,2) as run_milliseconds,
aborted, decode(alrt.event,'Very selective query filter','Filter','Scanned a large number of deleted rows','Deleted','Nested Loop Join in the query plan','Nested Loop','Distributed a large number of rows across the network','Distributed','Broadcasted a large number of rows across the network','Broadcast','Missing query planner statistics','Stats',alrt.event) as event
from stl_query
left outer join ( select query, trim(split_part(event,':',1)) as event from STL_ALERT_EVENT_LOG group by query, trim(split_part(event,':',1)) ) as alrt on alrt.query = stl_query.query
where userid <> 1
-- and (querytxt like 'SELECT%' or querytxt like 'select%' )
-- and database = ''
order by starttime desc
limit 100
Is there some means of querying the system tables to establish which tables are using what locking schemes? I took a look at the columns in sysobjects but nothing jumped out.
aargh, just being an idiot:
SELECT name, lockscheme(name)
FROM sysobjects
WHERE type="U"
ORDER BY name
take a look at the syslockinfo and syslocks system tables
you can also run the sp_lock proc