How to disable one query in Postgres for specified (partioned) table - postgresql

SELECT COUNT(*) as count FROM "public"."views";
This query executes every time by TablePlus client when I open a partitioned table with many partitions and processed too long time.
How I can disable the execution of this query on this table?

Related

How to fetch tables that are operated by a given query in redshift

Is there a way to fetch the tables that are operated by a given query? For example, the below query operates on table 'abc':
select * from abc
After a query is executed successfully, can we fetch the tables that the query actually operated on in redshift?
Harsha - yes and in a number of ways. The most straight forward is to query the stl_scan system table which lists all table scans and the query number that generated the scan. The question for you is how do you want to identify the query you just ran? By text? By current session id? Stl_scan will have lots of data in it for a busy cluster so you want to find only those rows you care about. If current session you can use "where pid = (SELECT pg_backend_pid())" to get the query run by the current session but pid isn't in stl_scan so you will need to join with stl_query which has both pid and query number. You will also want to have a "where starttime > getdate() - interval '1 hour'" in your query so you aren't looking through all of history for information about a query you just ran.

Postgresql select count(*) takes too long time

I have a table in my postgresql table. The table has about 9.100.000 rows. When I execute a query select count(*) from table the execution time is about 1.5 minutes. Is this normal? And what can I do decrease this tim?
If you want an estimation of the size you can use count_estimate. It is much faster.
https://wiki.postgresql.org/wiki/Count_estimate
Another workaround is to use a statistics field, to increase it every time a new row is being added.
Also please read https://www.citusdata.com/blog/2016/10/12/count-performance/

Postgres database takes time to get total record count

I am using Postgres database and I had a daily_txns table which contains more then 2,00,00,000 records in it,
so I have created child table partitions inheriting the master transaction table as daily_txns_child_2017_08_01 for daily transaction.
I am searching for a indexed column **mid** and **created_date** it takes longer time to respond, even for getting the total records in it takes longer time.
How to speed up the data fetch? It takes me more then 5 min for first query and for second query it takes more then 20 min.
My Query are as below.
select * from daily_txns where created_date <='2017-08-01' and created_date>='2017-07-01' and mid='0446721M0008690' order by created_date desc limit 10;
select count(mid) from daily_txns where created_date <='2017-08-01' and created_date>='2017-07-01' and mid='0446721M0008690';
Or the time taken is a normal one.

How can I get the total run time of a query in redshift, with a query?

I'm in the process of benchmarking some queries in redshift so that I can say something intelligent about changes I've made to a table, such as adding encodings and running a vacuum. I can query the stl_query table with a LIKE clause to find the queries I'm interested in, so I have the query id, but tables/views like stv_query_summary are much too granular and I'm not sure how to generate the summarization I need!
The gui dashboard shows the metrics I'm interested in, but the format is difficult to store for later analysis/comparison (in other words, I want to avoid taking screenshots). Is there a good way to rebuild that view with sql selects?
To add to Alex answer, I want to comment that stl_query table has the inconvenience that if the query was in a queue before the runtime then the queue time will be included in the run time and therefore the runtime won't be a very good indicator of performance for the query.
To understand the actual runtime of the query, check on stl_wlm_query for the total_exec_time.
select total_exec_time
from stl_wlm_query
where query='query_id'
There are some usefuls tools/scripts in https://github.com/awslabs/amazon-redshift-utils
Here is one of said scripts stripped out to give you query run times in milliseconds. Play with the filters, ordering etc to show the results you are looking for:
select userid, label, stl_query.query, trim(database) as database, trim(querytxt) as qrytext, starttime, endtime, datediff(milliseconds, starttime,endtime)::numeric(12,2) as run_milliseconds,
aborted, decode(alrt.event,'Very selective query filter','Filter','Scanned a large number of deleted rows','Deleted','Nested Loop Join in the query plan','Nested Loop','Distributed a large number of rows across the network','Distributed','Broadcasted a large number of rows across the network','Broadcast','Missing query planner statistics','Stats',alrt.event) as event
from stl_query
left outer join ( select query, trim(split_part(event,':',1)) as event from STL_ALERT_EVENT_LOG group by query, trim(split_part(event,':',1)) ) as alrt on alrt.query = stl_query.query
where userid <> 1
-- and (querytxt like 'SELECT%' or querytxt like 'select%' )
-- and database = ''
order by starttime desc
limit 100

Postgres 9.4 detects Deadlock when read-modify-write on single table

We have an application with a simple table
given_entity{
UUID id;
TimeStamp due_time;
TimeStamp process_time;
}
This is a spring boot (1.2.5.RELEASE) application that uses spring-data-jpa.1.2.5.RELEASE with hibernate-4.3.10.FINAL as jpa provier.
We have 5 instances of this application with each of them having a scheduler running every 2 second and querying the database for rows that have a due_time of last 2 mins until now that are not yet processed;
SELECT * FROM given_entity
WHERE process_time is null and due_time between now() and NOW() - INTERVAL '2 minutes'
FOR UPDATE
Requirement is each row of above table gets successfully processed by exactly one of application instances.
Then the application instance processes these rows and update its process_time field in one transaction.
This may or may not take more than 2 seconds, which is scheduler interval.
Also we don't have any index but PK index on this table.
Second point worth noting is that these instances might insert rows this table which is called separately by clients.
Problem: in the logs I see this message from postgresql (rarely but it happens)
ERROR: deadlock detected
Detail: Process 10625 waits for ShareLock on transaction 25382449; blocked by process 10012.
Process 10012 waits for ShareLock on transaction 25382448; blocked by process 12238.
Process 12238 waits for AccessExclusiveLock on tuple (1371,45) of relation 19118 of database 19113; blocked by process 10625.
Hint: See server log for query details.
Where: while locking tuple (1371,45) in relation "given_entity"
Question:
How does this happen?
I checked postgresql locks and searched internet. I didn't find anything that says deadlock is possible on only one simple table.
I also couldn't reproduce this error using test.
Process A tries to lock row 1 followed by row 2. Meanwhile, process B tries to lock row 2 then row 1. That's all it takes to trigger a deadlock.
The problem is that the row locks are acquired in an indeterminate order, because the SELECT returns its rows in an indeterminate order. And avoiding this is just a matter of ensuring that all processes agree on an order when locking rows, i.e.:
SELECT * FROM given_entity
WHERE process_time is null and due_time between now() and NOW() - INTERVAL '2 minutes'
ORDER BY id
FOR UPDATE
In Postgres 9.5+, you can simply ignore any row which is locked by another process using FOR UPDATE SKIP LOCKED.
This can easily happen.
There are probably several rows that satisfy the condition
due_time BETWEEN now() AND now() - INTERVAL '2 minutes'
so it can easily happen that the SELECT ... FOR UPDATE finds and locks one row and then is blocked locking the next row. Remember – for a deadlock it is not necessary that more than one table is involved, it is enough that more than one lockable resource is involved. In your case, those are two different rows in the given_entity table.
It may even be that the deadlock happens between two of your SELECT ... FOR UPDATE statements.
Since you say that there is none but the primary key index on the table, the query has to perform a sequential scan. In PostgreSQL, there is no fixed order for rows returned from a sequential scan. Rather, if two sequential scans run concurrently, the second one will “piggy-back” on the first and will start scanning the table at the current location of the first sequential scan.
You can check if that is the case by setting the parameter synchronize_seqscans to off and see if the deadlocks vanish. Another option would be to take a SHARE ROW EXCLUSIVE lock on the table before you run the statement.
Switch on hibernate batch updates in your application.properties
hibernate.batch.size=100
hibernate.order_updates=true
hibernate.order_inserts=true
hibernate.jdbc.fetch_size = 400