Sometime Postgresql Query takes to much time to load Data - postgresql

I am using PostgreSQL 13.4 and has intermediate level experience with PostgreSQL.
I have a table which stores reading from a device for each customer. For each customer we receive around ~3,000 data in a day and we store them in to our usage_data table as below
We have the proper indexing in place.
Column | Data Type | Index name | Idx Access Type
-------------+-----------------------------+---------------------------+---------------------------
id | bigint | |
customer_id | bigint | idx_customer_id | btree
date | timestamp | idx_date | btree
usage | bigint | idx_usage | btree
amount | bigint | idx_amount | btree
Also I have common index on 2 columns which is below.
CREATE INDEX idx_cust_date
ON public.usage_data USING btree
(customer_id ASC NULLS LAST, date ASC NULLS LAST)
;
Problem
Few weird incidents I have observed.
When I tried to get data for 06/02/2022 for a customer it took almost 20 seconds. It's simple query as below
SELECT * FROM usage_data WHERE customer_id =1 AND date = '2022-02-06'
The execution plan
When I execute same query for 15 days then I receive result in 32 milliseconds.
SELECT * FROM usage_data WHERE customer_id =1 AND date > '2022-05-15' AND date <= '2022-05-30'
The execution plan
Tried Solution
I thought it might be the issue of indexing as for this particular date I am facing this issue.
Hence I dropped all the indexing from the table and recreate it.
But the problem didn't resolve.
Solution Worked (Not Recommended)
To solve this I tried another way. Created a new database and restored the old database
I executed the same query in new database and this time it takes 10 milliseconds.
The execution plan
I don't think this is proper solution for production server
Any idea why for any specific data we are facing this issue?
Please let me know if any additional information is required.
Please guide. Thanks

Related

PostgreSQL 13 - Performance Improvement to delete large table data

I am using PostgreSQL 13 and has intermediate level experience with PostgreSQL.
I have a table named tbl_employee. it stores employee details for number of customers.
Below is my table structure, followed by datatype and index access method
Column | Data Type | Index name | Idx Access Type
-------------+-----------------------------+---------------------------+---------------------------
id | bigint | |
name | character varying | |
customer_id | bigint | idx_customer_id | btree
is_active | boolean | idx_is_active | btree
is_delete | boolean | idx_is_delete | btree
I want to delete employees for specific customer by customer_id.
In table I have total 18,00,000+ records.
When I execute below query for customer_id 1001 it returns 85,000.
SELECT COUNT(*) FROM tbl_employee WHERE customer_id=1001;
When I perform delete operation using below query for this customer then it takes 2 hours, 45 minutes to delete the records.
DELETE FROM tbl_employee WHERE customer_id=1001
Problem
My concern is that this query should take less than 1 min to delete the records. Is this normal to take such long time or is there any way we can optimise and reduce the execution time?
Below is Explain output of delete query
The values of seq_page_cost = 1 and random_page_cost = 4.
Below are no.of pages occupied by the table "tbl_employee" from pg_class.
Please guide. Thanks
During :
DELETE FROM tbl_employee WHERE customer_id=1001
Is there any other operation accessing this table? If only this SQL accessing this table, I don't think it will take so much time.
In RDBMS systems each SQL statement is also a transaction, unless it's wrapped in BEGIN; and COMMIT; to make multi-statement transactions.
It's possible your multirow DELETE statement is generating a very large transaction that's forcing PostgreSQL to thrash -- to spill its transaction logs from RAM to disk.
You can try repeating this statement until you've deleted all the rows you need to delete:
DELETE FROM tbl_employee WHERE customer_id=1001 LIMIT 1000;
Doing it this way will keep your transactions smaller, and may avoid the thrashing.
SQL: DELETE FROM tbl_employee WHERE customer_id=1001 LIMIT 1000;
will not work then.
To make the batch delete smaller, you can try this:
DELETE FROM tbl_employee WHERE ctid IN (SELECT ctid FROM tbl_employee where customer_id=1001 limit 1000)
Until there is nothing to delete.
Here the "ctid" is an internal column of Postgresql Tables. It can locate the rows.

Can Postgres table partitioning by hash boost query performance?

I have an OLTP table on a Postgres 14.2 database that looks something like this:
Column | Type | Nullable |
----------------+-----------------------------+-----------
id | character varying(32) | not null |
user_id | character varying(255) | not null |
slug | character varying(255) | not null |
created_at | timestamp without time zone | not null |
updated_at | timestamp without time zone | not null |
Indexes:
"playground_pkey" PRIMARY KEY, btree (id)
"playground_user_id_idx" btree (user_id)
The database host has 8GB of RAM and 2 CPUs.
I have roughly 500M records in the table which adds up to about 80GB in size.
The table gets about 10K INSERT/h, 30K SELECT/h, and 5K DELETE/h.
The main query run against the table is:
SELECT * FROM playground WHERE user_id = '12345678' and slug = 'some-slug' limit 1;
Users have anywhere between 1 record to a few hundred records.
Thanks to the index on the user_id I generally get decent performance (double-digit milliseconds), but 5%-10% of the queries will take a few hundred milliseconds to maybe a second or two at worst.
My question is this: would partitioning the table by hash(user_id) help me boost lookup performance by taking advantage of partition pruning?
No, that wouldn't improve the speed of the query at all, since there is an index on that attribute. If anything, the increased planning time will slow down the query.
If you want to speed up that query as much as possible, create an index that supports both conditions:
CREATE INDEX ON playground (user_id, slug);
If slug is large, it may be preferable to index a hash:
CREATE INDEX ON playground (user_id, hashtext(slug));
and query like this:
SELECT *
FROM playground
WHERE user_id = '12345678'
AND slug = 'some-slug'
AND hashtext(slug) = hashtext('some-slug');
LIMIT 1;
Of course, partitioning could be a good idea for other reasons, for example to speed up autovacuum or CREATE INDEX.

Graph in Grafana using Postgres Datasource with BIGINT column as time

I'm trying to construct very simple graph showing how much visits I've got in some period of time (for example for each 5 minutes).
I have Grafana of v. 5.4.0 paired well with Postgres v. 9.6 full of data.
My table below:
CREATE TABLE visit (
id serial CONSTRAINT visit_primary_key PRIMARY KEY,
user_credit_id INTEGER NOT NULL REFERENCES user_credit(id),
visit_date bigint NOT NULL,
visit_path varchar(128),
method varchar(8) NOT NULL DEFAULT 'GET'
);
Here's some data in it:
id | user_credit_id | visit_date | visit_path | method
----+----------------+---------------+---------------------------------------------+--------
1 | 1 | 1550094818029 | / | GET
2 | 1 | 1550094949537 | /mortgage/restapi/credit/{userId}/decrement | POST
3 | 1 | 1550094968651 | /mortgage/restapi/credit/{userId}/decrement | POST
4 | 1 | 1550094988557 | /mortgage/restapi/credit/{userId}/decrement | POST
5 | 1 | 1550094990820 | /index/UGiBGp0V | GET
6 | 1 | 1550094990929 | / | GET
7 | 2 | 1550095986310 | / | GET
...
So I tried these 3 variants (actually, dozens of others with no luck) with no success:
Solution A:
SELECT
visit_date as "time",
count(user_credit_id) AS "user_credit_id"
FROM visit
WHERE $__timeFilter(visit_date)
ORDER BY visit_date ASC
No data on graph. Error: pq: invalid input syntax for integer: "2019-02-14T13:16:50Z"
Solution B
SELECT
$__unixEpochFrom(visit_date),
count(user_credit_id) AS "user_credit_id"
FROM visit
GROUP BY time
ORDER BY user_credit_id
Series ASELECT
$__time(visit_date/1000,10m,previous),
count(user_credit_id) AS "user_credit_id A"
FROM
visit
WHERE
visit_date >= $__unixEpochFrom()::bigint*1000 and
visit_date <= $__unixEpochTo()::bigint*1000
GROUP BY 1
ORDER BY 1
No data on graph. No Error..
Solution C:
SELECT
$__timeGroup(visit_date, '1h'),
count(user_credit_id) AS "user_credit_id"
FROM visit
GROUP BY time
ORDER BY time
No data on graph. Error: pq: function pg_catalog.date_part(unknown, bigint) does not exist
Could someone please help me to sort out this simple problem as I think the query should be compact, naive and simple.. But Grafana docs demoing its syntax and features confuse me slightly.. Thanks in advance!
Use this query, which will works if visit_date is timestamptz:
SELECT
$__timeGroupAlias(visit_date,5m,0),
count(*) AS "count"
FROM visit
WHERE
$__timeFilter(visit_date)
GROUP BY 1
ORDER BY 1
But your visit_date is bigint so you need to convert it to timestamp (probably with TO_TIMESTAMP()) or you will need find other way how to use it with bigint. Use query inspector for debugging and you will see SQL generated by Grafana.
Jan Garaj, Thanks a lot! I should admit that your snippet and what's more valuable your additional comments advising to switch to SQL debugging dramatically helped me to make my "breakthrough".
So, the resulting query which solved my problem below:
SELECT
$__unixEpochGroup(visit_date/1000, '5m') AS "time",
count(user_credit_id) AS "Total Visits"
FROM visit
WHERE
'1970-01-01 00:00:00 GMT'::timestamp + ((visit_date/1000)::text)::interval BETWEEN
$__timeFrom()::timestamp
AND
$__timeTo()::timestamp
GROUP BY 1
ORDER BY 1
Several comments to decypher all this Grafana magic:
Grafana has its limited DSL to make configurable graphs, this set of functions converts into some meaningful SQL (this is where seeing "compiled" SQL helped me a lot, many thanks again).
To make my BIGINT column be appropriate for predefined Grafana functions we need to simply convert it to seconds from UNIX epoch so, in math language - just divide by 1000.
Now, WHERE statement seems not so simple and predictable, Grafana DSL works different where and simple division did not make trick and I solved it by using another Grafana functions to get FROM and TO points of time (period of time for which Graph should be rendered) but these functions generate timestamp type while we do have BIGINT in our column. So, thanks to Postgres we have a bunch of converter means to make it timestamp ('1970-01-01 00:00:00 GMT'::timestamp + ((visit_date/1000)::text)::interval - generates you one BIGINT value converted to Postgres TIMESTAMP with which Grafana deals just fine).
P.S. If you don't mind I've changed my question text to be more precise and detailed.

PostgreSQL+table partitioning: inefficient max() and min()

I have a huge partitioned table stored at a PostgreSQL table. Each child table has an index and a check constraint on its id, e.g. (irrelevant deatils removed for clarity):
Master table: points
Column | Type | Modifiers
---------------+-----------------------------+------------------------
id | bigint |
creation_time | timestamp without time zone |
the_geom | geometry |
Sub-table points_01
Column | Type | Modifiers
---------------+-----------------------------+-------------------------
id | bigint |
creation_time | timestamp without time zone |
the_geom | geometry |
Indexes:
"points_01_pkey" PRIMARY KEY, btree (id)
"points_01_creation_time_idx" btree (creation_time)
"points_01_the_geom_idx" gist (the_geom) CLUSTER
Check constraints:
"enforce_srid_the_geom" CHECK (srid(the_geom) = 4326)
"id_gps_points_2010_08_22__14_47_04_check"
CHECK (id >= 1000000::bigint AND id <= 2000000::bigint)
Now,
SELECT max(id) FROM points_01
is instant, but:
SELECT max(id) FROM points
which is a master table for points_01 .. points_60 and should take very little time using check constraints, takes more than an hour because the query planner does not utilize the check constraints.
According to the PostgreSQL wiki (last section of this page), this is a known issue that would be fixed in the next versions.
Is there a good hack that will make the query planner utilize the check constraints and indices of sub-tables for max() and min() queries?
Thanks,
Adam
I don't know if it will work, but you could try this:
For that session, you could disable all access strategies but indexed ones:
db=> set enable_seqscan = off;
db=> set enable_tidscan = off;
db=> -- your query goes here
This way, only bitmapscan and indexscan would be enabled. PostgreSQL will have no choice but to use indexes to access data on the table.
After running your query, remember to reenable seqscan and tidscan by doing:
db=> set enable_seqscan = on;
db=> set enable_tidscan = on;
Otherwise, those access strategies will be disabled for the session from that point on.
Short answer: No. At this point in time, there's no way to make the Postgres planner understand that some aggregate functions can check the constraints on child partitions first. Its fairly easy to prove for specific case of min and max, but for aggregates in general, its a tough case.
You can always write it as a UNION of several partitions when it just has to be done...
I don't know much about postgres but you could could try this query (My query syntax may be not right due to lack of experience with postgres query's):
SELECT id FROM points a WHERE id > ALL (SELECT id FROM x WHERE x.id != a.id)
I'm curious if this works.

Oracle: TABLE ACCESS FULL with Primary key?

There is a table:
CREATE TABLE temp
(
IDR decimal(9) NOT NULL,
IDS decimal(9) NOT NULL,
DT date NOT NULL,
VAL decimal(10) NOT NULL,
AFFID decimal(9),
CONSTRAINT PKtemp PRIMARY KEY (IDR,IDS,DT)
)
;
Let's see the plan for select star query:
SQL>explain plan for select * from temp;
Explained.
SQL> select plan_table_output from table(dbms_xplan.display('plan_table',null,'serial'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 61 | 2 (0)|
| 1 | TABLE ACCESS FULL| TEMP | 1 | 61 | 2 (0)|
---------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
11 rows selected.
SQL server 2008 shows in the same situation Clustered index scan. What is the reason?
select * with no where clause -- means read every row in the table, fetch every column.
What do you gain by using an index? You have to go to the index, get a rowid, translate the rowid into a table offset, read the file.
What happens when you do a full table scan? You go the th first rowid in the table, then read on through the table to the end.
Which one of these is faster given the table you have above? Full table scan. Why? because it skips having to to go the index, retreive values, then going back to the other to where the table lives and fetching.
To answer this more simply without mumbo-jumbo, the reason is:
Clustered Index = Table
That's by definition in SQL Server. If this is not clear, look up the definition.
To be absolutely clear once again, since most people seem to miss this, the Clustered Index IS the table itself. It therefore follows that "Clustered Index Scan" is another way of saying "Table Scan". Or what Oracle calls "TABLE ACCESS FULL"