Is this a postgresql bug? Only one row can not query by equal but can query by like

Is this a postgresql bug? Only one row can not query by equal but can query by like - postgresql

i have a table,only one row in this table can not query by equal query,but can query by like (not incloud %).
postgresql server version:90513
# select id,external_id,username,external_id from users where username = 'oFIC94vdidrrKHpi5lc1_2Ibv-OA';
id | external_id | username | external_id
----+-------------+----------+-------------
(0 rows)
# select id,external_id,username,external_id from users where username like 'oFIC94vdidrrKHpi5lc1_2Ibv-OA';
id | external_id | username | external_id
--------------------------------------+------------------------------+------------------------------+------------------------------
61ebea19-74f5-4713-9a30-63eb5af8ac8f | oFIC94vdidrrKHpi5lc1_2Ibv-OA | oFIC94vdidrrKHpi5lc1_2Ibv-OA | oFIC94vdidrrKHpi5lc1_2Ibv-OA
(1 row)
if i dump this table and restore it,it will be fixed. by why.
it is a postgresql bug? how can i workaround it. I've met twice.

Do you have an index on this table? If yes, this appears like corrupted index - PostgreSQL uses index in first case, and if the index is corrupt it might return no result.
This is usually bug, either software one or hardware (data loss on power loss, or any memory issues). Try dropping and recreating index, or rebuilding it with https://www.postgresql.org/docs/9.3/sql-reindex.html

Related

Sometime Postgresql Query takes to much time to load Data

I am using PostgreSQL 13.4 and has intermediate level experience with PostgreSQL.
I have a table which stores reading from a device for each customer. For each customer we receive around ~3,000 data in a day and we store them in to our usage_data table as below
We have the proper indexing in place.
Column | Data Type | Index name | Idx Access Type
-------------+-----------------------------+---------------------------+---------------------------
id | bigint | |
customer_id | bigint | idx_customer_id | btree
date | timestamp | idx_date | btree
usage | bigint | idx_usage | btree
amount | bigint | idx_amount | btree
Also I have common index on 2 columns which is below.
CREATE INDEX idx_cust_date
ON public.usage_data USING btree
(customer_id ASC NULLS LAST, date ASC NULLS LAST)
;
Problem
Few weird incidents I have observed.
When I tried to get data for 06/02/2022 for a customer it took almost 20 seconds. It's simple query as below
SELECT * FROM usage_data WHERE customer_id =1 AND date = '2022-02-06'
The execution plan
When I execute same query for 15 days then I receive result in 32 milliseconds.
SELECT * FROM usage_data WHERE customer_id =1 AND date > '2022-05-15' AND date <= '2022-05-30'
The execution plan
Tried Solution
I thought it might be the issue of indexing as for this particular date I am facing this issue.
Hence I dropped all the indexing from the table and recreate it.
But the problem didn't resolve.
Solution Worked (Not Recommended)
To solve this I tried another way. Created a new database and restored the old database
I executed the same query in new database and this time it takes 10 milliseconds.
The execution plan
I don't think this is proper solution for production server
Any idea why for any specific data we are facing this issue?
Please let me know if any additional information is required.
Please guide. Thanks

PostgreSQL 13 - Performance Improvement to delete large table data

I am using PostgreSQL 13 and has intermediate level experience with PostgreSQL.
I have a table named tbl_employee. it stores employee details for number of customers.
Below is my table structure, followed by datatype and index access method
Column | Data Type | Index name | Idx Access Type
-------------+-----------------------------+---------------------------+---------------------------
id | bigint | |
name | character varying | |
customer_id | bigint | idx_customer_id | btree
is_active | boolean | idx_is_active | btree
is_delete | boolean | idx_is_delete | btree
I want to delete employees for specific customer by customer_id.
In table I have total 18,00,000+ records.
When I execute below query for customer_id 1001 it returns 85,000.
SELECT COUNT(*) FROM tbl_employee WHERE customer_id=1001;
When I perform delete operation using below query for this customer then it takes 2 hours, 45 minutes to delete the records.
DELETE FROM tbl_employee WHERE customer_id=1001
Problem
My concern is that this query should take less than 1 min to delete the records. Is this normal to take such long time or is there any way we can optimise and reduce the execution time?
Below is Explain output of delete query
The values of seq_page_cost = 1 and random_page_cost = 4.
Below are no.of pages occupied by the table "tbl_employee" from pg_class.
Please guide. Thanks

During :
DELETE FROM tbl_employee WHERE customer_id=1001
Is there any other operation accessing this table? If only this SQL accessing this table, I don't think it will take so much time.

In RDBMS systems each SQL statement is also a transaction, unless it's wrapped in BEGIN; and COMMIT; to make multi-statement transactions.
It's possible your multirow DELETE statement is generating a very large transaction that's forcing PostgreSQL to thrash -- to spill its transaction logs from RAM to disk.
You can try repeating this statement until you've deleted all the rows you need to delete:
DELETE FROM tbl_employee WHERE customer_id=1001 LIMIT 1000;
Doing it this way will keep your transactions smaller, and may avoid the thrashing.

SQL: DELETE FROM tbl_employee WHERE customer_id=1001 LIMIT 1000;
will not work then.
To make the batch delete smaller, you can try this:
DELETE FROM tbl_employee WHERE ctid IN (SELECT ctid FROM tbl_employee where customer_id=1001 limit 1000)
Until there is nothing to delete.
Here the "ctid" is an internal column of Postgresql Tables. It can locate the rows.

optimize a postgres query that updates a big table [duplicate]

I have two huge tables:
Table "public.tx_input1_new" (100,000,000 rows)
Column | Type | Modifiers
----------------|-----------------------------|----------
blk_hash | character varying(500) |
blk_time | timestamp without time zone |
tx_hash | character varying(500) |
input_tx_hash | character varying(100) |
input_tx_index | smallint |
input_addr | character varying(500) |
input_val | numeric |
Indexes:
"tx_input1_new_h" btree (input_tx_hash, input_tx_index)
Table "public.tx_output1_new" (100,000,000 rows)
Column | Type | Modifiers
--------------+------------------------+-----------
tx_hash | character varying(100) |
output_addr | character varying(500) |
output_index | smallint |
input_val | numeric |
Indexes:
"tx_output1_new_h" btree (tx_hash, output_index)
I want to update table1 by the other table:
UPDATE tx_input1 as i
SET
input_addr = o.output_addr,
input_val = o.output_val
FROM tx_output1 as o
WHERE
i.input_tx_hash = o.tx_hash
AND i.input_tx_index = o.output_index;
Before I execute this SQL command, I already created the index for this two table:
CREATE INDEX tx_input1_new_h ON tx_input1_new (input_tx_hash, input_tx_index);
CREATE INDEX tx_output1_new_h ON tx_output1_new (tx_hash, output_index);
I use EXPLAIN command to see the query plan, but it didn't use the index I created.
It took about 14-15 hours to complete this UPDATE.
What is the problem within it?
How can I shorten the execution time, or tune my database/table?
Thank you.

Since you are joining two large tables and there are no conditions that could filter out rows, the only efficient join strategy will be a hash join, and no index can help with that.
First there will be a sequential scan of one of the tables, from which a hash structure is built, then there will be a sequential scan over the other table, and the hash will be probed for each row found. How could any index help with that?
You can expect such an operation to take a long time, but there are some ways in which you could speed up the operation:
Remove all indexes and constraints on tx_input1 before you begin. Your query is one of the examples where an index does not help at all, but actually hurts performance, because the indexes have to be updated along with the table. Recreate the indexes and constraints after you are done with the UPDATE. Depending on the number of indexes on the table, you can expect a decent to massive performance gain.
Increase the work_mem parameter for this one operation with the SET command as high as you can. The more memory the hash operation can use, the faster it will be. With a table that big you'll probably still end up having temporary files, but you can still expect a decent performance gain.
Increase checkpoint_segments (or max_wal_size from version 9.6 on) to a high value so that there are fewer checkpoints during the UPDATE operation.
Make sure that the table statistics on both tables are accurate, so that PostgreSQL can come up with a good estimate for the number of hash buckets to create.
After the UPDATE, if it affects a big number of rows, you might consider to run VACUUM (FULL) on tx_input1 to get rid of the resulting table bloat. This will lock the table for a longer time, so do it during a maintenance window. It will reduce the size of the table and as a consequence speed up sequential scans.

DB2 table partitioning and delete old records based on condition

I have a table with few million records.
___________________________________________________________
| col1 | col2 | col3 | some_indicator | last_updated_date |
-----------------------------------------------------------
| | | | yes | 2009-06-09.12.2345|
-----------------------------------------------------------
| | | | yes | 2009-07-09.11.6145|
-----------------------------------------------------------
| | | | no | 2009-06-09.12.2345|
-----------------------------------------------------------
I have to delete records which are older than month with some_indicator=no.
Again I have to delete records older than year with some_indicator=yes.This job will run everyday.
Can I use db2 partitioning feature for above requirement?.
How can I partition table using last_updated_date column and above two some_indicator values?
one partition should contain records falling under monthly delete criterion whereas other should contain yearly delete criterion records.
Are there any performance issues associated with table partitioning if this table is being frequently read,upserted?
Any other best practices for above requirement will surely help.

I haven't done much with partitioning (I've mostly worked with DB2 on the iSeries), but from what I understand, you don't generally want to be shuffling things between partitions (ie - making the partition '1 month ago'). I'm not even sure if it's even possible. If it was, you'd have to scan some (potentially large) portion of your table every day, just to move it (select, insert, delete, in a transaction).
Besides which, partitioning is a DB Admin problem, and it sounds like you just have a DB User problem - namely, deleting 'old' records. I'd just do this in a couple of statements:
DELETE FROM myTable
WHERE some_indicator = 'no'
AND last_updated_date < TIMESTAMP(CURRENT_DATE - 1 MONTH, TIME('00:00:00'))
and
DELETE FROM myTable
WHERE some_indicator = 'yes'
AND last_updated_date < TIMESTAMP(CURRENT_DATE - 1 YEAR, TIME('00:00:00'))
.... and you can pretty much ignore using a transaction, as you want the rows gone.
(as a side note, using 'yes' and 'no' for indicators is terrible. If you're not on a version that has a logical (boolean) type, store character '0' (false) and '1' (true))

Oracle: TABLE ACCESS FULL with Primary key?

There is a table:
CREATE TABLE temp
(
IDR decimal(9) NOT NULL,
IDS decimal(9) NOT NULL,
DT date NOT NULL,
VAL decimal(10) NOT NULL,
AFFID decimal(9),
CONSTRAINT PKtemp PRIMARY KEY (IDR,IDS,DT)
)
;
Let's see the plan for select star query:
SQL>explain plan for select * from temp;
Explained.
SQL> select plan_table_output from table(dbms_xplan.display('plan_table',null,'serial'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 61 | 2 (0)|
| 1 | TABLE ACCESS FULL| TEMP | 1 | 61 | 2 (0)|
---------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
11 rows selected.
SQL server 2008 shows in the same situation Clustered index scan. What is the reason?

select * with no where clause -- means read every row in the table, fetch every column.
What do you gain by using an index? You have to go to the index, get a rowid, translate the rowid into a table offset, read the file.
What happens when you do a full table scan? You go the th first rowid in the table, then read on through the table to the end.
Which one of these is faster given the table you have above? Full table scan. Why? because it skips having to to go the index, retreive values, then going back to the other to where the table lives and fetching.

To answer this more simply without mumbo-jumbo, the reason is:
Clustered Index = Table
That's by definition in SQL Server. If this is not clear, look up the definition.
To be absolutely clear once again, since most people seem to miss this, the Clustered Index IS the table itself. It therefore follows that "Clustered Index Scan" is another way of saying "Table Scan". Or what Oracle calls "TABLE ACCESS FULL"

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse