I found the inconsistency data on postgres - postgresql

I have a table of datas on postgres. This table, called it 'table1', have a unique constraint on field 'id'. this table also have 3 other fields, 'write_date', 'state', 'state_detail'.
all this time, i got no problem when accesing and joining these table with another table with field 'id' as the relational field. but, this time, i got a strange result when i querying this table1.
when i run this query (called it Query1):
SELECT id, write_date, state, state_detail
FROM table1
WHERE write_date = '2019-07-30 19:42:49.314' or write_date = '2019-07-30 14:29:06.945'
it gives me 2 rows of datas, with the same id, but different value for the other fields:
id || write_date || state || state_detail
168972 2019-07-30 14:29:06.945 1 80
168972 2019-07-30 19:42:49.314 2 120
BUT, when i run this query (called it Query2):
SELECT id, write_date, state, state_detail
FROM table1
WHERE id = 168972
it gives me just 1 row:
id || write_date || state || state_detail
168972 2019-07-30 19:42:49.314 2 120
How come it gives the different result. i mean, i checked 'table1', it has the unique constraint 'id' as primary key. But, how come this happened?
i have restart the postgres service, and i run those 2 queries again. And it still gives me the same result as above.

This looks like a case of index corruption, specifically on the unique index on the id column. Could you run the following query:
SELECT ctid, id, write_date, state, state_detail FROM table1
WHERE write_date = '2019-07-30 19:42:49.314' or write_date = '2019-07-30 14:29:06.945'
You will likely receive 2 rows back for the id, with two different ctids. The ctid represents the physical location on disk for each row. Presuming you get two rows back, you will need to pick a row and delete the other one in order to "fix" the data. In addition, you'll want to recreate the unique index on the table, in order to prevent this from happening again.
Oh, and don't be surprised if you find other rows in the table like this. Depending on the source of the corruption (bad disks, bad memory, recent crash, upgrade to glibc?), this could be the sign of future trouble to come. I'd probably double-check all my tables for any issues, recreate any unique indexes, and look at the OS level for any signs of disk corruption or I/O errors. And if you aren't on the latest version of Postgres, you should upgrade.

Related

The last updated data shows first in the postgres selet query?

I have simple query that takes some results from User model.
Query 1:
SELECT users.id, users.name, company_id, updated_at
FROM "users"
WHERE (TRIM(telephone) = '8973847' AND company_id = 90)
LIMIT 20 OFFSET 0;
Result:
Then I have done some update on the customer 341683 and again I run the same query that time the result shows different, means the last updated shows first. So postgres is taking the last updated by default or any other things happens here?
Without an order by clause, the database is free to return rows in any order, and will usually just return them in whichever way is fastest. It stands to reason the row you recently updated will be in some cache, and thus returned first.
If you need to rely on the order of the returned rows, you need to explicitly state it, e.g.:
SELECT users.id, users.name, company_id, updated_at
FROM "users"
WHERE (TRIM(telephone) = '8973847' AND company_id = 90)
ORDER BY id -- Here!
LIMIT 20 OFFSET 0

Prevent two threads from selecting same row ibm db2

I have a situation where I have multiple (potentially hundreds) threads repeating the same task (using a java scheduled executor, if you are curious). This task entails selecting rows of changes (from a table called change) that have not yet been processed (processed changes are kept track in a m:n join table called process_change_rel that keeps track of the process id, record id and status) processing them, then updating back the status.
My question is, how is the best way to prevent two threads from the same process from selecting the same row? Will the below solution (using for update to lock rows ) work? If not, please suggest a working solution
Create table change(
—id , autogenerated pk
—other fields
)
Create table change_process_rel(
—change id (pk of change table)
—process id (pk of process table)
—status)
Query I would use is listed below
Select * from
change c
where c.id not in(select changeid from change_process_rel with cs) for update
Please let me know if this would work
You have to "lock" a row which you are going to process somehow. Such a "locking" should be concurrent of course with minimum conflicts / errors.
One way is as follows:
Create table change
(
id int not null generated always as identity
, v varchar(10)
) in userspace1;
insert into change (v) values '1', '2', '3';
Create table change_process_rel
(
id int not null
, pid int not null
, status int not null
) in userspace1;
create unique index change_process_rel1 on change_process_rel(id);
Now you should be able to run the same statement from multiple concurrent sessions:
SELECT ID
FROM NEW TABLE
(
insert into change_process_rel (id, pid, status)
select c.id, mon_get_application_handle(), 1
from change c
where not exists (select 1 from change_process_rel r where r.id = c.id)
fetch first 1 row only
with ur
);
Every such a statement inserts 1 or 0 rows into the change_process_rel table, which is used here as a "lock" table. The corresponding ID from change is returned, and you may proceed with processing of the corresponding event in the same transaction.
If the transaction completes successfully, then the row inserted into the change_process_rel table is saved, so, the corresponding id from change may be considered as processed. If the transaction fails, the corresponding "lock" row from change_process_rel disappears, and this row may be processed later by this or another application.
The problem of this method is, that when both tables become large enough, such a sub-select may not work as quick as previously.
Another method is to use Evaluate uncommitted data through lock deferral.
It requires to place the status column into the change table.
Unfortunately, Db2 for LUW doesn't have SKIP LOCKED functionality, which might help with such a sort of algorithms.
If, let's say, status=0 is "not processed", and status<>0 is some processing / processed status, then after setting these DB2_EVALUNCOMMITTED and DB2_SKIP* registry variables and restart the instance, you may "catch" the next ID for processing with the following statement.
SELECT ID
FROM NEW TABLE
(
update
(
select id, status
from change
where status=0
fetch first 1 row only
)
set status=1
);
Once you get it, you may do further processing of this ID in the same transaction as previously.
It's good to create an index for performance:
create index change1 on change(status);
and may be set this table as volatile or collect distribution statistics on this column in addition to regular statistics on table and its indexes periodically.
Note that such a registry variables setting has global effect, and you should keep it in mind...

Insert into subselect slow

I try to fill a table "SAMPLE" that requires ids from three other tables.
The table "SAMPLE" that needs to be filled look holds the following:
id (integer, not null, pk)
code (text, not null)
subsystem_id (integer, fk)
system_id (integer, not null, fk)
manufacturer_id (integer, fk)
The current query looks like this:
insert into SAMPLE(system_id, manufacturer_id, code, subsystem_id)
values ((select id from system where initial = 'P'), (select id from manufacturer where name = 'nameXY'), 'P0001', (select id from subsystem where code = 'NAME PATTERN'));
It is ridiculously slow, inserting 8k rows in around a minute.
I'm not sure if this is a really bad query problem or if my postgres configuration is heavily messed up.
For clarification, more table information:
subsystem:
This table holds fixed values (9) with a basic pattern I can access easily.
system
This table holds fixed values (4) that can be identified using the "initial" attribute
manufacturer
This table holds the name of a manufacturer.
The "SAMPLE" table will be the only connection between those tables so I'm not sure if I can use joins.
I'm pretty sure 8k values should be a gigantic joke to insert for a database so I'm really confused.
My specs:
Win 7 x86_64
8GB RAM
intel i5 3470S (QUAD) 2,9 GHZ
Postgres is v9.3
I didn't see any peak during my query so I suspect something is up with my configuration. If you need information about it, let me know.
Note: It is possible that I have codes or names that can not be found in the subsystem or manufacturer tables. Instead of adding nothing, I want to add a NULL value to the cell then.
8000 inserts/mn is roughly 133 per second or 0.133 ms per statement.
This is to be expected if the INSERTs happen in a loop each statement in its own transaction.
Each transaction commits to disk and waits for the disk to confirm that the data is written in durable storage. This is known to be slow.
Add a transaction around the loop with BEGIN and END and it will run at normal speed.
Ideally you wouldn't even have a loop but a more complex query that does a single INSERT to create all the rows from their sources, if possible.
I could not test it because I have no PostgreSql installed and no database with a similar structure, but may it would be faster to get the insert data from a single statement
INSERT INTO Sample (system_id, manufacturer_id, code, subsystem_id)
SELECT s.id AS system_id,
m.id AS manufacturer_id,
'P0001' AS code,
ss.id AS subsystem_id
FROM system s
JOIN manufacturer m
ON m.name = 'nameXY'
JOIN subsystem ss
ON ss.code = 'NAME PATTERN'
WHERE s.initial = 'P'
I hope this works.

Postgresql get some last rows from table

I have PostgreSQL table:
Username1 SomeBytes1
Username2 SomeBytes1
Username1 SomeBytes1
Username1 SomeBytes1
I need to get some rows from with name Username1 but from the end of the table. For example i need last to rows with Username1
select from my_table where user = Username1 LIMIT 2
Gives me first 2 rows, but i need last two.
How can i select it?
Thank you.
first and last in a table is very arbitrary. In order to have a good predictable result you should always have an order by clause. And if you have that, then getting the last two rows will become easy.
For instance, if you have a primary key or something like an ID (which is populated by a sequence), then you can do:
select * from my_table where user = 'Username1' order by ID desc limit 2.
desc tells the database to sort the rows in reverse order, which means that last will be first.
Does your table have a primary key ? / Can your table be sorted?
Because the notion of 'first' and 'last' implies some sorting of the tuples. If this is the case, you could sort the data the other way around, so that your 'last' entries are on top. Then you can access them with the statement you tried.
To view tail of a table you may use ctid. It is a temporary physical identifier of a record in PostgreSQL.
SELECT * from my_table
WHERE user = Username1
ORDER BY ctid DESC
LIMIT 2

SQLite - a smart way to remove and add new objects

I have a table in my database and I want for each row in my table to have an unique id and to have the rows named sequently.
For example: I have 10 rows, each has an id - starting from 0, ending at 9. When I remove a row from a table, lets say - row number 5, there occurs a "hole". And afterwards I add more data, but the "hole" is still there.
It is important for me to know exact number of rows and to have at every row data in order to access my table arbitrarily.
There is a way in sqlite to do it? Or do I have to manually manage removing and adding of data?
Thank you in advance,
Ilya.
It may be worth considering whether you really want to do this. Primary keys usually should not change through the lifetime of the row, and you can always find the total number of rows by running:
SELECT COUNT(*) FROM table_name;
That said, the following trigger should "roll down" every ID number whenever a delete creates a hole:
CREATE TRIGGER sequentialize_ids AFTER DELETE ON table_name FOR EACH ROW
BEGIN
UPDATE table_name SET id=id-1 WHERE id > OLD.id;
END;
I tested this on a sample database and it appears to work as advertised. If you have the following table:
id name
1 First
2 Second
3 Third
4 Fourth
And delete where id=2, afterwards the table will be:
id name
1 First
2 Third
3 Fourth
This trigger can take a long time and has very poor scaling properties (it takes longer for each row you delete and each remaining row in the table). On my computer, deleting 15 rows at the beginning of a 1000 row table took 0.26 seconds, but this will certainly be longer on an iPhone.
I strongly suggest that you re-think your design. In my opinion your asking yourself for troubles in the future (e.g. if you create another table and want to have some relations between the tables).
If you want to know the number of rows just use:
SELECT count(*) FROM table_name;
If you want to access rows in the order of id, just define this field using PRIMARY KEY constraint:
CREATE TABLE test (
id INTEGER PRIMARY KEY,
...
);
and get rows using ORDER BY clause with ASC or DESC:
SELECT * FROM table_name ORDER BY id ASC;
Sqlite creates an index for the primary key field, so this query is fast.
I think that you would be interested in reading about LIMIT and OFFSET clauses.
The best source of information is the SQLite documentation.
If you don't want to take Stephen Jennings's very clever but performance-killing approach, just query a little differently. Instead of:
SELECT * FROM mytable WHERE id = ?
Do:
SELECT * FROM mytable ORDER BY id LIMIT 1 OFFSET ?
Note that OFFSET is zero-based, so you may need to subtract 1 from the variable you're indexing in with.
If you want to reclaim deleted row ids the VACUUM command or pragma may be what you seek,
http://www.sqlite.org/faq.html#q12
http://www.sqlite.org/lang_vacuum.html
http://www.sqlite.org/pragma.html#pragma_auto_vacuum