is primary key automatically indexed in postgresql? [closed] - postgresql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have created table name as d with ID column as primary key and then just inserted records as shown in output, but after fetching all records this output still displayed same as order in which records are inserted. but output as a see now not in ordered form.

PostgreSQL automatically creates an index for each unique constraint and primary key constraint to enforce uniqueness. Thus, it is not necessary to create an index explicitly for primary key columns. (See CREATE INDEX for more information.)
Source:
Docs

but after fetching all records this output still displayed same as order in which records are inserted
There is NO default "sort" order - even if there is an index on that column (which indeed is the case in Postgres: the primary key is supported by a unique index in the background)
Rows in a relational table are not sorted.
The only (really: the only) way to get a specific order is to use an ORDER BY
If you do not specify an ORDER BY the database is free to return the rows in any order it wants - and that order can change at any time.
The order can change because of various reasons:
other sessions are running the same statement
the table was updated
the execution plan changes
...

In addition to what the others have said, Postgres does not have a concept of a 'Clustered Index' like Microsoft SQL Server and other databases have. You can cluster an index, but it is a one-time operation (until you call it again) and will not maintain the clustering of rows upon edits, etc. See the docs
I was running into the same thing you were, where I half expected the rows to be returned in order of primary key (I didn't insert them out of order like you did, though). They did come back upon initial insert, but editing a record in Postgres seems to move the record to the end of the page, and the records quickly became out of order (I updated fields other than the primary key).

Related

Postgres - How to delete a row with its sequence? [duplicate]

This question already has an answer here:
Why in PostgreSQL when you delete a row in a table the id number of the future inserted row is not sequential?
(1 answer)
Closed 10 months ago.
I created a table called Product with a serial ID in Postgres:
CREATE TABLE PRODUCTS (
PRODUCT_ID serial PRIMARY KEY,
NAME VARCHAR ( 100 ) NOT NULL );
If I deleted a row with
DELETE FROM PRODUCTS WHERE PRODUCT_ID = 2
The next product to insert will have the id: 3 even if the Product_Id: 2 was already deleted, I'd like to delete the data with it's sequence so the next insert has the id: 2.
It's a generally bad idea. IDs are usually not the place where you need to fill the gaps.
However, if you are only looking to do this manually, then you can insert the new row with including the ID value by hand, then sequence will not be used.
You can reset the sequence with Postgres functions (eg. setval) , but you really shouldn't do it except in some database recovery cases maybe.
Just get used to the fact that IDs will not be sequential. That's the norm. If you have time going around chasing sequence numbering, you might be doing something productive instead.
If you do want sequential IDs for some non changing things like categories, just export and recreate table after you've inserted all the data, omitting ID column when importing. But remember that IDs are usually used by other tables, so you might have to update all your database every time you want to nicely renumber your IDs.

PostgreSQL 9.5 ON CONFLICT DO UPDATE command cannot affect row a second time

I have a table from which I want to UPSERT into another, when try to launch the query, I get the "cannot affect row a second time" error. So I tried to see if I have some duplicate on my first table regarding the field with the UNIQUE constraint, and I have none. I must be missing something, but since I cannot figure out what (and my query is a bit complex because it is including some JOIN), here is the query, the field with the UNIQUE constraint is "identifiant_immeuble" :
with upd(a,b,c,d,e,f,g,h,i,j,k) as(
select id_parcelle, batimentimmeuble,etatimmeuble,nb_loc_hab_ma,nb_loc_hab_ap,nb_loc_pro, dossier.id_dossier, adresse.id_adresse, zapms.geom, 1, batimentimmeuble2
from public.zapms
left join geo_pays_gex.dossier on dossier.designation_siea=zapms.id_dossier
left join geo_pays_gex.adresse on adresse.id_voie=(select id_voie from geo_pays_gex.voie where (voie.designation=zapms.nom_voie or voie.nom_quartier=zapms.nom_quartier) and voie.rivoli=lpad(zapms.rivoli,4,'0'))
and adresse.num_voie=zapms.num_voie
and adresse.insee=zapms.insee_commune::integer
)
insert into geo_pays_gex.bal2(identifiant_immeuble, batimentimmeuble, id_etat_addr, nb_loc_hab_ma, nb_loc_hab_ap, nb_loc_pro, id_dossier, id_adresse, geom, raccordement, batimentimmeuble2)
select a,b,c,d,e,f,g,h,i,j,k from upd
on conflict (identifiant_immeuble) do update
set batimentimmeuble=excluded.batimentimmeuble, id_etat_addr=excluded.id_etat_addr, nb_loc_hab_ma=excluded.nb_loc_hab_ma, nb_loc_hab_ap=excluded.nb_loc_hab_ap, nb_loc_pro=excluded.nb_loc_pro,
id_dossier=excluded.id_dossier, id_adresse=excluded.id_adresse,geom=excluded.geom, raccordement=1, batimentimmeuble2=excluded.batimentimmeuble2
;
As you can see, I use several intermediary tables in this query : one to store the street's names (voie), one related to this one storing the adresses (adresse, basically numbers related through a foreign key to the street's names table), and another storing some other datas related to the projects' names (dossier).
I don't know what other information I could give to help find an answer, I guess it is better I do not share the actual content of my tables since it may touch some privacy regulations or such.
Thanks for your attention.
EDIT : I found a workaround by deleting the entries present in the zapms table from the bal2 table, as such
delete from geo_pays_gex.bal2 where bal2.identifiant_immeuble in (select id_parcelle from zapms);
it is not entirely satisfying though, since I would have prefered to keep track of the data creator and the date of creation, as much as the fact that the data has been modified (I have some fields to store this information) and here I simply erase all this history... And I have another table with the primary key of the bal2 table as a foreign key. I am still in the DB creation so I can afford to truncate this table, but in production it wouldn't be possible since I would lose some datas.

Alter a SELECT Query to Limit

I'm working on a 1M+ row table. The software that inserts the data sometimes tries to select all rows. If it tries to do that; It crashes.
I'm not able to modify the software so I'm trying to implement a fix on the Postgresql side.
I want Postgresql to limit SELECT query results that are coming from a special user to 1.
I tried to implement a RULE but haven't been able to do it with success. Any suggestions are welcome.
Br,
You could rename the table and create a view with the name of the table (selecting from the renamed table).
Then you can include a LIMIT clause in the view definition.
There is a chance you need an index. Let me give you a few scenarios
There is a unique constraint on one of the fields but no corresponding index. This way when you insert a record PostgreSQL has to scan the table to see if there is an existing record with the same value in that field.
Your software mimics unique field constraint. Before inserting a new record it scans the table for a record with the same value in one of the fields to check if such a record already exists. Index on the right field would definitely help.
You software wants to compute the next "id" value. In this case it runs SELECT MAX(id) in order to find the next available value. "id" needs an index.
Try to find out if indexing one of the table fields helps. You can also try to trace and analyze queries submitted to the server and see if those queries can benefit from indexing the table. You can enable query logging this way How to log PostgreSQL queries?
Another guess is that your software buffers all records before processing them. Reading 1M records into memory may crash it. Limiting fetchSize (e.g. if your software uses JDBC you could add defaultRowFetchSize connection parameter to the connection string) may help though I realize you may not have means to change the way the existing software fetches data from DB.

would postgres really update page file when fields are all equals before and after update?

I am working with a little website crawler program. I use PostgresQL to store data and use such statement to update that,
INSERT INTO topic (......) VALUES (......)
ON CONFLICT DO UPDATE /* updagte all fields here */
The question is if all fields before upate and after update are really equals, would PostgresQL really update that?
Postgres (like nearly all other DBMS) will not check if the target values are different then the original ones. So the answer is: yes, it will update the row even if the values are different.
However, you can easily prevent the "empty" update in this case by including a where clause:
INSERT INTO topic (......)
VALUES (......)
ON CONFLICT (...)
DO UPDATE
set ... -- update all column
WHERE topic IS DISTINCT FROM excluded;
The where clause will prevent updating a row that is identical to the one that is being inserted. To make that work correctly your insert has to list all columns of the target tables. Otherwise the topic is distinct from excluded condition will always be true because the excluded row has fewer columns then the topic row and thus it id "distinct" from it.
Adding a check for modified values has been discussed multiple times on the mailing list and has always be discarded. The main reason being, that it doesn't make sense to have the overhead of checking for changes for every statement just to cope with a few badly written ones.

Postgresql check if exists vs unique constraint in table

Am inserting huge number of records containing Phone Number and Status (a million) in a postgresql database from hibernate. I am reading the records from a file, processing each, and inserting them one at a time. But before the insert I need to check whether this combination of phone number and status already exists in the table.
It seems to me that the fastest way would be to do a query and limit it by 1, or an Exists query, but another suggestion I got from a colleague is to add a unique constraint on the table on the phone number and status fields and in case the unique key rule is being violated, just catch the exception in hibernate.
Any thoughts on what's the fastest and most reliable method?
It depends whether there are only these columns, or also some others, for example date. If you don't care which record will stay in database (for example you need latest combination of number, status and date), then create unique constraint and recover from exception that is thrown when inserting duplicities.
You may also insert all with duplicities but with some primary key (id) and then delete all duplicities but one you like to have (group by...), then create unique constraint.
Last option depends on size of records - if it is only 1M of them, you may filter them in application layer and then save them.
All of these depend on how much duplicities are there, if just few, use option one, if each record may be there 10 times, maybe last option is the best (depending on RAM, but you will hold only currently best record for phone and status)