Why does Id increases by two instead of one using insert - postgresql

I have been trying to understand after lots of hours and still cannot understand why it is happening.
I have created two tables with ALTER:
CREATE TABLE stores (
id SERIAL PRIMARY KEY,
store_name TEXT
-- add more fields if needed
);
CREATE TABLE products (
id SERIAL,
store_id INTEGER NOT NULL,
title TEXT,
image TEXT,
url TEXT UNIQUE,
added_date timestamp without time zone NOT NULL DEFAULT NOW(),
PRIMARY KEY(id, store_id)
);
ALTER TABLE products
ADD CONSTRAINT "FK_products_stores" FOREIGN KEY ("store_id")
REFERENCES stores (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE RESTRICT;
and everytime I am inserting a value to products by doing
INSERT
INTO
public.products(store_id, title, image, url)
VALUES((SELECT id FROM stores WHERE store_name = 'footish'),
'Teva Flatform Universal Pride',
'https://www.footish.se/sneakers/teva-flatform-universal-pride-t1116376',
'https://www.footish.se/pub_images/large/teva-flatform-universal-pride-t1116376-p77148.jpg?timestamp=1623417840')
I can see that the column of id increases by two everytime I insert instead of one and I would like to know what is the reason behind that?
I have not been able to figure out why and it would be nice to know! :)

There could be 3 reasons:
You've tried to create data but it failed. Even on failed creation and transaction rollback, a sequence does count. A used number will never be put back.
You're using a global sequence and created other data on other data meanwhile. Using a global sequence will always increase on any table data added, even on other tables be modified.
DB configuration for your sequence is set to stepsize/allocationsize=2. It can be configured however you want.
Overall it is not important. The most important thing is that it increases automatically and that even on a error/delete a already tried ID will never be put back.
If you want to have concrete information you need to procive the information about the sequence. You can check that using a SQL CLI or show it via DBeaver/....

Related

Would this PostgresQL model work for long-term use and security?

I'm making a real-time chat app and was stuck figuring out how the DB model should look like. I've made this diagram, but would this work? My issue is more to do with foreign keys.
I know this is a very vague question. But have been struggling with this model for a while now. This is the first database I'm setting up so it's probably got a load of errors.
Actually you are fairly close, but over complicated it a bit. At the conceptual/logical model you have just 2 entities. Users and Messages
with a many-to-many relationship. At the physical level the Channels table resolves the M:M into the 2 one_to_many you have described. But the
viewing this way ravels a couple issues. The attribute user is not required in the Messages table and if physically implemented requires a not easily done validation
that the user there exists in the Channels table. Further everything that Message:User relationship provides is a available
via Users:Channels:Messages relationship. A similar argument applies to Channels column in Users - completely resolved by the resolution table. Suggestion: drop user from message table and channels from users.
Now lets look at the columns of Channels. It looks like you using a boiler plate for created_at and updated_at, but are they necessary?
Well at least for updated_at No. What can be updated? If either User or Message is updated you have a brand new entry. Yes it may seem like the same physical row (actually it is not)
but the meaning is completely different. Well how about last massage? What is it trying to indicate that the max value created at for the user does not give you?
I cannot see anything. I guess you could change the created at but what is the point of tracking when I changed that column. Suggestion: drop last message sent and updated at (unless required by Institution standards) from message table.
That leaves the Users table itself. Besides Channels mentioned above there is the Contacts column. Physically as a array it violates 1NF and becomes difficult to manage - (as wall as validating that the contact is in fact a user)
Logically it is creating a M:M on USER:USER. So resolve it the same way as User:Messages, pull it out into another table, say User_Contacts with 2 attributes to the Users table. Suggestion drop contacts for the users table and create a resolution table.
Unfortunately, I do not have a good ERD diagrammer, so I just provide DDL.
create table users (
user_id integer generated always as identity primary key
, name text
, phone_number text
, last_login timestamptz
, created_at timestamptz
, updated_at timestamptz
) ;
create type message_type as enum ('short', 'long'); -- list all values
create table messages(
msg_id integer generated always as identity primary key
, msg_type message_type
, message text
, created_at timestamptz
, updated_at timestamptz
);
create table channels( -- resolves M:M Users:Messages
user_id integer
, msg_id integer
, created_at timestamptz
, constraint channels_pk
primary key (user_id, msg_id)
, constraint channels_2_users_fk
foreign key (user_id)
references users(user_id)
, constraint channels_2_messages_fk
foreign key (msg_id)
references messages(msg_id )
);
create table user_contacts( -- resolves M:M Users:Users
user_id integer
, contact_id integer
, created_at timestamptz
, constraint user_contacts_pk
primary key (user_id, contact_id)
, constraint user_2_users_fk
foreign key (user_id)
references users(user_id)
, constraint contact_2_user_fk
foreign key (user_id)
references users(user_id)
, constraint contact_not_me_check check (user_id <> contact_id)
);
Notes:
Do not use text as PK, use either integer (bigint) or UUID, and generate them during insert.
Caution on ENUM. In Postgres you can add new values, but you cannot remove a value. Depending upon number of values and how often the change consider creating a lookup/reference table for them.
Do not use the data type TIME. It is really not that useful without the date. Simple example I login today at 15:00, you login tomorrow at 13:00. Now, from the database itself, which of us logged in first.

Storing duplicate data as a column in Postgres?

In some database project, I have a users table which somehow has a computed value avg_service_rating. And there is another table called services with all the services associated to the user and the ratings for that service. Is there a computationally-lite way which I can maintain the avg_service_rating rating without updating it every time an INSERT is done on the services table? Perhaps like a generate column but with a function call instead? Any direct advice or link to resources will be greatly appreciated as well!
CREATE TABLE users (
username VARCHAR PRIMARY KEY,
avg_service_ratings NUMERIC -- is it possible to store some function call for this column?,
...
);
CREATE TABLE service (
username VARCHAR NOT NULL REFERENCE users (username);
service_date DATE NOT NULL,
rating INTEGER,
PRIMARY KEY (username, service_date),
);
If the values should be consistent, a generated column won't fit the bill, since it is only recomputed if the row itself is modified.
I see two solutions:
have a trigger on the services table that updates the users table whenever a rating is added or modified. That slows down data modifications, but not your queries.
Turn users into a view. The original users table would be renamed, and it loses the avg_service_rating column, which is computed on the fly by the view.
To make the illusion perfect, create an INSTEAD OF INSERT OR UPDATE OR DELETE trigger on the view that modifies the underlying table. Then your application does not need to be changed.
With this solution you pay a certain price both on SELECT and on data modifications, but the latter price will be lower, since you don't have to modify two tables (and users might receive fewer modifications than services). An added advantage is that you avoid data duplication.
A generated column would only be useful if the source data is in the same table row.
Otherwise your options are a view (where you could call a function or calculate the value via a subquery), or an AFTER UPDATE OR INSERT trigger on the service table, which updates users.avg_service_ratings. With a trigger, if you get a lot of updates on the service table you'd need to consider possible concurrency issues, but it would mean the figure doesn't need to be calculated every time a row in the users table is accessed.

How can a relational database with foreign key constraints ingest data that may be in the wrong order?

The database is ingesting data from a stream, and all the rows needed to satisfy a foreign key constraint may be late or never arrive.
This can likely be accomplished by using another datastore, one without foreign key constraints, and then when all the needed data is available, read into the database which has fk constraints. However, this adds complexity and I'd like to avoid it.
We're working on a solution that creates "placeholder" rows to point the foreign key to. When the real data comes in, the placeholder is replaced with real values. Again, this adds complexity, but it's the best solution we've found so far.
How do people typically solve this problem?
Edit: Some sample data which might help explain the problem:
Let's say we have these tables:
CREATE TABLE order (
id INTEGER NOT NULL,
order_number,
PRIMARY KEY (id),
UNIQUE (order_number)
);
CREATE TABLE line_item (
id INTEGER NOT NULL,
order_number INTEGER REFERENCES order(order_number),
PRIMARY KEY (id)
);
If I insert an order first, not a problem! But let's say I try:
INSERT INTO line_item (order_number) values (123) before order 123 was inserted. This will fail the fk constraint of course. But this might be the order I get the data, since it's reading from a stream that is collecting this data from multiple sources.
Also, to address #philpxy's question, I didn't really find much on this. One thing that was mentioned was deferred constraints. This is a mechanism that waits to do the fk constraints at the end of a transaction. I don't think it's possible to do that in my case however, since these insert statements will be run at random times whenever the data is received.
You have a business workflow problem, because line items of individual orders are coming in before the orders themselves have come in. One workaround, perhaps not ideal, would be to create a before insert trigger which checks, for every incoming insert to the line_item table, whether that order already exists in the order table. If not, then it will first insert the order record before trying the insert on line_item.
CREATE OR REPLACE FUNCTION "public"."fn_insert_order" () RETURNS trigger AS $$
BEGIN
INSERT INTO "order" (order_number)
SELECT NEW.order_number
WHERE NOT EXISTS (SELECT 1 FROM "order" WHERE order_number = NEW.order_number);
RETURN NEW;
END
$$
LANGUAGE 'plpgsql'
# trigger
CREATE TRIGGER "trigger_insert_order"
BEFORE INSERT ON line_item FOR EACH ROW
EXECUTE PROCEDURE fn_insert_order()
Note: I am assuming that the id column of the order table in fact is auto increment, in which case Postgres would automatically assign a value to it when inserting as above. Most likely, this is what you want, as having two id columns which both need to be manually assigned does not make much sense.
You could accomplish that with a BEFORE INSERT trigger on line_item.
In that trigger you query order if a matching item exists, and if not, you insert a dummy row.
That will allow the INSERT to succeed, at the cost of some performance.
To insert rows into order, use
INSERT INTO order ...
ON CONFLICT ON (order_number) DO UPDATE SET
id = EXCLUDED.id;
Updating a primary key is problematic and may lead to conflicts. One way you could get around that is if you use negative ids for artificially generated orders (assuming that the real ids are positive). If you have any references to that primary key, you'd have to define the constraint with ON UPDATE CASCADE.

I need the name of the enterprise to be the same as it was when it was registered and not the value it currently has

I will explain the problem with an example:
I am designing a specific case of referential integrity in a table. In the model there are two tables, enterprise and document. We register the companies and then someone insert the documents associated with it. The name of the enterprise is variable. When it comes to recovering the documents, I need the name of the enterprise to be the same as it was when it was registered and not the value it currently has. The solution that I thought was to register the company again in each change with the same code, the updated name in this way would have the expected result, but I am not sure if it is the best solution. Can someone make a suggestion?
There are several possible solutions and it is hard to determine which one will exactly be the easiest.
Side comment: your question is limited to managing names efficiently but I would like to comment the fact that your DB is sensitive to files being moved, renamed or deleted. Your database will not be able to keep records up-to-date if anything happen at OS level. You should consider to do something about it too.
Amongst the few solution I considered, the one that is best normalized is the schema below:
CREATE TABLE Enterprise
(
IdEnterprise SERIAL PRIMARY KEY
, Code VARCHAR(4) UNIQUE
, IdName INTEGER DEFAULT -1 /* This will be used to get a single active name */
);
CREATE TABLE EnterpriseName (
IDName SERIAL PRIMARY KEY
, IdEnterprise INTEGER NOT NULL REFERENCES Enterprise(IdEnterprise) ON UPDATE NO ACTION ON DELETE CASCADE
, Name TEXT NOT NULL
);
ALTER TABLE Enterprise ADD FOREIGN KEY (IdName) REFERENCES EnterpriseName(IdName) ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED;
CREATE TABLE Document
(
IdDocument SERIAL PRIMARY KEY
, IdName INTEGER NOT NULL REFERENCES EnterpriseName(IDName) ON UPDATE NO ACTION ON DELETE NO ACTION
, FilePath TEXT NOT NULL
, Description TEXT
);
Using flag and/or timestamps or moving the enterprise name to the document table are appealing solutions, but only at first glance.
Especially, the part where you have to ensure a company always has 1, and 1 only "active" name is no easy thing to do.
Add a date range to your enterprise: valid_from, valid_to. Initialise to -infinity,+infinity. When you change the name of an enterprise, instead: update existing rows where valid_to = +infinity to be now() and insert the new name with valid_from = now(), valid_to = +infinity.
Add a date field to the document, something like create_date. Then when joining to enterprise you join on ID and d.create_date between e.valid_from and e.valid_to.
This is a simplistic approach and breaks things like uniqueness for your id and code. To handle that you could record the name in a separate table with the id,from,to,name. Leaving your original table with just the id and code for uniqueness.

PostgreSQL ON INSERT CASCADE

I've got two tables - one is Product and one is ProductSearchResult.
Whenever someone tries to Insert a SearchResult with a product that is not listed in the Product table the foreign key constrain is violattet, hence i get an error.
I would like to know how i could get my database to automatically create that missing Product in the Product Table (Just the ProductID, all other attributes can be left blank)
Is there such thing as CASCADE ON INSERT? If there is, i was not able not get it working.
Rules are getting executed after the Insert, so because we get an Error beforehand there are useless if you USE an "DO ALSO". If you use "DO INSTEAD" and add the INSERT Command at the End you end up with endless recursion.
I reckon a Trigger is the way to go - but all my attempts to write one failed.
Any recommendations?
The Table Structure:
CREATE TABLE Product (
ID char(10) PRIMARY KEY,
Title varchar(150),
Manufacturer varchar(80),
Category smallint,
FOREIGN KEY(Category) REFERENCES Category(ID) ON DELETE CASCADE);
CREATE TABLE ProductSearchResult (
SearchTermID smallint NOT NULL,
ProductID char(10) NOT NULL,
DateFirstListed date NOT NULL DEFAULT current_date,
DateLastListed date NOT NULL DEFAULT current_date,
PRIMARY KEY (SearchTermID,ProductID),
FOREIGN KEY (SearchTermID) REFERENCES SearchTerm(ID) ON DELETE CASCADE,
FOREIGN KEY (ProductID) REFERENCES Product ON DELETE CASCADE);
Yes, triggers are the way to go. But before you can start to use triggers in plpgsql, you
have to enable the language. As user postgres, run the command createlang with the proper parameters.
Once you've done that, you have to
Write function in plpgsql
create a trigger to invoke that function
See example 39-3 for a basic example.
Note that a function body in Postgres is a string, with a special quoting mechanism: 2 dollar signs with an optional word in between them, as the quotes. (The word allows you to quote other similar quotes.)
Also note that you can reuse a trigger procedure for multiple tables, as long as they have the columns your procedure uses.
So the function has to
check if the value of NEW.ProductID exists in the ProductSearchResult table, with a select statement (you ought to be able to use SELECT count(*) ... INTO someint, or SELECT EXISTS(...) INTO somebool)
if not, insert a new row in that table
If you still get stuck, come back here.
In any case (rules OR triggers) the insert needs to create a new key (and new values for the attributes) in the products table. In most cases, this implies that a (serial,sequence) surrogate primary key should be used in the products table, and that the "real world" product_id ("product number") should default to NULL, and be degraded to a candidate key.
BTW: a rule can be used, rules just are tricky to implement correctly for N:1 relations (they need the same kind of EXISTS-logic as in Bart's answer above).
Maybe cascading on INSERT is not such a good idea after all. What do you want to happen if someone inserts a ProductSearchResult record for a not-existing product? [IMO a FK is always a domain; you cannot just extend a domain just by referring to a not-existant value for it; that would make the FK constraint meaningless]