What suitable locking technique to prevent insertion of data? [PostgreSQL] - postgresql

I want to ensure that one worker has exactly one manager
CREATE TABLE IF NOT EXISTS user_relationships (
object_id SERIAL PRIMARY KEY NOT NULL UNIQUE,
manager_id INT REFERENCES users (object_id) NOT NULL,
worker_id INT REFERENCES users (object_id) NOT NULL,
CHECK (manager_id != worker_id),
UNIQUE (manager_id, worker_id)
);
I have series of SQL statements using Read Committed level of transaction isolation,
BEGIN
SELECT * FROM users WHERE id=manager_id AND acc_type="manager" FOR UPDATE;
SELECT * FROM users WHERE id=worker_id AND acc_type="worker" FOR UPDATE;
SELECT * FROM relationships WHERE id=worker;
INSERT INTO relationships (m, w) VALUES (manager_id, worker_id)
COMMIT
I have figured out the first two FOR UPDATE to prevent other
concurrent transactions from changing the users account type mid
transaction
I could not figure out what kind of "trick" to use for third query. Third query should return empty list to ensure that the worker has not yet been owned by any manager.
Third query FOR UPDATE does not work because I am expecting an empty row.
Due to third query, I run the risk of concurrent transaction adding duplicate worker to different managers.
What can I do to enforce one worker to one manager?

Probably you need to add UNIQUE to constraint to worker_id:
CREATE TABLE IF NOT EXISTS user_relationships (
object_id SERIAL PRIMARY KEY NOT NULL UNIQUE,
manager_id INT REFERENCES users (object_id) NOT NULL,
worker_id INT REFERENCES users (object_id) UNIQUE NOT NULL,
CHECK (manager_id != worker_id)
);
But is better to add field manager_id INT REFERENCES users (object_id) NOT NULL to table users and do not use user_relationships.

Related

Composite FK referencing atomic PK + non unique attribute

I am trying to create the following tables in Postgres 13.3:
CREATE TABLE IF NOT EXISTS accounts (
account_id Integer PRIMARY KEY NOT NULL
);
CREATE TABLE IF NOT EXISTS users (
user_id Integer PRIMARY KEY NOT NULL,
account_id Integer NOT NULL REFERENCES accounts(account_id) ON DELETE CASCADE
);
CREATE TABLE IF NOT EXISTS calendars (
calendar_id Integer PRIMARY KEY NOT NULL,
user_id Integer NOT NULL,
account_id Integer NOT NULL,
FOREIGN KEY (user_id, account_id) REFERENCES users(user_id, account_id) ON DELETE CASCADE
);
But I get the following error when creating the calendars table:
ERROR: there is no unique constraint matching given keys for referenced table "users"
Which does not make much sense to me since the foreign key contains the user_id which is the PK of the users table and therefore also has a uniqueness constraint. If I add an explicit uniqueness constraint on the combined user_id and account_id like so:
ALTER TABLE users ADD UNIQUE (user_id, account_id);
Then I am able to create the calendars table. This unique constraint seems unnecessary to me as user_id is already unique. Can someone please explain to me what I am missing here?
Postgres is so smart/dumb that it doesn't assume the designer to do stupid things.
The Postgres designers could have taken different strategies:
Detect the transitivity, and make the FK not only depend on users.id, but also on users.account_id -> accounts.id. This is doable but costly. It also involves adding multiple dependency-records in the catalogs for a single FK-constraint. When imposing the constraint(UPDATE or DELETE in any of the two referred tables), it could get very complex.
Detect the transitivity, and silently ignore the redundant column reference. This implies: lying to the programmer. It would also need to be represented in the catalogs.
cascading DDL operations would get more complex, too. (remember: DDL is already very hard w.r.t. concurrency/versioning)
From the execution/performance point of view: imposing the constraints currently involves "pseudo triggers" on the referred table's indexes. (except DEFERRED, which has to be handled specially)
So, IMHO the Postgres developers made the sane choice of refusing to do stupid complex things.

Is there any way in sql to resolve this scenario?

I have a table in database like this
CREATE TABLE assignments
(
id uuid,
owner_id uuid NOT NULL,
);
Now I want to check in records , If IDs I am getting from request already exist or Not. If exist I will update owner_id and If In request I Not getting a ID which already exist in table I have to delete that record.
(In short it's update mechanism In which I am getting multiple Id's to update in table and If there is already a record in database and In request aswell I will update in database , and If there is record in table but not in request I will delete that from database)
This can be done with a single Insert statement with the ON Conflict clause. Your first task will be creating a PK (or UNIQUE) constraint on the table. Presumably that would be id.
alter table assignments
add constraint primary key (id);
Then insert your data, the 'on constraint' clause will update the owner_id for any existing id.
insert assignments(id, owner_id)
values ( <id>,<owner>)
on conflict (id)
do update
set owner_id = excluded.owner_id;

PostgreSQL audit table design with Multiple "User types"

I'm trying to implement an Audit table design in PostgreSQL, where I have different types of user id's that can be audited.
Let's say I have a table named admins (which belong to an organization), and table superadmins (which don't).
CREATE TABLE example.organizations (
id SERIAL UNIQUE,
company_name varchar(50) NOT NULL UNIQUE,
phone varchar(20) NOT NULL check (phone ~ '^[0-9]+$')
);
and an example of a potential admin design
CREATE TABLE example.admins (
id serial primary_key,
admin_type varchar not null,
#... shared data
check constraint admin_type in ("super_admins", "regular_admins")
);
CREATE TABLE example.regular_admins (
id integer primary key,
admin_type varchar not null default "regular_admins"
organization_id integer references example.organizations(id),
#... other regular admin fields
foreign key (id, admin_type) references example.admins (id, admin_type),
check constraint admin_type = "regular_admins"
);
CREATE TABLE example.super_admins (
id integer primary key,
admin_type varchar not null default "super_admins"
#... other super admin fields
foreign key (id, admin_type) references example.admins (id, admin_type),
check constraint admin_type = "super_admins"
);
Now an audit table
CREATE TABLE audit.organizations (
audit_timestamp timestamp not null default now(),
operation text,
admin_id integer primary key,
before jsonb,
after jsonb,
);
This calls for inheritance or polymorphism at some level, but I'm curious about how to design it. I've heard that using PostgreSQL's inheritance functionality is not always a great way to go, although I'm finding it to fit this use case.
I'll need to be able to reference a single admin id in the trigger that updates the audit table, and it would be nice to be able to get the admin information when selecting from the audit table without using multiple queries.
Would it be better to use PostgreSQL inheritance or are there other ideas I haven't considered?
I wouldn't say that it calls for inheritance or polymorphism. Admins and superadmins are both types of user, whose only difference is that the former belong to an organization. You can represent this with a single table and a nullable foreign key. No need to overcomplicate matters. Especially if you're using a serial as your primary key type: bad things happen if you confuse admin #2 for superadmin #2.

PostgreSQL self referential table - how to store parent ID in script?

I've the following table:
DROP SEQUENCE IF EXISTS CATEGORY_SEQ CASCADE;
CREATE SEQUENCE CATEGORY_SEQ START 1;
DROP TABLE IF EXISTS CATEGORY CASCADE;
CREATE TABLE CATEGORY (
ID BIGINT NOT NULL DEFAULT nextval('CATEGORY_SEQ'),
NAME CHARACTER VARYING(255) NOT NULL,
PARENT_ID BIGINT
);
ALTER TABLE CATEGORY
ADD CONSTRAINT CATEGORY_PK PRIMARY KEY (ID);
ALTER TABLE CATEGORY
ADD CONSTRAINT CATEGORY_SELF_FK FOREIGN KEY (PARENT_ID) REFERENCES CATEGORY (ID);
Now I need to insert the data. So I start with parent:
INSERT INTO CATEGORY (NAME) VALUES ('PARENT_1');
And now I need the ID of the just inserted parent to add children to it:
INSERT INTO CATEGORY (NAME, PARENT_ID) VALUES ('CHILDREN_1_1', <what_goes_here>);
INSERT INTO CATEGORY (NAME, PARENT_ID) VALUES ('CHILDREN_1_2', <what_goes_here>);
How can I get and store the ID of the parent to later use it in the subsequent inserts?
You can use a data modifying CTE with the returning clause:
with parent_cat (parent_id) as (
INSERT INTO CATEGORY (NAME) VALUES ('PARENT_1')
returning id
)
INSERT INTO CATEGORY (NAME, PARENT_ID)
VALUES
('CHILDREN_1_1', (select parent_id from parent_cat) ),
('CHILDREN_1_2', (select parent_id from parent_cat) );
The answer is to use RETURNING along with WITH
WITH inserted AS (
INSERT INTO CATEGORY (NAME) VALUES ('PARENT_1')
RETURNING id
) INSERT INTO CATEGORY (NAME, PARENT_ID) VALUES
('CHILD_1_1', (SELECT inserted.id FROM inserted)),
('CHILD_2_1', (SELECT inserted.id FROM inserted));
( tl;dr : goto option 3: INSERT with RETURNING )
Recall that in postgresql there is no "id" concept for tables, just sequences (which are typically but not necessarily used as default values for surrogate primary keys, with the SERIAL pseudo-type).
If you are interested in getting the id of a newly inserted row, there are several ways:
Option 1: CURRVAL(<sequence name>);.
For example:
INSERT INTO persons (lastname,firstname) VALUES ('Smith', 'John');
SELECT currval('persons_id_seq');
The name of the sequence must be known, it's really arbitrary; in this example we assume that the table persons has an id column created with the SERIAL pseudo-type. To avoid relying on this and to feel more clean, you can use instead pg_get_serial_sequence:
INSERT INTO persons (lastname,firstname) VALUES ('Smith', 'John');
SELECT currval(pg_get_serial_sequence('persons','id'));
Caveat: currval() only works after an INSERT (which has executed nextval() ), in the same session.
Option 2: LASTVAL();
This is similar to the previous, only that you don't need to specify the sequence number: it looks for the most recent modified sequence (always inside your session, same caveat as above).
Both CURRVAL and LASTVAL are totally concurrent safe. The behaviour of sequence in PG is designed so that different session will not interfere, so there is no risk of race conditions (if another session inserts another row between my INSERT and my SELECT, I still get my correct value).
However they do have a subtle potential problem. If the database has some TRIGGER (or RULE) that, on insertion into persons table, makes some extra insertions in other tables... then LASTVAL will probably give us the wrong value. The problem can even happen with CURRVAL, if the extra insertions are done intto the same persons table (this is much less usual, but the risk still exists).
Option 3: INSERT with RETURNING
INSERT INTO persons (lastname,firstname) VALUES ('Smith', 'John') RETURNING id;
This is the most clean, efficient and safe way to get the id. It doesn't have any of the risks of the previous.
Drawbacks? Almost none: you might need to modify the way you call your INSERT statement (in the worst case, perhaps your API or DB layer does not expect an INSERT to return a value); it's not standard SQL (who cares); it's available since Postgresql 8.2 (Dec 2006...)
Conclusion: If you can, go for option 3. Elsewhere, prefer 1.
Note: all these methods are useless if you intend to get the last globally inserted id (not necessarily in your session). For this, you must resort to select max(id) from table (of course, this will not read uncommitted inserts from other transactions).

PostgreSQL - Select from one table based on another

I have a table of tweets (#OneToMany) and another table of analyzedtweets (#ManyToOne) with 'n' number of analyzedtweets (one per analyst) for each entry in the tweet table. Essentially, I can have any number of analysts (represented in a table), each one can analyze a tweet just once. To make it a bit more complex, the entries in the tweet table are grouped by process which is represented by yet another table.
My question is, how would I query the analyzedtweet table for the tweet_id in the last entry given a specific process_id and analyst_id and then use that to find the next tweet in the tweet table also given the same process_id and analyst_id? Basically, I want to give the analyst the next tweet that he/she has not yet analyzed within that specific process (run).
Here are my tables:
CREATE TABLE tweet (
id SERIAL PRIMARY KEY,
process_id INTEGER NOT NULL REFERENCES process(id) ON DELETE CASCADE ON UPDATE CASCADE,
...
);
CREATE TABLE analyzedtweet (
id SERIAL PRIMARY KEY,
tweet_id INTEGER NOT NULL REFERENCES tweet(id) ON DELETE CASCADE ON UPDATE CASCADE,
analyst_id INTEGER NOT NULL REFERENCES analyst(id) ON DELETE CASCADE ON UPDATE CASCADE,
process_id INTEGER NOT NULL REFERENCES process(id) ON DELETE CASCADE ON UPDATE CASCADE,
...
);
CREATE TABLE process (
id SERIAL PRIMARY KEY,
...
);
CREATE TABLE analyst (
id SERIAL PRIMARY KEY,
...
);
The only way I know how to do this is in 2 steps:
Given a specific process_id (processId) and analyst_id (analystId) run the following query to give me the last tweet_id analyzed by that analyst in that process.
SELECT tweet_id from analyzedtweet WHERE analyzedtweet.analyst_id = analystId AND analyzedtweet.process_id = processId ORDER BY analyzedtweet.tweet_id DESC LIMIT 1
Take the result of the above query (referred to ask latestTweetId) and run the following query:
SELECT * from tweet WHERE tweet.id > latestTweetId AND tweet.process_id = processId ORDER BY tweet.id DESC LIMIT 1
I'm sure there is a much better way to do this with JOIN, I just can't figure out how.
Finally, I am using Hibernate and would like to get the POJO back.
If you are fetching the latest tweet for a giving process_id and analyzedtweet_id use this query:
List<Tweet> t = session.createQuery("select t from tweet t
join t.process p,t.analyzedtweet a
where p.id=? and a.id=? order by t.id desc")
.setParameter(1, process_id)
.setParameter(2, analyzedtweet_id)
.setMaxResults(1).getResultList();