Is there any way in sql to resolve this scenario? - postgresql

I have a table in database like this
CREATE TABLE assignments
(
id uuid,
owner_id uuid NOT NULL,
);
Now I want to check in records , If IDs I am getting from request already exist or Not. If exist I will update owner_id and If In request I Not getting a ID which already exist in table I have to delete that record.
(In short it's update mechanism In which I am getting multiple Id's to update in table and If there is already a record in database and In request aswell I will update in database , and If there is record in table but not in request I will delete that from database)

This can be done with a single Insert statement with the ON Conflict clause. Your first task will be creating a PK (or UNIQUE) constraint on the table. Presumably that would be id.
alter table assignments
add constraint primary key (id);
Then insert your data, the 'on constraint' clause will update the owner_id for any existing id.
insert assignments(id, owner_id)
values ( <id>,<owner>)
on conflict (id)
do update
set owner_id = excluded.owner_id;

Related

Avoid scan on attach partition with check constraint

I am recreating an existing table as a partitioned table in PostgreSQL 11.
After some research, I am approaching it using the following procedure so this can be done online while writes are still happening on the table:
add a check constraint on the existing table, first as not valid and then validating
drop the existing primary key
rename the existing table
create the partitioned table under the prior table name
attach the existing table as a partition to the new partitioned table
My expectation was that the last step would be relatively fast, but I don't really have a number for this. In my testing, it's taking about 30s. I wonder if my expectations are incorrect or if I'm doing something wrong with the constraint or anything else.
Here's a simplified version of the DDL.
First, the inserted_at column is declared like this:
inserted_at timestamp without time zone not null
I want to have an index on the ID even after I drop the PK for existing queries and writes, so I create an index:
create unique index concurrently my_events_temp_id_index on my_events (id);
The check constraint is created in one transaction:
alter table my_events add constraint my_events_2022_07_events_check
check (inserted_at >= '2018-01-01' and inserted_at < '2022-08-01')
not valid;
In the next transaction, it's validated (and the validation is successful):
alter table my_events validate constraint my_events_2022_07_events_check;
Then before creating the partitioned table, I drop the primary key of the existing table:
alter table my_events drop constraint my_events_pkey cascade;
Finally, in its own transaction, the partitioned table is created:
alter table my_events rename to my_events_2022_07;
create table my_events (
id uuid not null,
... other columns,
inserted_at timestamp without time zone not null,
primary key (id, inserted_at)
) partition by range (inserted_at);
alter table my_events attach partition my_events_2022_07
for values from ('2018-01-01') to ('2022-08-01');
That last transaction blocks inserts and takes about 30s for the 12M rows in my test database.
Edit
I wanted to add that in response to the attach I see this:
INFO: partition constraint for table "my_events_2022_07" is implied by existing constraints
That makes me think I'm doing this right.
The problem is not the check constraint, it is the primary key.
If you make the original unique index include both columns:
create unique index concurrently my_events_temp_id_index on my_events (id,inserted_at);
And if you make the new table have a unique index rather than a primary key on those two columns, then the attach is nearly instantaneous.
These seem to me like unneeded restrictions in PostgreSQL, both that the unique index on one column can't be used to imply uniqueness on the both columns, and that the unique index on both columns cannot be used to imply the primary key (nor even a unique constraint--but only a unique index).

PostgreSQL self referential table - how to store parent ID in script?

I've the following table:
DROP SEQUENCE IF EXISTS CATEGORY_SEQ CASCADE;
CREATE SEQUENCE CATEGORY_SEQ START 1;
DROP TABLE IF EXISTS CATEGORY CASCADE;
CREATE TABLE CATEGORY (
ID BIGINT NOT NULL DEFAULT nextval('CATEGORY_SEQ'),
NAME CHARACTER VARYING(255) NOT NULL,
PARENT_ID BIGINT
);
ALTER TABLE CATEGORY
ADD CONSTRAINT CATEGORY_PK PRIMARY KEY (ID);
ALTER TABLE CATEGORY
ADD CONSTRAINT CATEGORY_SELF_FK FOREIGN KEY (PARENT_ID) REFERENCES CATEGORY (ID);
Now I need to insert the data. So I start with parent:
INSERT INTO CATEGORY (NAME) VALUES ('PARENT_1');
And now I need the ID of the just inserted parent to add children to it:
INSERT INTO CATEGORY (NAME, PARENT_ID) VALUES ('CHILDREN_1_1', <what_goes_here>);
INSERT INTO CATEGORY (NAME, PARENT_ID) VALUES ('CHILDREN_1_2', <what_goes_here>);
How can I get and store the ID of the parent to later use it in the subsequent inserts?
You can use a data modifying CTE with the returning clause:
with parent_cat (parent_id) as (
INSERT INTO CATEGORY (NAME) VALUES ('PARENT_1')
returning id
)
INSERT INTO CATEGORY (NAME, PARENT_ID)
VALUES
('CHILDREN_1_1', (select parent_id from parent_cat) ),
('CHILDREN_1_2', (select parent_id from parent_cat) );
The answer is to use RETURNING along with WITH
WITH inserted AS (
INSERT INTO CATEGORY (NAME) VALUES ('PARENT_1')
RETURNING id
) INSERT INTO CATEGORY (NAME, PARENT_ID) VALUES
('CHILD_1_1', (SELECT inserted.id FROM inserted)),
('CHILD_2_1', (SELECT inserted.id FROM inserted));
( tl;dr : goto option 3: INSERT with RETURNING )
Recall that in postgresql there is no "id" concept for tables, just sequences (which are typically but not necessarily used as default values for surrogate primary keys, with the SERIAL pseudo-type).
If you are interested in getting the id of a newly inserted row, there are several ways:
Option 1: CURRVAL(<sequence name>);.
For example:
INSERT INTO persons (lastname,firstname) VALUES ('Smith', 'John');
SELECT currval('persons_id_seq');
The name of the sequence must be known, it's really arbitrary; in this example we assume that the table persons has an id column created with the SERIAL pseudo-type. To avoid relying on this and to feel more clean, you can use instead pg_get_serial_sequence:
INSERT INTO persons (lastname,firstname) VALUES ('Smith', 'John');
SELECT currval(pg_get_serial_sequence('persons','id'));
Caveat: currval() only works after an INSERT (which has executed nextval() ), in the same session.
Option 2: LASTVAL();
This is similar to the previous, only that you don't need to specify the sequence number: it looks for the most recent modified sequence (always inside your session, same caveat as above).
Both CURRVAL and LASTVAL are totally concurrent safe. The behaviour of sequence in PG is designed so that different session will not interfere, so there is no risk of race conditions (if another session inserts another row between my INSERT and my SELECT, I still get my correct value).
However they do have a subtle potential problem. If the database has some TRIGGER (or RULE) that, on insertion into persons table, makes some extra insertions in other tables... then LASTVAL will probably give us the wrong value. The problem can even happen with CURRVAL, if the extra insertions are done intto the same persons table (this is much less usual, but the risk still exists).
Option 3: INSERT with RETURNING
INSERT INTO persons (lastname,firstname) VALUES ('Smith', 'John') RETURNING id;
This is the most clean, efficient and safe way to get the id. It doesn't have any of the risks of the previous.
Drawbacks? Almost none: you might need to modify the way you call your INSERT statement (in the worst case, perhaps your API or DB layer does not expect an INSERT to return a value); it's not standard SQL (who cares); it's available since Postgresql 8.2 (Dec 2006...)
Conclusion: If you can, go for option 3. Elsewhere, prefer 1.
Note: all these methods are useless if you intend to get the last globally inserted id (not necessarily in your session). For this, you must resort to select max(id) from table (of course, this will not read uncommitted inserts from other transactions).

PostgreSQL - Select from one table based on another

I have a table of tweets (#OneToMany) and another table of analyzedtweets (#ManyToOne) with 'n' number of analyzedtweets (one per analyst) for each entry in the tweet table. Essentially, I can have any number of analysts (represented in a table), each one can analyze a tweet just once. To make it a bit more complex, the entries in the tweet table are grouped by process which is represented by yet another table.
My question is, how would I query the analyzedtweet table for the tweet_id in the last entry given a specific process_id and analyst_id and then use that to find the next tweet in the tweet table also given the same process_id and analyst_id? Basically, I want to give the analyst the next tweet that he/she has not yet analyzed within that specific process (run).
Here are my tables:
CREATE TABLE tweet (
id SERIAL PRIMARY KEY,
process_id INTEGER NOT NULL REFERENCES process(id) ON DELETE CASCADE ON UPDATE CASCADE,
...
);
CREATE TABLE analyzedtweet (
id SERIAL PRIMARY KEY,
tweet_id INTEGER NOT NULL REFERENCES tweet(id) ON DELETE CASCADE ON UPDATE CASCADE,
analyst_id INTEGER NOT NULL REFERENCES analyst(id) ON DELETE CASCADE ON UPDATE CASCADE,
process_id INTEGER NOT NULL REFERENCES process(id) ON DELETE CASCADE ON UPDATE CASCADE,
...
);
CREATE TABLE process (
id SERIAL PRIMARY KEY,
...
);
CREATE TABLE analyst (
id SERIAL PRIMARY KEY,
...
);
The only way I know how to do this is in 2 steps:
Given a specific process_id (processId) and analyst_id (analystId) run the following query to give me the last tweet_id analyzed by that analyst in that process.
SELECT tweet_id from analyzedtweet WHERE analyzedtweet.analyst_id = analystId AND analyzedtweet.process_id = processId ORDER BY analyzedtweet.tweet_id DESC LIMIT 1
Take the result of the above query (referred to ask latestTweetId) and run the following query:
SELECT * from tweet WHERE tweet.id > latestTweetId AND tweet.process_id = processId ORDER BY tweet.id DESC LIMIT 1
I'm sure there is a much better way to do this with JOIN, I just can't figure out how.
Finally, I am using Hibernate and would like to get the POJO back.
If you are fetching the latest tweet for a giving process_id and analyzedtweet_id use this query:
List<Tweet> t = session.createQuery("select t from tweet t
join t.process p,t.analyzedtweet a
where p.id=? and a.id=? order by t.id desc")
.setParameter(1, process_id)
.setParameter(2, analyzedtweet_id)
.setMaxResults(1).getResultList();

Before update trigger with referential integrity in oracle 11g

I want to understand what does before update in trigger means.
I have a table called DEPT_MST where DEPT_ID is the primary key. It has 2 rows with DEPT_ID 1 and 2.
Another table EMP has columns EMP_ID as primary key and EMP_DEPT_ID which is a foreign key referencing DEPT_ID of DEPT table.
Now if I add before update trigger on EMP tables EMP_DEPT_ID column which will check if new value for EMP_DEPT_ID is present in master table DEPT if now then will insert new row with new DEPT_ID to DEPT table.
Now if I update EMP_DEPT_ID to 3 where EMP_DEPT_ID is 2 in EMP table it is giving integrity constraint violation error parent not found.
So,
Does this mean that Oracle checks for integrity constraints first and then calls the "before update" trigger?
Then how can we bypass this check and call before update trigger?
What exactly does "before update" mean here?
How can I achieve above result by using triggers and not by using explicit PL SQL block?
Thank you
Non-deferred foreign key constraints are evaluated before triggers are called, yes.
If you can declare the foreign key constraint to be deferrable (which would require dropping and re-creating it if the existing constraint is not deferrable)
ALTER TABLE emp
ADD CONSTRAINT fk_emp_dept (emp_dept_id) REFERENCES dept( dept_id )
INITIALLY DEFERRED DEFERRABLE;
In your application, you can then set the constraint to be deferrable, run your INSERT statement causing the trigger to fire and insert the parent row. Your foreign key constraint will be validated when the transaction commits.
An alternative to defining the constraint to be deferrable would be to rename the emp table to, say, emp_base, create a view named emp and then create an instead of insert trigger on emp that implements the logic of first inserting into dept and then inserting into emp_base.

Why does this foreign key using inheritance not work? [duplicate]

This question already has answers here:
PostgreSQL foreign key not existing, issue of inheritance?
(2 answers)
Closed 8 years ago.
create table abstract_addresses (
address_id int primary key
);
create table phone_numbers (
phone_number text not null unique
) inherits (abstract_addresses) ;
create table contacts (
name text primary key,
address_id int not null references abstract_addresses(address_id)
);
insert into phone_numbers values (1, '18005551212'); --works
select * from abstract_addresses;
address_id
1
select * from phone_numbers;
address_id phone_number
1 18005551212
insert into contacts values ('Neil', 1); --error
I get this error message:
ERROR: insert or update on table "contacts" violates foreign key constraint "contacts_address_id_fkey"
SQL state: 23503
Detail: Key (address_id)=(1) is not present in table "abstract_addresses".
Just a bad use-case for postgresql table inheritance?
Per the caveats in the docs:
A serious limitation of the inheritance feature is that indexes (including unique constraints) and foreign key constraints only apply to single tables, not to their inheritance children. This is true on both the referencing and referenced sides of a foreign key constraint.
http://www.postgresql.org/docs/current/static/ddl-inherit.html
To do what you want:
Create a table with only an id — like you did.
Don't use inheritance. Really don't. It's useful to partition log tables; not for what you're doing.
Make phone number ids default to nextval('abstract_addresses_address_id_seq'), or whatever the sequence name is.
Add a foreign key in phone_numbers referencing abstract_addresses (address_id). Make it deferrable, initially deferred.
Add an after insert trigger on phone_numbers that inserts a new row in abstract_addresses when needed.
If appropriate, add an after delete trigger on phone_numbers that cascade deletes abstract_addresses — make sure it occurs after the delete, else affected rows will report incorrect values when you delete from phone_numbers.
That way, you'll have an abstract_address for use in occasional tables that need such a thing, while still being able to have a hard reference to phone_numbers where the latter is what you actually want.
One caveat to be aware of: it doesn't play well with ORMs.