Updating duplicates from one-to-many relationships.

Updating duplicates from one-to-many relationships. - postgresql

This isn't your standard "how do I find duplicates" question, I know how to do find duplicates, see below. This question is how do I update said records that also have child items with matching records?
Alright, I'm going to give you whole scenario so that you can work with this problem.
Duplicate records could be inserted as a result of critical system failure.
Finding later duplicates and marking the parent commission_import_commission_junction "is_processed = True" solves this problem.
The complication is that the commission_import_commission_junction and its children commission_import_commission_junction_line_items must be identical on the columns to compare.
the tables are:
commission_import_commission_junction
- id
- created_date
- some columns that are checked for duplication
- some columns that are not checked for duplication
commission_import_commission_junction_line_items
- id
- some columns that are checked for duplication
- some columns that are not checked for duplication
(For the full table spec, check out the CREATE TABLE statements in the bottom-most block of code.)
The query to mark duplicates on just the parent table commission_import_commission_junction:
UPDATE commission_import_commission_junction cicj
SET is_processed = TRUE
FROM (
SELECT MIN(created_date) AS first_date, member_id, site_id, action_status, action_type, ad_id, commission_id, country, event_date, locking_date, order_id, original, original_action_id, posting_date, website_id, advertiser_name, commission_amount, sale_amount, aggregator_affiliate_id
FROM commission_import_commission_junction inner_imports
JOIN commission_import_commission_junction_line_items inner_items ON inner_items.commission_import_commission_junction_id = inner_imports.commission_import_commission_junction_id
GROUP BY member_id, site_id, action_status, action_type, ad_id, commission_id, country, event_date, locking_date, order_id, original, original_action_id, posting_date, website_id, advertiser_name, commission_amount, sale_amount, aggregator_affiliate_id
HAVING (COUNT(*) > 1)
) AS dups
WHERE
-- MAIN TABLE COLUMNN LIST
(cicj.member_id, cicj.site_id, cicj.action_status, cicj.action_type, cicj.ad_id, cicj.commission_id, cicj.country, cicj.event_date, cicj.locking_date, cicj.order_id, cicj.original, cicj.original_action_id, cicj.posting_date, cicj.website_id, cicj.advertiser_name, cicj.commission_amount, cicj.sale_amount, cicj.aggregator_affiliate_id)
IS NOT DISTINCT FROM
-- OTHER TABLE COLUMN LIST
(dups.member_id, dups.site_id, dups.action_status, dups.action_type, dups.ad_id, dups.commission_id, dups.country, dups.event_date, dups.locking_date, dups.order_id, dups.original, dups.original_action_id, dups.posting_date, dups.website_id, dups.advertiser_name, dups.commission_amount, dups.sale_amount, dups.aggregator_affiliate_id)
AND cicj.created_date <> dups.first_date
AND cicj.is_processed = FALSE;
Somewhere and somehow I need to check that the line_items are also duplicates.
THE CODE BELOW IS TO SETUP THE DATABASE, remember this is postgres specific.
-- "commission_import_build" is a record that keeps information about the process of collecting the commission information. Duplicate commission_import_commission_junction records will not exist with the same commission_import_build_id
-- "commission_import_commission_junction" is a record description commission information from a customers purchase.
-- "commission_import_commission_junction_line_items" are records describing items in that purchase.
DROP TABLE IF EXISTS commission_import_commission_junction_line_items;
DROP TABLE IF EXISTS commission_import_commission_junction;
DROP TABLE IF EXISTS commission_import_builds;
CREATE TABLE commission_import_builds
(
commission_import_build_id serial NOT NULL,
build_date timestamp with time zone NOT NULL,
CONSTRAINT pkey_commission_import_build_id PRIMARY KEY (commission_import_build_id),
CONSTRAINT commission_import_builds_build_date_key UNIQUE (build_date)
);
INSERT INTO commission_import_builds (commission_import_build_id, build_date) VALUES (1, '2011-01-01');
INSERT INTO commission_import_builds (commission_import_build_id, build_date) VALUES (2, '2011-01-02');
INSERT INTO commission_import_builds (commission_import_build_id, build_date) VALUES (3, '2011-01-03');
CREATE TABLE commission_import_commission_junction
(
commission_import_commission_junction_id serial NOT NULL,
member_id integer,
site_id integer,
action_status character varying NOT NULL,
action_type character varying NOT NULL,
ad_id bigint,
commission_id bigint NOT NULL,
country character varying,
event_date timestamp with time zone NOT NULL,
locking_date timestamp with time zone,
order_id character varying NOT NULL,
original boolean,
original_action_id bigint NOT NULL,
posting_date timestamp with time zone NOT NULL,
website_id bigint NOT NULL,
advertiser_name character varying,
commission_amount numeric(19,2) NOT NULL,
sale_amount numeric(19,2) NOT NULL,
aggregator_affiliate_id integer NOT NULL,
is_processed boolean NOT NULL DEFAULT false,
created_date timestamp with time zone NOT NULL DEFAULT now(),
member_transaction_id integer,
commission_import_build_id integer NOT NULL,
CONSTRAINT pkey_commission_import_commission_junction_commission_import_co PRIMARY KEY (commission_import_commission_junction_id),
CONSTRAINT fk_commission_import_commission_junction_commission_import_buil FOREIGN KEY (commission_import_build_id)
REFERENCES commission_import_builds (commission_import_build_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
);
CREATE INDEX idx_commission_import_commission_junction_is_processed
ON commission_import_commission_junction
USING btree
(is_processed);
INSERT INTO commission_import_commission_junction (commission_import_commission_junction_id, action_status, action_type, commission_id, event_date, order_id, original_action_id, posting_date, website_id, commission_amount, sale_amount, aggregator_affiliate_id, commission_import_build_id, created_date) VALUES
(1, 'new', 'sale', 1234, '2011-02-04 14:39:52.989499-07', 'test-order', 1234567, '2011-02-04 14:39:52.989499-07', 123, 12.35, 123.45, 9876, 1, '2011-02-05');
INSERT INTO commission_import_commission_junction (commission_import_commission_junction_id, action_status, action_type, commission_id, event_date, order_id, original_action_id, posting_date, website_id, commission_amount, sale_amount, aggregator_affiliate_id, commission_import_build_id, created_date) VALUES
(2, 'new', 'sale', 1234, '2011-02-04 14:39:52.989499-07', 'test-order', 1234567, '2011-02-04 14:39:52.989499-07', 123, 12.35, 123.45, 9876, 2, '2011-02-06');
INSERT INTO commission_import_commission_junction (commission_import_commission_junction_id, action_status, action_type, commission_id, event_date, order_id, original_action_id, posting_date, website_id, commission_amount, sale_amount, aggregator_affiliate_id, commission_import_build_id, created_date) VALUES
(3, 'new', 'sale', 1234, '2011-02-04 14:39:52.989499-07', 'test-order', 1234567, '2011-02-04 14:39:52.989499-07', 123, 12.35, 123.45, 9876, 3, '2011-02-07');
SELECT * FROM commission_import_commission_junction;
CREATE TABLE commission_import_commission_junction_line_items
(
commission_import_commission_junction_line_item_id serial NOT NULL,
commission_import_commission_junction_id integer NOT NULL,
sku character varying,
quantity integer,
posting_date timestamp with time zone,
sale_amount numeric(19,2),
discount numeric(19,2),
CONSTRAINT pkey_commission_import_commission_junction_link_items_commissio PRIMARY KEY (commission_import_commission_junction_line_item_id),
CONSTRAINT fkey_commission_import_commission_junction_line_items_commissio FOREIGN KEY (commission_import_commission_junction_id)
REFERENCES commission_import_commission_junction (commission_import_commission_junction_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
);
INSERT INTO commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) VALUES (1, 'test1', 3, 23.45);
INSERT INTO commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) VALUES (1, 'test2', 3, 67.50);
INSERT INTO commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) VALUES (1, 'test3', 3, 32.50);
INSERT INTO commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) VALUES (2, 'test1', 3, 23.45);
INSERT INTO commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) VALUES (2, 'test2', 3, 67.50);
INSERT INTO commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) VALUES (2, 'test3', 3, 32.50);
INSERT INTO commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) VALUES (3, 'test1', 3, 23.45);
INSERT INTO commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) VALUES (3, 'test2', 3, 67.50);
INSERT INTO commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) VALUES (3, 'test3', 3, 32.50);

Reminds me of duplicate elimination in direct marketing mailing lists
Regardless of the details of your tables, a parent-child dupe elimination algorithm follows these steps:
1) Get duplicates into a list that matches old key to new key (temp table)
2) Update the foreign key in the child table
3) Delete the dupes from the parent
I admire the detail in your post, but I'm going to keep it simple and easier to read with some example table/column names:
-- step 1, get the list
-- Warning: t-sql syntax, adjust for Postgres
-- if it doesn't like placement of "into..." clause
select keep.primaryKey as keepKey
, dupe.primaryKey as dupeKey
into #DupeList
from (
select min(primaryKey) as primaryKey
, dupeCriteria1
, dupeCriteria2
FROM theTable
group by dupeCriteria1,dupeCritera2
having count(*) > 1
) keep
JOIN theTable dupe
ON keep.dupeCriteria1 = dupe.dupeCriteria1
AND keep.dupeCriteria2 = dupe.dupeCriteria2
AND keep.primaryKey <> dupe.primaryKey
Once you have that, update the foreign key in the child table:
update childTable
set foreignKey = #temp1.keepKey
from #temp1
where foreignKey = #temp1.dupeKey
Then just delete everything out of the parent table:
delete from parentTable
where primaryKey in (select dupeKey from #temp1)

CREATE FUNCTION removeCommissionImportCommissionJunctionDuplicates() RETURNS INT AS $BODY$ DECLARE duplicate RECORD; DECLARE parent RECORD; DECLARE children commission_import_commission_junction_line_items[]; DECLARE duplicate_children commission_import_commission_junction_line_items[]; DECLARE duplicate_child_count INT; DECLARE child commission_import_commission_junction_line_items; DECLARE duplicate_child commission_import_commission_junction_line_items; DECLARE num_updates INT; BEGIN
SELECT * FROM (SELECT 0) AS value INTO num_updates;
FOR duplicate IN
SELECT cicj.*, dups.first_date
FROM commission_import_commission_junction cicj
JOIN (SELECT MIN(created_date) AS first_date, member_id, site_id, action_status, action_type, ad_id, commission_id, country, event_date, locking_date, order_id, original, original_action_id, posting_date, website_id, advertiser_name, commission_amount, sale_amount, aggregator_affiliate_id
FROM commission_import_commission_junction inner_imports
GROUP BY member_id, site_id, action_status, action_type, ad_id, commission_id, country, event_date, locking_date, order_id, original, original_action_id, posting_date, website_id, advertiser_name, commission_amount, sale_amount, aggregator_affiliate_id
HAVING (COUNT(*) > 1)) AS dups
ON (cicj.member_id, cicj.site_id, cicj.action_status, cicj.action_type, cicj.ad_id, cicj.commission_id, cicj.country, cicj.event_date, cicj.locking_date, cicj.order_id, cicj.original, cicj.original_action_id, cicj.posting_date, cicj.website_id, cicj.advertiser_name, cicj.commission_amount, cicj.sale_amount, cicj.aggregator_affiliate_id)
IS NOT DISTINCT FROM
(dups.member_id, dups.site_id, dups.action_status, dups.action_type, dups.ad_id, dups.commission_id, dups.country, dups.event_date, dups.locking_date, dups.order_id, dups.original, dups.original_action_id, dups.posting_date, dups.website_id, dups.advertiser_name, dups.commission_amount, dups.sale_amount, dups.aggregator_affiliate_id)
WHERE cicj.created_date != dups.first_date
AND cicj.is_processed = FALSE
LOOP
--RAISE NOTICE 'Looping';
-- We need to collect the parent and children of the original record.
-- Get the parent of the original
SELECT *
FROM commission_import_commission_junction cicj
WHERE (cicj.member_id, cicj.site_id, cicj.action_status, cicj.action_type, cicj.ad_id, cicj.commission_id, cicj.country, cicj.event_date, cicj.locking_date, cicj.order_id, cicj.original, cicj.original_action_id, cicj.posting_date, cicj.website_id, cicj.advertiser_name, cicj.commission_amount, cicj.sale_amount, cicj.aggregator_affiliate_id)
IS NOT DISTINCT FROM
(duplicate.member_id, duplicate.site_id, duplicate.action_status, duplicate.action_type, duplicate.ad_id, duplicate.commission_id, duplicate.country, duplicate.event_date, duplicate.locking_date, duplicate.order_id, duplicate.original, duplicate.original_action_id, duplicate.posting_date, duplicate.website_id, duplicate.advertiser_name, duplicate.commission_amount, duplicate.sale_amount, duplicate.aggregator_affiliate_id)
AND cicj.created_date = duplicate.first_date
INTO parent;
-- Get the children of the original
children := ARRAY(
SELECT cicjli
FROM commission_import_commission_junction_line_items cicjli
WHERE cicjli.commission_import_commission_junction_id
= parent.commission_import_commission_junction_id);
--RAISE NOTICE 'parent: %', parent;
--RAISE NOTICE 'children: %', children;
-- Now get the duplicates children
duplicate_children := ARRAY(
SELECT cicjli
FROM commission_import_commission_junction_line_items cicjli
WHERE cicjli.commission_import_commission_junction_id
= duplicate.commission_import_commission_junction_id);
--RAISE NOTICE 'duplicate_children: %', duplicate_children;
-- Next, compare the children of the duplicate to the children of the original parent.
-- First compare size
IF array_upper(children, 1) = array_upper(duplicate_children, 1) THEN
--RAISE NOTICE 'Same number of children in duplicate as in parent';
-- Now compare each set
SELECT * FROM (SELECT 0) AS value INTO duplicate_child_count;
FOR child_index IN array_lower(children, 1) .. array_upper(children, 1) LOOP
child := children[child_index];
FOR duplicate_child_index IN array_lower(duplicate_children, 1) .. array_upper(duplicate_children, 1) LOOP
duplicate_child := duplicate_children[duplicate_child_index];
IF (child.sku, child.quantity, child.posting_date, child.sale_amount, child.discount) IS NOT DISTINCT FROM (duplicate_child.sku, duplicate_child.quantity, duplicate_child.posting_date, duplicate_child.sale_amount, duplicate_child.discount) THEN
SELECT * FROM (SELECT duplicate_child_count + 1) AS value INTO duplicate_child_count;
EXIT;
END IF;
END LOOP;
END LOOP;
--RAISE NOTICE 'Duplicate Child Count: %', duplicate_child_count;
-- If we have the same number of duplicates as there are records
IF duplicate_child_count = array_upper(duplicate_children, 1) THEN
-- Update the duplicate record as processed.
--RAISE NOTICE 'Marking duplicate % as is_processed', duplicate;
UPDATE commission_import_commission_junction cicj SET is_processed = TRUE WHERE cicj.commission_import_commission_junction_id
= duplicate.commission_import_commission_junction_id;
SELECT * FROM (SELECT num_updates + 1) AS value INTO num_updates;
END IF;
END IF;
END LOOP;
--RAISE NOTICE 'Updates: %', num_updates;
RETURN num_updates; END; $BODY$ LANGUAGE plpgsql;

Related

Postgres exclude using gist across different tables

I have 2 tables like this
drop table if exists public.table_1;
drop table if exists public.table_2;
CREATE TABLE public.table_1 (
id serial NOT NULL,
user_id bigint not null,
status varchar(255) not null,
date_start date NOT NULL,
date_end date NULL
);
CREATE TABLE public.table_2 (
id serial NOT NULL,
user_id bigint not null,
status varchar(255) not null,
date_start date NOT NULL,
date_end date NULL
);
alter table public.table_1
add constraint my_constraint_1
EXCLUDE USING gist (user_id with =, daterange(date_start, date_end, '[]') WITH &&)
where (status != 'deleted');
alter table public.table_2
add constraint my_constraint_2
EXCLUDE USING gist (user_id with =, daterange(date_start, date_end, '[]') WITH &&)
where (status != 'deleted');
Every table contains rows which are related to a user, and all the rows of the same user cannot overlap in range. In addition, some rows may be logically deleted, so I added a where condition.
So far it's working w/o problems, but the 2 constraints work separately for each table.
I need to create a constraint which cover the 2 set of tables, so that a single daterange (of the same user and not deleted), may appaer only once across the 2 different tables.
Does the EXCLUDE notation be extended to work with different tables or do I need to check it with a trigger? If the trigger is the answer, which is the simplier way to do this? Create a temporary table with the union of the 2, add the constraint on it and check if fails?

Starting from #Laurenz Albe suggestion, this is what I made
-- #################### SETUP SAMPLE TABLES ####################
drop table if exists public.table_1;
drop table if exists public.table_2;
CREATE TABLE public.table_1 (
id serial NOT NULL,
user_id bigint not null,
status varchar(255) not null,
date_start date NOT NULL,
date_end date NULL
);
CREATE TABLE public.table_2 (
id serial NOT NULL,
user_id bigint not null,
status varchar(255) not null,
date_start date NOT NULL,
date_end date NULL
);
alter table public.table_1
add constraint my_constraint_1
EXCLUDE USING gist (user_id with =, daterange(date_start, date_end, '[]') WITH &&)
where (status != 'deleted');
alter table public.table_2
add constraint my_constraint_2
EXCLUDE USING gist (user_id with =, daterange(date_start, date_end, '[]') WITH &&)
where (status != 'deleted');
-- #################### SETUP TRIGGER ####################
create or REPLACE FUNCTION check_date_overlap_trigger_hook()
RETURNS trigger as
$body$
DECLARE
l_table text;
l_sql text;
l_row record;
begin
l_table := TG_ARGV[0];
l_sql := format('
select *
from public.%s as t
where
t.user_id = %s -- Include only records of the same user
and t.status != ''deleted'' -- Include only records that are active
', l_table, new.user_id);
for l_row in execute l_sql
loop
IF daterange(l_row.date_start, COALESCE(l_row.date_end, 'infinity'::date)) && daterange(new.date_start, COALESCE(new.date_end, 'infinity'::date))
THEN
RAISE EXCEPTION 'Date interval is overlapping with another one in table %', l_table
USING HINT = 'You can''t have the same interval across table1 AND table2';
END IF;
end loop;
RETURN NEW;
end
$body$
LANGUAGE plpgsql;
-- #################### INSTALL TRIGGER ####################
create trigger check_date_overlap
BEFORE insert or update
ON public.table_1
FOR EACH row
EXECUTE PROCEDURE check_date_overlap_trigger_hook('table_2');
create trigger check_date_overlap
BEFORE insert or update
ON public.table_2
FOR EACH row
EXECUTE PROCEDURE check_date_overlap_trigger_hook('table_1');
-- #################### INSERT DEMO ROWS ####################
insert into public.table_1 (user_id, status, date_start, date_end) values (1, 'active', '2020-12-10', '2020-12-20');
insert into public.table_1 (user_id, status, date_start, date_end) values (1, 'deleted', '2020-12-15', '2020-12-25');
insert into public.table_1 (user_id, status, date_start, date_end) values (2, 'active', '2020-12-10', '2020-12-20');
insert into public.table_1 (user_id, status, date_start, date_end) values (2, 'deleted', '2020-12-15', '2020-12-25');
-- This will fail for overlap on the same table
-- insert into public.table_1 (user_id, status, date_start, date_end) values (1, 'active', '2020-12-15', '2020-12-25');
-- This will fail as the user 1 already has an overlapping period on table 1
-- insert into public.table_2 (user_id, status, date_start, date_end) values (1, 'active', '2020-12-15', '2020-12-25');
-- This will fail as the user 1 already has an overlapping period on table 1
insert into public.table_2 (user_id, status, date_start, date_end) values (1, 'deleted', '2020-12-15', '2020-12-25');
update public.table_2 set status = 'active' where id = 1;
select 'table_1' as src_table, * from public.table_1
union
select 'table_2', * from public.table_2

You can probably use a trigger, but triggers are always vulnerable to race conditions (unless you are using SERIALIZABLE isolation).
If your tables really have the same columns, why don't you use a single table (and perhaps add a type column to disambiguate)?

Postgresql not choosing rows grouping

I have query. There is a construction like this example: (online demo)
You will see the in result created_at field. I have to use query the created_at field. So I have to use it in select created_at. I don't want to use it created_at field in select. Because, there are millions of records in the deposits table. How can i escape this problem?
(Note: I have many table to query, like "deposits" table. this is just a short example.)
create table payment_methods
(
payment_method_id bigserial not null
constraint payment_methods_pkey
primary key
);
create table currencies_of_payment_methods
(
copm_id bigserial not null
constraint currencies_of_payment_methods_pkey
primary key,
payment_method_id integer not null
);
create table deposits
(
deposit_id bigserial not null
constraint deposits_pkey
primary key,
amount numeric(18,2) not null,
copm_id integer not null,
created_at timestamp(0)
);
INSERT INTO payment_methods (payment_method_id) VALUES (1);
INSERT INTO payment_methods (payment_method_id) VALUES (2);
INSERT INTO currencies_of_payment_methods (copm_id, payment_method_id) VALUES (1, 1);
INSERT INTO deposits (amount, copm_id, created_at) VALUES (100, 1, '2020-09-10 08:49:37');
INSERT INTO deposits (amount, copm_id, created_at) VALUES (200, 1, '2020-09-10 08:49:37');
INSERT INTO deposits (amount, copm_id, created_at) VALUES (40, 1, '2020-09-10 08:49:37');
Query:
SELECT payment_methods.payment_method_id,
deposit_copm_id.deposit_copm_id,
manuel_deposit_amount.manuel_deposit_amount,
manuel_deposit_amount.created_at
FROM payment_methods
CROSS JOIN lateral
(
SELECT currencies_of_payment_methods.copm_id AS deposit_copm_id
FROM currencies_of_payment_methods
WHERE currencies_of_payment_methods.payment_method_id = payment_methods.payment_method_id) deposit_copm_id
CROSS JOIN lateral
(
SELECT sum(deposits.amount) AS manuel_deposit_amount,
array_agg(deposits.created_at) AS created_at
FROM deposits
WHERE deposits.copm_id = deposit_copm_id.deposit_copm_id) manuel_deposit_amount
WHERE payment_methods.payment_method_id = 1

Copying records in a table with self referencing ids

I have a table with records which can reference another row in the same table so there is a parent-child relationship between rows in the same table.
What I am trying to achieve is to create the same data for another user so that they can see and manage their own version of this structure through the web ui where these rows are displayed as a tree.
Problem is when I bulk insert this data by only changing user_id, I lose the relation between rows because the parent_id values will be invalid for these new records and they should be updated as well with the newly generated ids.
Here is what I tried: (did not work)
Iterate over main_table
copy-paste the static values after each
do another insert on a temp table for holding old and new ids
update old parent_ids with new ids after loop ends
My attempt at doing such thing(last step is not included here)
create or replace function test_x()
returns void as
$BODY$
declare
r RECORD;
userId int8;
rowPK int8;
begin
userId := (select 1)
create table if not exists id_map (old_id int8, new_id int8);
create table if not exists temp_table as select * from main_table;
for r in select * from temp_table
loop
rowPK := insert into main_table(id, user_id, code, description, parent_id)
values(nextval('hibernate_sequence'), userId, r.code, r.description, r.parent_id) returning id;
insert into id_map (old_id, new_id) values (r.id, rowPK);
end loop;
end
$BODY$
language plpgsql;
My PostgreSQL version is 9.6.14.
DDL below for testing.
create table main_table(
id bigserial not null,
user_id int8 not null,
code varchar(3) not null,
description varchar(100) not null,
parent_id int8 null,
constraint mycompkey unique (user_id, code, parent_id),
constraint mypk primary key (id),
constraint myfk foreign key (parent_id) references main_table(id)
);
insert into main_table (id, user_id, code, description, parent_id)
values(0, 0, '01', 'Root row', null);
insert into main_table (id, user_id, code, description, parent_id)
values(1, 0, '001', 'Child row 1', 0);
insert into main_table (id, user_id, code, description, parent_id)
values(2, 0, '002', 'Child row 2', 0);
insert into main_table (id, user_id, code, description, parent_id)
values(3, 0, '002', 'Grand child row 1', 2);
How to write a procedure to accomplish this?
Thanks in advance.

It appears your task is coping all data for a given user to another while maintaining the hierarchical relationship within the new rows. The following accomplishes that.
It begins creating a new copy of the existing rows with the new user_id, including the old row parent_id. That will be user in the next (update) step.
The CTE logically begins with the new rows which have parent_id and joins to the old parent row. From here it joins to the old parent row to the new parent row using the code and description. At that point we have the new id along with the new parent is. At that point just update with those values. Actually for the update the CTE need only select those two columns, but I've left the intermediate columns so you trace through if you wish.
create or replace function copy_user_data_to_user(
source_user_id bigint
, target_user_id bigint
)
returns void
language plpgsql
as $$
begin
insert into main_table ( user_id,code, description, parent_id )
select target_user_id, code, description, parent_id
from main_table
where user_id = source_user_id ;
with n_list as
(select mt.id, mt.code, mt.description, mt.parent_id
, mtp.id p_id,mtp.code p_code,mtp.description p_des
, mtc.id c_id, mtc.code c_code, mtc.description c_description
from main_table mt
join main_table mtp on mtp.id = mt.parent_id
join main_table mtc on ( mtc.user_id = target_user_id
and mtc.code = mtp.code
and mtc.description = mtp.description
)
where mt.parent_id is not null
and mt.user_id = target_user_id
)
update main_table mt
set parent_id = n_list.c_id
from n_list
where mt.id = n_list.id;
return;
end ;
$$;
-- test
select * from copy_user_data_to_user(0,1);
select * from main_table;

CREATE TABLE 'table name you want to create' SELECT * FROM myset
but new table and myset column name should be equal and you can also
use inplace of * to column name but column name exist in new table
othwerwise getting errors

Postgres Dynamic Query

I have scenario were I have a master table which stores db table name and column name, I need to build dynamic query based on that.
CREATE TABLE MasterTable
(
Id int primary key,
caption varchar(100),
dbcolumnname varchar(100),
dbtablename varchar(100)
);
CREATE TABLE Engineers
(
Id int primary key,
Name varchar(100),
Salary BigInt
);
CREATE TABLE Executives
(
Id int primary key,
Name varchar(100),
Salary BigInt
);
CREATE TABLE Manager
(
Id int primary key,
Name varchar(100),
Salary BigInt
);
INSERT INTO Manager(Id, Name, Salary)
VALUES(1, 'Manager 1', 6000000);
INSERT INTO Executives(Id, Name, Salary)
VALUES(1, 'Executive 1', 6000000);
INSERT INTO Engineers(Id, Name, Salary)
VALUES(1, 'Engineer 1', 6000000);
INSERT INTO MasterTable(Id, caption, dbcolumnname, dbtablename)
VALUES (1, 'Name', 'name', 'Engineers');
INSERT INTO MasterTable(Id, caption, dbcolumnname, dbtablename)
VALUES (2, 'Name', 'name', 'Manager');
INSERT INTO MasterTable(Id, caption, dbcolumnname, dbtablename)
VALUES (3, 'Name', 'name', 'Executives');
INSERT INTO MasterTable(Id, caption, dbcolumnname, dbtablename)
VALUES (4, 'Salary', 'Salary', 'Engineers');
INSERT INTO MasterTable(Id, caption, dbcolumnname, dbtablename)
VALUES (5, 'Salary', 'Salary', 'Manager');
INSERT INTO MasterTable(Id, caption, dbcolumnname, dbtablename)
VALUES (6, 'Salary', 'Salary', 'Executives');
I want to build a stored procedure which accepts caption and Id and give result back based on dbcolumnname and dbtablename. For example if I pass Salary,Name as caption and Id as 1, stored procedure should be query of dbcolumn and dbtable, something like below.
Select Id as ID, name as Value from Engineers
UNION
Select Id as ID, name as Value from Manager
UNION
Select Id as ID, name as Value from Executives
UNION
Select Id as ID, Salary as Value from Executives
UNION
Select Id as ID, Salary as Value from Engineers
UNION
Select Id as ID, Salary as Value from Manager
I have heard of dynamic sql, can that be used here?
Fiddle
EDIT :: I got one dynamic query which builds union statement to get the output, however problem is i am not able to escape double quotes. Below is the query and Error
Query :
DO
$BODY$
BEGIN
EXECUTE string_agg(
format('SELECT %I FROM %I', dbcolumnname, dbtablename),
' UNION ')
FROM MasterTable;
END;
$BODY$;
Error:
ERROR: relation "Engineers" does not exist
LINE 1: SELECT name FROM "Engineers" UNION SELECT name FROM "Manager...

I'd like to suggest an alternative way of achieving what you want. That is, using PostgreSQL inheritance mechanism.
For instance:
CREATE TABLE ParentTable (
Id int,
Name varchar(100),
Salary BigInt
);
ALTER TABLE Engineers INHERIT ParentTable;
ALTER TABLE Executives INHERIT ParentTable;
ALTER TABLE Manager INHERIT ParentTable;
SELECT Id, Salary AS value FROM ParentTable
UNION
SELECT Id, Name AS value FROM ParentTable;
Now if you want to use MasterTable in order to restrict the set of tables used, you can do it as follows:
SELECT Id, Name AS value
FROM ParentTable
INNER JOIN pg_class ON parenttable.tableoid = pg_class.oid
INNER JOIN MasterTable ON LOWER(dbtablename) = LOWER(relname)
UNION
SELECT Id, Salary AS value
FROM ParentTable
INNER JOIN pg_class ON parenttable.tableoid = pg_class.oid
INNER JOIN MasterTable ON LOWER(dbtablename) = LOWER(relname)
However, you can not arbitrarily restrict the set of columns to retrieve from one table to another with this technique.

Table names and column names are case insensitive in SQL, unless they are quoted in double quotes. Postgres does this by folding unquoted identifiers to lower case.
So, your DDL:
CREATE TABLE MasterTable
(
Id int primary key,
caption varchar(100),
dbcolumnname varchar(100),
dbtablename varchar(100)
);
Will be interpreted by Postgres as
CREATE TABLE mastertable
(
id int primary key,
caption varchar(100),
dbcolumnname varchar(100),
dbtablename varchar(100)
);
You can avoid case folding by quoting the names:
CREATE TABLE "MasterTable"
(
"Id" int primary key,
caption varchar(100),
dbcolumnname varchar(100),
dbtablename varchar(100)
);
The %I format-specifier (internally uses quote_ident()) adds quotes to its argument (when needed)
, so the query asks for "MasterTable" when only mastertable is present in the schema.
But, it is easyer to just avoid MixedCase identifiers,

What's wrong with this T-SQL MERGE statement?

I am new to MERGE, and I'm sure I have some error in my code.
This code will run and create my scenario:
I have two tables, one that is called TempUpsert that fills from a SqlBulkCopy operation (100s of millions of records) and a Sales table that holds the production data which is to be indexed and used.
I wish to merge the TempUpsert table with the Sales one
I am obviously doing something wrong as it fails with even the smallest example
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[TempUpsert]') )
drop table TempUpsert;
CREATE TABLE [dbo].[TempUpsert](
[FirstName] [varchar](200) NOT NULL,
[LastName] [varchar](200) NOT NULL,
[Score] [int] NOT NULL
) ON [PRIMARY] ;
CREATE TABLE [dbo].[Sales](
[FullName] [varchar](200) NOT NULL,
[LastName] [varchar](200) NOT NULL,
[FirstName] [varchar](200) NOT NULL,
[lastUpdated] [date] NOT NULL,
CONSTRAINT [PK_Sales] PRIMARY KEY CLUSTERED
(
[FullName] ASC
)
---- PROC
CREATE PROCEDURE [dbo].[sp_MoveFromTempUpsert_to_Sales]
(#HashMod int)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
MERGE Sales AS trget
USING (
SELECT
--- Edit: Thanks to Mikal added DISTINCT
DISTINCT
FirstName, LastName , [Score], LastName+'.'+FirstName AS FullName
FROM TempUpsert AS ups) AS src (FirstName, LastName, [Score], FullName)
ON
(
src.[Score] = #hashMod
AND
trget.FullName=src.FullName
)
WHEN MATCHED
THEN
UPDATE SET trget.lastUpdated = GetDate()
WHEN NOT MATCHED
THEN INSERT ([FullName], [LastName], [FirstName], [lastUpdated])
VALUES (FullName, src.LastName, src.FirstName, GetDate())
OUTPUT $action, Inserted.*, Deleted.* ;
--print ##rowcount
END
GO
--- Insert dummie data
INSERT INTO TempUpsert (FirstName, LastName, Score)
VALUES ('John','Smith',2);
INSERT INTO TempUpsert (FirstName, LastName, Score)
VALUES ('John','Block',2);
INSERT INTO TempUpsert (FirstName, LastName, Score)
VALUES ('John','Smith',2); --make multiple on purpose
----- EXECUTE PROC
GO
DECLARE #return_value int
EXEC #return_value = [dbo].[sp_MoveFromTempUpsert_to_Sales]
#HashMod = 2
SELECT 'Return Value' = #return_value
GO
This returns:
(1 row(s) affected)
(1 row(s) affected)
(1 row(s) affected)
Msg 2627, Level 14, State 1, Procedure sp_MoveFromTempUpsert_to_Sales, Line 12
Violation of PRIMARY KEY constraint 'PK_Sales'. Cannot insert duplicate key in object
'dbo.Sales'. The statement has been terminated.
(1 row(s) affected)
What am I doing wrong please?
Greatly appreciated

The first two rows in your staging table will give you the duplicate PK. violation. Conc is the PK and you insert tmain+dmain with the same value twice.

In Summation
MERGE requires its input (Using) to be duplicates free
the Using is a regular SQL statement, so you can use Group By, distinct and having as well as Where clauses.
My final Merge looks like so :
MERGE Sales AS trget
USING (
SELECT FirstName, LastName, Score, LastName + '.' + FirstName AS FullName
FROM TempUpsert AS ups
WHERE Score = #hashMod
GROUP BY FirstName, LastName, Score, LastName + '.' + FirstName
) AS src (FirstName, LastName, [Score], FullName)
ON
(
-- src.[Score] = #hashMod
--AND
trget.FullName=src.FullName
)
WHEN MATCHED
THEN
UPDATE SET trget.lastUpdated = GetDate()
WHEN NOT MATCHED
THEN INSERT ([FullName], [LastName], [FirstName], [lastUpdated])
VALUES (FullName, src.LastName, src.FirstName, GetDate())
OUTPUT $action, Inserted.*, Deleted.* ;
--print ##rowcount
END
And it works!
Thanks to you all :)

Without DISTINCT or proper AGGREGATE function in subquery used in USING part of MERGE there will be two rows which suits criteria used in ON part of MERGE, which is not allowed. (Two John.Smith)
AND
Move the condition src.[Score] = #hashMod inside the subquery,
instead if ON clause not succeed, for example John.Smith have score of 2, and #HashMod = 1 - then if you already have the row with John.Smith in target table - you'll get an error with Primary Key Constraint

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Updating duplicates from one-to-many relationships. - postgresql

Related

Postgres exclude using gist across different tables

Postgresql not choosing rows grouping

Copying records in a table with self referencing ids

Postgres Dynamic Query

What's wrong with this T-SQL MERGE statement?

Categories

Resources