INSERT ON DUPLICATE KEY: Add values together - tsql

I need to change the date column of an SQL table that contains dates and amounts from, e.g. 20170101 to 20170102. However it can be the case that the new date 20170102 already exists. This results in an duplicate key error, because the date column is part of a unique index.
My first thought was to use INSERT ON DUPLICATE KEY:
INSERT INTO table (Date, Amount)
SELECT '20170102', Amount
FROM table
WHERE Date = '20170101'
ON DUPLICATE KEY UPDATE Amount = OldAmount + NewAmount
The part Amount = OldAmount + NewAmount cannot work obviously. But how can I solve my issue?

Here is your answer: Assume old table TABLE1 ( DATE1 DATE PRIMARY KEY, AMOUNT INT);
Add new column Date2 as DATE:
ALTER TABLE TABLE1 ADD COLUMN DATE2 DATE;
Generate all DATE2 by old date;
UPDATE TABLE1 set DATE2=DATE1; -- OLD column DATE1 or DATE
UPDATE DATE2 which record you want to update DATE
UPDATE TABLE1 SET DATE2='20170101' where DATE1='20170102'; -- Assume 20170101 is already exists
Get the records with added amount
SELECT DATE2,sum(AMOUNT) from TABLE1 group by 1; -- This will show all records with sum
You can create new table TABLE2 and insert those records:
CREATE TABLE TABLE2 ( DATE1 DATE PRIMARY KEY, AMOUNT int);
INSERT INTO TABLE2 SELECT DATE2, SUM(AMOUNT) FROM TABEL1 GROUP BY 1;

If you're using SQL server, try a merge statement.
Just to clarify - Do you need to change the dates of records in the table or just add new records?
CREATE TABLE #DataTable
(
SomeDate DATE,
Amount INT
)
INSERT #DataTable
VALUES
('20170101', 1),
('20170206', 2),
('20170309', 3),
('20170422', 4),
('20170518', 5)
DECLARE #NewValues TABLE
(
SomeDate DATE,
Amount INT
)
INSERT #NewValues
VALUES
('20170101', 10), --Update
('20170309', 15), --Update
('20170612', 6), --Insert
('20170725', 7) --Insert
MERGE INTO #DataTable AS tgt
USING #NewValues AS nv
ON nv.SomeDate = tgt.SomeDate
WHEN NOT MATCHED THEN INSERT
VALUES
(nv.SomeDate, nv.Amount)
WHEN MATCHED THEN UPDATE
SET tgt.Amount = tgt.Amount + nv.Amount
OUTPUT $action AS MergeAction,
Inserted.SomeDate,
Deleted.Amount AS OldValue,
Inserted.Amount AS NewValue;
DROP TABLE #DataTable;

Related

How can I update TABLE1 rows when I change some TABLE2 rows in POSTGRESQL?

I am building a soccer management tool where the league's admin can update the score of every match in the MATCHES TABLE. At the same time I want to update the TEAMS TABLE columns.
For instance if the match is DALLAS vs PHOENIX, and the score was DALLAS 2 - PHOENIX 3, I want to update that match in the MATCH TABLE (I know how to tho this) but at the same time I want to update the points of those two teams based on the result we just updated.
Is there a way to do that in POSTGRESQL?
Thanks for your help.
You can do this for triggers. What is a Database trigger? A database trigger is a special stored procedure that is run when specific actions occur within a database. Most triggers are defined to run when changes are made to a table’s data. Triggers can be defined to run after (or before) INSERT, UPDATE, and DELETE table records. Triggers use two special database objects, INSERTED and DELETED, to access rows affected by the database actions.
When table record is inserted – Use the INSERTED table to determine which rows were added to the table.
When table record is deleted – Use the DELETED table to see which rows were removed from the table.
When table record is updated – Use the INSERTED table to inspect the new or updated values and the DELETED table to see the values prior to update.
In PostgreSQL INSERTED trigger object is called NEW and DELETED object is called OLD
For example:
We have two tables, user_group and user_detail. I would like to insert 12 records into table user_detail when inserting data to table user_group
CREATE TABLE examples.user_group (
id serial4 NOT NULL,
group_name varchar(200) NOT NULL,
user_id int4 NOT NULL
);
CREATE TABLE examples.user_detail (
id serial4 NOT NULL,
user_id int4 NOT NULL,
"month" int2 NOT NULL
);
-- create trigger function for inserting 12 records into user_detail table
CREATE OR REPLACE FUNCTION examples.f_user_group_after_insert()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
DECLARE
p_user_id integer;
begin
p_user_id := new.user_id; -- new is a system table (trigger objects), which return inserted new records for user_group tables
insert into examples.user_detail (user_id, month) values (p_user_id, 1);
insert into examples.user_detail (user_id, month) values (p_user_id, 2);
insert into examples.user_detail (user_id, month) values (p_user_id, 3);
insert into examples.user_detail (user_id, month) values (p_user_id, 4);
insert into examples.user_detail (user_id, month) values (p_user_id, 5);
insert into examples.user_detail (user_id, month) values (p_user_id, 6);
insert into examples.user_detail (user_id, month) values (p_user_id, 7);
insert into examples.user_detail (user_id, month) values (p_user_id, 8);
insert into examples.user_detail (user_id, month) values (p_user_id, 9);
insert into examples.user_detail (user_id, month) values (p_user_id, 10);
insert into examples.user_detail (user_id, month) values (p_user_id, 11);
insert into examples.user_detail (user_id, month) values (p_user_id, 12);
return new;
end;
$function$
;
-- join trigger function to user_group table, when will be run after insert
create trigger user_group_after_insert
after insert
on
examples.user_group for each row execute function examples.f_user_group_after_insert();

Find the mismatch columns from two tables with same structure

I have two tables with same structure. I have to find the mismatch columns in both the tables based on id and year combination. Below is the table structure:
id and year is primary key in both the tables.
============================================================
Create table and insert script for table1:
create table table1 (id int, year int, name varchar(50), stat varchar(50), PRIMARY KEY (id,year));
insert into table1 values (1,2021,'Aman','L');
insert into table1 values (2,2021,'Ankit','H');
insert into table1 values (3,2021,'Rahul','G');
insert into table1 values (4,2021,'Gagan','L');
============================================================
Create table and insert script for table2:
create table table2 (id int, year int, name varchar(50), stat varchar(50), PRIMARY KEY (id,year));
insert into table2 values (1,2020,'Aman','H');
insert into table2 values (2,2020,'Anuj','M');
insert into table2 values (3,2020,'Rahul','G')
insert into table2 values (4,2020,'Abhi','L')
============================================================
Expected Output:
for example, id = 1 and year = 2021 from 1st table when compared with id = 1 and year = 2020 (table1 year -1) from table2 should return that stat is different.
id = 2 and year = 2021 from table1 when compared with id = 2 and year = 2020 from table2 should return that name and stat is different.
I need to compare the year-1 from table2 with year column of table1.
Can anyone help me with sql or DB2 query or procedure, how can I do that.
Sounds like you need a simple join
select *
from table1 t1
join table2 t2
on t1.id = t2.id
and t1.year = t2.year + 1
where t1.stat <> t2.stat
or t1.name <> t2.name --if you want this?

Sum and average total columns in PostgreSQL

I'm using this query to find duplicate dates but not sure how to sum each duplicate dates, average it and remove duplicate dates.
DB Schema
date_time
datapoint_1
datapoint_2
SQL Query
SELECT date_time, COUNT(date_time)
FROM MYTABLE
GROUP BY date_time
HAVING COUNT(date_time) > 1
ORDER BY COUNT(date_time)
I would create a new table to replace the old one. That is easier and might even perform better:
CREATE TABLE mytable2 (LIKE mytable);
INSERT INTO mytable2 (date_time, datapoint_1, datapoint_2)
SELECT m.date_time, avg(m.datapoint_1), avg(m.datapoint_2)
FROM mytable AS m
GROUP BY m.date_time;
Then you can drop mytable and rename mytable2 to replace it.
To prevent new rows from creating duplicates, you could change the way you insert data:
-- to keep track of counts
ALTER TABLE mytable ADD numval integer DEFAULT 1;
-- to prevent duplicates
ALTER TABLE mytable ADD UNIQUE (date_time);
-- to insert new rows
INSERT INTO mytable (date_time, datapoint_1, datapoint_2)
VALUES ('2021-06-30', 42.0, -34.9)
ON CONFLICT (date_time)
DO UPDATE SET numval = mytable.numval + 1,
datapoint_1 = mytable.datapoint_1 + excluded.datapoint_1,
datapoint_2 = mytable.datapoint_2 + excluded.datapoint_2;
-- to select the averages
SELECT date_time,
datapoint_1 / numval AS datapoint_1,
datapoint_2 / numval AS datapoint_2
FROM mytable;
When you use GROUP BY you can also use aggregate functions to reduce multiple lines to a single one (COUNT, that you used is one of such functions). In your case the query would be:
SELECT date_time, avg(datapoint_1), avg(datapoint_2)
FROM MYTABLE
GROUP BY date_time
For every distinct date_time you will get a single row with the average of datapoint_1 and datapoint_2.

Using 'on conflict' with a unique constraint on a table partitioned by date

Given the following table:
CREATE TABLE event_partitioned (
customer_id varchar(50) NOT NULL,
user_id varchar(50) NOT NULL,
event_id varchar(50) NOT NULL,
comment varchar(50) NOT NULL,
event_timestamp timestamp with time zone DEFAULT NOW()
)
PARTITION BY RANGE (event_timestamp);
And partitioning by calendar week [one example]:
CREATE TABLE event_partitioned_2020_51 PARTITION OF event_partitioned
FOR VALUES FROM ('2020-12-14') TO ('2020-12-20');
And the unique constraint [event_timestamp necessary since the partition key]:
ALTER TABLE event_partitioned
ADD UNIQUE (customer_id, user_id, event_id, event_timestamp);
I would like to update if customer_id, user_id, event_id exist, otherwise insert:
INSERT INTO event_partitioned (customer_id, user_id, event_id)
VALUES ('9', '99', '999')
ON CONFLICT (customer_id, user_id, event_id, event_timestamp) DO UPDATE
SET comment = 'I got updated';
But I cannot add a unique constraint only for customer_id, user_id, event_id, hence event_timestamp as well.
So this will insert duplicates of customer_id, user_id, event_id. Even so with adding now() as a fourth value, unless now() precisely matches what's already in event_timestamp.
Is there a way that ON CONFLICT could be less 'granular' here and update if now() falls in the week of the partition, rather than precisely on '2020-12-14 09:13:04.543256' for example?
Basically I am trying to avoid duplication of customer_id, user_id, event_id, at least within a week, but still benefit from partitioning by week (so that data retrieval can be narrowed to a date range and not scan the entire partitioned table).
I don't think you can do this with on conflict in a partitioned table. You can, however, express the logic with CTEs:
with
data as ( -- data
select '9' as customer_id, '99' as user_id, '999' as event_id
),
ins as ( -- insert if not exists
insert into event_partitioned (customer_id, user_id, event_id)
select * from data d
where not exists (
select 1
from event_partitioned ep
where
ep.customer_id = d.customer_id
and ep.user_id = d.user_id
and ep.event_id = d.event_id
)
returning *
)
update event_partitioned ep -- update if insert did not happen
set comment = 'I got updated'
from data d
where
ep.customer_id = d.customer_id
and ep.user_id = d.user_id
and ep.event_id = d.event_id
and not exists (select 1 from ins)
#GMB's answer is great and works well. Since enforcing a unique constrain on a partitioned table (parent table) partitioned by time range is usually not that useful, why now just have a unique constraint/index placed on the partition itself?
In your case, event_partitioned_2020_51 can have a unique constraint:
ALTER TABLE event_partitioned_2020_51
ADD UNIQUE (customer_id, user_id, event_id, event_timestamp);
And subsequent query can just use
INSERT ... INTO event_partitioned_2020_51 ON CONFLICT (customer_id, user_id, event_id, event_timestamp)
as long as this its the partition intended, which is usually the case.

TSQL: Remove duplicates based on max(date)

I am searching for a query to select the maximum date (a datetime column) and keep its id and row_id. The desire is to DELETE the rows in the source table.
Source Data
id date row_id(unique)
1 11/11/2009 1
1 12/11/2009 2
1 13/11/2009 3
2 1/11/2009 4
Expected Survivors
1 13/11/2009 3
2 1/11/2009 4
What query would I need to achieve the results I am looking for?
Tested on PostgreSQL:
delete from table where (id, date) not in (select id, max(date) from table group by id);
There are various ways of doing this, but the basic idea is the same:
- Indentify the rows you want to keep
- Compare each row in your table to the ones you want to keep
- Delete any that don't match
DELETE
[source]
FROM
yourTable AS [source]
LEFT JOIN
yourTable AS [keep]
ON [keep].id = [source].id
AND [keep].date = (SELECT MAX(date) FROM yourTable WHERE id = [keep].id)
WHERE
[keep].id IS NULL
DELETE
[yourTable]
FROM
[yourTable]
LEFT JOIN
(
SELECT id, MAX(date) AS date FROM yourTable GROUP BY id
)
AS [keep]
ON [keep].id = [yourTable].id
AND [keep].date = [yourTable].date
WHERE
[keep].id IS NULL
DELETE
[source]
FROM
yourTable AS [source]
WHERE
[source].row_id != (SELECT TOP 1 row_id FROM yourTable WHERE id = [source].id ORDER BY date DESC)
DELETE
[source]
FROM
yourTable AS [source]
WHERE
NOT EXISTS (SELECT id FROM yourTable GROUP BY id HAVING id = [source].id AND MAX(date) != [source].date)
Because you are using SQL Server 2000, you'er not able to use the Row Over technique of setting up a sequence and to identify the top row for each unique id.
So, your proposed technique is to use a datetime column to get the top 1 row to remove duplicates. That might work, but there is a possibility that you might still get duplicates having the same datetime value. But that's easy enough to check for.
First check the assumption that all rows are unique based on the id and date columns:
CREATE TABLE #TestTable (rowid INT IDENTITY(1,1), thisid INT, thisdate DATETIME)
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '11/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/12/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
SELECT COUNT(*) AS thiscount
FROM #TestTable
GROUP BY thisid, thisdate
HAVING COUNT(*) > 1
This example returns a value of 2 - indicating that you will still end up with duplicates even after using the date column to remove duplicates. If you return 0, then you have proven that your proposed technique will work.
When de-duping production data, I think one should take some precautions and test before and after. You should create a table to hold the rows you plan to remove so you can recover them easily if you need to after the delete statement has been executed.
Also, it's a good idea to know beforehand how many rows you plan to remove so you can verify the count before and after - and you can gauge the magnitude of the delete operation. Based on how many rows will be affected, you can plan when to run the operation.
To test before the de-duping process, find the occurrences.
-- Get occurrences of duplicates
SELECT COUNT(*) AS thiscount
FROM
#TestTable
GROUP BY thisid
HAVING COUNT(*) > 1
ORDER BY thisid
That gives you the rows with more than one row with the same id. Capture the rows from this query into a temporary table and then run a query using the SUM to get the total number of rows that are not unique based on your key.
To get the number of rows you plan to delete, you need the count of rows that are duplicate based on your unique key, and the number of distinct rows based on your unique key. You subtract the distinct rows from the count of occurrences. All that is pretty straightforward - so I'll leave you to it.
Try this
declare #t table (id int, dt DATETIME,rowid INT IDENTITY(1,1))
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
Query:
delete from #t where rowid not in(
select t.rowid from #t t
inner join(
select MAX(dt)maxdate
from #t
group by id) X
on t.dt = X.maxdate )
select * from #t
Output:
id dt rowid
1 2009-11-13 00:00:00.000 3
2 2009-11-01 00:00:00.000 4
delete from temp where row_id not in (
select t.row_id from temp t
right join
(select id,MAX(dt) as dt from temp group by id) d
on t.dt = d.dt and t.id = d.id)
I have tested this answer..
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
select * from #t
;WITH T AS(
select dense_rank() over(partition by id order by dt desc)NO,DT,ID,rowid from #t )
DELETE T WHERE NO>1