TSQL Bulk insert data while returning created id's to original table - tsql

I have two tables. One called #tempImportedData, another called #tempEngine.
I have data in #tempImportedData I would like to put this data into #tempEngine, once inserted into #tempEngine an id gets created. I would like that id to be placed back into #tempImportedData in the corresponding row. I believe this this the purpose of OUTPUT statement. I almost have a working copy please see below.
Declare #tempEngine as table(
id int identity(4,1) not null
,c1 int
,c2 int
);
Declare #tempImportedData as table(
c1 int
,c2 int
,engine_id int
);
insert into #tempImportedData (c1, c2)
select 1,1
union all select 1,2
union all select 1,3
union all select 1,4
union all select 2,1
union all select 2,2
union all select 2,3
union all select 2,4
;
INSERT INTO #tempEngine ( c1, c2 )
--OUTPUT INSERTED.c1, INSERTED.c2, INSERTED.id INTO #tempImportedData (c1, c2, engine_id) --dups with full data
--OUTPUT INSERTED.id INTO #tempImportedData (engine_id) -- new rows with wanted data, but nulls for rest
SELECT
c1
,c2
FROM
#tempImportedData
;
select * from #tempEngine ;
select * from #tempImportedData ;
I've commented out two lines starting with OUTPUT.
The problem with the first is that it inserts all of the correct data into #tempImportedData, so the end result is that 16 rows exist, the first 8 are the same with a null value for engine_id while the third column is null; the remaining 8 have all three columns populated. The end result should have 8 rows not 16.
The second OUTPUT statement has the same problem as the first - 16 rows instead of 8. However the new 8 rows contain null, null, engine_id
So how can I alter this TSQL to get #tempImportedData.engine_id updated without inserting new rows?

You need another table variable (#temp) to capture the output from the insert and then run a update statement using the #temp against #tempImportedData joining on c1 and c2. This requires that the combination of c1 and c2 is unique in #tempImportedData.
Declare #temp as table(
id int
,c1 int
,c2 int
);
INSERT INTO #tempEngine ( c1, c2 )
OUTPUT INSERTED.id, INSERTED.c1, INSERTED.c2 INTO #temp
SELECT
c1
,c2
FROM
#tempImportedData
;
UPDATE T1
SET engine_id = T2.id
FROM #tempImportedData as T1
INNER JOIN #temp as T2
on T1.c1 = T2.c1 and
T1.c2 = T2.c2
;

#tempImportedData still has the old data still in it. The first OUTPUT statement seems to be inserting the right data in the new rows, but the old rows are still there. If you run a DELETE on #tempImportedData taking away all rows where engine_id is null at the end of your script, you should be left the correct eight rows.

Related

How to add items to local function array in postgress?

I need to do multiple inserts in the table that number depends on how many records we got from select. I need to iterate over records from select and then insert them to another table. I want to get all new Ids from insert to array to use them later in the following inserts, how can I do this?
I can't collect them using select after insert, because there can be old records.
for record in (select test, test1, test2
from public.a
join public.b on a.reg_id = b.id
where a.id = arg_id) loop
(INSERT into public.c
( a, b, c)
(select test, test1, test2
from record)--need to get ids from this
end loop;
---
some block where I have old_ids
---
--to insert them there
insert into public.d(d,e,f,g)values(..,..,old_id,(id from previous insert))
upd
Tried to make like this:
with a2 as(
INSERT INTO public.reg
(name_, code, state)
(select a.secondname, a.code, b.state_name--multiple rows from select
from public.client a
left join public.states b on a.state_id = b.id
where a.id = id_p) RETURNING id
)
INSERT INTO public.request
(phone, address, qty, prod_id, reg_id)
(select phone, address, qty, prod_id, (select id from a2)--maybe something wrong there, but error happend before
from public.shp a
where a.id = id_p);
but getting an error: more than one row returned by a subquery used as an expression
Demonstration of using the result of a query:
\i tmp.sql
CREATE TABLE aa(aa integer not null primary key);
CREATE TABLE bb(bb integer not null primary key);
CREATE TABLE cc(cc integer not null primary key);
WITH x0 AS (
INSERT INTO aa(aa) values (1),(2),(3)
returning aa
)
, x1 AS (
INSERT INTO bb(bb)
SELECT aa*aa from x0
returning bb
)
, x2 AS (
INSERT INTO cc(cc)
SELECT bb*bb from x1
returning cc
)
-- main query
SELECT *
FROM x2
;
-- Check cc
SELECT *
FROM cc
;
Output:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
CREATE TABLE
CREATE TABLE
cc
----
1
16
81
(3 rows)
cc
----
1
16
81
(3 rows)

How to ignore a record in query to avoid a conversion error in JOIN?

I have a table T1 with alphanumeric codes (varchar column) where always the three first digits will be numeric like this:
001ABCD
100EFGH
541XYZZ
OTHER
NOTE: Please notice that I have ONE exception record which is all alpha (OTHER).
Also I have a table T2 with 3-digit numbers (int column) like this:
001
200
300
So when I run the following query:
SELECT * from T1
LEFT JOIN T2
ON SUBSTRING(T1.code1,1,3) = T2.code2
WHERE T1.code1 <> 'OTHER'
It is causing me the error:
Conversion failed when converting the varchar value 'OTH' to data type int.
I know the issue but not how to fix it (it's trying to compare 'OTH' with the T2.code2 INT column).
I tried to use WHERE but it didn't work at all.
I cannot get rid of the 'OTHER' record and convert the T2.code2 column from int to varchar is not an option. Any idea?
Here are 3 different ways you can solve this. I would recommended the persisted computed column since it only has to be calculated on insert and update, not every time you run the read query.
DROP TABLE IF EXISTS #T2;
DROP TABLE IF EXISTS #T1;
CREATE TABLE #T1
(
Code1 VARCHAR(10)
,Code2Computed AS TRY_CONVERT(INT,SUBSTRING(Code1,1,3)) PERSISTED
)
;
CREATE TABLE #T2
(
Code2 INT
)
;
INSERT INTO #T1
(Code1)
VALUES
('001ABCD')
,('100EFGH')
,('541XYZZ')
,('OTHER')
;
INSERT INTO #T2
(Code2)
VALUES
(001)
,(100)
,(200)
,(300)
,(541)
;
--Convert INT to 3 digit code
SELECT *
FROM #T1
LEFT JOIN #T2
ON SUBSTRING(#T1.Code1,1,3) = RIGHT(CONCAT('000',#T2.Code2),3)
;
--Convert 3 digit code to INT
SELECT *
FROM #T1
LEFT JOIN #T2
ON TRY_CONVERT(INT,SUBSTRING(#T1.Code1,1,3)) = #T2.Code2
;
--Use computed column
SELECT *
FROM #T1
LEFT JOIN #T2
ON #T1.Code2Computed = #T2.Code2
;

Remove Duplicates

I have a table like below:
SuppID AreaID SuppNo SupName SupPrice
------------------------------------------------
1 3 526 ANC 100
1 3 985 JTT 200
3 4 100 HIK 300
In the above table, for same SuppID(1) and same AreaID(3), different SuppNo are there (526 & 985) in two different rows.
In this scenario , I'd like to make those two rows into a single row with SuppNo field as blank.
Also my output result should display rows with all the columns.
Any Help?
This should get you started:
DECLARE #TABLE TABLE (SuppID INT, AreaID INT, SuppNo VARCHAR(5), SupName VARCHAR(5), SupPrice INT)
INSERT INTO #TABLE
SELECT 1,3,'526','ANC',100 UNION
SELECT 1,3,'985','JTT',200 UNION
SELECT 3,4,'100','HIK',300
-- select data before updates
SELECT * FROM #TABLE
-- add a row count by AreaID/SuppID
;WITH T1 AS
(
SELECT *
,SUM(1) OVER(PARTITION BY AREAID,SUPPID) AS ROWCNT
FROM #TABLE
)
-- set the SuppNo blank on rows that have more than 1 match
UPDATE T1 SET SuppNo='' WHERE ROWCNT>1
-- add a row # by AreaID/SuppID
;WITH T2 AS
(
SELECT *
,ROW_NUMBER() OVER(PARTITION BY AREAID,SUPPID ORDER BY AREAID,SUPPID) AS ROWID
FROM #TABLE
)
-- delete duplicate rows
DELETE
FROM T2
WHERE ROWID>1
-- select data after updates
SELECT * FROM #TABLE

PostgreSQL - How to get distinct on two columns separately?

I've a table like this:
Source table "tab"
column1 column2
x 1
x 2
y 1
y 2
y 3
z 3
How can I build the query to get result with unique values in each of two columns separately. For example I'd like to get a result like one of these sets:
column1 column2
x 1
y 2
z 3
or
column1 column2
x 2
y 1
z 3
or ...
Thanks.
What you're asking for is difficult because it's weird: SQL treats rows as related fields but you're asking to make two separate lists (distinct values from col1 and distinct values from col2) then display them in one output table not caring how the rows match up.
You can so this by writing the SQL along those lines. Write a separate select distinct for each column, then put them together somehow. I'd put them together by giving each row in each results a row number, then joining them both to a big list of numbers.
It's not clear what you want null to mean. Does it mean there's a null in one of the columns, or that there's not the same number of distinct values in each column? This one problem from asking for things that don't match up with typical relational logic.
Here's an example, removing the null value from the data since that confuses the issue, different data values to avoid confusing rowNumber with data and so there are 3 distinct values in one column and 4 in another. This works for SQL Server, presumably there's a variation for PostgreSQL.
if object_id('mytable') is not null drop table mytable;
create table mytable ( col1 nvarchar(10) null, col2 nvarchar(10) null)
insert into mytable
select 'x', 'a'
union all select 'x', 'b'
union all select 'y', 'c'
union all select 'y', 'b'
union all select 'y', 'd'
union all select 'z', 'a'
select c1.col1, c2.col2
from
-- derived table giving distinct values of col1 and a rownumber column
( select col1
, row_number() over (order by col1) as rowNumber
from ( select distinct col1 from mytable ) x ) as c1
full outer join
-- derived table giving distinct values of col2 and a rownumber column
( select col2
, row_number() over (order by col2) as rowNumber
from ( select distinct col2 from mytable ) x ) as c2
on c1.rowNumber = c2.rowNumber

TSQL: Remove duplicates based on max(date)

I am searching for a query to select the maximum date (a datetime column) and keep its id and row_id. The desire is to DELETE the rows in the source table.
Source Data
id date row_id(unique)
1 11/11/2009 1
1 12/11/2009 2
1 13/11/2009 3
2 1/11/2009 4
Expected Survivors
1 13/11/2009 3
2 1/11/2009 4
What query would I need to achieve the results I am looking for?
Tested on PostgreSQL:
delete from table where (id, date) not in (select id, max(date) from table group by id);
There are various ways of doing this, but the basic idea is the same:
- Indentify the rows you want to keep
- Compare each row in your table to the ones you want to keep
- Delete any that don't match
DELETE
[source]
FROM
yourTable AS [source]
LEFT JOIN
yourTable AS [keep]
ON [keep].id = [source].id
AND [keep].date = (SELECT MAX(date) FROM yourTable WHERE id = [keep].id)
WHERE
[keep].id IS NULL
DELETE
[yourTable]
FROM
[yourTable]
LEFT JOIN
(
SELECT id, MAX(date) AS date FROM yourTable GROUP BY id
)
AS [keep]
ON [keep].id = [yourTable].id
AND [keep].date = [yourTable].date
WHERE
[keep].id IS NULL
DELETE
[source]
FROM
yourTable AS [source]
WHERE
[source].row_id != (SELECT TOP 1 row_id FROM yourTable WHERE id = [source].id ORDER BY date DESC)
DELETE
[source]
FROM
yourTable AS [source]
WHERE
NOT EXISTS (SELECT id FROM yourTable GROUP BY id HAVING id = [source].id AND MAX(date) != [source].date)
Because you are using SQL Server 2000, you'er not able to use the Row Over technique of setting up a sequence and to identify the top row for each unique id.
So, your proposed technique is to use a datetime column to get the top 1 row to remove duplicates. That might work, but there is a possibility that you might still get duplicates having the same datetime value. But that's easy enough to check for.
First check the assumption that all rows are unique based on the id and date columns:
CREATE TABLE #TestTable (rowid INT IDENTITY(1,1), thisid INT, thisdate DATETIME)
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '11/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/12/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
SELECT COUNT(*) AS thiscount
FROM #TestTable
GROUP BY thisid, thisdate
HAVING COUNT(*) > 1
This example returns a value of 2 - indicating that you will still end up with duplicates even after using the date column to remove duplicates. If you return 0, then you have proven that your proposed technique will work.
When de-duping production data, I think one should take some precautions and test before and after. You should create a table to hold the rows you plan to remove so you can recover them easily if you need to after the delete statement has been executed.
Also, it's a good idea to know beforehand how many rows you plan to remove so you can verify the count before and after - and you can gauge the magnitude of the delete operation. Based on how many rows will be affected, you can plan when to run the operation.
To test before the de-duping process, find the occurrences.
-- Get occurrences of duplicates
SELECT COUNT(*) AS thiscount
FROM
#TestTable
GROUP BY thisid
HAVING COUNT(*) > 1
ORDER BY thisid
That gives you the rows with more than one row with the same id. Capture the rows from this query into a temporary table and then run a query using the SUM to get the total number of rows that are not unique based on your key.
To get the number of rows you plan to delete, you need the count of rows that are duplicate based on your unique key, and the number of distinct rows based on your unique key. You subtract the distinct rows from the count of occurrences. All that is pretty straightforward - so I'll leave you to it.
Try this
declare #t table (id int, dt DATETIME,rowid INT IDENTITY(1,1))
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
Query:
delete from #t where rowid not in(
select t.rowid from #t t
inner join(
select MAX(dt)maxdate
from #t
group by id) X
on t.dt = X.maxdate )
select * from #t
Output:
id dt rowid
1 2009-11-13 00:00:00.000 3
2 2009-11-01 00:00:00.000 4
delete from temp where row_id not in (
select t.row_id from temp t
right join
(select id,MAX(dt) as dt from temp group by id) d
on t.dt = d.dt and t.id = d.id)
I have tested this answer..
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
select * from #t
;WITH T AS(
select dense_rank() over(partition by id order by dt desc)NO,DT,ID,rowid from #t )
DELETE T WHERE NO>1