Update a column with row_number with duplicate records without PK - postgresql

I have a table with duplicate records but without primary key. Data looks like this:
I want to update one empty column with one of column concatenate with row_number. After update, I want to achieve this:
Since the table does not have a unique column, which means I would join back to a CTE or subquery. I know in sql server, it can be done like this:
UPDATE X
SET X.NEW_KEY = X.PERSONNUMBER + '-' + X.NEW_CODE_DEST
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY PERSONNUMBER) AS NEW_CODE_DEST,PERSONNUMBER,NEW_KEY
FROM EMPLOYEE
) as X;
I tried same logic in postgresql but it didn't work. It threw an error:
SQL Error [42P01]: ERROR: relation "x" does not exist
I also tried this in
UPDATE EMPLOYEE
SET NEW_KEY = X.PERSONNUMBER || '-' || X.NEW_CODE_DEST
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY PERSONNUMBER) AS NEW_CODE_DEST,PERSONNUMBER
FROM EMPLOYEE
) as X;
The NEW_KEY is updated with duplicate value, not surprise there.
So is there a equivalent method in to achieve the same update result? Or I have my query wrong?
Really appreciate the help!

Related

postgres case statement with subquery

I have a subquery like this
with subquery as (select host from table_A where << some condition >>)
and in my main query, I am querying data from another table called table_B, and one of the columns is called destination_host. Now I need to check if the destination_host is in the list returned from my subquery, then I want to output TypeA in my select statement or else TypeB. My select statement looks something like
select name, place, destination_host
from table_B
where <<some condition>>
I want to output a fourth column that is based on a condition check, let's say we call this host_category and if the destination_host value exists in the subquery then I want to add value typeA or else typeB. Please can you help me understand how to write this. I understand that it is hard to provide guidance if you don't have actual data to work with.
I tried using case statements such as this one:
when (destination_host in (select host from subquery)) THEN 'typeA'
when (destination_host not in (select host from subquery)) THEN 'typeB'
end as host_category
but I don't think this is the way to solve this problem.
I would use EXISTS:
WITH subquery AS (...)
SELECT CASE WHEN EXISTS (SELECT 1 FROM subquery
WHERE subquery.host = table_b.destination_host)
THEN 'typeA'
ELSE 'typeB'
END
FROM table_b;
With queries like that, you have to take care of NULL values. If table_b.destination_host is NULL, the row will always show up as typeB, because NULL = NULL is not TRUE in SQL.

EXISTS in filter returning too many values

I need to write a query that uses EXISTS, rather than IN, so that it will run fast. The filter is being fed so many parameter values that EXISTS seems like the only option. The difference is between a 20+ minute query and a 5 second query.
This is the query I have:
SELECT DISTINCT d.GROUP_NAME
FROM [EMPLOYEE] e JOIN [DATA_FACT] d ON (e.KEY = d.KEY)
WHERE d.DATE BETWEEN #Start and #End
AND EXISTS
(
select '1234567' -- #ID
)
AND e.Location IN (#Location)
ORDER BY d.GROUP_NAME ASC
The problem is that it is returning too many records. Based on the values I'm passing to filter on, I should get 1 row back but instead I am getting 28.
If I remove the EXISTS and add the following then I get the 1 record I need:
AND e.ID IN ('1234567')
Is there a way to fix the query to work with EXISTS so that I get the correct results?
This is essentially what you want if you are going to try to use exists to filter your data_fact table by parameters in your employee table. Not sure how much it's going to improve your performance though when you throw a massive number of employee IDs at it.
SELECT
d.GROUP_NAME
FROM [DATA_FACT] AS d
WHERE d.DATE BETWEEN #Start and #End
AND EXISTS
(
select 1
from EMPLOYEE AS e
WHERE d.[KEY] = e.[KEY]
AND e.[Location] IN (#Location)
AND e.ID IN ('1234567')
)
ORDER BY d.GROUP_NAME ASC

Specifice order to tables in postgres

I just created a temporary table as:
create temporary table userAndProductSales as
select p.p_name, u.u_name, u.s_price, u.quantity
from product p
join userAndStates u
on p.s_id = u.s_id
Now I want to select some columns with a particular order. For example, I want the select to give me an output of:
u_name1 p_name1
u_name1 p_name2
u_name1 p_name3
u_name1 p_name4
...
u_name2 p_name1
u_name2 p_name2
u_name2 p_name3
....
and so on and so forth. How do I get this ouput? I've tried something on the lines of:
select (select u_name from userandproductsales order by u_name), p_name from userandproductsales
but I'm getting an error
UPDATE: Figured out that the table I'm joining isn't giving me the correct data I want. Thanks for the help though.
Here is how to use ORDER BY :
SELECT * from userandstatesales
order by u_name , p_name
Unless there is a reason for creating a temporary table (like needing to access it later in the same session), you should avoid the expense and simply do a order by from your select. For example:
select p.p_name, u.u_name, u.s_price, u.quantity
from product p
join userAndStates u
on p.s_id = u.s_id
order by u.u_name, p.p_name;

In DB2, perform an update based on insert for large number of rows

In DB2, I need to do an insert, then, using results/data from that insert, update a related table. I need to do it on a million plus records and would prefer not to lock the entire database. So, 1) how do I 'couple' the insert and update statements? 2) how can I ensure the integrity of the transaction (without locking the whole she-bang)?
some pseudo-code should help clarify
STEP 1
insert into table1 (neededId, id) select DYNAMICVALUE, id from tableX where needed value is null
STEP 2
update table2 set neededId = (GET THE DYNAMIC VALUE JUST INSERTED) where id = (THE ID JUST INSERTED)
note: in table1, the ID col is not unique, so i can't just filter on that to find the new DYNAMICVALUE
This should be more clear (FTR, this works, but I don't like it, because I'd have to lock the tables to maintain integrity. Would be great it I could run these statements together, and allow the update to refer to the newAddressNumber value.)
/****RUNNING TOP INSERT FIRST****/*
--insert a new address for each order that does not have a address id
insert into addresses
(customerId, addressNumber, address)
select
cust.Id,
--get next available addressNumber
ifNull((select max(addy2.addressNumber) from addresses addy2 where addy2.customerId = cust.id),0) + 1 as newAddressNumber,
cust.address
from customers cust
where exists (
--find all customers with at least 1 order where addressNumber is null
select 1 from orders ord
where 1=1
and ord.customerId = cust.id
and ord.addressNumber is null
)
/*****RUNNING THIS UPDATE SECOND*****/
update orders ord1
set addressNumber = (
select max(addressNumber) from addresses addy3
where addy3.customerId = ord1.customerId
)
where 1=1
and ord1.addressNumber is null
The IDENTITY_VAL_LOCAL function is a non-deterministic function that returns the most recently assigned value for an identity column, where the assignment occurred as a result of a single INSERT statement using a VALUES clause

t sql select into existing table new column

Hi I have a temp table (#temptable1) and I want to add a column from another temp table (#temptable2) into that, my query is as follows:
select
Customer
,CustName
,KeyAccountGroups
,sum(Weeksales) as Weeksales
into #temptable1
group by Customer
,CustName
,KeyAccountGroups
select
SUM(QtyInvoiced) as MonthTot
,Customer
into #temptalbe2
from SalesSum
where InvoiceDate between #dtMonthStart and #dtMonthEnd
group by Customer
INSERT INTO #temptable1
SELECT MonthTot FROM #temptable2
where #temptable1.Customer = #temptable2.Customer
I get the following: Column name or number of supplied values does not match table definition.
In an INSERT statement you cannot reference the table you are inserting into. An insert works under the assumption that a new row is to be created. That means there is no existing row that could be referenced.
The functionality you are looking for is provided by the UPDATE statement:
UPDATE t1
SET MonthTot = t2.MonthTot
FROM #temptable1 t1
JOIN #temptable2 t2
ON t1.Customer = t2.Customer;
Be aware however, that this logic requires the Customer column in t2 to be unique. If you have duplicate values in that table the query will seem to run fine, however you will end up with randomly changing results.
For more details on how to combine two tables in an UPDATE or DELETE check out my A Join A Day - UPDATE & DELETE post.
If I understand it correctly you want to do two things.
1: Alter table #temptable1 and add a new column.
2: Fill that column with the values of #temptable2
ALTER #temptable1 ADD COLUMN MothTot DATETIME
UPDATE #temptable1 SET MothTot = (
SELECT MonthTot
FROM #temptable2
WHERE #temptable2.Customer = #temptable1.Customer)