How do I select the min opendate from a list of duplicates? - tsql

I have 3 columns. SSN|AccountNumber|OpenDate
1 SSN may have multiple AccountNumbers
Each AccountNumber has a corresponding OpenDate
In my list I have many SSN's, each containing several account numbers which may have been opened on different days.
I want the results of my query to be SSN|earlest OpenDate|AccountNumber that corresponds with the earliest opendate.
I'm dealing with about 200,000 records.
EDIT: First I did
select SSN, min(OpenDate), AcctNumber from Table Group By SSN, AccountNumber
but that didn't quite give me the correct data.
The raw data gives me something like this:
SSN | AcctNumber | OpenDate
---------------------------
10 101 Jan
10 102 Feb
10 103 Mar
Where I got 10, Jan, and AccNumber 102 which is not the account number that is associated with Jan OpenDate After looking at others, I found that the account number I got was just one of the account numbers associated with that SSN rather than the one that corresponds with the min(OpenDate)

WITH CTE AS ( SELECT SSN, AcctNumber, OpenDate, ROW_NUM() OVER (PARTITION BY SSN ORDER BY OpenDate DESC) AS RN ) SELECT SSN, AcctNumber, OpenDate FROM CTE WHERE RN=1;

If your table is like this:
SSN | AcctNumber | OpenDate
---------------------------
10 101 April
10 101 May
10 102 April
20 201 June
20 201 July
Do you want your query to return this?
SSN | AcctNumber | OpenDate
---------------------------
10 101 April
10 102 April
20 201 June
Then you would use this query:
select ssn, min(OpenDate), acctNumber from tbl group by ssn, acctNumber

You can try this..
select SSN , AcctNumber, OpenDate
from (SELECT SSN , AcctNumber, OpenDate
, ROW_NUMBER() OVER ( PARTITION BY SSN, ORDER BY OpenDate ASC ) AS RN
FROM table) AS temp
WHERE temp.RN= 1

Related

Window Function For Consecutive Dates

I want to know how many users were active for 3 consecutive days on any given day.
e.g on 2022-11-03, 1 user (user_id = 111) was active 3 days in a row. Could someone please advise what kind of window function(?) would be needed?
This is my dataset:
user_id
active_date
111
2022-11-01
111
2022-11-02
111
2022-11-03
222
2022-11-01
333
2022-11-01
333
2022-11-09
333
2022-11-10
333
2022-11-11
If you are confident there are no duplicate user_id + active_date rows in the source data, then you can use two LAG functions like this:
SELECT user_id,
active_date,
CASE WHEN DATEADD(day, -1, active_date) = LAG(active_date, 1) OVER (PARTITION BY user_id ORDER BY active_date)
AND DATEADD(day, -2, active_date) = LAG(active_date, 2) OVER (PARTITION BY user_id ORDER BY active_date)
THEN 'Yes'
ELSE 'No'
END AS rowof3
FROM your_table
ORDER BY user_id, active_date;
If there might be duplication, use this FROM clause instead:
FROM (SELECT DISTINCT user_id, active_date :: DATE FROM your_table)

How I can find duplicate values in the result of a join operation?

I have two tables
MappingTable > Id, ItemId, Quantity
ItemTable > ItemId, Name, DateOfPurchase
I wanted to find out the duplicate rows having same Quantity and same DateOfPurchase.
eg. I have
Id ItemId Quantity
1 01 4
2 03 5
3 05 4
ItemId Name DateOfPurchase
01 AB 2019-10-30 18:30:00
05 XY 2019-10-30 18:17:00
Result:
Quantity DateOfPurchase Name
4 2019-10-30 AB
4 2019-10-30 XY
So, I might join these tables and then find duplicates
How can I do that?
One option is to use window funtions, if your database supports them:
select *
from (
select
m.*,
i.name,
i.dateOfPurchase,
count(*) over(partition by m.quantity, p.dateOfPurchase) cnt
from mapping m
inner join item i on i.itemId = m.itemId
) t
where cnt > 1
order by quantity, dateOfPurchase

Difference between the max date and the penultimate max for specific employee - postgresql

Bit stuck on a problem. Trying to find the difference between two dates in postgreSQL.
I have a table emp with many employees in it:
emp_id, date
1, 31-10-2017
1, 08-08-2017
1, 02-06-2017
I want it to look like this:
emp_id, max_date, penultimate_date, difference
1, 31-10-2017, 08-08-2017, 84 days
Obviously you can use max(date) and group by the emp_id, however how do you retrieve the penultimate date. I have used a few functions like:
order by date desc limit 1 offset 1
I have also tried to put these in sub queries but that hasn,t worked as there are many employee numbers and I need one row for each employee.
Can anyone help???
Thanks,
pp84
as kindly suggested by #Haleemur Ali, order by date desc limit 1 offset 1 would not work with several emp_id:
t=# with d(emp_id, date)as (values(1, '31-10-2017'::date),(1, '08-08-2017'),(1, '02-06-2017' ),(2,'2016-01-01'),(2,'2016-02-02'),(2,'2016-03-03'))
select distinct emp_id
, max(date) over (partition by emp_id) max_date
, nth_value(date,2) over (partition by emp_id) penultimate_date
, max(date) over (partition by emp_id) - nth_value(date,2) over (partition by emp_id) diff
from d
;
emp_id | max_date | penultimate_date | diff
--------+------------+------------------+------
2 | 2016-03-03 | 2016-02-02 | 30
1 | 2017-10-31 | 2017-08-08 | 84
(2 rows)
Time: 0.756 ms
WITH emps (emp_id, date) AS (
VALUES (1, '2017-10-31'::DATE)
, (1, '2017-08-08'::DATE)
, (1, '2017-08-08'::DATE)
)
SELECT DISTINCT ON (emp_id)
emp_id
, "date" max_date
, LEAD("date") OVER w penultimate_date
, "date" - LEAD("date") OVER w difference
FROM emps
WINDOW w AS (PARTITION BY emp_id)
ORDER BY emp_id, date DESC
When ordered in descending order, the LEAD("date") w will give the value of the date value from the next row.
The DISTINCT ON limits the resultset to 1 row (the first row encountered) per emp_id.
With our ordering this first row must contain the greatest date, and the LEAD(...) over w therefore returns the penultimate date. This gives us the following result:
emp_id | max_date | penultimate_date | difference
--------+------------+------------------+------------
1 | 2017-10-31 | 2017-08-08 | 84
(1 row)

PGSQL duplicate record in same column

i have a table and i want to know where duplicate records are present for same columns. These are my columns and i want to get record where group_id or week are different for same code and fweek and newcode
Id newcode fweek code group_id week
1 343001 2016-01 343 100 8
2 343002 2016-01 343 100 8
3 343001 2016-01 343 101 08
Required record is
Id newcode fweek code group_id week
3 343001 2016-01 343 101 08
To find the duplicate values i have joined the table with itself.
and we need to group the results with code,fweek and newcode to get more than one duplicate rows if they exist. i have used max() to get last inserted row.
you don't need to use is distinct from (it is same for inequality + NULL). if you don't want to compare NULL ones, use <> operator.
You find more information about here info
select r.*
from your_table r
where r.id in (select max(r.id)
from your_table r
join your_table r2 on r2.code = r.code and r2.fweek = r.fweek and r2.newcode = r.newcode
where
r2.group_id is distinct from r.group_id or
r2.week is distinct from r.week
group by r.code,
r.fweek,
r.newcode
having count(*) > 1)

which is more efficient, select array_agg over partition, or select array (subquery)?

I have data like:
group_id | day | amount
----------+-------------+-------
1 | 15 Nov 2015 | 5.0
1 | 15 Nov 2015 | 6.0
1 | 14 Nov 2015 | 3.0
2 | 17 Nov 2015 | 5.0
2 | 15 Nov 2015 | 5.0
and I want to select the top ten amounts for each (group_id, day). I tried writing things like:
Postgres 9.4
select max(x.group_id), max(x.day), max(x.amounts)
from (select group_id, day, array_agg(amount) over w as amounts,
row_number() over w as r
from my_table window w as (partition group_id, day
order by amount desc)) as x
where x.r<=10 group by x.group_id,x.day
It also occurred to me that I could write a much more straightforward query:
select a.day, a.group_id, array(select amount
from my_table
where day=a.day and group_id=a.group_id
order by amount desc limit 10)
from my_table as a group by a.day, a.group_id
Which does exactly what I want. This led me to the question: assuming I can tweak the first example to get what I want, which query would be faster? Is the subquery slower than the partitions ?
You probably should use an analytic function.
Dont know why you also have MAX, MIN outside the subquery. Your querys doesnt seem to be equivalents.
Your request of top 10 by group should be:
WITH ranked as (
SELECT group_id,
day,
row_number() OVER
(partition by group_id, day ORDER BY ammount DESC) rn
FROM my_table
)
SELECT group_id,
day,
array_agg(amount) over (partition by group_id, day ORDER BY rn)
FROM ranked
WHERE rn <=10