How to find duplicate values after distinct count - postgresql

One piece of code:
select count(distinct for_id),task_text ,status from core.vw_task_new
where code = 'willingness_assessment' and status in ('Completed','Ready') and task_text is not null and dt > '2022-07-18'
group by task_text ,status
gives 647 as the total count.
however the code below:
select count(distinct for_id) from core.vw_task_new
where code = 'willingness_assessment' and status in ('Completed','Ready') and task_text is not null and dt > '2022-07-18'
gives 630 as the count
My question is, how do I get the duplicates that are causing this discrepancy?

May be below query might help you:
select task_text ,status,count(1) from core.vw_task_new
where code = 'willingness_assessment' and status in
('Completed','Ready') and task_text is not null and dt > '2022-07-18'
group by task_text ,status having count(1) > 1
having count(1)>1 returns all the which existed more than once with the combination mentioned in SELECT and GROUP BY clause

Related

How to get unique date from multiple dates based on condition in PostgreSQL?

How do I get unique dates from the table "timelog" for the dates where isParent is not 1.
In the table, 2022-01-10 should not come as the result as this date has isParent as 1.
So far, I have written query like this -
SELECT DISTINCT session::date
FROM timelog
WHERE id IN (SELECT id FROM timelog WHERE isParent = 0)
Obviously, this is not working as intended. What changes do I need to make in this query to make it work?
You can try to use condition aggregate function in HAVING, let your condition which didn't any isParent = 1 date on CASE WHEN
SELECT session::date
FROM timelog
GROUP BY session::date
HAVING COUNT(CASE WHEN isParent = 1 THEN 1 END) = 0
sqlfiddle
SELECT session::date
FROM timelog
GROUP BY session::date
HAVING COUNT(*) filter(where isParent = 1) = 0
SELECT session::date
FROM timelog
GROUP BY session::date
HAVING NOT ARRAY_AGG(isParent) && ARRAY[1]; -- NOT in array containing 1
Something along the lines of
SELECT DISTINCT session::date FROM timelog WHERE isParent = 0
should work. All you are looking for is unique dates where the parent is 0 Correct? No aggregation needs to be applied ?

postgreSQL, first date when cummulative sum reaches mark

I have the following sample table
And the output should be the first date (for each id) when cum_rev reaches the 100 mark.
I tried the following, because I taught with group bz trick and the where condition i will only get the first occurrence of value higher than 100.
SELECT id
,pd
,cum_rev
FROM (
SELECT id
,pd
,rev
,SUM(rev) OVER (
PARTITION BY id
ORDER BY pd
) AS cum_rev
FROM tab1
)
WHERE cum_rev >= 100
GROUP BY id
But it is not working, and I get the following error. And also when I add an alias is not helping
ERROR: subquery in FROM must have an alias LINE 4: FROM (
^ HINT: For example, FROM (SELECT ...) [AS] foo.
So the desired output is:
2 2015-04-02 135.70
3 2015-07-03 102.36
Do I need another approach? Can anyone help?
Thanks
demo:db<>fiddle
SELECT
id, total
FROM (
SELECT
*,
SUM(rev) OVER (PARTITION BY id ORDER BY pd) - rev as prev_total,
SUM(rev) OVER (PARTITION BY id ORDER BY pd) as total
FROM tab1
) s
WHERE total >= 100 AND prev_total < 100
You can use the cumulative SUM() window function for each id group (partition). To find the first which goes over a threshold you need to check the previous value for being under the threshold while the current one meets it.
PS: You got the error because your subquery is missing an alias. In my example its just s

Identifying rows with multiple IDs linked to a unique value

Using ms-sql 2008 r2; am sure this is very straightforward. I am trying to identify where a unique value {ISIN} has been linked to more than 1 Identifier. An example output would be:
isin entity_id
XS0276697439 000BYT-E
XS0276697439 000BYV-E
This is actually an error and I want to look for other instances where there may be more than one entity_id linked to a unique ISIN.
This is my current working but it's obviously not correct:
select isin, entity_id from edm_security_entity_map
where isin is not null
--and isin = ('XS0276697439')
group by isin, entity_id
having COUNT(entity_id) > 1
order by isin asc
Thanks for your help.
Elliot,
I don't have a copy of SQL in front of me right now, so apologies if my syntax isn't spot on.
I'd start by finding the duplicates:
select
x.isin
,count(*)
from edm_security_entity_map as x
group by x.isin
having count(*) > 1
Then join that back to the full table to find where those duplicates come from:
;with DuplicateList as
(
select
x.isin
--,count(*) -- not used elsewhere
from edm_security_entity_map as x
group by x.isin
having count(*) > 1
)
select
map.isin
,map.entity_id
from edm_security_entity_map as map
inner join DuplicateList as dup
on dup.isin = map.isin;
HTH,
Michael
So you're saying that if isin-1 has a row for both entity-1 and entity-2 that's an error but isin-3, say, linked to entity-3 in two separe rows is OK? The ugly-but-readable solution to that is to pre-pend another CTE on the previous solution
;with UniqueValues as
(select distinct
y.isin
,y.entity_id
from edm_security_entity_map as y
)
,DuplicateList as
(
select
x.isin
--,count(*) -- not used elsewhere
from UniqueValues as x
group by x.isin
having count(*) > 1
)
select
map.isin
,map.entity_id
from edm_security_entity_map as map -- or from UniqueValues, depening on your objective.
inner join DuplicateList as dup
on dup.isin = map.isin;
There are better solutions with additional GROUP BY clauses in the final query. If this is going into production I'd be recommending that. Or if your table has a bajillion rows. If you just need to do some analysis the above should suffice, I hope.

Summing Multiple Records by maxdate

I have a table with the following data
Bldg Suit SQFT Date
1 1 1,000 9/24/2012
1 1 1,500 12/31/2011
1 2 800 8/31/2012
1 2 500 10/1/2005
I want to write a query that will sum the max date for each suit record, so the desired result would be 1,800, and must be in one cell/row. This will ultimately be part of subquery, I am just not getting what I expect with the queries I have writtren so far.
Thanks in advance.
You can use the following (See SQL Fiddle with Demo):
select sum(t1.sqft) Total
from yourtable t1
inner join
(
select max(dt) mxdt, suit, bldg
from yourtable
group by suit, bldg
) t2
on t1.dt = t2.mxdt
and t1.bldg = t2.bldg
and t1.suit = t2.suit
; With Data As
(
Select Bldg, Suit, SQFT, Row_Number() Over (Partition By Bldg, Suit Order By Date DESC) As RowID
From YourTableNameHere
)
Select Bldg, Sum(SQFT) As TotalSQFT
From Data
Where RowId = 1
Group By Bldg

Postgresql Update Based on count, min and group by

Thank you for taking the time to look at my question.
I've seen similar questions, but not the same depth. Please help!
I would like to update a column all rows in a table that holds user_id and date_created with the lowest date_created for the user_id.
The following select gives me all the rows I would like to update:
select user_id, min(date_created) from mytable s1 where
(select count(1) from mytable s2 where
s1.user_id = s2.user_id group by s2.user_id)
> 1 group by user_id order by user_id;
I would have expected this update to work:
update mytable set join_status = 1 where date_created =
(select min(date_created) from mytable s1 where
(select count(1) from simplepay_payment s2 where
s1.user_id = s2.user_id group by s2.user_id)
> 1 group by user_id);
But is gave the following error:
ERROR: more than one row returned by a subquery used as an expression
I've tried a few different solutions, but nothing seems to help.
Does anyone have any ideas fro me?
Thanks again.
Change your SQL to:
update mytable set join_status = 1 where date_created IN
(select min(date_created) from mytable s1 where
(select count(1) from simplepay_payment s2 where
s1.user_id = s2.user_id group by s2.user_id)
> 1 group by user_id);
Read more on row comparison in the docs.
EDIT:
In the subquery you're performing GROUP BY user_id. This means that you will receive many rows, based on the number of unique user_id values in your simplepay_payment table.
To make your query working as expected, you should join using 2 columns: user_id and date_created. As you've mentioned, you already have the query that gives you the correct results, so you can use it like this:
WITH desired AS (
SELECT user_id, min(date_created) AS mindt
FROM mytable s1 where
(SELECT count(1) FROM mytable s2
WHERE s1.user_id = s2.user_id GROUP BY s2.user_id) > 1
GROUP BY user_id)
UPDATE mytable m SET join_status = 1 FROM desired d
WHERE d.user_id = m.user_id AND d.mindt = m.date_created;
I've wrapped in your query into the Common Table Expression and used it in the UPDATE statement. You can add RETURNING m.* at the end of the query to see the rows that had been updated and their new values.
You can test this query on SQL Fiddle.
EDIT2:
Common Table Expressions (WITH-queries) are not available before version 9.1 for UPDATE statements. You can simply move the CTE subquery into the update, like this:
UPDATE mytable m SET join_status = 1 FROM (
SELECT user_id, min(date_created) AS mindt
FROM mytable s1 where
(SELECT count(1) FROM mytable s2
WHERE s1.user_id = s2.user_id GROUP BY s2.user_id) > 1
GROUP BY user_id) d
WHERE d.user_id = m.user_id AND d.mindt = m.date_created;