PostgreSQL : comparing two sets of results does not work - postgresql

I have a table that contains 3 columns of ids, clothes, shoes, customers and relates them.
I have a query that works fine :
select clothes, shoes from table where customers = 101 (all clothes and shoes of customer 101). This returns
clothes - shoes (SET A)
1 6
1 2
33 12
24 null
Another query that works fine :
select clothes ,shoes from table
where customers in
(select customers from table where clothes = 1 and customers <> 101 ) (all clothes and shoes of any other customer than 101, with specified clothes). This returns
shoes - clothes(SET B)
6 null
null 24
1 1
2 1
12 null
null 26
14 null
Now I want to get all clothes and shoes from SET A that are not in SET B.
So (example) select from SET A where NOT IN SET B. This should return just clothes 33, right?
I try to convert this to a working query :
select clothes, shoes from table where customers = 101
and
(clothes,shoes) not in
(
select clothes,shoes from
table where customers in
(select customers from table where clothes = 1 and customers <> 101 )
) ;
I tried different syntaxes, but the above looks more logic.
Problem is I never get clothes 33, just an empty set.
How do I fix this? What goes wrong?
Thanks
Edit , here is the contents of the table
id shoes customers clothes
1 1 1 1
2 1 4 1
3 1 5 1
4 2 2 2
5 2 3 1
6 1 3 1
44 2 101 1
46 6 101 1
49 12 101 33
51 13 102
52 101 24
59 107 51
60 107 24
62 23 108 51
63 23 108 2
93 124 25
95 6 125
98 127 25
100 3 128
103 24 131
104 25 132
105 102 28
106 10 102
107 23 133
108 4 26
109 6 4
110 4 24
111 12 4
112 14 4
116 102 48
117 102 24
118 102 25
119 102 26
120 102 29
122 134 31

The except clause in PostgreSQL works the way the minus operator does in Oracle. I think that will give you what you want.
I think notionally your query looks right, but I suspect those pesky nulls are impacting your results. Just like a null is not-NOT equal to 5 (it's nothing, therefore it's neither equal to nor not equal to anything), a null is also not-NOT "in" anything...
select clothes, shoes
from table1
where customers = 101
except
select clothes, shoes
from table1
where customers in (
select customers
from table1
where clothes = 1 and customers != 101
)

For PostgreSQL null is undefined value, so You must get rid of potential nulls in your result:
select id,clothes,shoes from t1 where customers = 101 -- or select id...
and (
clothes not in
(
select COALESCE(clothes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101 )
)
OR
shoes not in
(
select COALESCE(shoes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101 )
)
)
if You wanted unique pairs you would use:
select clothes, shoes from t1 where customers = 101
and
(clothes,shoes) not in
(
select coalesce(clothes,-1),coalesce(shoes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101 )
) ;
You can't get "clothes 33" if You are selecting both clothes and shoes columns...
Also if u need to know exactly which column, clothes or shoes was unique to this customer, You might use this little "hack":
select id,clothes,-1 AS shoes from t1 where customers = 101
and
clothes not in
(
select COALESCE(clothes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101)
)
UNION
select id,-1,shoes from t1 where customers = 101
and
shoes not in
(
select COALESCE(shoes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101)
)
And Your result would be:
id=49, clothes=33, shoes=-1
(I assume that there aren't any clothes or shoes with id -1, You may put any exotic value here)
Cheers

Related

PostgreSQL Lag Function between of Two Tables

I am new to Postgresql/Python, so please bear with me!
Assuming we have two tables:
item table having a itemid, price, time.
user table having colums userid, itemid, timecreated, quantity, firstprice, lastprice, difference.
Table examples like :
item table:
itemid price time
RBK 92 1546408800
LBV 51 1546408800
ZBT 49 1546408800
GLS 22 1546408800
DBC 17 1546408800
RBK 91 1546495200
LBV 55 1546495200
ZBT 51 1546495200
GLS 24 1546495200
DBC 28 1546581600
RBK 108 1546581600
LBV 46 1546581600
ZBT 49 1546581600
GLS 21 1546581600
DBC 107 1546581600
In item table all those values comes up with api.
and user table:
userid itemid timecreated quantitty firstprice currentprice difference
1 RBK 1546408800 20
2 RBK 1546408800 15
3 RBK 1546408800 35
3 GLS 1546408800 101
3 DBC 1546495200 140
1 RBV 1546495200 141
2 RBK 1546495200 25
2 RBV 1546581600 31
User table is djangobased table which is user can register\add new items to follow prices.
My struggle access the item table to fetch first price which is having a same timestamp. In that example userid 1 RBK First price (1546408800) must be filling with 92
I did some trick with postgresql with (lag) But this does not seems to be working:
update user
set firstprice = tt.prev_price
from (select item.*,
lag(price) over (partition by itemid order by time) as prev_price
from item
) tt
where tt.id = item.id and
tt.prev_close is distinct from item.price;
I can call current price from the api but didnt find out the way to filling firstprice from the item table. I will be making for a trigger for this query. I searched on google and on stackoverflow but I couldn't find anything that could help me. Thanks in advance.
I can advice next approach (may be not fastest):
update "user" set firstprice = (
select price from "item" i
where i.itemid = "user".itemid and i.time >= "user".timecreated order by i.time limit 1
);
It calculate firstprice using sub-query. Test this SQL here

Problem counting items in an individual row without duplication

I'm trying to write a query that will include a count for the primary and secondary activity only when Group ID = 260 and Item id in(1302,1303,1305,1306) for each individual RecordID. So far I have been able to single out the rows with those conditions, but I only want to count the primary and secondary activities once(because the Primary and Secondary activities are the same for their corresponding RecordID regardless of how many rows there are), if they aren't null, regardless of how many RecordID's match those conditions.
RecordID: GroupID: ItemID: PrimActivity: SecActivity:
320 260 1302 36 0
320 260 6456 36 0
320 312 1303 36 0
560 400 1302 46 48
560 312 1305 46 48
460 260 1305 45 56
460 260 1302 45 56
Result I'm getting:
RecordID: Count:
320 2
460 4
Expected result:
RecordID: Count:
320 1
460 2
SELECT dfr.RecordID,
COUNT(CASE WHEN dfr.PrimActivity <> 0 and a.GroupID =260 then 1
ELSE NULL END) +
COUNT(CASE WHEN dfr.SecActivity <> 0 and a.GroupID =260 then 1 ELSE
NULL END) AS Count
From ActivityItem ai
Join DailyRecord dfr on ai.PrimActivity = dfr.PrimActivity
Join AreaInfo af on af.AreaInfoID = dfr.AreaInfoID
Join Information a on dfr.RecordID = a.RecordID
Join Lookup lp on lp.ItemID = a.ItemID
WHERE a.GroupID like '260' and EXISTS(
SELECT b.RecordID, b.GroupID, b.ItemID
FROM Areainfo b
where a.RecordID=b.RecordID and b.ItemID IN(1302,1303,1305,1306)
GROUP BY dfr.RecordID
You should be more clear when you explain the structure of tables you are using. However, I reach the expected result starting from your sample table doing this:
SELECT RecordID,COUNT(*) as Count
FROM (SELECT DISTINCT RecordID,ItemID,PrimActivity,SecActivity
FROM [TABLE YOU POSTED]
WHERE GroupID = 260 and ItemID in (1302,1303,1305,1306) ) A
GROUP BY RecordID

Ordering row having same values in two columns on top

I have data similar to one given below:
ID UserID PlayerID Name
1 56 21 A
2 57 34 B
3 77 77 C
4 65 23 D
5 77 77 E
I want the rows with same value in UserID and PlayerID column to be at the top.
I have currently done this:
select * from tblTest
order by abs(UserID - PlayerID ) asc
Any better way to achieve this result?
Try this
SELECT * From tblTest
Order By Case When UserID = PlayerID Then 0 Else 1 End

Subselect and Max

Alright, I've been trying to conceptualize this for a better part of the afternoon and still cannot figure out how to structure this subselect.
The data that I need to report are ages for a given student major grouped by the past 3 fiscal years. Each fiscal year has 3 semesters (summer, fall, spring). I need to have my query grouped on the fiscalyear and agerange fields and then count the distinct student id's.
I currently have this for my SQL statement:
Select COUNT(distinct StuID), AgeRange, FiscalYear
from tblStatic
where Campus like 'World%' and (enrl_act like 'REG%' or enrl_act like 'SCH%')
and StuMaj = 'LAWSC' and FiscalYear IN ('09/10', '10/11', '11/12')
group by FiscalYear, AgeRange
order by FiscalYear, AgeRange
So this is all fine and dandy except it doesn't match my headcount of students for the fiscalyear. The reason being, that people may cross over in the age ranges during the fiscal year and is adding them to my count twice.
How can I use a subselect to resolve this duplicate entry? The field I have been trying to get working is my semester field and using a max to find the max semester during a fiscalyear for a given student.
Data Sample:
Count AgeRange FiscalYear
3 1 to 19 09/10
20 20 to 23 09/10
60 24 to 29 09/10
96 30 to 39 09/10
34 40 to 49 09/10
14 50 to 59 09/10
3 60+ 09/10
2 1 to 19 10/11
24 20 to 23 10/11
73 24 to 29 10/11
109 30 to 39 10/11
43 40 to 49 10/11
11 50 to 59 10/11
2 60+ 10/11
1 1 to 19 11/12
17 20 to 23 11/12
75 24 to 29 11/12
123 30 to 39 11/12
44 40 to 49 11/12
14 50 to 59 11/12
2 60+ 11/12
Solution: (Just got this working and produced my headcounts that match what they are suppose to be)
Select COUNT(distinct S.StuID), AR.AgeRange, S.FiscalYear
from tblStatic S
INNER JOIN
( Select S.StuID, MIN(AgeRange) as AgeRange
From tblStatic S
Group By S.StuID) AR on S.StuID=AR.StuID
where Campus like 'World%' and (enrl_act like 'REG%' or
enrl_act like 'SCH%')
and StuMaj = 'LAWSC' and FiscalYear IN ('09/10', '10/11', '11/12')
group by S.FiscalYear, AR.AgeRange
order by S.FiscalYear, AR.AgeRange
Replace each student's age range with its maximum (or minimum, if you like) age range that fiscal year, then count them:
;
WITH sourceData AS (
SELECT
StudID,
MaxAgeRangeThisFiscalYear = MAX(AgeRange) OVER
(PARTITION BY StudID, FiscalYear),
FiscalYear
FROM tblStatic
WHERE Campus LIKE 'World%'
AND (enrl_act LIKE 'REG%' OR enrl_act LIKE 'SCH%')
AND StuMaj = 'LAWSC'
AND FiscalYear IN ('09/10', '10/11', '11/12')
)
SELECT
FiscalYear,
AgeRange = MaxAgeRangeThisFiscalYear,
Count = COUNT(DISTINCT StudID)
FROM sourceData
GROUP BY
FiscalYear,
MaxAgeRangeThisFiscalYear
ORDER BY
FiscalYear,
MaxAgeRangeThisFiscalYear

Extract Unique Time Slices in Oracle

I use Oracle 10g and I have a table that stores a snapshot of data on a person for a given day. Every night an outside process adds new rows to the table for any person whose had any changes to their core data (stored elsewhere). This allows a query to be written using a date to find out what a person 'looked' like on some past day. A new row is added to the table even if only a single aspect of the person has changed--the implication being that many columns have duplicate values from slice to slice since not every detail changed in each snapshot.
Below is a data sample:
SliceID PersonID StartDt Detail1 Detail2 Detail3 Detail4 ...
1 101 08/20/09 Red Vanilla N 23
2 101 08/31/09 Orange Chocolate N 23
3 101 09/15/09 Yellow Chocolate Y 24
4 101 09/16/09 Green Chocolate N 24
5 102 01/10/09 Blue Lemon N 36
6 102 01/11/09 Indigo Lemon N 36
7 102 02/02/09 Violet Lemon Y 36
8 103 07/07/09 Red Orange N 12
9 104 01/31/09 Orange Orange N 12
10 104 10/20/09 Yellow Orange N 13
I need to write a query that pulls out time slices records where some pertinent bits, not the whole record, have changed. So, referring to the above, if I only want to know the slices in which Detail3 has changed from its previous value, then I would expect to only get rows having SliceID 1, 3 and 4 for PersonID 101 and SliceID 5 and 7 for PersonID 102 and SliceID 8 for PersonID 103 and SliceID 9 for PersonID 104.
I'm thinking I should be able to use some sort of Oracle Hierarchical Query (using CONNECT BY [PRIOR]) to get what I want, but I have not figured out how to write it yet. Perhaps YOU can help.
Thanks you for your time and consideration.
Here is my take on the LAG() solution, which is basically the same as that of egorius, but I show my workings ;)
SQL> select * from
2 (
3 select sliceid
4 , personid
5 , startdt
6 , detail3 as new_detail3
7 , lag(detail3) over (partition by personid
8 order by startdt) prev_detail3
9 from some_table
10 )
11 where prev_detail3 is null
12 or ( prev_detail3 != new_detail3 )
13 /
SLICEID PERSONID STARTDT N P
---------- ---------- --------- - -
1 101 20-AUG-09 N
3 101 15-SEP-09 Y N
4 101 16-SEP-09 N Y
5 102 10-JAN-09 N
7 102 02-FEB-09 Y N
8 103 07-JUL-09 N
9 104 31-JAN-09 N
7 rows selected.
SQL>
The point about this solution is that it hauls in results for 103 and 104, who don't have slice records where detail3 has changed. If that is a problem we can apply an additional filtration, to return only rows with changes:
SQL> with subq as (
2 select t.*
3 , row_number () over (partition by personid
4 order by sliceid ) rn
5 from
6 (
7 select sliceid
8 , personid
9 , startdt
10 , detail3 as new_detail3
11 , lag(detail3) over (partition by personid
12 order by startdt) prev_detail3
13 from some_table
14 ) t
15 where t.prev_detail3 is null
16 or ( t.prev_detail3 != t.new_detail3 )
17 )
18 select sliceid
19 , personid
20 , startdt
21 , new_detail3
22 , prev_detail3
23 from subq sq
24 where exists ( select null from subq x
25 where x.personid = sq.personid
26 and x.rn > 1 )
27 order by sliceid
28 /
SLICEID PERSONID STARTDT N P
---------- ---------- --------- - -
1 101 20-AUG-09 N
3 101 15-SEP-09 Y N
4 101 16-SEP-09 N Y
5 102 10-JAN-09 N
7 102 02-FEB-09 Y N
SQL>
edit
As egorius points out in the comments, the OP does want hits for all users, even if they haven't changed, so the first version of the query is the correct solution.
In addition to OMG Ponies' answer: if you need to query slices for all persons, you'll need partition by:
SELECT s.sliceid
, s.personid
FROM (SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (
PARTITION BY t.personid ORDER BY t.startdt
) prev_val
FROM t) s
WHERE (s.prev_val IS NULL OR s.prev_val != s.detail3)
I think you'll have better luck with the LAG function:
SELECT s.sliceid
FROM (SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
FROM TABLE t) s
WHERE s.personid = 101
AND (s.prev_val IS NULL OR s.prev_val != s.detail3)
Subquery Factoring alternative:
WITH slices AS (
SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
FROM TABLE t)
SELECT s.sliceid
FROM slices s
WHERE s.personid = 101
AND (s.prev_val IS NULL OR s.prev_val != s.detail3)