Extract Unique Time Slices in Oracle - oracle10g

I use Oracle 10g and I have a table that stores a snapshot of data on a person for a given day. Every night an outside process adds new rows to the table for any person whose had any changes to their core data (stored elsewhere). This allows a query to be written using a date to find out what a person 'looked' like on some past day. A new row is added to the table even if only a single aspect of the person has changed--the implication being that many columns have duplicate values from slice to slice since not every detail changed in each snapshot.
Below is a data sample:
SliceID PersonID StartDt Detail1 Detail2 Detail3 Detail4 ...
1 101 08/20/09 Red Vanilla N 23
2 101 08/31/09 Orange Chocolate N 23
3 101 09/15/09 Yellow Chocolate Y 24
4 101 09/16/09 Green Chocolate N 24
5 102 01/10/09 Blue Lemon N 36
6 102 01/11/09 Indigo Lemon N 36
7 102 02/02/09 Violet Lemon Y 36
8 103 07/07/09 Red Orange N 12
9 104 01/31/09 Orange Orange N 12
10 104 10/20/09 Yellow Orange N 13
I need to write a query that pulls out time slices records where some pertinent bits, not the whole record, have changed. So, referring to the above, if I only want to know the slices in which Detail3 has changed from its previous value, then I would expect to only get rows having SliceID 1, 3 and 4 for PersonID 101 and SliceID 5 and 7 for PersonID 102 and SliceID 8 for PersonID 103 and SliceID 9 for PersonID 104.
I'm thinking I should be able to use some sort of Oracle Hierarchical Query (using CONNECT BY [PRIOR]) to get what I want, but I have not figured out how to write it yet. Perhaps YOU can help.
Thanks you for your time and consideration.

Here is my take on the LAG() solution, which is basically the same as that of egorius, but I show my workings ;)
SQL> select * from
2 (
3 select sliceid
4 , personid
5 , startdt
6 , detail3 as new_detail3
7 , lag(detail3) over (partition by personid
8 order by startdt) prev_detail3
9 from some_table
10 )
11 where prev_detail3 is null
12 or ( prev_detail3 != new_detail3 )
13 /
SLICEID PERSONID STARTDT N P
---------- ---------- --------- - -
1 101 20-AUG-09 N
3 101 15-SEP-09 Y N
4 101 16-SEP-09 N Y
5 102 10-JAN-09 N
7 102 02-FEB-09 Y N
8 103 07-JUL-09 N
9 104 31-JAN-09 N
7 rows selected.
SQL>
The point about this solution is that it hauls in results for 103 and 104, who don't have slice records where detail3 has changed. If that is a problem we can apply an additional filtration, to return only rows with changes:
SQL> with subq as (
2 select t.*
3 , row_number () over (partition by personid
4 order by sliceid ) rn
5 from
6 (
7 select sliceid
8 , personid
9 , startdt
10 , detail3 as new_detail3
11 , lag(detail3) over (partition by personid
12 order by startdt) prev_detail3
13 from some_table
14 ) t
15 where t.prev_detail3 is null
16 or ( t.prev_detail3 != t.new_detail3 )
17 )
18 select sliceid
19 , personid
20 , startdt
21 , new_detail3
22 , prev_detail3
23 from subq sq
24 where exists ( select null from subq x
25 where x.personid = sq.personid
26 and x.rn > 1 )
27 order by sliceid
28 /
SLICEID PERSONID STARTDT N P
---------- ---------- --------- - -
1 101 20-AUG-09 N
3 101 15-SEP-09 Y N
4 101 16-SEP-09 N Y
5 102 10-JAN-09 N
7 102 02-FEB-09 Y N
SQL>
edit
As egorius points out in the comments, the OP does want hits for all users, even if they haven't changed, so the first version of the query is the correct solution.

In addition to OMG Ponies' answer: if you need to query slices for all persons, you'll need partition by:
SELECT s.sliceid
, s.personid
FROM (SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (
PARTITION BY t.personid ORDER BY t.startdt
) prev_val
FROM t) s
WHERE (s.prev_val IS NULL OR s.prev_val != s.detail3)

I think you'll have better luck with the LAG function:
SELECT s.sliceid
FROM (SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
FROM TABLE t) s
WHERE s.personid = 101
AND (s.prev_val IS NULL OR s.prev_val != s.detail3)
Subquery Factoring alternative:
WITH slices AS (
SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
FROM TABLE t)
SELECT s.sliceid
FROM slices s
WHERE s.personid = 101
AND (s.prev_val IS NULL OR s.prev_val != s.detail3)

Related

PostgreSQL : comparing two sets of results does not work

I have a table that contains 3 columns of ids, clothes, shoes, customers and relates them.
I have a query that works fine :
select clothes, shoes from table where customers = 101 (all clothes and shoes of customer 101). This returns
clothes - shoes (SET A)
1 6
1 2
33 12
24 null
Another query that works fine :
select clothes ,shoes from table
where customers in
(select customers from table where clothes = 1 and customers <> 101 ) (all clothes and shoes of any other customer than 101, with specified clothes). This returns
shoes - clothes(SET B)
6 null
null 24
1 1
2 1
12 null
null 26
14 null
Now I want to get all clothes and shoes from SET A that are not in SET B.
So (example) select from SET A where NOT IN SET B. This should return just clothes 33, right?
I try to convert this to a working query :
select clothes, shoes from table where customers = 101
and
(clothes,shoes) not in
(
select clothes,shoes from
table where customers in
(select customers from table where clothes = 1 and customers <> 101 )
) ;
I tried different syntaxes, but the above looks more logic.
Problem is I never get clothes 33, just an empty set.
How do I fix this? What goes wrong?
Thanks
Edit , here is the contents of the table
id shoes customers clothes
1 1 1 1
2 1 4 1
3 1 5 1
4 2 2 2
5 2 3 1
6 1 3 1
44 2 101 1
46 6 101 1
49 12 101 33
51 13 102
52 101 24
59 107 51
60 107 24
62 23 108 51
63 23 108 2
93 124 25
95 6 125
98 127 25
100 3 128
103 24 131
104 25 132
105 102 28
106 10 102
107 23 133
108 4 26
109 6 4
110 4 24
111 12 4
112 14 4
116 102 48
117 102 24
118 102 25
119 102 26
120 102 29
122 134 31
The except clause in PostgreSQL works the way the minus operator does in Oracle. I think that will give you what you want.
I think notionally your query looks right, but I suspect those pesky nulls are impacting your results. Just like a null is not-NOT equal to 5 (it's nothing, therefore it's neither equal to nor not equal to anything), a null is also not-NOT "in" anything...
select clothes, shoes
from table1
where customers = 101
except
select clothes, shoes
from table1
where customers in (
select customers
from table1
where clothes = 1 and customers != 101
)
For PostgreSQL null is undefined value, so You must get rid of potential nulls in your result:
select id,clothes,shoes from t1 where customers = 101 -- or select id...
and (
clothes not in
(
select COALESCE(clothes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101 )
)
OR
shoes not in
(
select COALESCE(shoes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101 )
)
)
if You wanted unique pairs you would use:
select clothes, shoes from t1 where customers = 101
and
(clothes,shoes) not in
(
select coalesce(clothes,-1),coalesce(shoes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101 )
) ;
You can't get "clothes 33" if You are selecting both clothes and shoes columns...
Also if u need to know exactly which column, clothes or shoes was unique to this customer, You might use this little "hack":
select id,clothes,-1 AS shoes from t1 where customers = 101
and
clothes not in
(
select COALESCE(clothes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101)
)
UNION
select id,-1,shoes from t1 where customers = 101
and
shoes not in
(
select COALESCE(shoes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101)
)
And Your result would be:
id=49, clothes=33, shoes=-1
(I assume that there aren't any clothes or shoes with id -1, You may put any exotic value here)
Cheers

PostgreSQL window function & difference between dates

Suppose I have data formatted in the following way (FYI, total row count is over 30K):
customer_id order_date order_rank
A 2017-02-19 1
A 2017-02-24 2
A 2017-03-31 3
A 2017-07-03 4
A 2017-08-10 5
B 2016-04-24 1
B 2016-04-30 2
C 2016-07-18 1
C 2016-09-01 2
C 2016-09-13 3
I need a 4th column, let's call it days_since_last_order which, in the case where order_rank = 1 then 0 else calculate the number of days since the previous order (with rank n-1).
So, the above would return:
customer_id order_date order_rank days_since_last_order
A 2017-02-19 1 0
A 2017-02-24 2 5
A 2017-03-31 3 35
A 2017-07-03 4 94
A 2017-08-10 5 38
B 2016-04-24 1 0
B 2016-04-30 2 6
C 2016-07-18 1 79
C 2016-09-01 2 45
C 2016-09-13 3 12
Is there an easier way to calculate the above with a window function (or similar) rather than join the entire dataset against itself (eg. on A.order_rank = B.order_rank - 1) and doing the calc?
Thanks!
use the lag window function
SELECT
customer_id
, order_date
, order_rank
, COALESCE(
DATE(order_date)
- DATE(LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date))
, 0)
FROM <table_name>

How can I evaluate data over time in Postgresql?

I need to find users who have posted three times or more, three months in a row. I wrote this query:
select count(id), owneruserid, extract(month from creationdate) as postmonth from posts
group by owneruserid, postmonth
having count(id) >=3
order by owneruserid, postmonth
And I get this:
count owneruserid postmonth
36 -1 1
23 -1 2
45 -1 3
41 -1 4
18 -1 5
24 -1 6
31 -1 7
78 -1 8
83 -1 9
17 -1 10
88 -1 11
127 -1 12
3 6 11
3 7 12
4 8 1
8 8 12
4 12 4
3 12 5
3 22 2
4 22 4
(truncated)
Which is great. How can I query for users who posted three times or more, three months or more in a row? Thanks.
This is called the Islands and Gaps problem, specifically it's an Island problem with a date range. You should,
Fix this question up.
Flag it to be sent to dba.stackexchange.com
To solve this,
Create a pseudo column with a window that has 1 if the row preceding it does not correspond to the preceding mont
Create groups out of that with COUNT()
Check to make sure the count(*) for the group is greater than or equal to three.
Query,
SELECT l.id, creationdaterange, count(*)
FROM (
SELECT t.id,
t.creationdate,
count(range_reset) OVER (PARTITION BY t.id ORDER BY creationdate) AS creationdaterange
FROM (
SELECT id,
creationdate,
CASE
WHEN date_trunc('month',creationdate::date)::date - interval '1 month' = date_trunc('month',lag(creationdate))::date OVER (PARTITION BY id ORDER BY creationdate)
THEN 1
END AS range_reset
FROM post
ORDER BY id, creationdate
) AS t;
) AS l
GROUP BY t.id, creationdaterange
HAVING count(*) >= 3;

Group and Stuff multiple rows based on Count condition

I have a script that runs every 10 minutes and returns table with events from past 24 hours (marked by the script run time)
ID Name TimeOfEvent EventCategory TeamColor
1 Verlene Bucy 2015-01-30 09:10:00.000 1 Blue
2 Geneva Rendon 2015-01-30 09:20:00.000 2 Blue
3 Juliane Hartwig 2015-01-30 09:25:00.000 3 Blue
4 Vina Dutton 2015-01-30 12:55:00.000 2 Red
5 Cristin Lewis 2015-01-30 15:50:00.000 2 Red
6 Reiko Cushman 2015-01-30 17:10:00.000 1 Red
7 Mallie Temme 2015-01-30 18:35:00.000 3 Blue
8 Keshia Seip 2015-01-30 19:55:00.000 2 Blue
9 Rosalia Maher 2015-01-30 20:35:00.000 3 Red
10 Keven Gabel 2015-01-30 21:25:00.000 3 Red
Now I'd like to select two groups of Names based on these conditions:
1) Select Names from same EventCategory having 4 or more records in past 24 hours.
2) Select Names from same EventCategory and same TeamColor having 2 or more records in past 1 hour.
So my result would be:
4+per24h: Geneva Rendon, Vina Dutton, Cristin Lewis, Keshia Seip EventCategory = 2
4+per24h: Juliane Hartwig, Mallie Temme, Rosalia Maher, Keven Gabel EventCategory = 3
2+per1h: Rosalia Maher, Keven Gabel EventCategory = 3, TeamColor = Red
For the first one, I have written this:
SELECT mt.EventCategory, MAX(mt.[name]), MAX(mt.TimeOfEvent), MAX(mt.TeamColor)
FROM #mytable mt
GROUP BY mt.EventCategory
HAVING COUNT(mt.EventCategory) >= 4
because I don't care for the actual time as long as it's in the past 24 hours (and it always is), but I have trouble stuffing the names in one row.
The second part, I have no idea how to do. Because the results need to have both same EventCategory and TeamColor and also be limited by the one hour bracket.
this is possible, but you mix two separate issues. Here you find them combined with UNION:
Just paste this into an empty query window and execute. Adapt to your needs:
DECLARE #tbl TABLE(ID INT,Name VARCHAR(100),TimeOfEvent DATETIME,EventCategory INT,TeamColor VARCHAR(10));
INSERT INTO #tbl VALUES
(1,'Verlene Bucy','2015-01-30T09:10:00.000',1,'Blue')
,(2,'Geneva Rendon','2015-01-30T09:20:00.000',2,'Blue')
,(3,'Juliane Hartwig','2015-01-30T09:25:00.000',3,'Blue')
,(4,'Vina Dutton','2015-01-30T12:55:00.000',2,'Red')
,(5,'Cristin Lewis','2015-01-30T15:50:00.000',2,'Red')
,(6,'Reiko Cushman','2015-01-30T17:10:00.000',1,'Red')
,(7,'Mallie Temme','2015-01-30T18:35:00.000',3,'Blue')
,(8,'Keshia Seip','2015-01-30T19:55:00.000',2,'Blue')
,(9,'Rosalia Maher','2015-01-30T20:35:00.000',3,'Red')
,(10,'Keven Gabel','2015-01-30T21:25:00.000',3,'Red');
WITH Extended AS
(
SELECT *
,DATEDIFF(MINUTE,'2015-01-30T21:26:00.000',TimeOfEvent) AS MinuteDiff --use GETDATE() here...
,COUNT(*) OVER(PARTITION BY EventCategory) AS CountCategory
FROM #tbl AS tbl
)
,Filtered24Hours AS
(
SELECT *
FROM Extended
WHERE CountCategory >=4
)
,Filtered60Mins AS
(
SELECT *
FROM Extended
WHERE MinuteDiff >=-60
AND CountCategory >=2
)
SELECT DISTINCT (SELECT COUNT(*) FROM Filtered24Hours AS x WHERE x.EventCategory=outerSource.EventCategory) AS CountNames
,'per24h' AS TimeIntervall
,STUFF((
SELECT ' ,' + innerSource.Name
FROM Filtered24Hours AS innerSource
WHERE innerSource.EventCategory=outerSource.EventCategory
ORDER BY innerSource.TimeOfEvent
FOR XML PATH('')
),1,2,'') AS Names
,EventCategory
,NULL
FROM Filtered24Hours AS outerSource
UNION
SELECT DISTINCT (SELECT COUNT(*) FROM Filtered60Mins AS x WHERE x.EventCategory=outerSource.EventCategory)
,'per1h'
,STUFF((
SELECT ' ,' + innerSource.Name
FROM Filtered60Mins AS innerSource
WHERE innerSource.EventCategory=outerSource.EventCategory
ORDER BY innerSource.TimeOfEvent
FOR XML PATH('')
),1,2,'')
,EventCategory
,TeamColor
FROM Filtered60Mins AS outerSource
The result
Count Interv Names Category Team
4 per24h Geneva Rendon ,Vina Dutton ,Cristin Lewis ,Keshia Seip 2 NULL
4 per24h Juliane Hartwig ,Mallie Temme ,Rosalia Maher ,Keven Gabel 3 NULL
2 per1h Rosalia Maher ,Keven Gabel 3 Red

While loop to add data for pivot

Currently i have a requirement which needs a table to look like this:
Instrument Long Short 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 ....
Fixed 41 41 35 35 35 35 35 35 35 53 25 25
Index 16 16 22 22 22 32 12 12 12 12 12 12
Credits 29 29 41 16 16 16 16 16 16 16 16 16
Short term 12 12 5 5 5 5 5 5 5 5 5 17
My worktable looks like the following:
Instrument Long Short Annual Coupon Maturity Date Instrument ID
Fixed 10 10 10 01/01/2025 1
Index 5 5 10 10/05/2016 2
Credits 15 15 16 25/06/2020 3
Short term 12 12 5 31/10/2022 4
Fixed 13 13 15 31/03/2030 5
Fixed 18 18 10 31/01/2019 6
Credits 14 14 11 31/12/2013 7
Index 11 11 12 31/10/2040 8
..... etc
So basically the long and the short in the pivot should be the sum of each distinct instrument ID. And then for each year i need to take the sum of each Annual Coupon until the maturity date year where the long and the coupon rate are added together.
My thinking was that i had to create a while loop which would populate a table with a record for each year for each instrument until the maturity date, so that i could then pivot using an sql pivot some how. Does this seem feasible? Any other ideas on the best way of doing this, particularly i might need help on the while loop?
The following solution uses a numbers table to unfold ranges in your table, performs some special processing on some of the data columns in the unfolded set, and finally pivots the results:
WITH unfolded AS (
SELECT
t.Instrument,
Long = SUM(z.Long ) OVER (PARTITION BY Instrument),
Short = SUM(z.Short) OVER (PARTITION BY Instrument),
Year = y.Number,
YearValue = t.AnnualCoupon + z.Long + z.Short
FROM YourTable t
CROSS APPLY (SELECT YEAR(t.MaturityDate)) x (Year)
INNER JOIN numbers y ON y.Number BETWEEN YEAR(GETDATE()) AND x.Year
CROSS APPLY (
SELECT
Long = CASE y.Number WHEN x.Year THEN t.Long ELSE 0 END,
Short = CASE y.Number WHEN x.Year THEN t.Short ELSE 0 END
) z (Long, Short)
),
pivoted AS (
SELECT *
FROM unfolded
PIVOT (
SUM(YearValue) FOR Year IN ([2013], [2014], [2015], [2016], [2017], [2018], [2019], [2020],
[2021], [2022], [2023], [2024], [2025], [2026], [2027], [2028], [2029], [2030],
[2031], [2032], [2033], [2034], [2035], [2036], [2037], [2038], [2039], [2040])
) p
)
SELECT *
FROM pivoted
;
It returns results for a static range years. To use it for a dynamically calculated year range, you'll first need to prepare the list of years as a CSV string, something like this:
SET #columnlist = STUFF(
(
SELECT ', [' + CAST(Number) + ']'
FROM numbers
WHERE Number BETWEEN YEAR(GETDATE())
AND (SELECT YEAR(MAX(MaturityDate)) FROM YourTable)
ORDER BY Number
FOR XML PATH ('')
),
1, 2, ''
);
then put it into the dynamic SQL version of the query:
SET #sql = N'
WITH unfolded AS (
...
PIVOT (
SUM(YearValue) FOR Year IN (' + #columnlist + ')
) p
)
SELECT *
FROM pivoted;
';
and execute the result:
EXECUTE(#sql);
You can try this solution at SQL Fiddle.