Pivoting while grouping in postgres - postgresql

I've been using crosstab in postgres to pivot a table, but am now needing to add in a grouping and I'm not sure if that's possible.
I'm starting with results like this:
Date Account# Type Count
-----------------------------------------
2020/1/1 100 Red 5
2020/1/1 100 Blue 3
2020/1/1 100 Yellow 7
2020/1/2 100 Red 2
2020/1/2 100 Yellow 9
2020/1/1 101 Red 4
2020/1/1 101 Blue 7
2020/1/1 101 Yellow 3
2020/1/2 101 Red 8
2020/1/2 101 Blue 6
2020/1/2 101 Yellow 4
And I'd like to pivot it like this, where there's a row for each combination of date and account #:
Date Account# Red Blue Yellow
---------------------------------------------
2020/1/1 100 5 3 7
2020/1/2 100 2 0 9
2020/1/1 101 4 7 3
2020/1/2 101 8 6 4
This is the code I've written returns the error "The provided SQL must return 3 columns: rowid, category, and values" which makes sense per my understanding of crosstab.
SELECT *
FROM crosstab(
SELECT date, account_number, type, count
FROM table
ORDER BY 2,1,3'
) AS ct (date timestamp, account_number varchar, Red bigint, Blue bigint, Yellow bigint);
(I wrote the dates in a simplified format in the example tables but they are timestamps)
Is there a different way I can manipulate the first table to look like the second? Thank you!

You can do conditional aggregation:
select
date,
account#,
sum(cnt) filter(where type = 'Red' ) red,
sum(cnt) filter(where type = 'Blue' ) blue,
sum(cnt) filter(where type = 'Yellow') yellow
from mytable
group by date, account#

Related

Calculate percentage difference between two rows

I have this query that produced the table below.
select season,
guildname,
count(guildname) as mp_count,
(count(guildname)/600::float)*100 as grank
from mp_rankings
group by season, guildname
order by grank desc
season
guildname
mp_count
grank
10
LEGENDS
56
9.33333333333333
9
LEGENDS
54
9
10
EVERGLADE
50
8.33333333333333
9
Mystic
46
7.66666666666667
10
Mystic
42
7
9
EVERGLADE
39
6.5
10
100
36
6
9
PARABELLUM
33
5.5
10
PARABELLUM
29
4.83333333333333
9
100
29
4.83333333333333
I wanted to create a new column that calculates the percentage difference between the two seasons using identical guildnames. For example:
season
guildname
mp_count
grank
prev_season_percent_diff
10
LEGENDS
56
9.33333333333333
0.33%
10
EVERGLADE
50
8.33333333333333
1.83%
The resulting table will only show the current season (which is the highest season value, 10 in this case) and adds a new column prev_season_percent_diff, which is the current season's grank minus the previous season's grank.
How can I achieve this?
Use a Common Table Expression ("CTE") for the grouped result and join it to itself to calculate the difference to the previous season:
with summary as (
select
season,
guildname,
count(*) as mp_count, -- simplified equivalent expression
count(*)/6 as grank -- simplified equivalent expression
from mp_rankings
group by season, guildname
)
select
a.season,
a.guildname,
a.mp_count,
a.grank,
a.mp_count - b.mp_count as prev_season_percent_diff
from summary a
left join summary b on b.guildname = a.guildname
and b.season = a.season - 1
where a.season = (select max(season) from summary)
order by a.grank desc
If you actually want a % in the result, concatenate a % to the difference calculation.

Group and Stuff multiple rows based on Count condition

I have a script that runs every 10 minutes and returns table with events from past 24 hours (marked by the script run time)
ID Name TimeOfEvent EventCategory TeamColor
1 Verlene Bucy 2015-01-30 09:10:00.000 1 Blue
2 Geneva Rendon 2015-01-30 09:20:00.000 2 Blue
3 Juliane Hartwig 2015-01-30 09:25:00.000 3 Blue
4 Vina Dutton 2015-01-30 12:55:00.000 2 Red
5 Cristin Lewis 2015-01-30 15:50:00.000 2 Red
6 Reiko Cushman 2015-01-30 17:10:00.000 1 Red
7 Mallie Temme 2015-01-30 18:35:00.000 3 Blue
8 Keshia Seip 2015-01-30 19:55:00.000 2 Blue
9 Rosalia Maher 2015-01-30 20:35:00.000 3 Red
10 Keven Gabel 2015-01-30 21:25:00.000 3 Red
Now I'd like to select two groups of Names based on these conditions:
1) Select Names from same EventCategory having 4 or more records in past 24 hours.
2) Select Names from same EventCategory and same TeamColor having 2 or more records in past 1 hour.
So my result would be:
4+per24h: Geneva Rendon, Vina Dutton, Cristin Lewis, Keshia Seip EventCategory = 2
4+per24h: Juliane Hartwig, Mallie Temme, Rosalia Maher, Keven Gabel EventCategory = 3
2+per1h: Rosalia Maher, Keven Gabel EventCategory = 3, TeamColor = Red
For the first one, I have written this:
SELECT mt.EventCategory, MAX(mt.[name]), MAX(mt.TimeOfEvent), MAX(mt.TeamColor)
FROM #mytable mt
GROUP BY mt.EventCategory
HAVING COUNT(mt.EventCategory) >= 4
because I don't care for the actual time as long as it's in the past 24 hours (and it always is), but I have trouble stuffing the names in one row.
The second part, I have no idea how to do. Because the results need to have both same EventCategory and TeamColor and also be limited by the one hour bracket.
this is possible, but you mix two separate issues. Here you find them combined with UNION:
Just paste this into an empty query window and execute. Adapt to your needs:
DECLARE #tbl TABLE(ID INT,Name VARCHAR(100),TimeOfEvent DATETIME,EventCategory INT,TeamColor VARCHAR(10));
INSERT INTO #tbl VALUES
(1,'Verlene Bucy','2015-01-30T09:10:00.000',1,'Blue')
,(2,'Geneva Rendon','2015-01-30T09:20:00.000',2,'Blue')
,(3,'Juliane Hartwig','2015-01-30T09:25:00.000',3,'Blue')
,(4,'Vina Dutton','2015-01-30T12:55:00.000',2,'Red')
,(5,'Cristin Lewis','2015-01-30T15:50:00.000',2,'Red')
,(6,'Reiko Cushman','2015-01-30T17:10:00.000',1,'Red')
,(7,'Mallie Temme','2015-01-30T18:35:00.000',3,'Blue')
,(8,'Keshia Seip','2015-01-30T19:55:00.000',2,'Blue')
,(9,'Rosalia Maher','2015-01-30T20:35:00.000',3,'Red')
,(10,'Keven Gabel','2015-01-30T21:25:00.000',3,'Red');
WITH Extended AS
(
SELECT *
,DATEDIFF(MINUTE,'2015-01-30T21:26:00.000',TimeOfEvent) AS MinuteDiff --use GETDATE() here...
,COUNT(*) OVER(PARTITION BY EventCategory) AS CountCategory
FROM #tbl AS tbl
)
,Filtered24Hours AS
(
SELECT *
FROM Extended
WHERE CountCategory >=4
)
,Filtered60Mins AS
(
SELECT *
FROM Extended
WHERE MinuteDiff >=-60
AND CountCategory >=2
)
SELECT DISTINCT (SELECT COUNT(*) FROM Filtered24Hours AS x WHERE x.EventCategory=outerSource.EventCategory) AS CountNames
,'per24h' AS TimeIntervall
,STUFF((
SELECT ' ,' + innerSource.Name
FROM Filtered24Hours AS innerSource
WHERE innerSource.EventCategory=outerSource.EventCategory
ORDER BY innerSource.TimeOfEvent
FOR XML PATH('')
),1,2,'') AS Names
,EventCategory
,NULL
FROM Filtered24Hours AS outerSource
UNION
SELECT DISTINCT (SELECT COUNT(*) FROM Filtered60Mins AS x WHERE x.EventCategory=outerSource.EventCategory)
,'per1h'
,STUFF((
SELECT ' ,' + innerSource.Name
FROM Filtered60Mins AS innerSource
WHERE innerSource.EventCategory=outerSource.EventCategory
ORDER BY innerSource.TimeOfEvent
FOR XML PATH('')
),1,2,'')
,EventCategory
,TeamColor
FROM Filtered60Mins AS outerSource
The result
Count Interv Names Category Team
4 per24h Geneva Rendon ,Vina Dutton ,Cristin Lewis ,Keshia Seip 2 NULL
4 per24h Juliane Hartwig ,Mallie Temme ,Rosalia Maher ,Keven Gabel 3 NULL
2 per1h Rosalia Maher ,Keven Gabel 3 Red

Postgresql: Grouping with limit on group size using window functions

Is there a way in Postgresql to write a query which groups rows based on a column with a limit without discarding additional rows.
Say I've got a table with three columns id, color, score with the following rows
1 red 10.0
2 red 7.0
3 red 3.0
4 blue 5.0
5 green 4.0
6 blue 2.0
7 blue 1.0
I can get a grouping based on color with window functions with the following query
SELECT * FROM (
SELECT id, color, score, rank()
OVER (PARTITION BY color ORDER BY score DESC)
FROM grouping_test
) AS foo WHERE rank <= 2;
with the result
id | color | score | rank
----+-------+-------+------
4 | blue | 5.0 | 1
6 | blue | 2.0 | 2
5 | green | 4.0 | 1
1 | red | 10.0 | 1
2 | red | 7.0 | 2
which discards item with ranks > 2. However what I need is a result like
1 red 10.0
2 red 7.0
4 blue 5.0
6 blue 2.0
5 green 4.0
3 red 3.0
7 blue 1.0
With no discarded rows.
Edit:
To be more precise about the logic I need:
Get me the row with the highest score
The next row with the same color and the highest possible score
The item with the highest score of the remaining items
Same as 2., but for the row from 3.
...
Continue as long as pairs with the same color can be found, then order whats left by descending score.
The import statements for a test table can be found here.
Thanks for your help.
It can be done using two nested window functions
SELECT
id
FROM (
SELECT
id,
color,
score,
((rank() OVER color_window) - 1) / 2 AS rank_window_id
FROM grouping_test
WINDOW color_window AS (PARTITION BY color ORDER BY score DESC)
) as foo
WINDOW rank_window AS (PARTITION BY (color, rank_window_id))
ORDER BY
(max(score) OVER rank_window) DESC,
color;
With 2 being the parameter of the group size.
You can do ORDER BY (rank <= 2) DESC to get the rows with rank<=2 above all else:
SELECT id,color,score FROM (
SELECT id, color, score, rank()
OVER (PARTITION BY color ORDER BY score DESC),
max(score) OVER (PARTITION BY color) mx
FROM grouping_test
) AS foo
ORDER BY
(rank <= 2) DESC,
CASE WHEN rank<=2 THEN mx ELSE NULL END DESC,
id;
http://sqlfiddle.com/#!12/bbcfc/109

How to hide duplicates values in column

I have 2 tables in sql table_a and table_b this is the output: (1 to many relationship)
table_a table_b
id_no (pk) name id_no (fk) id_tabl(pk) order_code order_item
1 a 1 1 11 aple
1 a 1 2 12 orange
1 a 1 3 13 ice
2 b 2 4 12 orange
2 b 2 5 13 ice
3 c 3 6 13 ice
3 c 3 7 12 orange
3 c 3 8 11 aple
I want to display only 1 name with all his order_item.
How can I display it using iReport in the xml?
The output sample:
id_no name order_item
1 a aple
orange
ice
2 b orange
ice
3 c ice
orange
aple
Using only 2 (order_item) field pattern in every pages of my invoice
the other display will be displayed in invoice pages 2.
You should use Data Grouping.
You can read this article about data grouping.
Your can use the query like this:
SELECT table_a.id_no, table_a.name, table_b.order_item FROM table_a, table_b WHERE table_a.id_no=table_b.id_no ORDER BY table_a.name
Note: may be you need to add sort by table_a.id_no column.
You should create the iReport's group for the name field
Note: may be you need to create two groups - for id_no and name fields.
You can use the Group and Details bands for drawing the data row. Or you can put all textField elements to the Detail band. In this case you should set false value to the isPrintRepeatedValues textField's property (for id_no and name fields).

Extract Unique Time Slices in Oracle

I use Oracle 10g and I have a table that stores a snapshot of data on a person for a given day. Every night an outside process adds new rows to the table for any person whose had any changes to their core data (stored elsewhere). This allows a query to be written using a date to find out what a person 'looked' like on some past day. A new row is added to the table even if only a single aspect of the person has changed--the implication being that many columns have duplicate values from slice to slice since not every detail changed in each snapshot.
Below is a data sample:
SliceID PersonID StartDt Detail1 Detail2 Detail3 Detail4 ...
1 101 08/20/09 Red Vanilla N 23
2 101 08/31/09 Orange Chocolate N 23
3 101 09/15/09 Yellow Chocolate Y 24
4 101 09/16/09 Green Chocolate N 24
5 102 01/10/09 Blue Lemon N 36
6 102 01/11/09 Indigo Lemon N 36
7 102 02/02/09 Violet Lemon Y 36
8 103 07/07/09 Red Orange N 12
9 104 01/31/09 Orange Orange N 12
10 104 10/20/09 Yellow Orange N 13
I need to write a query that pulls out time slices records where some pertinent bits, not the whole record, have changed. So, referring to the above, if I only want to know the slices in which Detail3 has changed from its previous value, then I would expect to only get rows having SliceID 1, 3 and 4 for PersonID 101 and SliceID 5 and 7 for PersonID 102 and SliceID 8 for PersonID 103 and SliceID 9 for PersonID 104.
I'm thinking I should be able to use some sort of Oracle Hierarchical Query (using CONNECT BY [PRIOR]) to get what I want, but I have not figured out how to write it yet. Perhaps YOU can help.
Thanks you for your time and consideration.
Here is my take on the LAG() solution, which is basically the same as that of egorius, but I show my workings ;)
SQL> select * from
2 (
3 select sliceid
4 , personid
5 , startdt
6 , detail3 as new_detail3
7 , lag(detail3) over (partition by personid
8 order by startdt) prev_detail3
9 from some_table
10 )
11 where prev_detail3 is null
12 or ( prev_detail3 != new_detail3 )
13 /
SLICEID PERSONID STARTDT N P
---------- ---------- --------- - -
1 101 20-AUG-09 N
3 101 15-SEP-09 Y N
4 101 16-SEP-09 N Y
5 102 10-JAN-09 N
7 102 02-FEB-09 Y N
8 103 07-JUL-09 N
9 104 31-JAN-09 N
7 rows selected.
SQL>
The point about this solution is that it hauls in results for 103 and 104, who don't have slice records where detail3 has changed. If that is a problem we can apply an additional filtration, to return only rows with changes:
SQL> with subq as (
2 select t.*
3 , row_number () over (partition by personid
4 order by sliceid ) rn
5 from
6 (
7 select sliceid
8 , personid
9 , startdt
10 , detail3 as new_detail3
11 , lag(detail3) over (partition by personid
12 order by startdt) prev_detail3
13 from some_table
14 ) t
15 where t.prev_detail3 is null
16 or ( t.prev_detail3 != t.new_detail3 )
17 )
18 select sliceid
19 , personid
20 , startdt
21 , new_detail3
22 , prev_detail3
23 from subq sq
24 where exists ( select null from subq x
25 where x.personid = sq.personid
26 and x.rn > 1 )
27 order by sliceid
28 /
SLICEID PERSONID STARTDT N P
---------- ---------- --------- - -
1 101 20-AUG-09 N
3 101 15-SEP-09 Y N
4 101 16-SEP-09 N Y
5 102 10-JAN-09 N
7 102 02-FEB-09 Y N
SQL>
edit
As egorius points out in the comments, the OP does want hits for all users, even if they haven't changed, so the first version of the query is the correct solution.
In addition to OMG Ponies' answer: if you need to query slices for all persons, you'll need partition by:
SELECT s.sliceid
, s.personid
FROM (SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (
PARTITION BY t.personid ORDER BY t.startdt
) prev_val
FROM t) s
WHERE (s.prev_val IS NULL OR s.prev_val != s.detail3)
I think you'll have better luck with the LAG function:
SELECT s.sliceid
FROM (SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
FROM TABLE t) s
WHERE s.personid = 101
AND (s.prev_val IS NULL OR s.prev_val != s.detail3)
Subquery Factoring alternative:
WITH slices AS (
SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
FROM TABLE t)
SELECT s.sliceid
FROM slices s
WHERE s.personid = 101
AND (s.prev_val IS NULL OR s.prev_val != s.detail3)