Should I use GROUPING SETS, CUBE, or ROLLUP in Postgres

Should I use GROUPING SETS, CUBE, or ROLLUP in Postgres - postgresql

We just upgraded last month to Postgres 10, so I'm new to a few of its feautures.
So this query requests that I display the days each student is taken care of and require a sum of how many students are taken care of for each weekday
select distinct s.studentnr,(CASE When lower(cd.weekday) like lower('MONDAY')
then 1 else 0 end) as MONDAY,
(CASE When lower(cd.weekday) like lower('TUESDAY')
then 1 else 0 end) as TUESDAY,
(CASE When lower(cd.weekday) like lower('WEDNESDAY')
then 1 else 0 end) as WEDNESDAY,
(CASE When lower(cd.weekday) like lower('THURSDAY')
then 1 else 0 end) as THURSDAY,
(CASE When lower(cd.weekday) like lower('FRIDAY')
then 1 else 0 end) as FRIDAY,
scp.durationid
from student s
full join studentcarepreference scp on s.id = scp.studentid
full join careday cd on cd.studentcarepreferenceid = scp.id
join pupil per on per.id = s.personid
join studentschool ss ON ss.studentid = s.id
join duration d on d.id = sdc.durationid
AND d.id BETWEEN ss.validfrom AND ss.validuntil
where sdc.durationid = 1507
and cd.weekday is not null
order by s.studentnr
where s.studentnr and cd.weekday are both varchar type
resulting in
However I need the following data as follows.
Required result
Which approach is best to use in this kind of query?
new results after change to code
select case grouping(studentnr)
when 0 then studentnr
else count(distinct studentnr)|| ' students'
end studentnr
, count(case lower(cd.weekday) when 'monday' then 1 end) monday
, count(case lower(cd.weekday) when 'tuesday' then 1 end) teusday
, count(case lower(cd.weekday) when 'wednesday' then 1 end) wednesday
, count(case lower(cd.weekday) when 'thursday' then 1 end) thursday
, count(case lower(cd.weekday) when 'friday' then 1 end) friday
from mydata
group by rollup ((studentnr))
order by studentnr
Nearly there I guess, just the results or values are wrong. what would you suggest I look into to correcgt the results?

It looks like you want to ROLLUP yourdata using a GROUPING SET:
select case grouping(studentnr)
when 0 then studentnr
else count(distinct studentnr)|| ' students'
end studentnr
, count(distinct case careday when 'monday' then studentnr end) monday
, count(distinct case careday when 'tuesday' then studentnr end) teusday
, count(distinct case careday when 'wednesday' then studentnr end) wednesday
, count(distinct case careday when 'thursday' then studentnr end) thursday
, count(distinct case careday when 'friday' then studentnr end) friday
, durationid
from yourdata
group by rollup ((studentnr, durationid))
Which yields the desired results:
| studentnr | monday | teusday | wednesday | thursday | friday | durationid |
|------------|--------|---------|-----------|----------|--------|------------|
| 10177 | 1 | 1 | 1 | 1 | 1 | 1507 |
| 717208 | 1 | 1 | 1 | 1 | 1 | 1507 |
| 722301 | 1 | 1 | 1 | 1 | 0 | 1507 |
| 3 students | 3 | 3 | 3 | 3 | 2 | (null) |
The second set of parenthesis in the ROLLUP indicates that studentnr and durationid should be summarized at the same level when doing the roll up.
With just one level of summarization, there's not much difference between ROLLUP and CUBE, however to use GROUPING SETS would require a slight change to the GROUP BY clause in order to get the lowest desired level of detail. All three of the following GROUP BY statements produce equivalent results:
group by rollup ((studentnr, durationid))
group by cube ((studentnr, durationid))
group by grouping sets ((),(studentnr, durationid))

Related

Select dates missing data in a range

I have a postgres table test_table that looks like this:
date | test_hour
------------+-----------
2000-01-01 | 1
2000-01-01 | 2
2000-01-01 | 3
2000-01-02 | 1
2000-01-02 | 2
2000-01-02 | 3
2000-01-02 | 4
2000-01-03 | 1
2000-01-03 | 2
I need to select all the dates which don't have test_hour = 1, 2, and 3, so it should return
date
------------
2000-01-03
Here is what I have tried:
SELECT date FROM test_table WHERE test_hour NOT IN (SELECT generate_series(1,3));
But that only returns dates that have extra hours beyond 1, 2, 3

You can use aggregation and conditional HAVING clauses, like so:
SELECT mydate
FROM mytable
GROUP BY mydate
HAVING
MAX(CASE WHEN test_hour = 1 THEN 1 END) != 1
OR MAX(CASE WHEN test_hour = 2 THEN 1 END) != 1
OR MAX(CASE WHEN test_hour = 3 THEN 1 END) != 1

Another possibility would be to join it against the series (or another subquery containing the hours) and do a [distinct] count on the hours aggregatet per date:
select date from tst
inner join (select generate_series(1,3) "hour") hours on hours.hour = tst.hour
group by tst.date
having count(distinct tst.hour) < 3;
or
select date from tst
where hour in (select generate_series(1,3))
group by date
having count(distinct tst.hour) < 3;
[You don't need the distinct if date/hour combinations in Your table are unique]

A solution using set difference, giving you exactly the rows that are missing:
(SELECT DISTINCT
date, all_hour
FROM test_table
CROSS JOIN generate_series(1,3) all_hour)
EXCEPT
(TABLE test_table)
And a solution using an array aggregate and the array contains operator:
SELECT date
FROM test_table
GROUP BY date
HAVING NOT array_agg(test_hour) #> ARRAY(SELECT generate_series(1,3))
(online demos)

Monthly Counting PostgreSQL giving months

I have a table for customers like this
cust_id | date_signed_up | location_id
-----------------------------------------
1 | 2019/01/01 | 1
2 | 2019/03/05 | 1
3 | 2019/06/17 | 1
What I need is to have a monthly count but having the months even if its 0. Ex:
monthly_count | count
-------------------------
Jan | 1
Feb | 0
Mar | 1
Apr | 0
(months can be in numbers)
Right now I made this query:
SELECT date_trunc('MONTH', (date_signed_up::date)) AS monthly, count(customer_id) AS count FROM customer
WHERE group_id = 1
GROUP BY monthly
ORDER BY monthly asc
but it's giving me just for the months there's information, skipping the ones where it's zero. How can I get all the months even if they have or not information.

You need a list of months.
How to generate Month list in PostgreSQL?
SELECT a.month , count( y.cust_id )
FROM allMonths a
LEFT JOIN yourTable y
ON a.month = date_trunc('MONTH', (date_signed_up::date))
GROUP BY a.month

Count rows within a group, but also from global result set: performance issue

I have a table with log records. Each log record is represented by a status (open or closed) and a date:
CREATE TABLE logs (
id BIGSERIAL PRIMARY KEY,
status VARCHAR NOT NULL,
inserted_at DATE NOT NULL
);
I need to get a daily report with a following information:
how many log records with status = open were created,
how many log records with status = closed were created,
how many log records with status = open exist to this day, including this day.
Here's a sample report output:
day | created | closed | total_open
------------+---------+--------+------------
2017-01-01 | 2 | 0 | 2
2017-01-02 | 2 | 1 | 3
2017-01-03 | 1 | 1 | 3
2017-01-04 | 1 | 0 | 4
2017-01-05 | 1 | 0 | 5
2017-01-06 | 1 | 0 | 6
2017-01-07 | 1 | 0 | 7
2017-01-08 | 0 | 1 | 6
2017-01-09 | 0 | 0 | 6
2017-01-10 | 0 | 0 | 6
(10 rows)
I solved this in a very "dirty" way:
INSERT INTO logs (status, inserted_at) VALUES
('created', '2017-01-01'),
('created', '2017-01-01'),
('closed', '2017-01-02'),
('created', '2017-01-02'),
('created', '2017-01-02'),
('created', '2017-01-03'),
('closed', '2017-01-03'),
('created', '2017-01-04'),
('created', '2017-01-05'),
('created', '2017-01-06'),
('created', '2017-01-07'),
('closed', '2017-01-08');
SELECT days.day,
count(case when logs.inserted_at = days.day AND logs.status = 'created' then 1 end) as created,
count(case when logs.inserted_at = days.day AND logs.status = 'closed' then 1 end) as closed,
count(case when logs.inserted_at <= days.day AND logs.status = 'created' then 1 end) -
count(case when logs.inserted_at <= days.day AND logs.status = 'closed' then 1 end) as total
FROM (SELECT day::date FROM generate_series('2017-01-01'::date, '2017-01-10'::date, '1 day'::interval) day) days,
logs
GROUP BY days.day
ORDER BY days.day;
Also (posted it on gist for brevity), and would like to improve the solution.
Right now explain for my query reveals some ridiculous cost numbers that that I would like to minimize (I don't have indexes yet).
How an efficient query to achieve the report above would look like?

A possible solution is to use window functions:
select s.*, sum(created - closed) over (order by inserted_at)
from (select inserted_at,
count(status) filter (where status = 'created') created,
count(status) filter (where status = 'closed') closed
from (select d::date inserted_at
from generate_series('2017-01-01'::date, '2017-01-10'::date, '1 day'::interval) d) d
left join logs using (inserted_at)
group by inserted_at) s
http://rextester.com/GFRRP71067
Also, an index on (inserted_at, status) could help you a lot with this query.
Note: count(...) filter (where ...) is really just a fancy way to write count(case when ... then ... [else null] end).

Sum(Case when) resulting in multiple rows of the selection

I have a huge table of customer orders and I want to run one query to list orders by month for the past 13 months by 'user_id'. What I have now (below) works but instead of only listing one row per user_id it lists one row for each order the user_id has. Ex: one user has 42 total orders over his life with us so it lists his user_id in 42 rows and each row only has one payment. Typically I would just throw this in a pivot table in excel but I'm over the million row limit so I need for it to be right and have had zero success. I would like for the read out to look like this:
user_id | jul_12 | aug_12 |
123456 | 150.00 | 150.00 |
Not this:
user_id | jul_12 | aug_12 |
123456 | 0.00 | 150.00 |
123456 | 150.00 | 0.00 |
etc. 40 more rows
SELECT ui.user_id,
SUM(CASE WHEN date_part('year', o.time_stamp) = 2012 AND date_part('month', o.time_stamp) = 07 THEN o.amount ELSE 0 END) jul_12,
SUM(CASE WHEN date_part('year', o.time_stamp) = 2012 AND date_part('month', o.time_stamp) = 08 THEN o.amount ELSE 0 END) aug_12,
FROM orders o JOIN users_info ui ON ui.user_id = o.user_id
WHERE user_id = '123456'
GROUP BY ui.user_id, o.time_stamp;

Try something like:
SELECT ui.user_id,
SUM(CASE WHEN date_part('year', o.time_stamp) = 2012 AND date_part('month', o.time_stamp) = 07 THEN o.amount ELSE 0 END) jul_12,
SUM(CASE WHEN date_part('year', o.time_stamp) = 2012 AND date_part('month', o.time_stamp) = 08 THEN o.amount ELSE 0 END) aug_12,
FROM orders o JOIN users_info ui ON ui.user_id = o.user_id
WHERE user_id = '123456'
GROUP BY ui.user_id;
You were getting one row per order because you were grouping by o.time_stamp and timestamps are different for each order.
A shorter version of query:
SELECT ui.user_id,
SUM(CASE WHEN date_trunc('month', o.time_stamp) = to_date('2012 07','YYYY MM') THEN o.amount END) jul_12,
SUM(CASE WHEN date_trunc('month', o.time_stamp) = to_date('2012 08','YYYY MM') THEN o.amount END) aug_12,
FROM orders o
JOIN users_info ui ON ui.user_id = o.user_id
WHERE ui.user_id = '123456'
GROUP BY ui.user_id;

T-SQL Count rows with specific values (Multiple in one query)

I need some help with a T-SQL query. I want to count fields that have a special value(e.g. >1).
Assuming i have a table like:
IGrp | Item | Value1 | Value2
#############################
A | I11 | 0.52 | 1.18
A | I12 | 1.30 | 0.54
A | I21 | 0.49 | 2.37
B | I22 | 2.16 | 1.12
B | I31 | 1.50 | 0.28
I want a result like:
IGrp | V1High | V2High
######################
A | 1 | 2
B | 2 | 1
In my mind this should be going with this expression
SELECT IGrp, COUNT(Value1>1) AS V1High, COUNT(Value2>1) AS V2High
FROM Tbl GROUP BY IGrp
But that's not possible in T-SQL since the Count() does not take boolean values.
So is it really the only possible way to do multiple queries with WHERE Value>1 and COUNT(*) and join them afterwards? Or is there a trick to accomplish the desired result?
Thanks in advance.

SELECT IGrp,
COUNT(CASE WHEN Value1 > 1 THEN 1 ELSE NULL END) AS V1High,
COUNT(CASE WHEN Value2 > 1 THEN 1 ELSE NULL END) AS V2High
FROM Tbl
GROUP BY IGrp

You can use the CASE statement:
SELECT IGrp,
SUM(CASE WHEN Value1>1 THEN 1 ELSE 0 END) AS V1High,
SUM(CASE WHEN Value2>1 THEN 1 ELSE 0 END) AS V2High
FROM Tbl GROUP BY IGrp

make use of case when will do work for you
SELECT IGrp,
sum(case when isnull(Value1,0)>1 then 1 else 0 end) AS V1High,
sum(case when isnull(Value2,0)>1 then 1 else 0 end) AS V2High
FROM Tbl GROUP BY IGrp

SELECT IGrp,
COUNT(CASE WHEN Value1 = 'Foo' THEN 1 ELSE NULL END) AS Tot_Foo,
COUNT(CASE WHEN Value1 = 'Blah' THEN 1 ELSE NULL END) AS Tot_Blah
FROM Tbl
GROUP BY IGrp
This can also be used to compare 2 different values for the same field, with minor changes as shown above.
Very helpful for verifying values that are supposed to exist in a 1:1 ratio.

You can put a CASE .. WHEN .. statement inside the COUNT() functions to return 1 when the conditions hold, NULL otherwise.

You can also use:
select
count(nullif(field > minvalue,false))

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Should I use GROUPING SETS, CUBE, or ROLLUP in Postgres - postgresql

Related

Select dates missing data in a range

Monthly Counting PostgreSQL giving months

Count rows within a group, but also from global result set: performance issue

Sum(Case when) resulting in multiple rows of the selection

T-SQL Count rows with specific values (Multiple in one query)

Categories

Resources