Sum(Case when) resulting in multiple rows of the selection - postgresql

I have a huge table of customer orders and I want to run one query to list orders by month for the past 13 months by 'user_id'. What I have now (below) works but instead of only listing one row per user_id it lists one row for each order the user_id has. Ex: one user has 42 total orders over his life with us so it lists his user_id in 42 rows and each row only has one payment. Typically I would just throw this in a pivot table in excel but I'm over the million row limit so I need for it to be right and have had zero success. I would like for the read out to look like this:
user_id | jul_12 | aug_12 |
123456 | 150.00 | 150.00 |
Not this:
user_id | jul_12 | aug_12 |
123456 | 0.00 | 150.00 |
123456 | 150.00 | 0.00 |
etc. 40 more rows
SELECT ui.user_id,
SUM(CASE WHEN date_part('year', o.time_stamp) = 2012 AND date_part('month', o.time_stamp) = 07 THEN o.amount ELSE 0 END) jul_12,
SUM(CASE WHEN date_part('year', o.time_stamp) = 2012 AND date_part('month', o.time_stamp) = 08 THEN o.amount ELSE 0 END) aug_12,
FROM orders o JOIN users_info ui ON ui.user_id = o.user_id
WHERE user_id = '123456'
GROUP BY ui.user_id, o.time_stamp;

Try something like:
SELECT ui.user_id,
SUM(CASE WHEN date_part('year', o.time_stamp) = 2012 AND date_part('month', o.time_stamp) = 07 THEN o.amount ELSE 0 END) jul_12,
SUM(CASE WHEN date_part('year', o.time_stamp) = 2012 AND date_part('month', o.time_stamp) = 08 THEN o.amount ELSE 0 END) aug_12,
FROM orders o JOIN users_info ui ON ui.user_id = o.user_id
WHERE user_id = '123456'
GROUP BY ui.user_id;
You were getting one row per order because you were grouping by o.time_stamp and timestamps are different for each order.
A shorter version of query:
SELECT ui.user_id,
SUM(CASE WHEN date_trunc('month', o.time_stamp) = to_date('2012 07','YYYY MM') THEN o.amount END) jul_12,
SUM(CASE WHEN date_trunc('month', o.time_stamp) = to_date('2012 08','YYYY MM') THEN o.amount END) aug_12,
FROM orders o
JOIN users_info ui ON ui.user_id = o.user_id
WHERE ui.user_id = '123456'
GROUP BY ui.user_id;

Related

Select dates missing data in a range

I have a postgres table test_table that looks like this:
date | test_hour
------------+-----------
2000-01-01 | 1
2000-01-01 | 2
2000-01-01 | 3
2000-01-02 | 1
2000-01-02 | 2
2000-01-02 | 3
2000-01-02 | 4
2000-01-03 | 1
2000-01-03 | 2
I need to select all the dates which don't have test_hour = 1, 2, and 3, so it should return
date
------------
2000-01-03
Here is what I have tried:
SELECT date FROM test_table WHERE test_hour NOT IN (SELECT generate_series(1,3));
But that only returns dates that have extra hours beyond 1, 2, 3
You can use aggregation and conditional HAVING clauses, like so:
SELECT mydate
FROM mytable
GROUP BY mydate
HAVING
MAX(CASE WHEN test_hour = 1 THEN 1 END) != 1
OR MAX(CASE WHEN test_hour = 2 THEN 1 END) != 1
OR MAX(CASE WHEN test_hour = 3 THEN 1 END) != 1
Another possibility would be to join it against the series (or another subquery containing the hours) and do a [distinct] count on the hours aggregatet per date:
select date from tst
inner join (select generate_series(1,3) "hour") hours on hours.hour = tst.hour
group by tst.date
having count(distinct tst.hour) < 3;
or
select date from tst
where hour in (select generate_series(1,3))
group by date
having count(distinct tst.hour) < 3;
[You don't need the distinct if date/hour combinations in Your table are unique]
A solution using set difference, giving you exactly the rows that are missing:
(SELECT DISTINCT
date, all_hour
FROM test_table
CROSS JOIN generate_series(1,3) all_hour)
EXCEPT
(TABLE test_table)
And a solution using an array aggregate and the array contains operator:
SELECT date
FROM test_table
GROUP BY date
HAVING NOT array_agg(test_hour) #> ARRAY(SELECT generate_series(1,3))
(online demos)

Should I use GROUPING SETS, CUBE, or ROLLUP in Postgres

We just upgraded last month to Postgres 10, so I'm new to a few of its feautures.
So this query requests that I display the days each student is taken care of and require a sum of how many students are taken care of for each weekday
select distinct s.studentnr,(CASE When lower(cd.weekday) like lower('MONDAY')
then 1 else 0 end) as MONDAY,
(CASE When lower(cd.weekday) like lower('TUESDAY')
then 1 else 0 end) as TUESDAY,
(CASE When lower(cd.weekday) like lower('WEDNESDAY')
then 1 else 0 end) as WEDNESDAY,
(CASE When lower(cd.weekday) like lower('THURSDAY')
then 1 else 0 end) as THURSDAY,
(CASE When lower(cd.weekday) like lower('FRIDAY')
then 1 else 0 end) as FRIDAY,
scp.durationid
from student s
full join studentcarepreference scp on s.id = scp.studentid
full join careday cd on cd.studentcarepreferenceid = scp.id
join pupil per on per.id = s.personid
join studentschool ss ON ss.studentid = s.id
join duration d on d.id = sdc.durationid
AND d.id BETWEEN ss.validfrom AND ss.validuntil
where sdc.durationid = 1507
and cd.weekday is not null
order by s.studentnr
where s.studentnr and cd.weekday are both varchar type
resulting in
However I need the following data as follows.
Required result
Which approach is best to use in this kind of query?
new results after change to code
select case grouping(studentnr)
when 0 then studentnr
else count(distinct studentnr)|| ' students'
end studentnr
, count(case lower(cd.weekday) when 'monday' then 1 end) monday
, count(case lower(cd.weekday) when 'tuesday' then 1 end) teusday
, count(case lower(cd.weekday) when 'wednesday' then 1 end) wednesday
, count(case lower(cd.weekday) when 'thursday' then 1 end) thursday
, count(case lower(cd.weekday) when 'friday' then 1 end) friday
from mydata
group by rollup ((studentnr))
order by studentnr
Nearly there I guess, just the results or values are wrong. what would you suggest I look into to correcgt the results?
It looks like you want to ROLLUP yourdata using a GROUPING SET:
select case grouping(studentnr)
when 0 then studentnr
else count(distinct studentnr)|| ' students'
end studentnr
, count(distinct case careday when 'monday' then studentnr end) monday
, count(distinct case careday when 'tuesday' then studentnr end) teusday
, count(distinct case careday when 'wednesday' then studentnr end) wednesday
, count(distinct case careday when 'thursday' then studentnr end) thursday
, count(distinct case careday when 'friday' then studentnr end) friday
, durationid
from yourdata
group by rollup ((studentnr, durationid))
Which yields the desired results:
| studentnr | monday | teusday | wednesday | thursday | friday | durationid |
|------------|--------|---------|-----------|----------|--------|------------|
| 10177 | 1 | 1 | 1 | 1 | 1 | 1507 |
| 717208 | 1 | 1 | 1 | 1 | 1 | 1507 |
| 722301 | 1 | 1 | 1 | 1 | 0 | 1507 |
| 3 students | 3 | 3 | 3 | 3 | 2 | (null) |
The second set of parenthesis in the ROLLUP indicates that studentnr and durationid should be summarized at the same level when doing the roll up.
With just one level of summarization, there's not much difference between ROLLUP and CUBE, however to use GROUPING SETS would require a slight change to the GROUP BY clause in order to get the lowest desired level of detail. All three of the following GROUP BY statements produce equivalent results:
group by rollup ((studentnr, durationid))
group by cube ((studentnr, durationid))
group by grouping sets ((),(studentnr, durationid))

Get Data Week Wise in SQL Server

I have a Table with columns ProductId, DateofPurchase, Quantity.
I want a report in which week it belongs to.
Suppose if I give March Month I can get the quantity for the march month.
But I want as below if I give date as parameter.
Here Quantity available for March month on 23/03/2018 is 100
Material Code Week1 Week2 Week3 Week4
12475 - - - 100
The logic is 1-7 first week, 8-15 second week, 16-23 third week, 24-30 fourth week
#Sasi, this can get you started. YOu will need to use CTE to build a template table that describes what happens yearly. Then using your table with inner join you can link it up and do a pivot to group the weeks.
Let me know if you need any tweaking.
DECLARE #StartDate DATE='20180101'
DECLARE #EndDate DATE='20180901'
DECLARE #Dates TABLE(
Workdate DATE Primary Key
)
DECLARE #tbl TABLE(ProductId INT, DateofPurchase DATE, Quantity INT);
INSERT INTO #tbl
SELECT 12475, '20180623', 100
;WITH Dates AS(
SELECT Workdate=#StartDate,WorkMonth=DATENAME(MONTH,#StartDate),WorkYear=YEAR(#StartDate), WorkWeek=datename(wk, #StartDate )
UNION ALL
SELECT CurrDate=DateAdd(WEEK,1,Workdate),WorkMonth=DATENAME(MONTH,DateAdd(WEEK,1,Workdate)),YEAR(DateAdd(WEEK,1,Workdate)),datename(wk, DateAdd(WEEK,1,Workdate)) FROM Dates D WHERE Workdate<#EndDate ---AND (DATENAME(MONTH,D.Workdate))=(DATENAME(MONTH,D.Workdate))
)
SELECT *
FROM
(
SELECT
sal.ProductId,
GroupWeek='Week'+
CASE
WHEN WorkWeek BETWEEN 1 AND 7 THEN '1'
WHEN WorkWeek BETWEEN 8 AND 15 THEN '2'
WHEN WorkWeek BETWEEN 16 AND 23 THEN '3'
WHEN WorkWeek BETWEEN 24 AND 30 THEN '4'
WHEN WorkWeek BETWEEN 31 AND 37 THEN '5'
WHEN WorkWeek BETWEEN 38 AND 42 THEN '6'
END,
Quantity
FROM
Dates D
JOIN #tbl sal on
sal.DateofPurchase between D.Workdate and DateAdd(DAY,6,Workdate)
)T
PIVOT
(
SUM(Quantity) FOR GroupWeek IN (Week1, Week2, Week3, Week4, Week5, Week6, Week7, Week8, Week9, Week10, Week11, Week12, Week13, Week14, Week15, Week16, Week17, Week18, Week19, Week20, Week21, Week22, Week23, Week24, Week25, Week26, Week27, Week28, Week29, Week30, Week31, Week32, Week33, Week34, Week35, Week36, Week37, Week38, Week39, Week40, Week41, Week42, Week43, Week44, Week45, Week46, Week47, Week48, Week49, Week50, Week51, Week52
/*add as many as you need*/)
)p
--ORDER BY
--1
option (maxrecursion 0)
Sample Data :
DECLARE #Products TABLE(Id INT PRIMARY KEY,
ProductName NVARCHAR(50))
DECLARE #Orders TABLE(ProductId INT,
DateofPurchase DATETIME,
Quantity BIGINT)
INSERT INTO #Products(Id,ProductName)
VALUES(1,N'Product1'),
(2,N'Product2')
INSERT INTO #Orders( ProductId ,DateofPurchase ,Quantity)
VALUES (1,'2018-01-01',130),
(1,'2018-01-09',140),
(1,'2018-01-16',150),
(1,'2018-01-24',160),
(2,'2018-01-01',30),
(2,'2018-01-09',40),
(2,'2018-01-16',50),
(2,'2018-01-24',60)
Query :
SELECT P.Id,
P.ProductName,
Orders.MonthName,
Orders.Week1,
Orders.Week2,
Orders.Week3,
Orders.Week4
FROM #Products AS P
INNER JOIN (SELECT O.ProductId,
SUM((CASE WHEN DATEPART(DAY,O.DateofPurchase) BETWEEN 1 AND 7 THEN O.Quantity ELSE 0 END)) AS Week1,
SUM((CASE WHEN DATEPART(DAY,O.DateofPurchase) BETWEEN 8 AND 15 THEN O.Quantity ELSE 0 END)) AS Week2,
SUM((CASE WHEN DATEPART(DAY,O.DateofPurchase) BETWEEN 16 AND 23 THEN O.Quantity ELSE 0 END)) AS Week3,
SUM((CASE WHEN DATEPART(DAY,O.DateofPurchase) >= 24 THEN O.Quantity ELSE 0 END)) AS Week4,
DATENAME(MONTH,O.DateofPurchase) AS MonthName
FROM #Orders AS O
GROUP BY O.ProductId,DATENAME(MONTH,O.DateofPurchase)) AS Orders ON P.Id = Orders.ProductId
Result :
-----------------------------------------------------------------------
| Id | ProductName | MonthNumber | Week1 | Week2 | Week3 | Week4 |
-----------------------------------------------------------------------
| 1 | Product1 | January | 130 | 140 | 150 | 160 |
| 2 | Product2 | January | 30 | 40 | 50 | 60 |
-----------------------------------------------------------------------

Count the number of consecutive entries fulfilling a condition within a GROUP BY

I've got a list of users who are behind on their bills, and I want to generate an entry for each of them that says how many consecutive bills they've been behind on. So here's the table:
user | bill_date | outstanding_balance
---------------------------------------
a | 2017-03-01 | 90
a | 2016-12-01 | 60
a | 2016-09-01 | 30
b | 2017-03-01 | 50
b | 2016-12-01 | 0
b | 2016-09-01 | 40
c | 2017-03-01 | 0
c | 2016-12-01 | 0
c | 2016-09-01 | 1
And I want a query that would generate the following table:
user | consecutive_billing_periods_behind
-----------------------------------------
a | 3
b | 1
a | 0
In other words, if you've paid up at any point, I want to ignore all of the earlier entries, and only count how many billing periods you've been behind since you've been last paid up. How do I do this most simply?
If I understood the question correctly, first you need to find the last date that any given customer paid their bill so the last date their outstanding balance was 0. You can do this by this subquery:
(SELECT
user1,
bill_date AS no_outstanding_bill_date
FROM table1
WHERE outstanding_balance = 0)
Then you need get the last bill date and create field for each row if they are outstanding bill. Then filter the rows between the last clear day to last bill date of each customer by this where clause:
WHERE bill_date >= last_clear_day AND bill_date <= last_bill_date
Then if you put the pieces together you can have the results by this query:
SELECT
DISTINCT
user1,
sum(is_outstanding_bill)
OVER (
PARTITION BY user1 ) AS consecutive_billing_periods_behind
FROM (
SELECT
user1,
last_value(bill_date)
OVER (
PARTITION BY user1
ORDER BY bill_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS last_bill_date,
CASE WHEN outstanding_balance > 0
THEN 1
ELSE 0 END AS is_outstanding_bill,
bill_date,
outstanding_balance,
nvl(max(t2.no_outstanding_bill_date)
OVER (
PARTITION BY user1 ), min(bill_date)
OVER (
PARTITION BY user1 )) AS last_clear_day
FROM table1 t1
LEFT JOIN (SELECT
user1,
bill_date AS no_outstanding_bill_date
FROM table1
WHERE outstanding_balance = 0) t2 USING (user1)
) table2
WHERE bill_date >= last_clear_day AND bill_date <= last_bill_date
Since we are using distinct you will not need the group by clause.
select
user,
count(case when min_balance > 0 then 1 end)
as consecutive_billing_periods_behind
from
(
select
user,
min(outstanding_balance)
over (partition by user order by bill_date) as min_balance
from tbl
)
group by user
Or:
select
user,
count(*)
as consecutive_billing_periods_behind
from
(
select
user,
bill_date,
max(case when outstanding_balance = 0 then bill_date) over
(partition by user)
as max_bill_date_with_zero_balance
from tbl
)
where
-- If user has no outstanding_balance = 0, then
max_bill_date_with_zero_balance is null
-- Count all rows in this case.
-- Otherwise
or
-- count rows with
bill_date > max_bill_date_with_zero_balance
group by user

One table with two different tasks

I got a test from my lecturer, I have to make one table with 3 columns inside: prodName, Qty, and totSalesToDate. Column Qty shows how many products have been sold in the input date, and totSalesToDate indicates products have been sold during the beginning of a month until the input date. Here is the example result table:
prodName | Qty | totSalesToDate
Car | 2 | 10
Bicycle | 8 | 22
Truck | 1 | 7
Motor-cycle | 3 | 12
I have to make this table using stored procedure (TSQL) with no subqueries. So far, the queries I made is:
create procedure SalesReport #date varchar(10)
as
select p.prodName, sum(s.Qty) as Qty
from PeriodTime pt full join Sales s on pt.Time = s.Time full join Product p on s.prodID = p.prodID
where #date = pt.Date
group by p.prodName
union
select p.prodName, sum(s.Qty) as totSalesToDate
from PeriodTime pt full join Sales s on pt.Time = s.Time full join Product p on s.prodID = p.prodID
where pt.Date between '2010060' and #date and p.prodName is not null
group by p.prodName
go
But the result I get is like this:
prodName | Qty
Car | 2
Car | 10
Bicycle | 8
Bicycle | 22
Truck | 1
Truck | 7
Motor-cycle | 3
Motor-cycle | 12
Anybody can help? I've been googling around but still cannot find the answer. Thanks.
How about
create procedure SalesReport #date varchar(10)
as
select p.prodName,
SUM(CASE WHEN #date = pt.Date THEN s.Qty ELSE 0 END) as Qty,
SUM(CASE WHEN pt.Date between '2010060' and #date THEN s.Qty ELSE 0.0 END) AS totSalesToDate
from PeriodTime pt full join Sales s on pt.Time = s.Time full join Product p on s.prodID = p.prodID
group by p.prodName
go