Postgres Count Function for monthly transactions, counting distinct only - postgresql

I am counting the size of my teams within a department. All employees have an employee ID beginning with "E" and then a designating number (i.e. "0", "1", etc) to denote which team they are on.
I have the following query in Postgres to count the size of the teams, but the problem is that with this query, I get a lot of rows that are empty, because some months are duplicated. For example, the row containing "May/2016" may be duplicated 3 times, with only 1 row containing the actual team counts.
select to_char("Date", 'Mon/YYYY') as "Date",
sum(case when l_part LIKE 'E0%%' then count end) as "ACCOUNTING",
sum(case when l_part LIKE 'E1%%' then count end) as "SW",
sum(case when l_part LIKE 'E2%%' then count end) as "SUPPORT",
sum(case when l_part LIKE 'E3%%' then count end) as "CALLCENTER",
sum(case when l_part LIKE 'E4%%' then count end) as "ADMIN",
sum(case when l_part LIKE 'E5%%' then count end) as "MARKETING",
sum(case when l_part LIKE 'E9%%' then count end) as "MANAGEMENT"
from (
select left("Type",4)as l_part, count(*),"Date" from
"Transactions" group by "Date",l_part
) p group by "Date"
order by min("Date");
If I can just get the count down to one row per month/yyyy, and order by the date that would be helpful and less confusing. Any tweaks to my attempt appreciated.
Here is what populates, as an example using September 2015:
This is what I get:
DATE | ACCOUNTING | SW | SUPPORT | CALLCENTER | ADMIN | MARKETING |
Sep/15| | | | | | |
Sep/15| | | | | | |
Sep/15| 1 | 2 | 1 | 5 | 5 | 3 |

I suspect your issue is the GROUP BY clause, which I think is solved using DATE_TRUNC(). Not sure if you need the where clause.
SELECT
to_char(DATE_TRUNC('month',"Date"), 'Mon/YYYY') as "Date"
, SUM(CASE WHEN left("Type",4) LIKE 'E0%%' THEN 1 END) AS "ACCOUNTING"
, SUM(CASE WHEN left("Type",4) LIKE 'E1%%' THEN 1 END) AS "SW"
, SUM(CASE WHEN left("Type",4) LIKE 'E2%%' THEN 1 END) AS "SUPPORT"
, SUM(CASE WHEN left("Type",4) LIKE 'E3%%' THEN 1 END) AS "CALLCENTER"
, SUM(CASE WHEN left("Type",4) LIKE 'E4%%' THEN 1 END) AS "ADMIN"
, SUM(CASE WHEN left("Type",4) LIKE 'E5%%' THEN 1 END) AS "MARKETING"
, SUM(CASE WHEN left("Type",4) LIKE 'E9%%' THEN 1 END) AS "MANAGEMENT"
FROM "Transactions"
WHERE "Date" IS NOT NULL
GROUP BY
DATE_TRUNC('month',"Date")
ORDER BY
DATE_TRUNC('month',"Date")
btw: Instead of SUM() an alternatve using COUNT() would be:
SELECT
to_char(DATE_TRUNC('month',"Date"), 'Mon/YYYY') as "Date"
, COUNT(CASE WHEN left("Type",4) LIKE 'E0%%' THEN 1 END) AS "ACCOUNTING"
, COUNT(CASE WHEN left("Type",4) LIKE 'E1%%' THEN 1 END) AS "SW"
, COUNT(CASE WHEN left("Type",4) LIKE 'E2%%' THEN 1 END) AS "SUPPORT"
, COUNT(CASE WHEN left("Type",4) LIKE 'E3%%' THEN 1 END) AS "CALLCENTER"
, COUNT(CASE WHEN left("Type",4) LIKE 'E4%%' THEN 1 END) AS "ADMIN"
, COUNT(CASE WHEN left("Type",4) LIKE 'E5%%' THEN 1 END) AS "MARKETING"
, COUNT(CASE WHEN left("Type",4) LIKE 'E9%%' THEN 1 END) AS "MANAGEMENT"
COUNT() increments by one for any NON-NULL value it encounters.

Related

postgres remove blank values from grouping

I have a rollup SQL statement that I'm trying to covert over from Oracle into PostGresSql. Overall the results look correct except I'm getting a blank value in the grouping column and I'm not sure how to get rid of it.
Right now I have:
SELECT
COALESCE(CASE WHEN GROUPING(COUNTY) = 1 THEN 'TOTAL' else county::text END) as COUNTY
,COUNT(CASE WHEN STV_TO_GAS THEN 1 END) as STOVE_TO_GAS_SUM
,COUNT(CASE WHEN FIRE_TO_GAS THEN 1 END) as FIRE_TO_GAS_SUM
,COUNT(CASE WHEN PELLET_TO_GAS THEN 1 END) as PELLET_TO_GAS_SUM
,COUNT(CASE WHEN STV_TO_ELECTRIC THEN 1 END) as STOVE_TO_ELECTRIC_SUM
,COUNT(CASE WHEN FIRE_TO_ELECTRIC THEN 1 END) as FIRE_TO_ELECTRIC_SUM
,COUNT(CASE WHEN PELLET_TO_ELECTRIC THEN 1 END) as PELLET_TO_ELECTRIC_SUM
,COUNT(CASE WHEN MERIDIAN THEN 1 END) as WITHIN_MERIDIAN_SUM
,count(CASE WHEN hb357.stv_to_gas then 1 END) +
count(CASE WHEN hb357.fire_to_gas then 1 END) +
count(CASE WHEN hb357.pellet_to_gas then 1 END) +
count(CASE WHEN hb357.stv_to_electric then 1 END) +
count(CASE WHEN hb357.fire_to_electric then 1 END) +
count(CASE WHEN hb357.pellet_to_electric then 1 END) +
count(CASE WHEN hb357.meridian then 1 END ) AS county_totals
FROM woodburn.HB357
WHERE app_status IN ('pending','approved')
AND (COUNTY IS NOT NULL OR trim(COUNTY) <> '')
GROUP BY rollup (county)
but its still returning a blank value in the county column
I've also trued changing the first case statement to
COALESCE(CASE
WHEN GROUPING(COUNTY) = 1 THEN 'TOTAL'
WHEN TRIM(COUNTY) != '' AND COUNTY IS NOT NULL then county
END) as COUNTY
but its returning a null row for county
FINAL SOLUTION
After trying all things suggested I essentially implemented what Jorge Campos recommended by doing the following:
SELECT
COALESCE(CASE
WHEN GROUPING(COUNTY) = 1 THEN 'TOTAL'
WHEN TRIM(COUNTY) != '' then county
END) as COUNTY
,COUNT(CASE WHEN STV_TO_GAS THEN 1 END) as STOVE_TO_GAS_SUM
,COUNT(CASE WHEN FIRE_TO_GAS THEN 1 END) as FIRE_TO_GAS_SUM
,COUNT(CASE WHEN PELLET_TO_GAS THEN 1 END) as PELLET_TO_GAS_SUM
,COUNT(CASE WHEN STV_TO_ELECTRIC THEN 1 END) as STOVE_TO_ELECTRIC_SUM
,COUNT(CASE WHEN FIRE_TO_ELECTRIC THEN 1 END) as FIRE_TO_ELECTRIC_SUM
,COUNT(CASE WHEN PELLET_TO_ELECTRIC THEN 1 END) as PELLET_TO_ELECTRIC_SUM
,COUNT(CASE WHEN MERIDIAN THEN 1 END) as WITHIN_MERIDIAN_SUM
,count(CASE WHEN hb357.stv_to_gas then 1 END) +
count(CASE WHEN hb357.fire_to_gas then 1 END) +
count(CASE WHEN hb357.pellet_to_gas then 1 END) +
count(CASE WHEN hb357.stv_to_electric then 1 END) +
count(CASE WHEN hb357.fire_to_electric then 1 END) +
count(CASE WHEN hb357.pellet_to_electric then 1 END) +
count(CASE WHEN hb357.meridian then 1 END ) AS county_totals
FROM woodburn.HB357
WHERE app_status IN ('pending','approved')
AND COUNTY IS NOT NULL
AND TRIM(COUNTY) != ''
GROUP BY rollup (county)

Should I use GROUPING SETS, CUBE, or ROLLUP in Postgres

We just upgraded last month to Postgres 10, so I'm new to a few of its feautures.
So this query requests that I display the days each student is taken care of and require a sum of how many students are taken care of for each weekday
select distinct s.studentnr,(CASE When lower(cd.weekday) like lower('MONDAY')
then 1 else 0 end) as MONDAY,
(CASE When lower(cd.weekday) like lower('TUESDAY')
then 1 else 0 end) as TUESDAY,
(CASE When lower(cd.weekday) like lower('WEDNESDAY')
then 1 else 0 end) as WEDNESDAY,
(CASE When lower(cd.weekday) like lower('THURSDAY')
then 1 else 0 end) as THURSDAY,
(CASE When lower(cd.weekday) like lower('FRIDAY')
then 1 else 0 end) as FRIDAY,
scp.durationid
from student s
full join studentcarepreference scp on s.id = scp.studentid
full join careday cd on cd.studentcarepreferenceid = scp.id
join pupil per on per.id = s.personid
join studentschool ss ON ss.studentid = s.id
join duration d on d.id = sdc.durationid
AND d.id BETWEEN ss.validfrom AND ss.validuntil
where sdc.durationid = 1507
and cd.weekday is not null
order by s.studentnr
where s.studentnr and cd.weekday are both varchar type
resulting in
However I need the following data as follows.
Required result
Which approach is best to use in this kind of query?
new results after change to code
select case grouping(studentnr)
when 0 then studentnr
else count(distinct studentnr)|| ' students'
end studentnr
, count(case lower(cd.weekday) when 'monday' then 1 end) monday
, count(case lower(cd.weekday) when 'tuesday' then 1 end) teusday
, count(case lower(cd.weekday) when 'wednesday' then 1 end) wednesday
, count(case lower(cd.weekday) when 'thursday' then 1 end) thursday
, count(case lower(cd.weekday) when 'friday' then 1 end) friday
from mydata
group by rollup ((studentnr))
order by studentnr
Nearly there I guess, just the results or values are wrong. what would you suggest I look into to correcgt the results?
It looks like you want to ROLLUP yourdata using a GROUPING SET:
select case grouping(studentnr)
when 0 then studentnr
else count(distinct studentnr)|| ' students'
end studentnr
, count(distinct case careday when 'monday' then studentnr end) monday
, count(distinct case careday when 'tuesday' then studentnr end) teusday
, count(distinct case careday when 'wednesday' then studentnr end) wednesday
, count(distinct case careday when 'thursday' then studentnr end) thursday
, count(distinct case careday when 'friday' then studentnr end) friday
, durationid
from yourdata
group by rollup ((studentnr, durationid))
Which yields the desired results:
| studentnr | monday | teusday | wednesday | thursday | friday | durationid |
|------------|--------|---------|-----------|----------|--------|------------|
| 10177 | 1 | 1 | 1 | 1 | 1 | 1507 |
| 717208 | 1 | 1 | 1 | 1 | 1 | 1507 |
| 722301 | 1 | 1 | 1 | 1 | 0 | 1507 |
| 3 students | 3 | 3 | 3 | 3 | 2 | (null) |
The second set of parenthesis in the ROLLUP indicates that studentnr and durationid should be summarized at the same level when doing the roll up.
With just one level of summarization, there's not much difference between ROLLUP and CUBE, however to use GROUPING SETS would require a slight change to the GROUP BY clause in order to get the lowest desired level of detail. All three of the following GROUP BY statements produce equivalent results:
group by rollup ((studentnr, durationid))
group by cube ((studentnr, durationid))
group by grouping sets ((),(studentnr, durationid))

Postgres: Extracting the IDs and names of people who are cheating the system

I have a table A with the following transaction data:
ID Name Type
1 Albert Rewards
2 Albert Visit
3 Ruddy Rewards
4 Ruddy Visit
5 Ruddy Purchase
6 Mario Rewards
7 Mario Visit
...
I want a table that only select the rows with names of people who used the "Rewards" and "Visit" type but didn't make a purchase, something like this:
ID Name Type
1 Albert Rewards
2 Albert Visit
6 Mario Rewards
7 Mario Visit
...
Any ideas?
The below query will count for every Visit/Rewards/Purchase how often they happened for a given name - and if the respective results are 1/1/0 then all records from the table with that name will be returned.
If fine-tuning is required (such as cases where the count of any of those > 1 etc.) that can be done by fiddling with the numbers in the 'having' clause. The same is true for adding additional categories to check against.
select *
from mytable a
where exists (select b.name,
sum(case when b.type='Rewards' then 1 else 0 end),
sum(case when b.type='Visit' then 1 else 0 end),
sum(case when b.type='Purchase' then 1 else 0 end)
from mytable b
where b.name=a.name
group by b.name
having sum(case when b.type='Rewards' then 1 else 0 end) = 1
and
sum(case when b.type='Visit' then 1 else 0 end) = 1
and
sum(case when b.type='Purchase' then 1 else 0 end) = 0);
For completion sake: SQLFiddle with 2 queries First query also works, but a little differently

How to optimize for both scenarios when a join is faster than dense_rank only when the result set is small?

When unfiltered, dense_rank on FeedDeliveryNutrients.NutrientID over 150,000 rows is 3.5x faster than joining to Nutrients with row_number on Nutrients.ID and using the joined row number. When filtered to a specific flock, joining with row_number is 9x faster.
Is there any optimization technique that could get the best of both worlds in a single query?
Fastest when unfiltered (150,000 rows returned):
select
FeedDeliveries.FlockID,
FeedDeliveryID,
DeliveryLb,
Bin,
DeliveryDate,
FormulaID,
FeedEnergy,
Nutrient1, Nutrient2, Nutrient3, Nutrient4, Nutrient5, Nutrient6, Nutrient7, Nutrient8, Nutrient9, Nutrient10, Nutrient11, Nutrient12, Nutrient13, Nutrient14, Nutrient15
from (
select
FeedDeliveryID,
sum(case when dense_rank = 1 then Amount end) as Nutrient1,
sum(case when dense_rank = 2 then Amount end) as Nutrient2,
sum(case when dense_rank = 3 then Amount end) as Nutrient3,
sum(case when dense_rank = 4 then Amount end) as Nutrient4,
sum(case when dense_rank = 5 then Amount end) as Nutrient5,
sum(case when dense_rank = 6 then Amount end) as Nutrient6,
sum(case when dense_rank = 7 then Amount end) as Nutrient7,
sum(case when dense_rank = 8 then Amount end) as Nutrient8,
sum(case when dense_rank = 9 then Amount end) as Nutrient9,
sum(case when dense_rank = 10 then Amount end) as Nutrient10,
sum(case when dense_rank = 11 then Amount end) as Nutrient11,
sum(case when dense_rank = 12 then Amount end) as Nutrient12,
sum(case when dense_rank = 13 then Amount end) as Nutrient13,
sum(case when dense_rank = 14 then Amount end) as Nutrient14,
sum(case when dense_rank = 15 then Amount end) as Nutrient15
from (select *, dense_rank() over (partition by FeedDeliveryID order by NutrientID) as dense_rank from dbo.FeedDeliveryNutrients) n
group by FeedDeliveryID
) pvt
join dbo.FeedDeliveries on FeedDeliveries.ID = FeedDeliveryID
Fastest when filtered by dbo.FeedDeliveries.FlockID (~100 rows returned):
select
FeedDeliveries.FlockID,
FeedDeliveryID,
DeliveryLb,
Bin,
DeliveryDate,
FormulaID,
FeedEnergy,
Nutrient1, Nutrient2, Nutrient3, Nutrient4, Nutrient5, Nutrient6, Nutrient7, Nutrient8, Nutrient9, Nutrient10, Nutrient11, Nutrient12, Nutrient13, Nutrient14, Nutrient15
from (
select
FeedDeliveryID,
sum(case when n.row_number = 1 then Amount end) as Nutrient1,
sum(case when n.row_number = 2 then Amount end) as Nutrient2,
sum(case when n.row_number = 3 then Amount end) as Nutrient3,
sum(case when n.row_number = 4 then Amount end) as Nutrient4,
sum(case when n.row_number = 5 then Amount end) as Nutrient5,
sum(case when n.row_number = 6 then Amount end) as Nutrient6,
sum(case when n.row_number = 7 then Amount end) as Nutrient7,
sum(case when n.row_number = 8 then Amount end) as Nutrient8,
sum(case when n.row_number = 9 then Amount end) as Nutrient9,
sum(case when n.row_number = 10 then Amount end) as Nutrient10,
sum(case when n.row_number = 11 then Amount end) as Nutrient11,
sum(case when n.row_number = 12 then Amount end) as Nutrient12,
sum(case when n.row_number = 13 then Amount end) as Nutrient13,
sum(case when n.row_number = 14 then Amount end) as Nutrient14,
sum(case when n.row_number = 15 then Amount end) as Nutrient15
from dbo.FeedDeliveryNutrients
join (select *, row_number() over (order by ID) as row_number from dbo.Nutrients) n on n.ID = NutrientID
group by FeedDeliveryID
) pvt
join dbo.FeedDeliveries on FeedDeliveries.ID = FeedDeliveryID
You already got the answer. Optimize for the most critical scenario.
From start I can use optimize for the 150.000 rows scenario but you must sniff a bit on your actual production server. If that 150.000 only happens a few times /week and the 100 rows ill hit your server many times/minute that is the more critical for you.

T-SQL Count rows with specific values (Multiple in one query)

I need some help with a T-SQL query. I want to count fields that have a special value(e.g. >1).
Assuming i have a table like:
IGrp | Item | Value1 | Value2
#############################
A | I11 | 0.52 | 1.18
A | I12 | 1.30 | 0.54
A | I21 | 0.49 | 2.37
B | I22 | 2.16 | 1.12
B | I31 | 1.50 | 0.28
I want a result like:
IGrp | V1High | V2High
######################
A | 1 | 2
B | 2 | 1
In my mind this should be going with this expression
SELECT IGrp, COUNT(Value1>1) AS V1High, COUNT(Value2>1) AS V2High
FROM Tbl GROUP BY IGrp
But that's not possible in T-SQL since the Count() does not take boolean values.
So is it really the only possible way to do multiple queries with WHERE Value>1 and COUNT(*) and join them afterwards? Or is there a trick to accomplish the desired result?
Thanks in advance.
SELECT IGrp,
COUNT(CASE WHEN Value1 > 1 THEN 1 ELSE NULL END) AS V1High,
COUNT(CASE WHEN Value2 > 1 THEN 1 ELSE NULL END) AS V2High
FROM Tbl
GROUP BY IGrp
You can use the CASE statement:
SELECT IGrp,
SUM(CASE WHEN Value1>1 THEN 1 ELSE 0 END) AS V1High,
SUM(CASE WHEN Value2>1 THEN 1 ELSE 0 END) AS V2High
FROM Tbl GROUP BY IGrp
make use of case when will do work for you
SELECT IGrp,
sum(case when isnull(Value1,0)>1 then 1 else 0 end) AS V1High,
sum(case when isnull(Value2,0)>1 then 1 else 0 end) AS V2High
FROM Tbl GROUP BY IGrp
SELECT IGrp,
COUNT(CASE WHEN Value1 = 'Foo' THEN 1 ELSE NULL END) AS Tot_Foo,
COUNT(CASE WHEN Value1 = 'Blah' THEN 1 ELSE NULL END) AS Tot_Blah
FROM Tbl
GROUP BY IGrp
This can also be used to compare 2 different values for the same field, with minor changes as shown above.
Very helpful for verifying values that are supposed to exist in a 1:1 ratio.
You can put a CASE .. WHEN .. statement inside the COUNT() functions to return 1 when the conditions hold, NULL otherwise.
You can also use:
select
count(nullif(field > minvalue,false))