PostgreSQL - how to sum only positive cumulative sums in total - postgresql

I try to write a select query in my PostgreSQL database table containing history of customers settlement. The query result should show sum of amounts based only of customers, who are debtors (sum of all invoices amounts of each customer is gteater than zero). In attached example (picture below) - when we take detailed settlement history from date: 10.06.2021, You can see that invoices total amount of customer A is plus (+) 190000 so this customers sum should be taken to total sum. From the other side, invoices amount sum of customer B is minus (-)266000 so this one is not debtor and should be skipped. I try to make a sum containing only positive partial sums of each customer divided by customer status as shown on the screen below (Expected result).
I tried query like this:
select s.*, s.active+s.inactive total from
(select to_char(date_trunc('month', debt_date),'YYYY-MM'),
greatest(sum(case when t.status = 'Active' then t.amount::numeric else 0 end),0) active,
greatest(sum(case when t.status = 'Inactive' then t.amount::numeric else 0 end),0) inactive
from customers_settlement t
group by 1) s order by 1;
but it didn't work - manual calculation in Excel gave different results than the query. I guess that there is something missing like:
over (partition by customer)
I believe that professionals like You, will be able to help me quickly. Thank You in advance!

I am not totally certain if that is what you want, but you could first group by month and customer, then eliminate negative results, then sum again:
SELECT m,
sum(active) AS "sum(Active)",
sum(inactive) AS "sum(Inactive)",
sum(active) + sum(inactive) AS "sum(Total)"
FROM (SELECT to_char(date_trunc('month', debt_date),'YYYY-MM') AS m,
greatest(sum(t.amount) FILTER (WHERE t.status = 'Active'), 0) AS active,
greatest(sum(t.amount) FILTER (WHERE t.status = 'Inactive'), 0) AS inactive
FROM customers_settlement AS t
GROUP BY m, customer) AS subq
GROUP BY m;

Related

How do I create if then logic in self-join condition?

I have a data set with Student names, Skills, and their scores in these skills by year.
I want a query to find which student has had the highest growth in any skill? The period for growth can be 1-3 years (there are missing values for some years).
So, if there are records for 2000, 2001 and 2002 for a student and a skill, to calculate growth for 2002, we need to look at 2001.
If there were only records for 2000 and 2002 for a student and a skill, to calculate growth, we can look at 2000 (only if 2001 is not present).
I thought of doing a self join to create a basis to compare scores. Tried to create the growth period logic in this join condition but got stuck.
SELECT q1.STUDENT, q1.SKILL, q1.YEAR, q2.YEAR, q1.SCORE, q2.SCORE
FROM Table q1
INNER JOIN Table q2
ON q1.STUDENT = q2.STUDENT AND q1.SKILL = q2.SKILL AND ...
-- This is where I get stuck
(q1.YEAR = q2.YEAR - 1) -- Case 1
(q1.YEAR <> q2.YEAR - 1) AND (q1.YEAR = q2.YEAR - 2) -- Case 2
(q1.YEAR <> q2.YEAR - 1) AND (q1.YEAR <> q2.YEAR - 2) AND (q1.YEAR = q2.YEAR - 3) -- Case
I understand that these cases are kind of getting unioned right now? How do I make them run in an IF logic manner?
Sample Data:
Should they be three different queries unioned together instead?
You can use the lag window function to calculate the growth instead of self join.
with t as
(
select
student, skill,
case when year-lag(year) over w <= 3 then score-lag(score, 1) over w end as growth
from _table
window w as (partition by student, skill order by year)
)
select distinct on (skill) student, skill, growth
from t
order by skill, growth desc nulls last;
In the t CTE growth will be null for every first year in a (student, skill) group of records.

PostgreSQL - Unexpected division by zero using SUM

This query (minimal reproducible example):
WITH t as (
SELECT 3 id, 2 price, 0 amount
)
SELECT
CASE WHEN amount > 0 THEN
SUM(price / amount)
ELSE
price
END u_price
FROM t
GROUP BY id, price, amount
on PostgreSQL 9.4 throws
division by zero
Without the SUM it works.
How is this possible?
I liked this question and I turned for help to these tough guys :
The planner is guilty:
A CASE cannot prevent evaluation of an aggregate expression contained
within it, because aggregate expressions are computed before other
expressions in a SELECT list or HAVING clause are considered
More details at https://www.postgresql.org/docs/10/static/sql-expressions.html#SYNTAX-EXPRESS-EVAL
I cannot figure the "why" part out, but here is a workaround...
WITH t as (
SELECT 3 id, 2 price, 0 amount
)
SELECT SUM(price / case when amount = 0 then 1 else amount end) u_cena
FROM t
GROUP BY id, price, amount
OR: you can use the following and avoid the "case"
SELECT SUM(price / power(amount,sign(amount))) u_cena
FROM t
GROUP BY id, price, amount

SQL to select users into groups based on group percentage

To keep this simple, let's say I have a table with 100 records that include:
userId
pointsEarned
I would like to group these 100 records (or whatever the total is based on other criteria) into several groups as follows:
Group 1, 15% of total records
Group 2, 25% of total records
Group 3, 10% of total records
Group 4, 10% of total records
Group 5, 40% (remaining of total records, percentage doesn't really matter)
In addition to the above, there will be a minimum of 3 groups and a maximum of 5 groups with varying percentages that always totally 100%. If it makes it easier, the last group will always be the remainder not picked in the other groups.
I'd like to results to be as follows:
groupNbr
userId
pointsEarned
To do this sort of breakup, you need a way to rank the records so that you can decide which group they belong in. If you do not want to randomise the group allocation, and userId is contiguous number, then using userId would be sufficient. However, you probably can't guarantee that, so you need to create some sort of ranking, then use that to split your data into groups. Here is a simple example.
Declare #Total int
Set #Total = Select COUNT(*) from dataTable
Select case
when ranking <= 0.15 * #Total then 1
when ranking <= 0.4 * #Total then 2
when ranking <= 0.5 * #Total then 3
when ranking <= 0.6 * #Total then 4
else 5 end as groupNbr,
userId,
pointsEearned
FROM (Select userId, pointsEarned, ROW_NUMBER() OVER (ORDER BY userId) as ranking From dataTable) A
If you need to randomise which group data end up in, then you need to allocate a random number to each row first, and then rank them by that random number and then split as above.
If you need to make the splits more flexible, you could design a split table that has columns like minPercentage, maxPercentage, groupNbr, fill it with the splits and do something like this
Declare #Total int
Set #Total = Select COUNT(*) from dataTable
Select S.groupNbr
B.userId,
B.pointsEearned
FROM (Select ranking / #Total * 100 as rankPercent, userId, pointsEarned
FROM (Select userId, pointsEarned, ROW_NUMBER() OVER (ORDER BY userId) as ranking From dataTable) A
) B
inner join splitTable S on S.minPercentage <= rankPercent and S.maxPercentage >= rankPercent

Can this group + sum query be written better so it can perform faster?

This is the data:
me friend game status count
1 2 gem done 10
2 1 gem done 5
1 3 gem done 4
3 1 gem done 6
This is my query:
WITH
-- outgoing for all
outgoing_for_all AS
(SELECT
me,
sum(count) AS sum
FROM game_totals
WHERE status IN ('pending', 'done')
GROUP BY me),
-- incoming for all
incoming_for_all AS
(SELECT
friend,
sum(count) AS sum
FROM game_totals
WHERE status IN ('pending', 'done')
GROUP BY friend)
SELECT
me,
outgoing_for_all.sum AS outgoing,
incoming_for_all.sum AS incoming,
outgoing_for_all.sum - incoming_for_all.sum AS score
FROM outgoing_for_all
FULL OUTER JOIN incoming_for_all ON outgoing_for_all.me = incoming_for_all.friend
This is the result:
me outgoing incoming score
1 14 11 3
2 5 10 -5
3 6 4 2
Can the query above be written so it will perform faster?
I think there might be possibility to do the summing with just one SELECT. The problem is, I don't know how to GROUP BY properly so I can sum count from two rows into one.
Thank you.
Yes, you can get sum of both in one query, using window functions.
SELECT
me,
sum(count) AS sum over(partition by me) AS outgoing,
sum(count) AS sum over(partition by friend) AS incoming
FROM game_totals
WHERE status IN ('pending', 'done')
The problem is, I don't know how to GROUP BY properly so I can sum count from two rows into one.
You guessed right: because you want a single row (a count value) to be counted twice (one for me's outgoing and one for friend's incoming), you'll need to double all of your rows. Also, these doubled rows will need to be grouped by a different column. The traditional approach usually is something with UNION:
SELECT me,
SUM(count) FILTER (WHERE mul = 1) outgoing,
SUM(count) FILTER (WHERE mul = -1) incoming,
SUM(mul * count) score
FROM (
SELECT me, 1 mul, count
FROM game_totals
WHERE status IN ('pending', 'done')
UNION ALL
SELECT friend, -1, count
FROM game_totals
WHERE status IN ('pending', 'done')
) t
GROUP BY me;
Or, because we know exactly that each row must be counted twice, you can use a CROSS JOIN too:
SELECT CASE mul WHEN 1 THEN me ELSE friend END me,
SUM(count) FILTER (WHERE mul = 1) outgoing,
SUM(count) FILTER (WHERE mul = -1) incoming,
SUM(mul * count) score
FROM game_totals, (VALUES (1), (-1)) m(mul)
WHERE status IN ('pending', 'done')
GROUP BY CASE mul WHEN 1 THEN me ELSE friend END
BUT: these are just more readable. They are actually slower than your variant. (I'm afraid, simple window functions won't help you either here.) I think you've already found the fastest solution. However, you should think about using indexes (Maybe more, or other indexes, if you already using them). F.ex. this index could help you a lot:
CREATE INDEX idx_game_totals_me_friend_count
ON game_totals(me, friend, count)
WHERE status IN ('pending', 'done');
http://rextester.com/NGAHW3672

SSRS 2005 column chart: show series label missing when data count is zero

I have a pretty simple chart with a likely common issue. I've searched for several hours on the interweb but only get so far in finding a similar situation.
the basics of what I'm pulling contains a created_by, person_id and risk score
the risk score can be:
1 VERY LOW
2 LOW
3 MODERATE STABLE
4 MODERATE AT RISK
5 HIGH
6 VERY HIGH
I want to get a headcount of persons at each risk score and display a risk count even if there is a count of 0 for that risk score but SSRS 2005 likes to suppress zero counts.
I've tried this in the point labels
=IIF(IsNothing(count(Fields!person_id.value)),0,count(Fields!person_id.value))
Ex: I'm missing values for "1 LOW" as the creator does not have any "1 LOW" they've assigned risk scores for.
*here's a screenshot of what I get but I'd like to have a column even for a count when it still doesn't exist in the returned results.
#Nathan
Example scenario:
select professor.name, grades.score, student.person_id
from student
inner join grades on student.person_id = grades.person_id
inner join professor on student.professor_id = professor.professor_id
where
student.professor_id = #professor
Not all students are necessarily in the grades table.
I have a =Count(Fields!person_id.Value) for my data points & series is grouped on =Fields!score.Value
If there were a bunch of A,B,D grades but no C & F's how would I show labels for potentially non-existent counts
In your example, the problem is that no results are returned for grades that are not linked to any students. To solve this ideally there would be a table in your source system which listed all the possible values of "score" (e.g. A - F) and you would join this into your query such that at least one row was returned for each possible value.
If such a table doesn't exist and the possible score values are known and static, then you could manually create a list of them in your query. In the example below I create a subquery that returns a combination of all professors and all possible scores (A - F) and then LEFT join this to the grades and students tables (left join means that the professor/score rows will be returned even if no students have those scores in the "grades" table).
SELECT
professor.name
, professorgrades.score
, student.person_id
FROM
(
SELECT professor_id, score
FROM professor
CROSS JOIN
(
SELECT 'A' AS score
UNION
SELECT 'B'
UNION
SELECT 'C'
UNION
SELECT 'D'
UNION
SELECT 'E'
UNION
SELECT 'F'
) availablegrades
) professorgrades
INNER JOIN professor ON professorgrades.professor_id = professor.professor_id
LEFT JOIN grades ON professorgrades.score = grades.score
LEFT JOIN student ON grades.person_id = student.person_id AND
professorgrades.professor_id = student.professor_id
WHERE professorgrades.professor_id = 1
See a live example of how this works here: SQLFIDDLE
SELECT RS.RiskScoreId, RS.Description, SUM(DT.RiskCount) AS RiskCount
FROM (
SELECT RiskScoreId, 1 AS RiskCount
FROM People
UNION ALL
SELECT RiskScoreId, 0 AS RiskCount
FROM RiskScores
) DT
INNER JOIN RiskScores RS ON RS.RiskScoreId = DT.RiskScoreId
GROUP BY RS.RiskScoreId, RS.Description
ORDER BY RS.RiskScoreId