How does one avoid Join placing a constraint on aggregate function? - postgresql

[using postgres]
So the background to my scenario is:
I'm trying to get the average expenses by age where the user is active
cost = 40
Only 2 of the 33 year olds have purchased something
I have 34 active members that are 33 years old and active (whether or not they made a payment is irrelevant in this count)
with this in mind money spent per age = 40 / 34 = 1.18
what I am getting right now is = 40 / 2 = 20
I understand that it's constrained by the two users who made a purchase
So where did I get all of this?
select date_part('year', age(birthday)) as age,
avg(cost)
from person
inner join payment on person.person_id = payment.person_id
inner join product on payment.product_id = product.product_id
where
date_part('year', age(birthday))= 33 and user_state = 'active'
group by age
Unfortunately, when using an aggregate function (in this example avg())
it seems avg() is constrained to the result of the inner join (I've tried a left join to maintain having access to all users, it didn't seem to work since I still got the undesired result 20). Is there a way to avoid this? In other words can I make it so the avg() call is specific to my person table rather than the result of the join?
If it matters, this is how I am retrieving total sum.
select sum(cost)
from person
inner join payment on person.person_id = payment.person_id
inner join product on payment.product_id = product.product_id
where
date_part('year', age(birthday))= 33
and
user_state = 'active'
= 40
The obvious is to do the count of people I want and then do the sum seperately, but I'm trying to avoid going from one query to another.

avg will skip nulls so coalesce those null values into zeros and obviously left join:
select
date_part('year', age(birthday)) as age,
avg(coalesce(cost,0))
from
person
left join
payment on ...
left join
product on ...

Related

Should I be using distinct / distinct on to remove duplicates caused by a 1:many join?

Revision to my previous question as I continue to try different solutions. select distinct on is getting me very close to my intended output but I'm not quite able to make it fully work without adding to the GROUP BY statement. I'm now wondering if I should be focusing on making select distinct on work rather than trying to improve the original table join. Other questions that I have read state that this is more of a 'band-aid' approach. Is there a best practice that I should be following? Original question below:
I am having trouble finding the best way to join a 1:many table without increasing my output with duplicates. I have tried using select distinct on v.visit.id which gets me close, but only if I change the GROUP BY statement, which will mess up my desired output. My end goal is to calculate both how long the patient/visit was in the OR and how long the surgeon was scheduled for their block from the tables below:
Table 1 (visit)
visit_id
123
321
Table 2 (pat_phy_relation_table)
patphys_pat_num
patphys_rel_type
patphys_phy_num
123
ATTENDING
1306
321
ATTENDING
1306
Table 3 (physician_table1)
phys1_num
phys1_name
1306
Dr X
Table 4 (multi_app_documentation) (OR times)
nsma1_patnum
nsma1_code
nsma1_ans
123
ORINTIME
1037
123
OROUT
1352
321
ORINTIME
0723
321
OROUT
0952
Table 5 (ews_location_table2) (block times)
esla1_loca
esla1_date
esla1_bt_beg
esla1_bt_end
esla1_bt_surg
OR3
2021-09-02
{'07:00:00',,,,,,,,,}
{'17:00:00',,,,,,,,,}
{001306,,,,,,,,,}
OR3
2021-09-16
{'07:00:00',,,,,,,,,}
{'17:00:00',,,,,,,,,}
{001306,,,,,,,,,}
OR3
2021-09-30
{'07:00:00',,,,,,,,,}
{'17:00:00',,,,,,,,,}
{001306,,,,,,,,,}
Expected Results
total_visits
or_hours_utilized
total_block_hours
surgeon
2
9:31:00
30:00:00
Dr X
Actual Results
total_visits
or_hours_utilized
total_block_hours
surgeon
6
28:33:00
60:00:00
Dr X
My assumption is that since I am using an inner join for table 5, my results are being duplicated by the # of returned rows. However, I'm not aware of another way to join this table as all of my other joins are 1:1. This is the only 1:many relationship. I just can't seem to think of a solution as table 5 has no related columns to the visit table.
I'm currently looking into subqueries, but I'm not familiar enough with them to know if I can handle the table 5 calculations in one and just pass back the results to the main query.
I've tried to strip out information that is irrelevant to the question, but let me know if I can slim down anything else. Query below:
select
count(v.visit_id) as total_visits,
sum(mad2.nsma1_ans::time - mad.nsma1_ans::time) as or_hours_utilized,
sum(esla1_bt_end[1] - esla1_bt_beg[1]) as total_block_hours,
pt1.phys1_name as surgeon
from visit as v
inner join pat_phy_relation_table as pprt
on pprt.patphys_pat_num = v.visit_id
inner join physician_table1 as pt1
on pt1.phys1_num = pprt.patphys_phy_num
inner join ews_location_table2 elt2
on lpad(pt1.phys1_num::varchar, 6, '0') = any (elt2.esla1_bt_surg)
and esla1_loca in ('OR1','OR2','OR3','OR4')
and esla1_date between '2021-09-01' and '2021-09-30'
inner join multi_app_documentation mad2
on mad2.nsma1_patnum = v.visit_id
and mad2.nsma1_code = 'OROUT' --only pulling visits/physicians with an OROUT
inner join multi_app_documentation mad
on mad.nsma1_patnum = v.visit_id
and mad.nsma1_code = 'ORINTIME' --only pulling visits/physicians with an ORINTIME
where v.visit_admit_date = '2021-09-01'
group by pt1.phys1_name
I'm not aware of another way to join this table as all of my other joins are 1:1. This is the only 1:many relationship
Not quite, both your pat_phy_relation_table and your ews_location_table2 have 1:many relations to the surgeons (physician_table1). So really it's a many:many relationship between patients and or blocks - but that's not what you want, you can't use a JOIN there. Instead, do two independent LATERAL subqueries, for each surgeon:
select
a.total_visits,
a.or_hours_utilized,
pt1.phys1_name as surgeon,
b.total_block_hours
from
physician_table1 as pt1,
lateral (
select
count(v.visit_id) as total_visits,
sum(mad2.nsma1_ans::time - mad.nsma1_ans::time) as or_hours_utilized,
from visit as v
inner join pat_phy_relation_table as pprt
on pprt.patphys_pat_num = v.visit_id
inner join multi_app_documentation mad2
on mad2.nsma1_patnum = v.visit_id
and mad2.nsma1_code = 'OROUT' -- only pulling visits/physicians with an OROUT
inner join multi_app_documentation mad
on mad.nsma1_patnum = v.visit_id
and mad.nsma1_code = 'ORINTIME' -- only pulling visits/physicians with an ORINTIME
where
pt1.phys1_num = pprt.patphys_phy_num -- joining against the particular physician
and v.visit_admit_date = '2021-09-01'
) as a,
lateral (
select
sum(esla1_bt_end[1] - esla1_bt_beg[1]) as total_block_hours,
from
ews_location_table2 elt2
where
lpad(pt1.phys1_num::varchar, 6, '0') = any (elt2.esla1_bt_surg) -- joining against the particular physician
and esla1_loca in ('OR1','OR2','OR3','OR4')
and esla1_date between '2021-09-01' and '2021-09-30'
) as b;

How can I make the denominator a constant for each of the numbers in the same row in SQL?

I am trying to create a table with the average amount of sales divided by a cohort of users that signed up for an account in a certain month, however, I can only figure out to divide by the number of people that made a purchase in that specific month which is lower than the total amount of the cohort. How do I change the query below to make each of the avg_sucessful_transacted amounts divide by cohort 0 for each month?
thank you.
select sum (t.amount_in_dollars)/ count (distinct u.id) as Avg_Successful_Transacted, (datediff(month,[u.created:month],[t.createdon:month])) as Cohort, [u.created:month] as Months,
count (distinct u.id) as Users
from [transaction_cache as t]
left join [user_cache as u] on t.owner = u.id
where t.type = 'savings' and t.status = 'successful' and [u.created:year] > ['2017-01-01':date:year]
group by cohort, months
order by Cohort, Months
You will need to break out the cohort sizing into its own subquery or CTE in order to calculate the total number of distinct users who were created during the month which matches the cohort's basis month.
I approached this by bucketing users by the month they were created using the date_trunc('Month', <date>, <date>) function, but you may wish to approach it differently based on the specific business logic that generates your cohorts.
I don't work with Periscope, so the example query below is structured for pure Redshift, but hopefully it is easy to translate the syntax into Periscope's expected format:
WITH cohort_sizes AS (
SELECT date_trunc('Month', created)::DATE AS cohort_month
, COUNT(DISTINCT(id)) AS cohort_size
FROM user_cache u
GROUP BY 1
),
cohort_transactions AS (
SELECT date_trunc('Month', created)::DATE AS cohort_month
, createdon
, owner
, type
, status
, amount_in_dollars
, id
, created
FROM transaction_cache t
LEFT JOIN user_cache u ON t.owner = u.id
WHERE t.type = 'savings'
AND t.status = 'successful'
AND u.created > '2017-01-01'
)
SELECT SUM(t.amount_in_dollars) / s.cohort_size AS Avg_Successful_Transacted
, (datediff(MONTH, u.created, t.createdon)) AS Cohort
, u.created AS Months
, count(DISTINCT u.id) AS Users
FROM cohort_transactions t
JOIN cohort_sizes s ON t.cohort_month = s.cohort_month
LEFT JOIN user_cache AS u ON t.owner = u.id
GROUP BY s.cohort_size, Cohort, Months
ORDER BY Cohort, Months
;

SSRS 2005 column chart: show series label missing when data count is zero

I have a pretty simple chart with a likely common issue. I've searched for several hours on the interweb but only get so far in finding a similar situation.
the basics of what I'm pulling contains a created_by, person_id and risk score
the risk score can be:
1 VERY LOW
2 LOW
3 MODERATE STABLE
4 MODERATE AT RISK
5 HIGH
6 VERY HIGH
I want to get a headcount of persons at each risk score and display a risk count even if there is a count of 0 for that risk score but SSRS 2005 likes to suppress zero counts.
I've tried this in the point labels
=IIF(IsNothing(count(Fields!person_id.value)),0,count(Fields!person_id.value))
Ex: I'm missing values for "1 LOW" as the creator does not have any "1 LOW" they've assigned risk scores for.
*here's a screenshot of what I get but I'd like to have a column even for a count when it still doesn't exist in the returned results.
#Nathan
Example scenario:
select professor.name, grades.score, student.person_id
from student
inner join grades on student.person_id = grades.person_id
inner join professor on student.professor_id = professor.professor_id
where
student.professor_id = #professor
Not all students are necessarily in the grades table.
I have a =Count(Fields!person_id.Value) for my data points & series is grouped on =Fields!score.Value
If there were a bunch of A,B,D grades but no C & F's how would I show labels for potentially non-existent counts
In your example, the problem is that no results are returned for grades that are not linked to any students. To solve this ideally there would be a table in your source system which listed all the possible values of "score" (e.g. A - F) and you would join this into your query such that at least one row was returned for each possible value.
If such a table doesn't exist and the possible score values are known and static, then you could manually create a list of them in your query. In the example below I create a subquery that returns a combination of all professors and all possible scores (A - F) and then LEFT join this to the grades and students tables (left join means that the professor/score rows will be returned even if no students have those scores in the "grades" table).
SELECT
professor.name
, professorgrades.score
, student.person_id
FROM
(
SELECT professor_id, score
FROM professor
CROSS JOIN
(
SELECT 'A' AS score
UNION
SELECT 'B'
UNION
SELECT 'C'
UNION
SELECT 'D'
UNION
SELECT 'E'
UNION
SELECT 'F'
) availablegrades
) professorgrades
INNER JOIN professor ON professorgrades.professor_id = professor.professor_id
LEFT JOIN grades ON professorgrades.score = grades.score
LEFT JOIN student ON grades.person_id = student.person_id AND
professorgrades.professor_id = student.professor_id
WHERE professorgrades.professor_id = 1
See a live example of how this works here: SQLFIDDLE
SELECT RS.RiskScoreId, RS.Description, SUM(DT.RiskCount) AS RiskCount
FROM (
SELECT RiskScoreId, 1 AS RiskCount
FROM People
UNION ALL
SELECT RiskScoreId, 0 AS RiskCount
FROM RiskScores
) DT
INNER JOIN RiskScores RS ON RS.RiskScoreId = DT.RiskScoreId
GROUP BY RS.RiskScoreId, RS.Description
ORDER BY RS.RiskScoreId

Using fields from select query in where clause in subqueries

I have a list of people and there are 4 types that can occur as well as 5 resolutions for each type. I'm trying to write a single query so that I can pull each type/resolution combination for each person but am running into problems. This is what I have so far:
SELECT person,
TypeRes1 = (SELECT COUNT(*) FROM table1 where table1.status = 45)
JOIN personTbl ON personTbl.personid = table1.personid
WHERE person LIKE 'A0%'
GROUP BY person
I have adjusted column names to make it more...generic, but basically the person table has several hundred people in it and I just want A01 through A09, so the like statement is the easiest way to do this. The problem is that my results end up being something like this:
Person TypeRes1
A06 48
A04 48
A07 48
A08 48
A05 48
Which is incorrect. I can't figure out how to get the column count correct for each person. I tried doing something like:
SELECT person as p,
TypeRes1= (SELECT COUNT(*) FROM table1
JOIN personTbl ON personTbl.personid = table1.personid
WHERE table1.status = 45 AND personTbl.person = p)
FROM table1
JOIN personTbl ON personTbl.personid = table1.personid
WHERE personTbl.person LIKE 'A0%'
GROUP BY personTbl.person
But that gives me the error: Invalid Column name 'p'. Is it possible to pass p into the subquery or is there another way to do it?
EDIT: There are 19 different statuses as well, so there will be 19 different TypeRes, for brevity I just put the one as if I can find the one, I think I can do the rest on my own.
Maybe something like this:
SELECT
person,
(
SELECT
COUNT(*)
FROM
table1
WHERE
table1.status = 45
AND personTbl.personid = table1.personid
) AS TypeRes1
FROM
personTbl
WHERE person LIKE 'A0%'

Joining Against Derived Table

I'm not sure of the terminology here, so let me give an example. I have this query:
SELECT * FROM Events
--------------------
Id Name StartPeriodId EndPeriodId
1 MyEvent 34 32
In here, the PeriodIds specify how long the event lasts for, think of it as weeks of the year specified in another table if that helps. Notice that the EndPeriodId is not necessarily sequentially after the StartPeriodId. So I could do this:
SELECT * FROM Periods WHERE Id = 34
-----------------------------------
Id StartDate EndDate
34 2009-06-01 2009-08-01
Please do not dwell on this structure, as it's only an example and not how it actually works. What I need to do is come up with this result set:
Id Name PeriodId
1 MyEvent 34
1 MyEvent 33
1 MyEvent 32
In other words, I need to select an event row for each period in which the event exists. I can calculate the Period information (32, 33, 34) easily, but my problem lies in pulling it out in a single query.
This is in SQL Server 2008.
I may be mistaken, and I can't test it right now because there's no SQL Server available right now, but wouldn't that simply be:
SELECT Events.Id, Events.Name, Periods.PeriodId
FROM Periods
INNER JOIN Events
ON Periods.ID BETWEEN Events.StartPeriodId AND Events.EndPeriodId
I'm assuming that you want a listing of all periods that fall between the dates for the periods specified by start/end period id's.
With CTE_PeriodDate (ID, MaxDate, MinDate)
as (
Select Id, Max(Dates) MaxDate, MinDate=Min(Dates) from (
Select e.ID, StartDate as Dates from Events e
Inner join Periods P on P.ID=StartPeriodID
Union All
Select e.ID, EndDate from Events e
Inner join Periods P on P.ID=StartPeriodID
Union All
Select e.ID, StartDate from Events e
Inner join Periods P on P.ID=EndPeriodID
Union All
Select e.ID, EndDate from Events e
Inner join Periods P on P.ID=EndPeriodID ) as A
group by ID)
Select E.Name, P.ID from CTE_PeriodDate CTE
Inner Join Periods p on
(P.StartDate>=MinDate and P.StartDate<=MaxDate)
and (p.EndDate<=MaxDate and P.EndDate>=MinDate)
Inner Join Events E on E.ID=CTE.ID
It's not the best way to do this, but it does work.
It get's the min and max date ranges for the periods specified on the event.
Using these two date it joins with the periods table on values inside the range between the two.
Kris