How to count people in complex age/gender/etc groups? - tsql

I got the following Patients table.
HospitalId INT,
GenderId BIT,
Age TINYINT,
DiseaseId SMALLINT
GenderId = 0 is Male
GenderId = 1 is Female
HospitalA has the HospitalId 0
HospitalB has the HospitalId 1
Here's the output I want to produce:
DiseaseId | HospitalA_Male_18-30 | HospitalA_Male_31-40 |
---------------------------------------------------------
0 | (count here) | (count here) |
1 | (count here) | (count here) |
2 | (count here) | (count here) |
3 | (count here) | (count here) |
(columns continued)
HospitalA_Female_18-30 | HospitalA_Female_31-40 |
-------------------------------------------------
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(columns continued)
HospitalB_Male_18-30 | HospitalB_Male_31-40 |
---------------------------------------------
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(columns continued)
HospitalB_Female_18-30 | HospitalB_Female_31-40 |
-------------------------------------------------
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(9 columns in the result set)
So as you can see I actually need to count, for each disease, how many patients have the disease in each specific group (by hospital, by gender and by age category).
How can such grouping be done (most efficiently) in T-SQL?

You might do it using pivot query:
select * from
(
select diseaseid,
'Hospital'
+ case hospitalid when 0 then 'A' when 1 then 'B' end
+ '_'
+ case genderid when 1 then 'Female' else 'Male' end
+ '_'
+ case when age between 18 and 30
then '18-30'
else (case when age between 31 and 40 then '31-40' end)
end Title,
1 Cnt
from Patients
where age between 18 and 40
) t
pivot (
count (Cnt) for Title in (
[HospitalA_Male_18-30], [HospitalA_Male_31-40],
[HospitalA_Female_18-30], [HospitalA_Female_31-40],
[HospitalB_Male_18-30], [HospitalB_Male_31-40],
[HospitalB_Female_18-30], [HospitalB_Female_31-40]
)
) as Q
UPDATE
As a development of the above solution, you could also move the name parts from the CASE expressions to their own virtual tables and join the Patients table to them:
;with
hospital (hospitalid, hospitalname) as (
select 0, 'HospitalA' union all
select 1, 'HospitalB'
),
gender (genderid, gendername) as (
select 0, 'Male' union all
select 1, 'Female'
),
agerange (agefrom, ageto) as (
select 18, 30 union all
select 31, 40
)
select * from
(
select p.diseaseid,
h.hospitalname + '_' + g.gendername + '_'
+ rtrim(a.agefrom) + '-' + rtrim(a.ageto) as Title,
1 Cnt
from Patients p
inner join hospital h on p.hospitalid = h.hospitalid
inner join gender g on p.genderid = g.genderid
inner join agerange a on p.age between a.agefrom and a.ageto
where p.age between 18 and 40
) t
pivot (
count (Cnt) for Title in (
[HospitalA_Male_18-30], [HospitalA_Male_31-40],
[HospitalA_Female_18-30], [HospitalA_Female_31-40],
[HospitalB_Male_18-30], [HospitalB_Male_31-40],
[HospitalB_Female_18-30], [HospitalB_Female_31-40]
)
) as Q
The overhead of adding the subselects and joins is made up for by greater ease of meaintenance:
the (meta)data part is separated from the logic part;
the name part lists are more convenient to expand as necessary;
the concatenation expression is easier to modify in case you need to change the format of the target column names.

Please try this
SELECT
DiseaseId,
SUM(CASE WHEN HospitalId = 0 AND GenderId=0 AND (Age BETWEEN 18 AND 30) THEN 1 ELSE 0 END) AS [HospitalA_Male_18-30],
SUM(CASE WHEN HospitalId = 0 AND GenderId=0 AND (Age BETWEEN 31 AND 40) THEN 1 ELSE 0 END) AS [HospitalA_Male_31-40],
SUM(CASE WHEN HospitalId = 0 AND GenderId=1 AND (Age BETWEEN 18 AND 30) THEN 1 ELSE 0 END) AS [HospitalA_Female_18-30],
......
FROM Patients
GROUP BY DiseaseId
ORDER BY DiseaseId

Related

Database only Running total with with conditions

There are a ton of questions about calculating running totals with Postgres but I am struggling to do something slightly different.
I have a table that looks like this
txn_id
amount
String
Integer
amounts can either be positive or negative.
I am trying to return a table that looks like this
txn_id
amount
running_total
overage_total
String
Integer
Integer
Integer
Where running total is running sum of the amount column as long as the amount is greater than zero and overage_total is the running sum of amounts that were lower than zero.
An example of would be
txn_id
amount
a
1
b
2
c
-4
d
2
e
-1
I have been using a window function for the running sum but it's not quite what we need.
The correct table would return
txn_id
amount
running_total
overage_total
a
1
1
0
b
2
3
0
c
-4
0
1
d
2
2
1
e
-1
1
1
Currently I have am doing this in code but it would be really incredible to do it in the database if it's possible.
The pattern here is running total with a cap. It could be achieved with recursive cte:
WITH RECURSIVE cte_r AS (
SELECT t.*, ROW_NUMBER() OVER(ORDER BY t.txn_id) AS rn FROM tab t
), cte AS (
SELECT rn,
txn_id,
amount,
CASE WHEN amount <= 0 THEN 0 ELSE amount END AS total,
CASE WHEN amount <= 0 THEN 1 ELSE 0 END AS overage_total
FROM cte_r
WHERE rn = 1
UNION ALL
SELECT cte_r.rn,
cte_r.txn_id,
cte_r.amount,
CASE WHEN cte.total + cte_r.amount <= 0 THEN 0
ELSE cte.total + cte_r.amount
END AS total,
cte.overage_total + CASE WHEN cte.total + cte_r.amount <= 0
THEN 1 ELSE 0 END AS overage_total
FROM cte
JOIN cte_r
ON cte.rn = cte_r.rn-1
)
SELECT txn_id, amount, total,overage_total
FROM cte
ORDER BY rn;
Output:
+---------+---------+--------+---------------+
| txn_id | amount | total | overage_total |
+---------+---------+--------+---------------+
| a | 1 | 1 | 0 |
| b | 2 | 3 | 0 |
| c | -4 | 0 | 1 |
| d | 2 | 2 | 1 |
| e | -1 | 1 | 1 |
| f | 2 | 3 | 1 |
| h | -4 | 0 | 2 |
+---------+---------+--------+---------------+
db<>fiddle demo
Related: Conditional SUM on Oracle and 7. Capping a running total
An option is to use a function to step through the rows and do calculations:
CREATE FUNCTION runningTotalWithCondition() RETURNS TABLE(txn_id char(1), amount int, running_total integer, overage_total integer) AS
$$
DECLARE
running_total integer := 0;
overage_total integer := 0;
c CURSOR FOR SELECT * FROM t ORDER BY txn_id ASC;
BEGIN
FOR recordvar IN c LOOP
IF (running_total + recordvar.amount) > 0 THEN
running_total = running_total + recordvar.amount;
overage_total = overage_total;
ELSE
overage_total = overage_total + abs(running_total + recordvar.amount);
running_total = 0;
END IF;
RETURN QUERY SELECT recordvar.txn_id, recordvar.amount, running_total, overage_total;
END LOOP;
END;
$$ LANGUAGE plpgsql;
Calling the function:
SELECT * FROM runningTotalWithCondition();

Get different LIMIT on each group on postgresql rank

To get 2 rows from each group I can use ROW_NUMBER() with condition <= 2 at last but my question is what If I want to get different limits on each group e.g 3 rows for section_id 1, 1 rows for 2 and 1 rows for 3?
Given the following table:
db=# SELECT * FROM xxx;
id | section_id | name
----+------------+------
1 | 1 | A
2 | 1 | B
3 | 1 | C
4 | 1 | D
5 | 2 | E
6 | 2 | F
7 | 3 | G
8 | 2 | H
(8 rows)
I get the first 2 rows (ordered by name) for each section_id, i.e. a result similar to:
id | section_id | name
----+------------+------
1 | 1 | A
2 | 1 | B
5 | 2 | E
6 | 2 | F
7 | 3 | G
(5 rows)
Current Query:
SELECT
*
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY section_id ORDER BY name) AS r,
t.*
FROM
xxx t) x
WHERE
x.r <= 2;
Create a table to contain the section limits, then join. The big advantage being that as new sections are required or limits change maintenance is reduced to a single table update and comes at very little cost. See example.
select s.section_id, s.name
from (select section_id, name
, row_number() over (partition by section_id order by name) rn
from sections
) s
left join section_limits sl on (sl.section_id = s.section_id)
where
s.rn <= coalesce(sl.limit_to,2);
Just fix up your where clause:
with numbered as (
select row_number() over (partition by section_id
order by name) as r,
t.*
from xxx t
)
select *
from numbered
where (section_id = 1 and r <= 3)
or (section_id = 2 and r <= 1)
or (section_id = 3 and r <= 1);

Need to have subquery within subquery

I have a stock table which holds for example
Partnumber | Depot | flag_redundant
------------+-------+----------------
1 | 1 | 5
1 | 2 | 0
1 | 3 | 0
1 | 4 | 5
2 | 1 | 0
2 | 2 | 0
2 | 3 | 0
2 | 4 | 0
I need to be able to see the depots in which the parts have not been flagged as redundant, but the flag_redundant has been at least been flagged once for that part, and I need to ignore any parts where there has not been a flag flagged.
Any help appreciated!
I'm thinking of something along the lines of ....
SELECT stock.part, stock.depot,
OrderCount = (SELECT CASE WHEN Stock.flag_redundant = 5 THEN 1 end as Countcolumn FROM stock C)
FROM stock
Partnumber | MissingDepots
------------+---------------
1 | Yes
You can group by partnumber and set the conditions in the HAVING clause:
select
partnumber, 'Yes' MissingDepots
from stock
group by partnumber
having
sum(flag_redundant) > 0 and
sum(case when flag_redundant = 0 then 1 end) > 0
Or:
select
partnumber, 'Yes' MissingDepots
from stock
group by partnumber
having sum(case when flag_redundant = 0 then 1 end) between 1 and count(*) - 1
See the demo.
Results:
> partnumber | missingdepots
> ---------: | :------------
> 1 | Yes
Assuming you want to get these partnumbers that contain data sets with flag_redundant = 5 AND 0:
demo:db<>fiddle
SELECT
partnumber,
'Yes' AS missing
FROM (
SELECT
partnumber,
COUNT(flag_redundant) FILTER (WHERE flag_redundant = 5) AS cnt_redundant, -- 2
COUNT(*) AS cnt -- 3
FROM
stock
GROUP BY partnumber -- 1
) s
WHERE cnt_redundant > 0 -- 4
AND cnt_redundant < cnt -- 5
Group by partnumber
Count all records with flag_redundant = 5
Count all records
Find all partnumbers that contain any element with 5 ...
... and which have more records than 5-element records

Group rows into two types depending on a value in column

I have a table:
------------------------------------------
Uid | mount | category
-----------------------------------------
1 | 10 | a
1 | 3 | b
3 | 7 | a
4 | 1 | b
4 | 12 | a
4 | 5 | b
1 | 2 | c
2 | 5 | d
I want to have one result like this:
------------------------------------------
Uid | suma | sumnota
-----------------------------------------
1 | 10 | 5
2 | 0 | 5
3 | 7 | 0
4 | 12 | 6
Group by uid;
Suma is sum(mount) where catagory = 'a';
Sumnota is sum(mount) where catagory <> 'a';
Any ideas how to do it?
Use conditional aggregation with CASE statements in SUM() function:
SELECT
uid
, SUM(CASE WHEN category = 'a' THEN mount ELSE 0 END) AS suma
, SUM(CASE WHEN category IS DISTINCT FROM 'a' THEN mount ELSE 0 END) AS sumnota
FROM
yourtable
GROUP BY uid
ORDER BY uid
I'm using IS DISTINCT FROM clause to properly handle NULL values in category column. If that's not your case you could simply use <> operator.
From documentation (bold emphasis mine):
Ordinary comparison operators yield null (signifying "unknown"), not
true or false, when either input is null.
For non-null inputs, IS DISTINCT FROM is the same as the <> operator. However, if both inputs are null it returns false, and if only one input is null it returns true.
Here's a solution more "verbosed" than accepted answer.
WITH
t_suma AS ( SELECT uid, SUM(mount) AS suma
FROM your_table
WHERE category = 'a'
GROUP BY uid ),
t_sumnota AS ( SELECT uid, SUM(mount) AS sumnota
FROM your_table
WHERE category <> 'a' or category is NULL
GROUP BY uid )
SELECT distinct y.uid, COALESCE( suma, 0) AS suma, COALESCE( sumnota, 0 ) AS sumnota
FROM your_table y LEFT OUTER JOIN t_suma ON ( y.uid = t_suma.uid )
LEFT OUTER JOIN t_sumnota ON ( y.uid = t_sumnota.uid )
ORDER BY uid;

pl sql query recuresive looping

i have only one table "tbl_test"
Which have table filed given below
tbl_test table
trx_id | proj_num | parent_num|
1 | 14 | 0 |
2 | 14 | 1 |
3 | 14 | 2 |
4 | 14 | 0 |
5 | 14 | 3 |
6 | 15 | 0 |
Result i want is : when trx_id value 5 is fetched
it's a parent child relationship. so,
trx_id -> parent_num
5 -> 3
3 -> 2
2 -> 1
That means output value:
3
2
1
Getting all parent chain
Query i used :
SELECT * FROM (
WITH RECURSIVE tree_data(project_num, task_num, parent_task_num) AS(
SELECT project_num, task_num, parent_task_num
FROM tb_task
WHERE project_num = 14 and task_num = 5
UNION ALL
SELECT child.project_num, child.task_num, child.parent_task_num
FROM tree_data parent Join tb_task child
ON parent.task_num = child.task_num AND parent.task_num = child.parent_task_num
)
SELECT project_num, task_num, parent_task_num
FROM tree_data
) AS tree_list ;
Can anybody help me ?
There's no need to do this with pl/pgsql. You can do it straight in SQL. Consider:
WITH RECURSIVE my_tree AS (
SELECT trx_id as id, parent_id as parent, trx_id::text as path, 1 as level
FROM tbl_test
WHERE trx_id = 5 -- start value
UNION ALL
SELECT t.trx_id, t.parent_id, p.path || ',' || t.trx_id::text, p.level + 1
FROM my_tree p
JOIN tbl_text t ON t.trx_id = p.parent
)
select * from my_tree;
If you are using PostgresSQL, try using a WITH clause:
WITH regional_sales AS (
SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region
), top_regions AS (
SELECT region
FROM regional_sales
WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
)
SELECT region,
product,
SUM(quantity) AS product_units,
SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;