Count rows from a related table

Count rows from a related table - postgresql

I have the following query to gather data for a report:
SELECT COUNT(*) as inspected,
count(*) filter(where status='fail') as failed,
count(*) filter(where status='deficient') as impaired,
count(*) filter(where status='pass') as passed,
device_types.name
FROM inspection_data
INNER JOIN devices ON devices.id=inspection_data.device_id
INNER JOIN device_types ON devices.device_type_id=device_types.id
WHERE inspection_id = 3
GROUP BY device_types.id
ORDER BY device_types.name
This query is working as intended (Though I'm sure could be optimized somewhat. SQL isn't my strong suit). The problem is that I now want to gather one more summary datum. I want to count the number of each device_type_id in the devices table for this location_id.
I'll try to map out the database tables:
| devices | device_types | inspection_data |
|:--------------:|:------------:|:---------------:|
| id | id | id |
| device_type_id | name | inspection_id |
| location_id | | device_id |
| | | status |
So when I run the query, I'm receiving results similar to this:
| inspected | failed | impaired | passed | name |
|:---------:|:------:|:--------:|--------|----------------------------|
| 6 | 0 | 2 | 4 | Air Sampling Type Detector |
| 9 | 1 | 1 | 7 | Alarm Bell |
And this is great. My hangup is that not all devices for a location have to be inspected during an inspection. So for example, let's say there are actually 15 "Alarm Bell" devices for this location, but only 9 were inspected as part of this inspection, as per the table above. How do I go about including another column in this output, named "total" with a value of 15 for the Alarm Bell device type, and so on for each of the device types in the report?
I hope I've adequately described what I'm trying to do. I am utterly stumped on how to go about this without running a second query, and I really don't want to do that unless absolutely necessary because it just clutters the code up even more.

I think you want a left join. However, I'm not sure what table goes first. My best guess is:
SELECT COUNT(*) as total,
COUNT(id.device_id) as inspected,
COUNT(id.device_id) filter (where status='fail') as failed,
COUNT(id.device_id) filter (where status='deficient') as impaired,
COUNT(id.device_id) filter (where status='pass') as passed,
dt.name
FROM devices d INNER JOIN
device_types dt
ON d.device_type_id = dt.id LEFT JOIN
inspection_data id
ON d.id = id.device_id AND
id.inspection_id = 3
GROUP BY dt.id
ORDER BY dt.name

Related

Need something akin to DISTINCT ON (a) AND DISTINCT ON (b) where normal DISTINCT clause isn't working

Just as a preface, I've tried using SELECT DISTINCT ON(a, b) and SELECT DISTICNT ON (a) ... UNION SELECT DISTINCT ON (b) but neither worked, so I'm reaching out for a possible solution.
I have two tables, player and card (a card represents something like a hitman's contract, with a reference to the 'killer' and the 'victim' which both reference the player table).
When a killer successfully kills a victim, the victim's state is set to 'dead' and their card is freed (by setting the card's killer_id to NULL), entering a kind of "free card pool". The killer is then set to 'idle' (as in waiting for a new card to be assigned to them), and their now completed card is set to completed.
What I'm trying to do is assign idle players cards from the freed card pool, with the additionaly caveat that a player can never receive themselves as a target, i.e. JOINing card.victim_id ON player.id is not allowed. So performing:
SELECT c.id AS card_id, p.id AS player_id
FROM card c
FULL OUTER JOIN player p ON true
WHERE p.state = 'idle'
AND c.killer_id IS NULL
AND c.victim_id != p.id;
returns all possible combinations of cards and players. What I need though is for every card and every player to be combined uniquely, which is to say that every card is assigned to a player and every player is assigned a card, where no card_id or player_id has appeared in a previous row. I've created a DB fiddle here to illustrate my point.
So given the above query, say I get the following dataset:
| card_id | player_id |
|-------------|---------------|
| 2 | 1 |
| 4 | 1 |
| 2 | 3 |
| 4 | 3 |
I want to pare it down into one of the following:
| card_id | player_id | | card_id | player_id |
|-------------|---------------| OR |-------------|---------------|
| 2 | 1 | | 2 | 3 |
| 4 | 3 | | 4 | 1 |
where there are no duplicate values for either column, but the actual combo of card_id and player_id itself doesn't matter.
As stated before, I tried using both SELECT DISTINCT ON(card_id, player_id) and SELECT DISTICNT ON (card_id) ... UNION SELECT DISTINCT ON (player_id) but neither worked.
Any help would be much appreciated!

Sum with different condition for every line

In my Postgresql 9.3 database I have a table stock_rotation:
+----+-----------------+---------------------+------------+---------------------+
| id | quantity_change | stock_rotation_type | article_id | date |
+----+-----------------+---------------------+------------+---------------------+
| 1 | 10 | PURCHASE | 1 | 2010-01-01 15:35:01 |
| 2 | -4 | SALE | 1 | 2010-05-06 08:46:02 |
| 3 | 5 | INVENTORY | 1 | 2010-12-20 08:20:35 |
| 4 | 2 | PURCHASE | 1 | 2011-02-05 16:45:50 |
| 5 | -1 | SALE | 1 | 2011-03-01 16:42:53 |
+----+-----------------+---------------------+------------+---------------------+
Types:
SALE has negative quantity_change
PURCHASE has positive quantity_change
INVENTORY resets the actual number in stock to the given value
In this implementation, to get the current value that an article has in stock, you need to sum up all quantity changes since the latest INVENTORY for the specific article (including the inventory value). I do not know why it is implemented this way and unfortunately it would be quite hard to change this now.
My question now is how to do this for more than a single article at once.
My latest attempt was this:
WITH latest_inventory_of_article as (
SELECT MAX(date)
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
)
SELECT a.id, sum(quantity_change)
FROM stock_rotation sr
INNER JOIN article a ON a.id = sr.article_id
WHERE sr.date >= (COALESCE(
(SELECT date FROM latest_inventory_of_article),
'1970-01-01'
))
GROUP BY a.id
But the date for the latest stock_rotation of type INVENTORY can be different for every article.
I was trying to avoid looping over multiple article ids to find this date.

In this case I would use a different internal query to get the max inventory per article. You are effectively using stock_rotation twice but it should work. If it's too big of a table you can try something else:
SELECT sr.article_id, sum(quantity_change)
FROM stock_rotation sr
LEFT JOIN (
SELECT article_id, MAX(date) AS date
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
GROUP BY article_id) AS latest_inventory
ON latest_inventory.article_id = sr.article_id
WHERE sr.date >= COALESCE(latest_inventory.date, '1970-01-01')
GROUP BY sr.article_id

You can use DISTINCT ON together with ORDER BY to get the latest INVENTORY row for each article_id in the WITH clause.
Then you can join that with the original table to get all later rows and add the values:
WITH latest_inventory as (
SELECT DISTINCT ON (article_id) id, article_id, date
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
ORDER BY article_id, date DESC
)
SELECT article_id, sum(sr.quantity_change)
FROM stock_rotation sr
JOIN latest_inventory li USING (article_id)
WHERE sr.date >= li.date
GROUP BY article_id;

Here is my take on it: First, build the list of products at their last inventory state, using a window function. Then, join it back to the entire list, filtering on operations later than the inventory date for the item.
with initial_inventory as
(
select article_id, date, quantity_change from
(select article_id, date, quantity_change, rank() over (partition by article_id order by date desc)
from stockRotation
where type = 'INVENTORY'
) a
where rank = 1
)
select ii.article_id, ii.quantity_change + sum(sr.quantity_change)
from initial_inventory ii
join stockRotation sr on ii.article_id = sr.article_id and sr.date > ii.date
group by ii.article_id, ii.quantity_change

Unpack expression results from case statement

Four categories in category table.
id | name
--------------
1 | 'wine'
2 | 'chocolate'
3 | 'autos'
4 | 'real estate'
Two of the many (thousands of) forecasters in forecaster table.
id | name
--------------
1 | 'sothebys'
2 | 'cramer'
Relevant forecasts by the forecasters for the categories in the forecast table.
| id | forecaster_id | category_id | forecast |
|----+---------------+-------------+--------------------------------------------------------------|
| 1 | 1 | 1 | 'bad weather, prices rise short-term' |
| 2 | 1 | 2 | 'cocoa bean surplus, prices drop' |
| 3 | 1 | 3 | 'we dont deal with autos - no idea' |
| 4 | 2 | 2 | 'sell, sell, sell' |
| 5 | 2 | 3 | 'demand for cocoa will skyrocket - prices up - buy, buy buy' |
I want prioritized mapping of (forecaster, category, forecast) such that, if a forecast exists for some primary forecaster (e.g. 'cramer') use it because I trust him more. If a forecast exists for some secondary forecaster (e.g. 'sothebys') use that. If no forecast exists for a category, return a row with that category and null for forecast.
I have something that almost works and after I get the logic down I hope to turn into parameterized query.
select
case when F1.category is not null
then (F1.forecaster, F1.category, F1.forecast)
when F2.category is not null
then (F2.forecaster, F2.category, F2.forecast)
else (null, C.category, null)
end
from
(
select
FR.name as forecaster,
C.id as cid,
C.category as category,
F.forecast
from
forecast F
inner join forecaster FR on (F.forecaster_id = FR.id)
inner join category C on (C.id = F.category_id)
where FR.name = 'cramer'
) F1
right join (
select
FR.name as forecaster,
C.id as cid,
C.category as category,
F.forecast
from
forecast F
inner join forecaster FR on (F.forecaster_id = FR.id)
inner join category C on (C.id = F.category_id)
where FR.name = 'sothebys'
) F2 on (F1.cid = F2.cid)
full outer join category C on (C.id = F2.cid);
This gives:
'(sothebys,wine,"bad weather, prices rise short-term")'
'(cramer,chocolate,"sell, sell, sell")'
'(cramer,autos,"demand for cocoa will skyrocket - prices up - buy, buy buy")'
'(,"real estate",)'
While that is the desired data it is a record of one column instead of three. The case was the only way I could find to achieve the ordering of cramer first sothebys next and there is lots of duplication. Is there a better way and how can the tuple like results be pulled back apart into columns?
Any suggestions, especially related to removal of duplication or general simplification appreciated.

This sounds like a case for DISTINCT ON (untested):
SELECT DISTINCT ON (c.id)
fr.name AS forecaster,
c.name AS category,
f.forecast
FROM forecast f
JOIN forecaster fr ON f.forecaster_id = fr.id
RIGHT JOIN category c ON f.category_id = c.id
ORDER BY
c.id,
CASE WHEN fr.name = 'cramer' THEN 0
WHEN fr.name = 'sothebys' THEN 1
ELSE 2
END;
For each category, the first row in the ordering will be picked. Since Cramer has a higher id than Sotheby's, it will be given preference.
Adapt the ORDER BY clause if you need a more complicated ranking.

Select rows of multiple tables via joins

I have some tables which are related to each others.
A short demonstration:
Sites:
id | clip_id | article_id | unit_id
--------------+------------+--------
1 | 123 | 12 | 7
Clips:
id | title | desc |
------------+--------
1 | foo2 | abc1
Articles:
id | title | desc | slug
------------+---------------------
1 | foo2 | abc1 | article.html
Units:
id | vertical_id | title |
------------------+-------+
1 | 123 | abc |
Verticals:
id | name |
-----------+
1 | vfoo |
Now I want to do something like below:
SELECT ALL VERTICAL, UNIT, SITE, CLIP, ARTICLE attributes
from VERTICAL, UNIT, SITE, CLIP, ARTICLE TABLES
WHERE vertical_id = 2
Can some one help me how can I use joins for this?

Here is a running example of possibly what you want: http://sqlfiddle.com/#!15/af63b/2
select * from
sites
inner join units on sites.unit_id=units.id
inner join clips on clips.id=sites.clip_id
inner join articles on articles.id=sites.article_id
inner join verticals on verticals.id=units.vertical_id
where units.vertical_id=123
The problem is, that the description you gave us did not clearly specify which columns to join:
(answered) Why does units have a link to site via site_id and sites a link back to units via unit_id?
(answered) Why does units have a link to verticals via vertical_id and verticals a link back to units via unit_id?
I am guessing that your data does not giva a consistent example to get rows using the join. For vertical_id=123 there is no corresponding entry in verticals.
Edit:
I corrected the SQL due to corrections within the question. With this the two questions are answered.

select s.id, s.clip_id, s.article_id, u.title, u.vertical_id, c.title, v.unit_id, c.desc, a.slug
from sites s
join units u on s.id = u.id
join clips c on u.id = c.id
join verticals v on c.id = v.id
join articles a on v.id = a.id
where v.vertical_id = 'any id'

Why do I have to provide an items.id column to the group by clause?

I want to return unique items based on condition, sorted by price asc. My query fails because Postgres wants items.id to be present in the group by clause. If it's included the query returns everything matching the where clause, which is not what I want. Why do I need to include the column?
select items.*
from items
where product_id = 1 and items.status = 'in_stock'
group by condition /* , items.id returns everything */
order by items.price asc
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 2 | good | 5 |
| 3 | good | 3 |
I only want items with ids 1 and 3.
Update: Here's a fiddle using the answer below, which still produces the error:
http://sqlfiddle.com/#!1/33786/2

The problem is that PostgreSQL has no way of knowing which items records you want to take values from; that is, it can't tell that you want this:
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 3 | good | 3 |
and not this:
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 2 | good | 5 |
To fix this, you need to use some sort of aggregation function, such as MAX:
SELECT MAX(id) AS id,
condition,
MAX(price) AS price
FROM items
WHERE product_id = 1
AND status = 'in_stock'
GROUP BY condition
ORDER BY price ASC
which gives:
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 3 | good | 5 |
(This restriction is part of the SQL standard, and most DBMSes enforce it. One exception is MySQL, which allows your query, but with the caveat that "The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate" [link].)

SQL Fiddle
select *
from (
select distinct on (cond)
id, cond, price
from items
where product_id = 1 and items.status = 'in_stock'
order by cond, price
) s
order by price

The SQL standard requires this behaviour, though some databases like MySQL ignore it and instead return unpredictable results.
If there's more than one row for "cond = good" and you ask for the "id" of the row where "cond = good", which row should the database give you? The row with id = 3, or id = 2? How should it know which to pick? MySQL picks an arbitrary row if there are multiple candidates, but this isn't allowed by the standard.
In your case you seem to want to pick the lowest-price row for each condition.
PostgreSQL provides an extension, DISTINCT ON ..., to help with this. Clodaldo has demonstrated this in his answer, so I won't repeat that here. Using DISTINCT ON will be much more efficient than the example below.
The SQL-standard way would be to use a window to rank the results, then filter on the ranked data. Unfortunately this is pretty inefficient as it requires all rows that match the inner where clause to be collected and sorted.
SELECT *
FROM (
SELECT *, dense_rank() OVER w AS itemrank
FROM items
WHERE product_id = 1 AND items.status = 'in_stock'
WINDOW w AS (PARTITION BY cond ORDER BY price ASC)
) ranked_items
WHERE itemrank = 1;
(http://sqlfiddle.com/#!1/33786/19)
Another SQL-standard way is to use an aggregation subquery to find the min prices for each category then display all rows with the min price:
SELECT *
FROM items INNER JOIN (
SELECT cond, min(price) AS minprice
FROM items
WHERE product_id = 1 AND items.status = 'in_stock'
GROUP BY cond
) minprices(cond, price)
ON (items.price = minprices.price AND items.cond = minprices.cond)
ORDER BY items.price;
Unlike the DISTINCT ON version, though, this will display multiple entries if the lowest priced item has more than one entry with the same cond and price.
So.. you should really use the DISTINCT ON approach, but you need to understand it. Start with the PostgreSQL documentation here.
On a side note, newer PostgreSQL versions allow you to refer to any column of a table whose primary key you've listed in GROUP BY; they identify the functional dependency of the other columns on the primary key. So you don't have to aggregate other cols if you've mentioned the PK in newer versions. That's what the standard requires, but older versions weren't smart enough to figure it out and required all columns to be listed explicitly.
That's what people who ask this question usually want to know, but doesn't apply strictly to your question since it turns out you're trying to use GROUP BY to filter rows.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Count rows from a related table - postgresql

Related

Need something akin to DISTINCT ON (a) AND DISTINCT ON (b) where normal DISTINCT clause isn't working

Sum with different condition for every line

Unpack expression results from case statement

Select rows of multiple tables via joins

Why do I have to provide an items.id column to the group by clause?

Categories

Resources