How do I write postgres conditional SELECT query? - postgresql

I have a table that has 3 columns.
id | name | score | approve
--------------------
1 | foo | 90 | f
2 | foo | 80 | t
I want to
SELECT id WHERE name='foo'
with these conditions:
if approve is True, then return that one (only one will be true for the same name)
otherwise select the one that has highest score
I was looking into IF...ELSE but cannot even come up with a query that executes (despite a working one...)
How to set up the query command for this type of queries?

In SQL, you can often use some logic by defining the right order and limit:
select id
from my_table
where name = 'foo'
order by approve desc, score desc
limit 1

Related

Counting rows with different conditions in one query

Let's say I have this table:
id | status | type
----+--------+----------
1 | new | car
2 | new | boat
3 | used | car
4 | new | car
and I wanted to count all the new vehicles, and the number of cars in one go, how to do that?
I tried this:
SELECT COUNT(status='new'), COUNT(type='car') FROM table;
but it always counts to 4 (the total amount of rows). The only thing I can think of is using a CASE inside the COUNT, but is there a cleaner way?
You could use FILTER to perform conditional aggregation:
SELECT COUNT(*) FILTER(WHERE status='new'), COUNT(*) FILTER(WHERE type='car')
FROM tab;
Alternatively SUM:
SELECT SUM((status='new')::int), SUM((type='car')::int) FROM table;

How to find duplicate rows in JPA

Is there a way to find duplicate entries in a data set using JPA?
| id | text |
-------------
| 1 | foo |
| 2 | bar |
| 3 | foo |
I want to have only entries 1 & 3 in my set.
I can't make it unique on this field.
—
DISTINCT would give me rows 1 & 2.
If it’s a query, a join with the same table? I’m not sure how that would work. I couldn’t get group by to function.
Edited
I believe you can use the following syntax without inner query:
SELECT id, text, COUNT(*) FROM entity GROUP BY text HAVING COUNT(*) > 1
You can apply common practice from SQL to JPQL with the following query:
SELECT e FROM Entity e WHERE e.text IN (SELECT text FROM Entity d GROUP BY text HAVING COUNT(*)>1.
A sub-query is required so you'd need an index on text column for it to be efficient.

PostgreSQL Group By not working as expected - wants too many inclusions

I have a simple postgresql table that I'm tying to query. Imaging a table like this...
| ID | Account_ID | Iteration |
|----|------------|-----------|
| 1 | 100 | 1 |
| 2 | 101 | 1 |
| 3 | 100 | 2 |
I need to get the ID column for each Account_ID where Iteration is at its maximum value. So, you'd think something like this would work
SELECT "ID", "Account_ID", MAX("Iteration")
FROM "Table_Name"
GROUP BY "Account_ID"
And I expect to get:
| ID | Account_ID | MAX(Iteration) |
|----|------------|----------------|
| 2 | 101 | 1 |
| 3 | 100 | 2 |
But when I do this, Postgres complains:
ERROR: column "ID" must appear in the GROUP BY clause or be used in an aggregate function
Which, when I do that it just destroys the grouping altogether and gives me the whole table!
Is the best way to approach this using the following?
SELECT DISTINCT ON ("Account_ID") "ID", "Account_ID", "Iteration"
FROM "Marketing_Sparks"
ORDER BY "Account_ID" ASC, "Iteration" DESC;
The GROUP BY statement aggregates rows with the same values in the columns included in the group by into a single row. Because this row isn't the same as the original row, you can't have a column that is not in the group by or in an aggregate function. To get what you want, you will probably have to select without the ID column, then join the result to the original table. I don't know PostgreSQL syntax, but I assume it would be something like the following.
SELECT Table_Name.ID, aggregate.Account_ID, aggregate.MIteration
(SELECT Account_ID, MAX(Iteration) AS MIteration
FROM Table_Name
GROUP BY Account_ID) aggregate
LEFT JOIN Table_Name ON aggregate.Account_ID = Table_Name.Account_ID AND
aggregate.MIteration = Tabel_Name.Iteration

Count rows from a related table

I have the following query to gather data for a report:
SELECT COUNT(*) as inspected,
count(*) filter(where status='fail') as failed,
count(*) filter(where status='deficient') as impaired,
count(*) filter(where status='pass') as passed,
device_types.name
FROM inspection_data
INNER JOIN devices ON devices.id=inspection_data.device_id
INNER JOIN device_types ON devices.device_type_id=device_types.id
WHERE inspection_id = 3
GROUP BY device_types.id
ORDER BY device_types.name
This query is working as intended (Though I'm sure could be optimized somewhat. SQL isn't my strong suit). The problem is that I now want to gather one more summary datum. I want to count the number of each device_type_id in the devices table for this location_id.
I'll try to map out the database tables:
| devices | device_types | inspection_data |
|:--------------:|:------------:|:---------------:|
| id | id | id |
| device_type_id | name | inspection_id |
| location_id | | device_id |
| | | status |
So when I run the query, I'm receiving results similar to this:
| inspected | failed | impaired | passed | name |
|:---------:|:------:|:--------:|--------|----------------------------|
| 6 | 0 | 2 | 4 | Air Sampling Type Detector |
| 9 | 1 | 1 | 7 | Alarm Bell |
And this is great. My hangup is that not all devices for a location have to be inspected during an inspection. So for example, let's say there are actually 15 "Alarm Bell" devices for this location, but only 9 were inspected as part of this inspection, as per the table above. How do I go about including another column in this output, named "total" with a value of 15 for the Alarm Bell device type, and so on for each of the device types in the report?
I hope I've adequately described what I'm trying to do. I am utterly stumped on how to go about this without running a second query, and I really don't want to do that unless absolutely necessary because it just clutters the code up even more.
I think you want a left join. However, I'm not sure what table goes first. My best guess is:
SELECT COUNT(*) as total,
COUNT(id.device_id) as inspected,
COUNT(id.device_id) filter (where status='fail') as failed,
COUNT(id.device_id) filter (where status='deficient') as impaired,
COUNT(id.device_id) filter (where status='pass') as passed,
dt.name
FROM devices d INNER JOIN
device_types dt
ON d.device_type_id = dt.id LEFT JOIN
inspection_data id
ON d.id = id.device_id AND
id.inspection_id = 3
GROUP BY dt.id
ORDER BY dt.name

Why do I have to provide an items.id column to the group by clause?

I want to return unique items based on condition, sorted by price asc. My query fails because Postgres wants items.id to be present in the group by clause. If it's included the query returns everything matching the where clause, which is not what I want. Why do I need to include the column?
select items.*
from items
where product_id = 1 and items.status = 'in_stock'
group by condition /* , items.id returns everything */
order by items.price asc
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 2 | good | 5 |
| 3 | good | 3 |
I only want items with ids 1 and 3.
Update: Here's a fiddle using the answer below, which still produces the error:
http://sqlfiddle.com/#!1/33786/2
The problem is that PostgreSQL has no way of knowing which items records you want to take values from; that is, it can't tell that you want this:
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 3 | good | 3 |
and not this:
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 2 | good | 5 |
To fix this, you need to use some sort of aggregation function, such as MAX:
SELECT MAX(id) AS id,
condition,
MAX(price) AS price
FROM items
WHERE product_id = 1
AND status = 'in_stock'
GROUP BY condition
ORDER BY price ASC
which gives:
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 3 | good | 5 |
(This restriction is part of the SQL standard, and most DBMSes enforce it. One exception is MySQL, which allows your query, but with the caveat that "The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate" [link].)
SQL Fiddle
select *
from (
select distinct on (cond)
id, cond, price
from items
where product_id = 1 and items.status = 'in_stock'
order by cond, price
) s
order by price
The SQL standard requires this behaviour, though some databases like MySQL ignore it and instead return unpredictable results.
If there's more than one row for "cond = good" and you ask for the "id" of the row where "cond = good", which row should the database give you? The row with id = 3, or id = 2? How should it know which to pick? MySQL picks an arbitrary row if there are multiple candidates, but this isn't allowed by the standard.
In your case you seem to want to pick the lowest-price row for each condition.
PostgreSQL provides an extension, DISTINCT ON ..., to help with this. Clodaldo has demonstrated this in his answer, so I won't repeat that here. Using DISTINCT ON will be much more efficient than the example below.
The SQL-standard way would be to use a window to rank the results, then filter on the ranked data. Unfortunately this is pretty inefficient as it requires all rows that match the inner where clause to be collected and sorted.
SELECT *
FROM (
SELECT *, dense_rank() OVER w AS itemrank
FROM items
WHERE product_id = 1 AND items.status = 'in_stock'
WINDOW w AS (PARTITION BY cond ORDER BY price ASC)
) ranked_items
WHERE itemrank = 1;
(http://sqlfiddle.com/#!1/33786/19)
Another SQL-standard way is to use an aggregation subquery to find the min prices for each category then display all rows with the min price:
SELECT *
FROM items INNER JOIN (
SELECT cond, min(price) AS minprice
FROM items
WHERE product_id = 1 AND items.status = 'in_stock'
GROUP BY cond
) minprices(cond, price)
ON (items.price = minprices.price AND items.cond = minprices.cond)
ORDER BY items.price;
Unlike the DISTINCT ON version, though, this will display multiple entries if the lowest priced item has more than one entry with the same cond and price.
So.. you should really use the DISTINCT ON approach, but you need to understand it. Start with the PostgreSQL documentation here.
On a side note, newer PostgreSQL versions allow you to refer to any column of a table whose primary key you've listed in GROUP BY; they identify the functional dependency of the other columns on the primary key. So you don't have to aggregate other cols if you've mentioned the PK in newer versions. That's what the standard requires, but older versions weren't smart enough to figure it out and required all columns to be listed explicitly.
That's what people who ask this question usually want to know, but doesn't apply strictly to your question since it turns out you're trying to use GROUP BY to filter rows.