Why do I have to provide an items.id column to the group by clause? - postgresql

I want to return unique items based on condition, sorted by price asc. My query fails because Postgres wants items.id to be present in the group by clause. If it's included the query returns everything matching the where clause, which is not what I want. Why do I need to include the column?
select items.*
from items
where product_id = 1 and items.status = 'in_stock'
group by condition /* , items.id returns everything */
order by items.price asc
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 2 | good | 5 |
| 3 | good | 3 |
I only want items with ids 1 and 3.
Update: Here's a fiddle using the answer below, which still produces the error:
http://sqlfiddle.com/#!1/33786/2

The problem is that PostgreSQL has no way of knowing which items records you want to take values from; that is, it can't tell that you want this:
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 3 | good | 3 |
and not this:
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 2 | good | 5 |
To fix this, you need to use some sort of aggregation function, such as MAX:
SELECT MAX(id) AS id,
condition,
MAX(price) AS price
FROM items
WHERE product_id = 1
AND status = 'in_stock'
GROUP BY condition
ORDER BY price ASC
which gives:
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 3 | good | 5 |
(This restriction is part of the SQL standard, and most DBMSes enforce it. One exception is MySQL, which allows your query, but with the caveat that "The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate" [link].)

SQL Fiddle
select *
from (
select distinct on (cond)
id, cond, price
from items
where product_id = 1 and items.status = 'in_stock'
order by cond, price
) s
order by price

The SQL standard requires this behaviour, though some databases like MySQL ignore it and instead return unpredictable results.
If there's more than one row for "cond = good" and you ask for the "id" of the row where "cond = good", which row should the database give you? The row with id = 3, or id = 2? How should it know which to pick? MySQL picks an arbitrary row if there are multiple candidates, but this isn't allowed by the standard.
In your case you seem to want to pick the lowest-price row for each condition.
PostgreSQL provides an extension, DISTINCT ON ..., to help with this. Clodaldo has demonstrated this in his answer, so I won't repeat that here. Using DISTINCT ON will be much more efficient than the example below.
The SQL-standard way would be to use a window to rank the results, then filter on the ranked data. Unfortunately this is pretty inefficient as it requires all rows that match the inner where clause to be collected and sorted.
SELECT *
FROM (
SELECT *, dense_rank() OVER w AS itemrank
FROM items
WHERE product_id = 1 AND items.status = 'in_stock'
WINDOW w AS (PARTITION BY cond ORDER BY price ASC)
) ranked_items
WHERE itemrank = 1;
(http://sqlfiddle.com/#!1/33786/19)
Another SQL-standard way is to use an aggregation subquery to find the min prices for each category then display all rows with the min price:
SELECT *
FROM items INNER JOIN (
SELECT cond, min(price) AS minprice
FROM items
WHERE product_id = 1 AND items.status = 'in_stock'
GROUP BY cond
) minprices(cond, price)
ON (items.price = minprices.price AND items.cond = minprices.cond)
ORDER BY items.price;
Unlike the DISTINCT ON version, though, this will display multiple entries if the lowest priced item has more than one entry with the same cond and price.
So.. you should really use the DISTINCT ON approach, but you need to understand it. Start with the PostgreSQL documentation here.
On a side note, newer PostgreSQL versions allow you to refer to any column of a table whose primary key you've listed in GROUP BY; they identify the functional dependency of the other columns on the primary key. So you don't have to aggregate other cols if you've mentioned the PK in newer versions. That's what the standard requires, but older versions weren't smart enough to figure it out and required all columns to be listed explicitly.
That's what people who ask this question usually want to know, but doesn't apply strictly to your question since it turns out you're trying to use GROUP BY to filter rows.

Related

Counting rows with different conditions in one query

Let's say I have this table:
id | status | type
----+--------+----------
1 | new | car
2 | new | boat
3 | used | car
4 | new | car
and I wanted to count all the new vehicles, and the number of cars in one go, how to do that?
I tried this:
SELECT COUNT(status='new'), COUNT(type='car') FROM table;
but it always counts to 4 (the total amount of rows). The only thing I can think of is using a CASE inside the COUNT, but is there a cleaner way?
You could use FILTER to perform conditional aggregation:
SELECT COUNT(*) FILTER(WHERE status='new'), COUNT(*) FILTER(WHERE type='car')
FROM tab;
Alternatively SUM:
SELECT SUM((status='new')::int), SUM((type='car')::int) FROM table;

How do I write postgres conditional SELECT query?

I have a table that has 3 columns.
id | name | score | approve
--------------------
1 | foo | 90 | f
2 | foo | 80 | t
I want to
SELECT id WHERE name='foo'
with these conditions:
if approve is True, then return that one (only one will be true for the same name)
otherwise select the one that has highest score
I was looking into IF...ELSE but cannot even come up with a query that executes (despite a working one...)
How to set up the query command for this type of queries?
In SQL, you can often use some logic by defining the right order and limit:
select id
from my_table
where name = 'foo'
order by approve desc, score desc
limit 1

How to find duplicate rows in JPA

Is there a way to find duplicate entries in a data set using JPA?
| id | text |
-------------
| 1 | foo |
| 2 | bar |
| 3 | foo |
I want to have only entries 1 & 3 in my set.
I can't make it unique on this field.
—
DISTINCT would give me rows 1 & 2.
If it’s a query, a join with the same table? I’m not sure how that would work. I couldn’t get group by to function.
Edited
I believe you can use the following syntax without inner query:
SELECT id, text, COUNT(*) FROM entity GROUP BY text HAVING COUNT(*) > 1
You can apply common practice from SQL to JPQL with the following query:
SELECT e FROM Entity e WHERE e.text IN (SELECT text FROM Entity d GROUP BY text HAVING COUNT(*)>1.
A sub-query is required so you'd need an index on text column for it to be efficient.

PostgreSQL Group By not working as expected - wants too many inclusions

I have a simple postgresql table that I'm tying to query. Imaging a table like this...
| ID | Account_ID | Iteration |
|----|------------|-----------|
| 1 | 100 | 1 |
| 2 | 101 | 1 |
| 3 | 100 | 2 |
I need to get the ID column for each Account_ID where Iteration is at its maximum value. So, you'd think something like this would work
SELECT "ID", "Account_ID", MAX("Iteration")
FROM "Table_Name"
GROUP BY "Account_ID"
And I expect to get:
| ID | Account_ID | MAX(Iteration) |
|----|------------|----------------|
| 2 | 101 | 1 |
| 3 | 100 | 2 |
But when I do this, Postgres complains:
ERROR: column "ID" must appear in the GROUP BY clause or be used in an aggregate function
Which, when I do that it just destroys the grouping altogether and gives me the whole table!
Is the best way to approach this using the following?
SELECT DISTINCT ON ("Account_ID") "ID", "Account_ID", "Iteration"
FROM "Marketing_Sparks"
ORDER BY "Account_ID" ASC, "Iteration" DESC;
The GROUP BY statement aggregates rows with the same values in the columns included in the group by into a single row. Because this row isn't the same as the original row, you can't have a column that is not in the group by or in an aggregate function. To get what you want, you will probably have to select without the ID column, then join the result to the original table. I don't know PostgreSQL syntax, but I assume it would be something like the following.
SELECT Table_Name.ID, aggregate.Account_ID, aggregate.MIteration
(SELECT Account_ID, MAX(Iteration) AS MIteration
FROM Table_Name
GROUP BY Account_ID) aggregate
LEFT JOIN Table_Name ON aggregate.Account_ID = Table_Name.Account_ID AND
aggregate.MIteration = Tabel_Name.Iteration

Adding the results of two select queries into one table row with PostgreSQL

I am attempting to return the result of two distinct select statements into one row in PostgreSQL. For example, I have two queries each that return the same number of rows:
Select tableid1, tableid2, tableid3 from table1
+----------+----------+----------+
| tableid1 | tableid2 | tableid3 |
+----------+----------+----------+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+----------+----------+----------+
Select table2id1, table2id2, table2id3, table2id4 from table2
+-----------+-----------+-----------+-----------+
| table2id1 | table2id2 | table2id3 | table2id4 |
+-----------+-----------+-----------+-----------+
| 7 | 8 | 9 | 15 |
| 10 | 11 | 12 | 19 |
+-----------+-----------+-----------+-----------+
Now i want to concatenate these tables keeping the same number of rows. I do not want to join on any values. The desired result would look like the following:
+----------+----------+----------+-----------+-----------+-----------+-----------+
| tableid1 | tableid2 | tableid3 | table2id1 | table2id2 | table2id3 | table2id4 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
| 1 | 2 | 3 | 7 | 8 | 9 | 15 |
| 4 | 5 | 6 | 10 | 11 | 12 | 19 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
What can I do to the two above queries (select * from table1) and (select * from table2) to return the desired result above.
Thanks!
You can use row_number() for join, but I'm not sure that you have guaranties that order of the rows will stay the same as in the tables. So it's better to add some order into over() clause.
with cte1 as (
select
tableid1, tableid2, tableid3, row_number() over() as rn
from table1
), cte2 as (
select
table2id1, table2id2, table2id3, table2id4, row_number() over() as rn
from table2
)
select *
from cte1 as c1
inner join cte2 as c2 on c2.rn = c1.rn
You can't have what you want, as you wrote the question. Your two SELECTs don't have any ORDER BY clause, so the database can return the rows in whatever order it feels like. If it currently matches up, it does so only by accident, and will stop matching up as soon as you UPDATE a row.
You need a key column. Then you need to join on the key column. Anything else is attempting to invent unreliable and unsafe joins without actually using a join.
Frankly, this seems like a pretty dodgy schema. Lots of numbered integer columns like this, and the desire to concatenate them, may be a sign you should be looking at using integer arrays, or using a side-table with a foreign key relationship, instead.
Sample data in case anyone else wants to play:
CREATE TABLE table1(tableid1 integer, tableid2 integer, tableid3 integer);
INSERT INTO table1 VALUES (1,2,3), (4,5,6);
CREATE TABLE table2(table2id1 integer, table2id2 integer, table2id3 integer, table2id4 integer);
INSERT INTO table2 VALUES (7,8,9,15), (10,11,12,19);
Depending on what you're actually doing you might really have wanted arrays.
I think you might need to read these two posts:
Join 2 sets based on default order
How keep data don't sort?
which explain that SQL tables just don't have an order. So you cannot fetch them in a particular order.
DO NOT USE THE FOLLOWING CODE, IT IS DANGEROUS AND ONLY INCLUDED AS A PROOF OF CONCEPT:
As it happens you can use a set-returning function hack to very inefficiently do what you want. It's incredibly ugly and *completely unsafe without an ORDER BY in the SELECTs, but I'll include it for completeness. I guess.
CREATE OR REPLACE FUNCTION t1() RETURNS SETOF table1 AS $$ SELECT * FROM table1 $$ LANGUAGE sql;
CREATE OR REPLACE FUNCTION t2() RETURNS SETOF table2 AS $$ SELECT * FROM table2 $$ LANGUAGE sql;
SELECT (t1()).*, (t2()).*;
If you use this in any real code then kittens will cry. It'll produce insane and bizarre results if the number of rows in the tables differ and it'll produce the rows in orderings that might seem right at first, but will randomly start coming out wrong later on.
THE SANE WAY is to add a primary key properly, then do a join.