Limit query by count distinct column values - postgresql

I have a table with people, something like this:
ID PersonId SomeAttribute
1 1 yellow
2 1 red
3 2 yellow
4 3 green
5 3 black
6 3 purple
7 4 white
Previously I was returning all of Persons to API as seperate objects. So if user set limit to 3, I was just setting query maxResults in hibernate to 3 and returning:
{"PersonID": 1, "attr":"yellow"}
{"PersonID": 1, "attr":"red"}
{"PersonID": 2, "attr":"yellow"}
and if someone specify limit to 3 and page 2(setMaxResult(3), setFirstResult(6) it would be:
{"PersonID": 3, "attr":"green"}
{"PersonID": 3, "attr":"black"}
{"PersonID": 3, "attr":"purple"}
But now I want to select people and combine then into one json object to look like this:
{
"PersonID":3,
"attrs": [
{"attr":"green"},
{"attr":"black"},
{"attr":"purple"}
]
}
And here is the problem. Is there any possibility in postgresql or hibernate to set limit not by number of rows but to number of distinct people ids, because if user specifies limit to 4 I should return person1, 2, 3 and 4, but in my current limiting mechanism I will return person1 with 2 attributes, person2 and person3 with only one attribute. Same problem with pagination, now I can return half of a person3 array attrs on one page and another half on next page.

You can use row_number to simulate LIMIT:
-- Test data
CREATE TABLE person AS
WITH tmp ("ID", "PersonId", "SomeAttribute") AS (
VALUES
(1, 1, 'yellow'::TEXT),
(2, 1, 'red'),
(3, 2, 'yellow'),
(4, 3, 'green'),
(5, 3, 'black'),
(6, 3, 'purple'),
(7, 4, 'white')
)
SELECT * FROM tmp;
-- Returning as a normal column (limit by someAttribute size)
SELECT * FROM (
select
"PersonId",
"SomeAttribute",
row_number() OVER(PARTITION BY "PersonId" ORDER BY "PersonId") AS rownum
from
person) as tmp
WHERE rownum <= 3;
-- Returning as a normal column (overall limit)
SELECT * FROM (
select
"PersonId",
"SomeAttribute",
row_number() OVER(ORDER BY "PersonId") AS rownum
from
person) as tmp
WHERE rownum <= 4;
-- Returning as a JSON column (limit by someAttribute size)
SELECT "PersonId", json_object_agg('color', "SomeAttribute") AS attributes FROM (
select
"PersonId",
"SomeAttribute",
row_number() OVER(PARTITION BY "PersonId" ORDER BY "PersonId") AS rownum
from
person) as tmp
WHERE rownum <= 3 GROUP BY "PersonId";
-- Returning as a JSON column (limit by person)
SELECT "PersonId", json_object_agg('color', "SomeAttribute") AS attributes FROM (
select
"PersonId",
"SomeAttribute"
from
person) as tmp
GROUP BY "PersonId"
LIMIT 4;
In this case, of course, you must use a native query, but this is a small trade-off IMHO.
More info here and here.

I'm assuming you have another Person table. With JPA, you should do the query on Person table(one side), not on the PersonColor(many side).Then the limit will be applied on number of rows of Person then
If you don't have the Person table and can't modify the DB, what you can do is use SQL and Group By PersonId, and concatenate colors
select PersonId, array_agg(Color) FROM my_table group by PersonId limit 2
SQL Fiddle

Thank you guys. After I realize that it could not be done with one query I just do sth like
temp_query = select distinct x.person_id from (my_original_query) x
with user specific page/per_page
and then:
my_original_query += " AND person_id in (temp_query_results)

Related

PostgreSQL sum some values together and don't for other

SELECT
t.id,
sum(o.amount),
t.parent_id
FROM tab t
LEFT JOIN order o ON o.deal = t.id
GROUP BY t.id
Current output:
id
sum
parent_id
1
10
2
10
3
15
5
4
30
5
5
0
6
0
8
7
0
8
8
20
Desired logic, if the row contains parent_id then skip it but add everything together in the sum field so for id 3,4,5 the total would be 45 and only the id 5 would be shown. There can be cases when the sums are in the "sub tabs" or in the "main tab" but everything should be summed together.
Desired output:
id
sum
parent_id
1
10
2
10
5
45
8
20
What have I tried so far is to do sub-selects and played around with group by. Can someone point me to the right direction?
Use coalesce().
with the_data(id, sum, parent_id) as (
values
(1, 10, null),
(2, 10, null),
(3, 15, 5),
(4, 30, 5),
(5, 0, null),
(6, 0, 8),
(7, 0, 8),
(8, 20, null)
)
select coalesce(parent_id, id) as id, sum(sum)
from the_data
group by 1
order by 1
Read about the feature in the documentation.
Db<>fiddle.
Your query isn't valid in PostgreSQL:
SELECT
t.id,
sum(o.amount),
t.parent_id
FROM tab t
LEFT JOIN order o ON o.deal = t.id
GROUP BY t.id
Unlike MySQL, PostgreSQL doesn't have implicit GROUP BY columns (unless something changed recently).
Anyway, if you're using t.id in your GROUP BY clause, then each t.id will produce one row, so you'll always have 3 and 4 separated, for example.
It looks like you're trying to use the parent_id as the main criterion to group by, falling back on the id when the parent_id is NULL.
You could use COALESCE(t.parent_id, t.id) to get this value for each row, and then group using it.
For example:
SELECT
COALESCE(t.parent_id, t.id),
SUM(o.amount)
FROM tab t
LEFT JOIN order o ON o.deal = t.id
GROUP BY COALESCE(t.parent_id, t.id)

Convert jsonb in PostgreSQL to rows without cycle

ffI have a json array stored in my postgres database. The first table "Orders" looks like this:
order_id, basket_items_id
1, {1,2}
2, {3}
3, {1,2,3,1}
Second table "Items" looks like this:
item_id, price
1,5
2,3
3,20
Already tried to load data with multiple sql and select of different jsonb record, but this is not a silver bullet.
SELECT
sum(price)
FROM orders
INNER JOIN items on
orders.basket_items_id = items.item_id
WHERE order_id = 3;
Want to get this as output:
order_id, basket_items_id, price
1, 1, 5
1, 2, 3
2, 3, 20
3, 1, 5
3, 2, 3
3, 3, 20
3, 1, 5
or this:
order_id, sum(price)
1, 8
2, 20
3, 33
demo:db<>fiddle
SELECT
o.order_id,
elems.value::int as basket_items_id,
i.price
FROM
orders o, jsonb_array_elements_text(basket_items_id) as elems
LEFT JOIN items i
ON i.item_id = elems.value::int
ORDER BY 1,2,3
jsonb_array_elements_text expands the jsonb array into one row each element. With this you are able to join against your second table directly
Since the expanded array gives you text elements you have to cast them into integers using ::int
Of course you can GROUP and SUM aggregate this as well:
SELECT
o.order_id,
SUM(i.price)
FROM
orders o, jsonb_array_elements_text(basket_items_id) as elems
LEFT JOIN items i
ON i.item_id = elems.value::int
GROUP BY o.order_id
ORDER BY 1
Is your orders.basket_items_id column of type jsonb or int[]?
If the type is jsonb you can use json_array_elements_text to expand the column:
SELECT
o.order_id,
o.basket_item_id,
items.price
FROM
(
SELECT
order_id,
jsonb_array_elements_text(basket_items_id)::int basket_item_id
FROM
orders
) o
JOIN
items ON o.basket_item_id = items.item_id
ORDER BY
1, 2, 3;
See this DB-Fiddle.
If the type is int[] (array of integers), you can run a similar query with the unnest function:
SELECT
o.order_id,
o.basket_item_id,
items.price
FROM
(
SELECT
order_id,
unnest(basket_items_id) basket_item_id
FROM
orders
) o
JOIN
items ON o.basket_item_id = items.item_id
ORDER BY
1, 2, 3;
See this DB-fiddle

Select values that exists in array, but do not exits in database?

I have got DB with IDs: 1 2 3 4 5. I need to return elements that exists in my array (simple list of data that usually specifying in IN ( ... ) ), but DO NOT exits in DB.
For example checking values: 1, 2, 3, 4, 5, 6, 7.
So query should return 6, 7. How can I do it's with PostgreSQL?
This can be solved using except
select *
from unnest(array[1,2,3,4,5,6]) as t(id)
except
select id
from the_table
With some test data:
select *
from unnest(array[1,2,3,4,5,6]) as t(id)
except
select id
from (values (1), (2), (3), (4) ) as the_table(id)
returns
id
--
5
6
If you want a query that excludes all elements in a list you can use the NOT IN statement.
SELECT * FROM someTable WHERE id NOT IN (1, 2, 3, 4, 5);
In your case you can create the query from your array.
with t (id) as (values (1),(2),(3),(4),(5))
select u.id
from
t
right join
unnest(array[1,2,3,4,5,6,7]) u (id) on t.id = u.id
where t.id is null
;
id
----
6
7

Summarizing Only Rows with given criteria

all!
Given the following table structure
DECLARE #TempTable TABLE
(
idProduct INT,
Layers INT,
LayersOnPallet INT,
id INT IDENTITY(1, 1) NOT NULL,
Summarized BIT NOT NULL DEFAULT(0)
)
and the following insert statement which generates test data
INSERT INTO #TempTable(idProduct, Layers, LayersOnPallet)
SELECT 1, 2, 4
UNION ALL
SELECT 1, 2, 4
UNION ALL
SELECT 1, 1, 4
UNION ALL
SELECT 2, 2, 4
I would like to summarize only those rows (by the Layers only) with the same idProduct and which will have the sum of layers equal to LayersOnPallet.
A picture is worth a thousand words:
From the picture above, you can see that only the first to rows were summarized because both have the same idProduct and the sum(layers) will be equal to LayersOnPallet.
How can I achieve this? It's there any way to do this only in selects (not with while)?
Thank you!
Perhaps this will do the trick. Note my comments:
-- your sample data
DECLARE #TempTable TABLE
(
idProduct INT,
Layers INT,
LayersOnPallet INT,
id INT IDENTITY(1, 1) NOT NULL,
Summarized BIT NOT NULL DEFAULT(0)
)
INSERT INTO #TempTable(idProduct, Layers, LayersOnPallet)
SELECT 1, 2, 4 UNION ALL
SELECT 1, 2, 4 UNION ALL
SELECT 1, 1, 4 UNION ALL
SELECT 2, 2, 4;
-- an intermediate temp table used for processing
IF OBJECT_ID('tempdb..#processing') IS NOT NULL DROP TABLE #processing;
-- let's populate the #processing table with duplicates
SELECT
idProduct,
Layers,
LayersOnPallet,
rCount = COUNT(*)
INTO #processing
FROM #tempTable
GROUP BY
idProduct,
Layers,
LayersOnPallet
HAVING COUNT(*) > 1;
-- Remove the duplicates
DELETE t
FROM #TempTable t
JOIN #processing p
ON p.idProduct = t.idProduct
AND p.Layers = t.Layers
AND p.LayersOnPallet = t.LayersOnPallet
-- Add the new, updated record
INSERT #TempTable
SELECT
idProduct,
Layers * rCount,
LayersOnPallet, 1
FROM #processing;
DROP TABLE #processing; -- cleanup
-- Final output
SELECT idProduct, Layers, LayersOnPallet, Summarized
FROM #TempTable;
Results:
idProduct Layers LayersOnPallet Summarized
----------- ----------- -------------- ----------
1 4 4 1
1 1 4 0
2 2 4 0

Select all but sort by count in postgresql

I have a table myTable with a lot of columns, keep in mind this table is too big, and one of that columns is a geometry point, we'll call it mySortColumn. I need to sort my select by count mySortColumn when there are the same.
One example could be this
myTable
id, mySortColumn
----------------
1, ASD12321F
2, ASD12321G
3, ASD12321F
4, ASD12321G
5, ASD12321H
6, ASD12321F
I have a query which can do what I want, the problem is the time. Actually it take like 30 seconds, and it seems like this:
SELECT
id,
mySortColumn
FROM
myTable
JOIN (
SELECT
mySortColumn,
ST_Y(mySortColumn) AS lat,
ST_X(mySortColumn) AS lng,
COUNT(*)
FROM myTable
GROUP BY mySortColumn
HAVING COUNT(*) > 1
) AS myPosition ON (
ST_X(myTable.mySortColumn) = myPosition.lng
AND ST_Y(myTable.mySortColumn) = myPosition.lat
)
WHERE
<some filters>
ORDER BY COUNT DESC
The result must be this:
id, mySortColumn
----------------
1, ASD12321F
3, ASD12321F
6, ASD12321F
2, ASD12321G
4, ASD12321G
5, ASD12321H
I hope you can help me.
Here you are:
select * from myTable order by count(1) over (partition by mySortColumn) desc;
For more info about aggregate over () construction have a look at:
http://www.postgresql.org/docs/9.4/static/tutorial-window.html