Select values that exists in array, but do not exits in database?

Select values that exists in array, but do not exits in database? - postgresql

I have got DB with IDs: 1 2 3 4 5. I need to return elements that exists in my array (simple list of data that usually specifying in IN ( ... ) ), but DO NOT exits in DB.
For example checking values: 1, 2, 3, 4, 5, 6, 7.
So query should return 6, 7. How can I do it's with PostgreSQL?

This can be solved using except
select *
from unnest(array[1,2,3,4,5,6]) as t(id)
except
select id
from the_table
With some test data:
select *
from unnest(array[1,2,3,4,5,6]) as t(id)
except
select id
from (values (1), (2), (3), (4) ) as the_table(id)
returns
id
--
5
6

If you want a query that excludes all elements in a list you can use the NOT IN statement.
SELECT * FROM someTable WHERE id NOT IN (1, 2, 3, 4, 5);
In your case you can create the query from your array.

with t (id) as (values (1),(2),(3),(4),(5))
select u.id
from
t
right join
unnest(array[1,2,3,4,5,6,7]) u (id) on t.id = u.id
where t.id is null
;
id
----
6
7

Related

Convert jsonb in PostgreSQL to rows without cycle

ffI have a json array stored in my postgres database. The first table "Orders" looks like this:
order_id, basket_items_id
1, {1,2}
2, {3}
3, {1,2,3,1}
Second table "Items" looks like this:
item_id, price
1,5
2,3
3,20
Already tried to load data with multiple sql and select of different jsonb record, but this is not a silver bullet.
SELECT
sum(price)
FROM orders
INNER JOIN items on
orders.basket_items_id = items.item_id
WHERE order_id = 3;
Want to get this as output:
order_id, basket_items_id, price
1, 1, 5
1, 2, 3
2, 3, 20
3, 1, 5
3, 2, 3
3, 3, 20
3, 1, 5
or this:
order_id, sum(price)
1, 8
2, 20
3, 33

demo:db<>fiddle
SELECT
o.order_id,
elems.value::int as basket_items_id,
i.price
FROM
orders o, jsonb_array_elements_text(basket_items_id) as elems
LEFT JOIN items i
ON i.item_id = elems.value::int
ORDER BY 1,2,3
jsonb_array_elements_text expands the jsonb array into one row each element. With this you are able to join against your second table directly
Since the expanded array gives you text elements you have to cast them into integers using ::int
Of course you can GROUP and SUM aggregate this as well:
SELECT
o.order_id,
SUM(i.price)
FROM
orders o, jsonb_array_elements_text(basket_items_id) as elems
LEFT JOIN items i
ON i.item_id = elems.value::int
GROUP BY o.order_id
ORDER BY 1

Is your orders.basket_items_id column of type jsonb or int[]?
If the type is jsonb you can use json_array_elements_text to expand the column:
SELECT
o.order_id,
o.basket_item_id,
items.price
FROM
(
SELECT
order_id,
jsonb_array_elements_text(basket_items_id)::int basket_item_id
FROM
orders
) o
JOIN
items ON o.basket_item_id = items.item_id
ORDER BY
1, 2, 3;
See this DB-Fiddle.
If the type is int[] (array of integers), you can run a similar query with the unnest function:
SELECT
o.order_id,
o.basket_item_id,
items.price
FROM
(
SELECT
order_id,
unnest(basket_items_id) basket_item_id
FROM
orders
) o
JOIN
items ON o.basket_item_id = items.item_id
ORDER BY
1, 2, 3;
See this DB-fiddle

Divide table raw into chunks in Postgres with st_dwithin limit

I got a table with linestrings that I want to divide into chunks that have a list of id not higher than provided number for each and store only lines that are within certain distance.
For example, I got a table with 14 rows
create table lines ( id integer primary key, geom geometry(linestring) );
insert into lines (id, geom) values ( 1, 'LINESTRING(0 0, 0 1)');
insert into lines (id, geom) values ( 2, 'LINESTRING(0 1, 1 1)');
insert into lines (id, geom) values ( 3, 'LINESTRING(1 1, 1 2)');
insert into lines (id, geom) values ( 4, 'LINESTRING(1 2, 2 2)');
insert into lines (id, geom) values ( 11, 'LINESTRING(2 2, 2 3)');
insert into lines (id, geom) values ( 12, 'LINESTRING(2 3, 3 3)');
insert into lines (id, geom) values ( 13, 'LINESTRING(3 3, 3 4)');
insert into lines (id, geom) values ( 14, 'LINESTRING(3 4, 4 4)');
create index lines_gix on lines using gist(geom);
I want to split it into chunks with 3 ids for each chunk with lines that are within 2 meters from each other or the first one.
The result I am trying to get from this example is:
| Chunk No.| Id chunk list |
|----------|----------------|
| 1 | 1, 2, 3 |
| 2 | 4, 5, 6 |
| 3 | 7, 8, 9 |
| 4 | 10, 11, 12 |
| 5 | 13, 14 |
I tried to use st_clusterwithin but when lines are close to each other it will return all of them not split into chunks.
I also tried to use some with recursive magic like the one from the answer provided by Paul Ramsey here. But I don't know how to modify the query to return limited grouped id list.

I am not sure if it is the best possible answer so if anyone has a better method or know how to improve provided answer feel free to update it. With a little modification of Paul answer, I've managed to create following queries that are doing what I asked for.
-- Create function for easier interaction
CREATE OR REPLACE FUNCTION find_connected(integer, double precision, integer, integer[])
returns integer[] AS
$$
WITH RECURSIVE lines_r AS -- Recursive allow to use the same query on the output - is like continues append to result and use it inside a query
(SELECT ARRAY[id] AS idlist,
geom, id
FROM lines
WHERE id = $1
UNION ALL
SELECT array_append(lines_r.idlist, lines.id) AS idlist, -- append id list to array
lines.geom AS geom, -- keep geometry
lines.id AS id -- keep source table id
FROM (SELECT * FROM lines WHERE NOT $4 #> array[id]) lines, lines_r -- from source table and recursive table
WHERE ST_DWITHIN(lines.geom, lines_r.geom, $2) -- where lines are within 2 meters
AND NOT lines_r.idlist #> ARRAY[lines.id] -- recursive id list array not contain lines array
AND array_length(idlist, 1) <= $3
)
SELECT idlist
FROM lines_r WHERE array_length(idlist, 1) <= $3 ORDER BY array_length(idlist, 1) DESC LIMIT 1;
$$
LANGUAGE 'sql';
-- Create id chunks
WITH RECURSIVE groups_r AS (
(SELECT find_connected(id, 2, 3, ARRAY[id]) AS idlist, find_connected(id, 2, 3, ARRAY[id]) AS grouplist, id
FROM lines WHERE id = 1)
UNION ALL
(SELECT array_cat(groups_r.idlist, find_connected(lines.id, 2, 3, groups_r.idlist)) AS idlist,
find_connected(lines.id, 2, 3, groups_r.idlist) AS grouplist,
lines.id
FROM lines,
groups_r
WHERE NOT groups_r.idlist #> ARRAY[lines.id]
LIMIT 1))
SELECT
-- (SELECT array_agg(DISTINCT x) FROM unnest(idlist) t (x)) idlist, -- left for better understanding what is happening
row_number() OVER () chunk_id,
(SELECT array_agg(DISTINCT x) FROM unnest(grouplist) t (x)) grouplist,
id input_line_id
FROM groups_r;
The only problem is that performance is quite pure when the number of ids in the chunk increase. For a table with 300 rows and 20 ids per chunk, execution time is around 15 min, even with indexes on geometry and id columns.

Limit query by count distinct column values

I have a table with people, something like this:
ID PersonId SomeAttribute
1 1 yellow
2 1 red
3 2 yellow
4 3 green
5 3 black
6 3 purple
7 4 white
Previously I was returning all of Persons to API as seperate objects. So if user set limit to 3, I was just setting query maxResults in hibernate to 3 and returning:
{"PersonID": 1, "attr":"yellow"}
{"PersonID": 1, "attr":"red"}
{"PersonID": 2, "attr":"yellow"}
and if someone specify limit to 3 and page 2(setMaxResult(3), setFirstResult(6) it would be:
{"PersonID": 3, "attr":"green"}
{"PersonID": 3, "attr":"black"}
{"PersonID": 3, "attr":"purple"}
But now I want to select people and combine then into one json object to look like this:
{
"PersonID":3,
"attrs": [
{"attr":"green"},
{"attr":"black"},
{"attr":"purple"}
]
}
And here is the problem. Is there any possibility in postgresql or hibernate to set limit not by number of rows but to number of distinct people ids, because if user specifies limit to 4 I should return person1, 2, 3 and 4, but in my current limiting mechanism I will return person1 with 2 attributes, person2 and person3 with only one attribute. Same problem with pagination, now I can return half of a person3 array attrs on one page and another half on next page.

You can use row_number to simulate LIMIT:
-- Test data
CREATE TABLE person AS
WITH tmp ("ID", "PersonId", "SomeAttribute") AS (
VALUES
(1, 1, 'yellow'::TEXT),
(2, 1, 'red'),
(3, 2, 'yellow'),
(4, 3, 'green'),
(5, 3, 'black'),
(6, 3, 'purple'),
(7, 4, 'white')
)
SELECT * FROM tmp;
-- Returning as a normal column (limit by someAttribute size)
SELECT * FROM (
select
"PersonId",
"SomeAttribute",
row_number() OVER(PARTITION BY "PersonId" ORDER BY "PersonId") AS rownum
from
person) as tmp
WHERE rownum <= 3;
-- Returning as a normal column (overall limit)
SELECT * FROM (
select
"PersonId",
"SomeAttribute",
row_number() OVER(ORDER BY "PersonId") AS rownum
from
person) as tmp
WHERE rownum <= 4;
-- Returning as a JSON column (limit by someAttribute size)
SELECT "PersonId", json_object_agg('color', "SomeAttribute") AS attributes FROM (
select
"PersonId",
"SomeAttribute",
row_number() OVER(PARTITION BY "PersonId" ORDER BY "PersonId") AS rownum
from
person) as tmp
WHERE rownum <= 3 GROUP BY "PersonId";
-- Returning as a JSON column (limit by person)
SELECT "PersonId", json_object_agg('color', "SomeAttribute") AS attributes FROM (
select
"PersonId",
"SomeAttribute"
from
person) as tmp
GROUP BY "PersonId"
LIMIT 4;
In this case, of course, you must use a native query, but this is a small trade-off IMHO.
More info here and here.

I'm assuming you have another Person table. With JPA, you should do the query on Person table(one side), not on the PersonColor(many side).Then the limit will be applied on number of rows of Person then
If you don't have the Person table and can't modify the DB, what you can do is use SQL and Group By PersonId, and concatenate colors
select PersonId, array_agg(Color) FROM my_table group by PersonId limit 2
SQL Fiddle

Thank you guys. After I realize that it could not be done with one query I just do sth like
temp_query = select distinct x.person_id from (my_original_query) x
with user specific page/per_page
and then:
my_original_query += " AND person_id in (temp_query_results)

Select all but sort by count in postgresql

I have a table myTable with a lot of columns, keep in mind this table is too big, and one of that columns is a geometry point, we'll call it mySortColumn. I need to sort my select by count mySortColumn when there are the same.
One example could be this
myTable
id, mySortColumn
----------------
1, ASD12321F
2, ASD12321G
3, ASD12321F
4, ASD12321G
5, ASD12321H
6, ASD12321F
I have a query which can do what I want, the problem is the time. Actually it take like 30 seconds, and it seems like this:
SELECT
id,
mySortColumn
FROM
myTable
JOIN (
SELECT
mySortColumn,
ST_Y(mySortColumn) AS lat,
ST_X(mySortColumn) AS lng,
COUNT(*)
FROM myTable
GROUP BY mySortColumn
HAVING COUNT(*) > 1
) AS myPosition ON (
ST_X(myTable.mySortColumn) = myPosition.lng
AND ST_Y(myTable.mySortColumn) = myPosition.lat
)
WHERE
<some filters>
ORDER BY COUNT DESC
The result must be this:
id, mySortColumn
----------------
1, ASD12321F
3, ASD12321F
6, ASD12321F
2, ASD12321G
4, ASD12321G
5, ASD12321H
I hope you can help me.

Here you are:
select * from myTable order by count(1) over (partition by mySortColumn) desc;
For more info about aggregate over () construction have a look at:
http://www.postgresql.org/docs/9.4/static/tutorial-window.html

T-SQL grouping question

Every once and a while I have a scenario like this, and can never come up with the most efficient query to pull in the information:
Let's say we have a table with three columns (A int, B int, C int). My query needs to answer a question like this: "Tell me what the value of column C is for the largest value of column B where A = 5." A real world scenario for something like this would be 'A' is your users, 'B' is the date something happened, and 'C' is the value, where you want the most recent entry for a specific user.
I always end up with a query like this:
SELECT
C
FROM
MyTable
WHERE
A = 5
AND B = (SELECT MAX(B) FROM MyTable WHERE A = 5)
What am I missing to do this in a single query (opposed to nesting them)? Some sort of 'Having' clause?

BoSchatzberg's answer works when you only care about the 1 result where A=5. But I suspect this question is the result of a more general case. What if you want to list the top record for each distinct value of A?
SELECT t1.*
FROM MyTable t1
INNER JOIN
(
SELECT A, MAX(B)
FROM MyTable
GROUP BY A
) t2 ON t1.A = t2.A AND t1.B = t2.B

--
SELECT C
FROM MyTable
INNER JOIN (SELECT A, MAX(B) AS MAX_B FROM MyTable GROUP BY A) AS X
ON MyTable.A = X.A
AND MyTable.B = MAX_B
--
WHERE MyTable.A = 5
In this case the first section (between the comments) can also easily be moved into a view for modularity or reuse.

You can do this:
SELECT TOP 1 C
FROM MyTable
WHERE A = 5
ORDER BY b DESC

I think you are close (and what you have would work). You could use something like the following:
select C
, max(B)
from MyTable
where A = 5
group by C

After a little bit of testing, I don't think that this can be done without doing it the way you're already doing it (i.e. a subquery). Since you need the max of B and you can't get the value of C without also including that in a GROUP BY or HAVING clause, a subquery seems to be the best way.
create table #tempints (
a int,
b int,
c int
)
insert into #tempints values (1, 8, 10)
insert into #tempints values (1, 8, 10)
insert into #tempints values (2, 4, 10)
insert into #tempints values (5, 8, 10)
insert into #tempints values (5, 3, 10)
insert into #tempints values (5, 7, 10)
insert into #tempints values (5, 8, 15)
/* this errors out with "Column '#tempints.c' is invalid in the select list because it is not contained in either an
aggregate function or the GROUP BY clause." */
select t1.c, max(t1.b)
from #tempints t1
where t1.a=5
/* this errors with "An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING
clause or a select list, and the column being aggregated is an outer reference." */
select t1.c, max(t1.b)
from #tempints t1, #tempints t2
where t1.a=5 and t2.b=max(t1.b)
/* errors with "Column '#tempints.a' is invalid in the HAVING clause because it is not contained in either an aggregate
function or the GROUP BY clause." */
select c
from #tempints
group by b, c
having a=5 and b=max(b)
drop table #tempints

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Select values that exists in array, but do not exits in database? - postgresql

I have got DB with IDs: 1 2 3 4 5. I need to return elements that exists in my array (simple list of data that usually specifying in IN ( ... ) ), but DO NOT exits in DB. For example checking values: 1, 2, 3, 4, 5, 6, 7. So query should return 6, 7. How can I do it's with PostgreSQL?

This can be solved using except select * from unnest(array[1,2,3,4,5,6]) as t(id) except select id from the_table With some test data: select * from unnest(array[1,2,3,4,5,6]) as t(id) except select id from (values (1), (2), (3), (4) ) as the_table(id) returns id -- 5 6

If you want a query that excludes all elements in a list you can use the NOT IN statement. SELECT * FROM someTable WHERE id NOT IN (1, 2, 3, 4, 5); In your case you can create the query from your array.

with t (id) as (values (1),(2),(3),(4),(5)) select u.id from t right join unnest(array[1,2,3,4,5,6,7]) u (id) on t.id = u.id where t.id is null ; id ---- 6 7

Related

Convert jsonb in PostgreSQL to rows without cycle

Divide table raw into chunks in Postgres with st_dwithin limit

Limit query by count distinct column values

Select all but sort by count in postgresql

T-SQL grouping question

Categories

Resources