postgres - single record with one to many join tables - postgresql

I have two tables a and b. Where main table is 'a' where I want to select from and table 'b' is for filtering.
Below is sample table with some data.
create table a (
id varchar primary key,
name varchar,
p varchar[]
);
insert into a (id, name, p) values
('1', 'v1', array['p1']),
('2', 'v2', array['p1','p2']),
('3', 'v3', array['p2','p3']);
create table b (
p varchar,
x varchar
);
insert into b(p, x) values
('p1', 'x1'),
('p2', 'x2'),
('p3', 'x1'),
('p3', 'x3'),
('p1', 'x2');
I want only one row from table a based on join on column p and filter on x. I tried few options, it works when I have one to one record in a and b but when I have one to many then I get multiple records.
select a.* from a,b where b.p=any(a.p) and b.x='x2';
Output I get is:
id name p
-----------------
1 v1 p1
2 v2 p1,p2
2 v2 p1,p2
3 v3 p2,p3
What I want is
id name p
-----------------
1 v1 p1
2 v2 p1,p2
3 v3 p2,p3
Also I am expecting table 'a' to have millions of rows and 'b' will have only few, so query has to be perform effectively.

As you only want columns from table a use an exists condition rather than a join:
select a.*
from a
where exists (select *
from b
where b.p = any(a.p)
and b.x='x2');
That will however be quite hard to optimize.
Another option - if not too many rows match the join criteria - is to apply a distinct on the result of the join.
In that case a GIN index on the array column can be used:
create index on a using gin (p);
select distinct a.*
from a
join b on a.p #> array[b.p]
where b.x= 'x2';

Related

Express Nearest Neighbor Join in Postgresql?

I have two tables Q and T, both containing a column of float numbers.
What I want to do is, for each number in Q, I want to find a number in T that has the smallest distance to it.
For example, for T={1,7,9} and Q={2,6,10}, I want to return Q,T pairs as {(2,1),(6,7),(10,9)}.
How should I express this query with SQL?
In addition, is that possible to accelerate this join by index, e.g. add an operator class which bind "FOR ORDER BY <->" with fabs calculation?
create table t (val_t integer);
create table q (val_q integer);
insert into t values (1),(7),(9);
insert into q values (2),(6),(10);
Start with a query that cross joins the two tables and adds a rank based on the difference:
SELECT val_q, val_t, rank() OVER (PARTITION BY val_q ORDER BY abs(val_t - val_q))
FROM t
JOIN q ON true ;
Use this query in a cte or subquery and filter by rank:
WITH src AS(
SELECT val_q, val_t, rank() OVER (PARTITION BY val_q ORDER BY abs(val_t - val_q))
FROM t
JOIN q ON true )
SELECT val_q, val_t FROM src
WHERE rank = 1;
val_q | val_t
-------+-------
2 | 1
6 | 7
10 | 9
See https://www.postgresql.org/docs/12/tutorial-window.html
Given this schema:
create table t (tn float);
insert into t values (1), (7), (9);
create table q (qn float);
insert into q values (2), (6), (10);
DISTINCT ON is the most straightforward way:
select distinct on (qn) qn, tn
from q
cross join t
order by qn, abs(qn - tn);
Exploiting a numeric range may perform better depending on your data sizes. If performance is an issue, then you can create an actual temp table for the range_tn CTE and put a gist index on it:
with all_tn as (
select tn
from t
union select null
), range_tn as (
select numrange(tn::numeric, (lead(tn) over w)::numeric, '[]') as tr
from all_tn
window w as (order by tn nulls first)
)
select qn,
case
when lower_inf(tr) then upper(tr)
when upper_inf(tr) then lower(tr)
when 2 * qn - lower(tr) - upper(tr) > 0 then upper(tr)
else lower(tr)
end as tn
from q
join range_tn
on qn::numeric <# tr;
Fiddle here

dynamically choose fields from different table based on existense

I have two tables A and B.
Both the tables have same number of columns.
Table A always contains all ids of Table B.
Need to fetch row from Table B first if it does not exist then have
to fetch from Table A.
I was trying to dynamically do this
select
CASE
WHEN b.id is null THEN
a.*
ELSE
b.*
END
from A a
left join B b on b.id = a.id
I think this syntax is not correct.
Can some one suggest how to proceed.
It looks like you want to select all columns from table A except when a matching ID exists in table B. In that case you want to select all columns from table B.
That can be done with this query as long as the number and types of columns in both tables are compatible:
select * from a where not exists (select 1 from b where b.id = a.id)
union all
select * from b
If the number, types, or order of columns differs you will need to explicitly specify the columns to return in each sub query.

Postgresql: insert the same data a few times

I have table a, in this table after a SQL request, I have the same records a few times.
Here is my request.
for server_id in (select bs.id from status.servers bs
join settings.config blc on bs.id = blc.server_id
where blc.lane_number = (dataitem->>'No')::SMALLINT AND blc.min_length <= (dataitem->>'len')::real
)
LOOP
insert into a(measurement_id, server_id, status)
VALUES (
measurement_id,server_id,false
);
END LOOP;
And as result i have in table a, records like:
id meas_id serv_id status
1 12 1 f
2 12 1 f
3 12 1 f
i've changed code a little, in working code there are not syntax mistakes
answering
"why i have the same records with dif id?"
table a probably have a default value for column id, so values are taken from sequence. most probably you created it with serial data type... Those results are expected then. If you want to define your value, you should not skip column in scalar list, so
insert into a(measurement_id, server_id, status)
must become
insert into a(id, measurement_id, server_id, status)
and the value passed accordingly...
If you expected one result (assuming it from same value of server_id), you need to add distinct to the
for server_id in (select distinct bs.id from status.servers bs
because currently your select returns three rows with same bs.id as result of a join with three matching rows on join key...

Map column value to table name and join

I have a composite type that looks like
CREATE TYPE member AS (
id BIGINT,
type CHAR(1)
);
I have a table that relies on this member type with an array.
CREATE TABLE relation (
id BIGINT PRIMARY KEY,
members member[]
);
I have three other tables each with a different schema (but having common id field)
CREATE TABLE table_x (
id BIGINT PRIMARY KEY,
some_text TEXT
);
CREATE TABLE table_y (
id BIGINT PRIMARY KEY,
some_int INT
);
CREATE TABLE table_z (
id BIGINT PRIMARY KEY,
some_date TIMESTAMP
);
type field in member type is just one character to find out table that specific member belongs to. A row in relation table can have a mix of different types.
I have a scenario which requires returning relation ids with at least one member fulfilling a certain condition based on it's type (let's say for x => some_text is not empty or y => some_int is greater than 10 or z => some_date is a week is from now).
I can implement this scenario on the application side by making multiple requests to the database:
unnest relation table
collect member data per relation
make new requests to find out relations
I am wondering if there is a way to map column values to table names and join them.
Assumption
I´m assuming that relation.members array does not have more than one member element of the same type. Correct?
Query to try
with unnested_members as (
-- Unnest members array
select id, unnest(members) members
from relation
)
, members_joined as (
-- left join on a per type basis with table_x, table_y and table_z.
select r.id, (r.members).id idext, (r.members).type,
x.some_text, y.some_int, z.some_date -- more types, more columns here
from unnested_members r
left join table_x x on (x.id = (r.members).id and (r.members).type = 'x')
left join table_y y on (y.id = (r.members).id and (r.members).type = 'y')
left join table_z z on (z.id = (r.members).id and (r.members).type = 'z')
-- More types, more tables to left join
)
select id,
max(some_text) some_text, -- use max() to get not null value for this id
max(some_int) some_int, -- use max() to get not null value for this id
max(some_date) some_date -- use max() to get not null value for this id
-- more types, more max() columns here
from members_joined
group by id -- get one row per relation.id with data from joined table_* columns
If you need to include more tables then you have to include these tables in the left join part, include the column in the select list and in the max() section as well.
#JNevill had a good point about this database design. Although this approach may not seem optimal, it keeps the table definitions clearly separate without any relations in between them. Also the size of relation table is fairly small compared to other three tables.
I solved the problem by simply fetching rows per type and merging them:
SELECT relation.* FROM relation, UNNEST(relation.members) member INNER JOIN table_x ON member.id = table_x.id WHERE member.type = 'x' AND table_x.some_text = 'some text value'
UNION
SELECT relation.* FROM relation, UNNEST(relation.members) member INNER JOIN table_y ON member.id = table_y.id WHERE member.type = 'y' AND table_y.some_int = 123
UNION
SELECT relation.* FROM relation, UNNEST(relation.members) member INNER JOIN table_z ON member.id = table_z.id WHERE member.type = 'z' AND table_z.some_date > '2017-01-11 00:00:00';

Full outer join on multiple tables in PostgreSQL

In PostgreSQL, I have N tables, each consisting of two columns: id and value. Within each table, id is a unique identifier and value is numeric.
I would like to join all the tables using id and, for each id, create a sum of values of all the tables where the id is present (meaning the id may be present only in subset of tables).
I was trying the following query:
SELECT COALESCE(a.id, b.id, c.id) AS id,
COALESCE(a.value,0) + COALESCE(b.value,0) + COALESCE(c.value.0) AS value
FROM
a
FULL OUTER JOIN
b
ON (a.id=b.id)
FULL OUTER JOIN
c
ON (b.id=c.id)
But it doesn't work for cases when the id is present in a and c, but not in b.
I suppose I would have to do some bracketing like:
SELECT COALESCE(x.id, c.id) AS id, x.value+c.value AS value
FROM
(SELECT COALESCE(a.id, b.id), a.value+b.value AS value
FROM
a
FULL OUTER JOIN
b
ON (a.id=b.id)
) AS x
FULL OUTER JOIN
c
ON (x.id = c.id)
It was only 3 tables and the code is ugly enough already imho. Is there some elegant, systematic ways how to do the join for N tables? Not to get lost in my code?
I would also like to point out that I did some simplifications in my example. Tables a, b, c, ..., are actually results of quite complex queries over several materialized views. But the syntactical problem remains the same.
I understood you need to sum the values from N tables and group them by id, correct?
For that I would do this:
Select x.id, sum (x.value) from (
Select * from a
Union all
Select * from b
Union all........
) as x group by x.id;
Since the n tables are composed by the same fields you can union them all creating a big table full of all the id - value tuples from all tables. Use union all because union filters for duplicates!
Then just sum all the values grouped by id.