groupingBy a join in KTORM - ktorm

I want to join a N x M relation, returning all N, and a count of how many M are there:
SELECT N.*, COUNT(M.id) FROM N LEFT JOIN M ON N.id = M.n_id GROUP BY N.id
but I don't see how. Joining and aggregation seem to exclude each other, or are not implemented, or not documented (or I didn't find it).

Related

Postgres - Insert nearest neighbour distance into another table

So I have three tables (A, B, C). In tables A and B I have points, and I want to insert into C each row from A, and some columns from the closest point from B to each point in A, as well as the distance between them. I know that the query to get the nearest neighbour is this:
SELECT DISTINCT ON (A.id5) A.state, B.way, st_distance (A.geom,B.geom) INTO C
FROM A, B
WHERE ST_DWithin(A.geom, B.geom, 150)
ORDER BY A.objectid, ST_Distance(A.geom,A.geom)
But I need to get that into a bigger INSERT query, and I tried to do it this way:
INSERT INTO complete(id_door, distance, id_way,Y, X, geom, check)
(SELECT A.state, (select distinct on (A.id5) ST_DISTANCE(A.geom,B.geom) from A order by A.id5, st_distance(A.geom,B.geom)), b.way, ST_Y(B.geom), ST_X(B.geom) ,B.geom, V.check
FROM A, B, C, V
WHERE
ST_INTERSECTS(A.geom, V.geom)\
AND ST_DWithin(A.geom, B.geom,150))
But this is not the right way, because I get the error:
psycopg2.ProgrammingError: more than one row returned by a subquery used as an expression
I cannot copy all the distances from A and B to C and then delete all but the closest because it is a huge table and I would run out of memory, so I need a way to only insert the rows with the info from the closest point from B to A.
What am I doing wrong here? Thank you in advance
UPDATE:
After some help, I have learned that I should use a Lateral in the Select query, but I'm not sure how to use it.
I need the Select to get each row in table A and find its nearest neighbour from table B, which I guess it is done using the query previously stated, and insert into table C some columns from A, some columns from its nearest neighbour (table B), and some columns from table V, which is selected by an Intersect condition. The main problem is how to organize all that into the Select so I don't get an error.
This is where I am at this point:
INSERT INTO C (id_door, distance, id_way,Y, X, geom, check)
(SELECT A.state, l.*, V.check
FROM A, B, C, V
lateral (select st_distance(a.geom,b.geom), b.way, ST_Y(B.geom), ST_X(B.geom) ,B.geom
From B
Where ST_DWithin(a.geom, b.geom,150))
Order by a.geom<->b.geom limit 1) l
WHERE
ST_INTERSECTS(A.geom, V.geom)
You can use lateral join - very smart type of subquery that can reference tables outside the subquery. More about lateral you can find here
-- Edited according to new information in answer --
Insert into C (id_door, distance, id_way,Y, X, geom, check)
select l.*
from a,
lateral (select a.state, st_distance(a.geom,b.geom),
b.way, ST_Y(B.geom), ST_X(B.geom), B.geom,
v.check
from b, v
where ST_DWithin(a.geom, b.geom,150)
and st_dwithin(a.geom,v.geom,0)
and st_intersects(a.geom,v.geom)
order by a.geom<->b.geom, v.geom limit 1) l
If you want more records per each point from A then increase the limit from 1 to your desired value.

postgresql inner join duplicating some records

I have a large query developed as cte, in certain parts I have to make totals of secondary tables using inner joins to minimize the number of records processed, somehow two subqueries almost identical one works and the second duplicates 8 times some of the totalized records
I need to use inner join or the response time is shoots to the sky by 15x or more times
with
p0 as (select distinct on (pventa) pventa, p.tipo tpva from lecturas l
left join puntoventa p on l.pventa=p.numero where dia between '2017-10-01' and '2017-10-31' and p.tipo in ('A','E')),
r1 as (select p.tpva, l.pventa, dia, turno from lecturas l
inner join p0 p on p.pventa=l.pventa
where dia between '2017-10-01' and '2017-10-31'),
p1 as (select pva, remision, sum(abono), count(abono) from pagosremisiones p
inner join movsgas m on p.pva=m.pventa and p.remision=m.folio
inner join r1 r on r.pventa=m.pventa and r.dia=m.dia and r.turno=m.turno group by 1,2 order by 1,2 ),
f1 as (select c.serie, c.factura, sum(abono), count(abono) from chequefactura c
inner join movsgas m on c.serie=m.serie and c.factura=m.factura
inner join r1 r on r.pventa=m.pventa and r.dia=m.dia and r.turno=m.turno group by 1,2 order by 1,2 )
select * from p1
Nprem and ncheck are for debugging
P1 and f1 depend on r1, p1 works (as far as I've tried) without duplicate records (nprem corresponds to existing registers), however, ncheck increases on some records up to 8 times its actual values
I'm not sure if the correct p1's results are purely casual and don't know how to correct duplicates in f1
I do have the alternative of doing direct subqueries but I have a didactic interest in using joins
Btw, so far direct subqueries are much more efficient than the joins possibly because they have been poorly structured
What am I doing wrong?
What would you do to optimize the code?
Thanks in advance
Jose
the trick needed is the new subquery r2 including [ distinct on (serie, factura) ], if I omit it the error persists; duplicates in r2 do not correspond to the number of duplicates in f1, so I had no idea where so many came from; thank you all and again an apology for the terrible description of my problem
with
p0 as (select distinct on (pventa) pventa, p.tipo tpva from lecturas l
left join puntoventa p on l.pventa=p.numero where dia between '2017-10-01' and '2017-10-31' and p.tipo in ('A','E')),
r1 as (select p.tpva, l.pventa, dia, turno from lecturas l
inner join p0 p on p.pventa=l.pventa
where dia between '2017-10-01' and '2017-10-31'),
r2 as (select distinct on (serie, factura) m.serie,m.factura from movsgas m
inner join chequefactura c on c.serie=m.serie and c.factura=m.factura
inner join r1 r on r.pventa=m.pventa and r.dia=m.dia and r.turno=m.turno),
p1 as (select pva, remision, sum(abono) payp from pagosremisiones p
inner join movsgas m on p.pva=m.pventa and p.remision=m.folio
inner join r1 r on r.pventa=m.pventa and r.dia=m.dia and r.turno=m.turno group by 1,2 order by 1,2 ),
f1 as (select c.serie, c.factura, sum(abono) payfr2, count(*) from chequefactura c
inner join r2 r on r.serie=c.serie and r.factura=c.factura group by 1,2 order by 1,2 )

Full outer join on multiple tables in PostgreSQL

In PostgreSQL, I have N tables, each consisting of two columns: id and value. Within each table, id is a unique identifier and value is numeric.
I would like to join all the tables using id and, for each id, create a sum of values of all the tables where the id is present (meaning the id may be present only in subset of tables).
I was trying the following query:
SELECT COALESCE(a.id, b.id, c.id) AS id,
COALESCE(a.value,0) + COALESCE(b.value,0) + COALESCE(c.value.0) AS value
FROM
a
FULL OUTER JOIN
b
ON (a.id=b.id)
FULL OUTER JOIN
c
ON (b.id=c.id)
But it doesn't work for cases when the id is present in a and c, but not in b.
I suppose I would have to do some bracketing like:
SELECT COALESCE(x.id, c.id) AS id, x.value+c.value AS value
FROM
(SELECT COALESCE(a.id, b.id), a.value+b.value AS value
FROM
a
FULL OUTER JOIN
b
ON (a.id=b.id)
) AS x
FULL OUTER JOIN
c
ON (x.id = c.id)
It was only 3 tables and the code is ugly enough already imho. Is there some elegant, systematic ways how to do the join for N tables? Not to get lost in my code?
I would also like to point out that I did some simplifications in my example. Tables a, b, c, ..., are actually results of quite complex queries over several materialized views. But the syntactical problem remains the same.
I understood you need to sum the values from N tables and group them by id, correct?
For that I would do this:
Select x.id, sum (x.value) from (
Select * from a
Union all
Select * from b
Union all........
) as x group by x.id;
Since the n tables are composed by the same fields you can union them all creating a big table full of all the id - value tuples from all tables. Use union all because union filters for duplicates!
Then just sum all the values grouped by id.

JPQL, OR only returning result of one condition?

I'm trying to write a JPQL query which should get a list which match atleast one of two conditions. When I construct the queries sepperatly they work as expected, but putting them together in an 'OR' returns a list which only match one of the conditions. I don't understand why this is.
This is the full query:
SELECT a FROM Article a WHERE ((a.ag.proteinPID.uniprot.AC LIKE :genProt)
OR (a.aid IN(SELECT a2.aid FROM Protein p JOIN p.articleList a2 WHERE p.uniprot.AC LIKE :genProt)))
And the sepperate ones:
1)
SELECT a FROM Article a WHERE a.aid IN(SELECT a2.aid FROM Protein p JOIN p.articleList a2 WHERE p.uniprot.AC LIKE :genProt)
2)
SELECT a FROM Article a WHERE a.ag.proteinPID.uniprot.AC LIKE :genProt
The full expression returns the same result as expression 2).
Try left joining the entities within the full query for the first condition:
SELECT a FROM Article a LEFT JOIN a.ag g LEFT JOIN g.proteinPID p LEFT JOIN p.uniport u WHERE ((u.AC LIKE :genProt)
OR (a.aid IN(SELECT a2.aid FROM Protein p JOIN p.articleList a2 WHERE p.uniprot.AC LIKE :genProt)))
Why this works: if do not explicitly left join, I suppose it makes an INNER JOIN which automatically will limit the results.

Join postgres table on two columns?

I can't find a straightforward answer. My query is spitting out the wrong result, and I think it's because it's not seeing the "AND" as an actual join.
Can you do something like this and if not, what is the correct approach:
SELECT * from X
LEFT JOIN Y
ON
y.date = x.date AND y.code = x.code
?
This is possible:
The ON clause is the most general kind of join condition: it takes a Boolean value expression of the same kind as is used in a WHERE clause. A pair of rows from T1 and T2 match if the ON expression evaluates to true for them.
http://www.postgresql.org/docs/9.1/static/queries-table-expressions.html#QUERIES-FROM
Your SQL looks OK.
It's fine. In fact, you can put any condition in the ON clause, even one
not related to the key columns or even the the tables at all, eg:
SELECT * from X
LEFT JOIN Y
ON y.date = x.date
AND y.code = x.code
AND EXTRACT (dow from current_date) = 1
Another, arguably more readable way of writing the join is to use tuples of columns:
SELECT * from X
LEFT JOIN Y
ON
(y.date, y.code) = (x.date, x.code)
;
, which clearly indicates that the join is based on the equality on several columns.
This solution has good performance:
select * from(
select md5(concat(date, code)) md5_x from x ) as x1
left join (select md5(concat(date, code)) md5_y from y) as y1
on x1.md5_x = y1.md5_y