Postgres group by update - slow query - postgresql

I am using postgres 9.X.
I have two tables
Table A
(
id integer
);
Table B
(
id integer,
Value integer
);
Both table are indexed on id.
Table A can have duplicate ID's
Example:
Table A
ID
1
1
1
2
1
I intend to insert number of occurrences of ID into table B (This table has all the ID's that are in Table A, but value is 0 initially)
Table B
ID Value
1 4
2 1
3 0
4 0
Here is my SQL statement
update tableB set value = value + sq.total
from
( select id, count(*) as total from TableA group by id ) as sq
where sq.id = tableB.id;
With 3-10 Million entries in TableA, it is taking an awful amount of time. Is there a way I can optimize this query?

Do you need tableB to be initially populated? An INSERT...SELECT from tableA into an empty tableB (with no indexes on tableB) should be faster:
insert into tableb (id, value)
select id, count(*)
from tablea
group by id
and then add your indexes to tableB once the data is there.

Related

Improving performance of a GROUP BY ... HAVING COUNT(...) > 1 in PostgreSQL

I'm trying to select the orders that are part of a trip with multiple orders.
I tried many approaches but can't find how to have a performant query.
To reproduce the problem here is the setup (here it's 100 000 rows, but really it's more 1 000 000 rows to see the timeout on db-fiddle).
Schema (PostgreSQL v14)
create table trips (id bigint primary key);
create table orders (id bigint primary key, trip_id bigint);
create index trips_idx on trips (id);
create index orders_idx on orders (id);
create index orders_trip_idx on orders (trip_id);
insert into trips (id) select seq from generate_series(1,100000) seq;
insert into orders (id, trip_id) select seq, floor(random() * 100000 + 1) from generate_series(1,100000) seq;
Query #1
explain analyze select orders.id
from orders
inner join trips on trips.id = orders.trip_id
inner join orders trips_orders on trips_orders.trip_id = trips.id
group by orders.id, trips.id
having count(trips_orders) > 1
limit 50
;
View on DB Fiddle
Here is what pgmustard gives me on the real query:
Do you actually need the join on trips? You could try
SELECT shared.id
FROM orders shared
WHERE EXISTS (SELECT * FROM orders other
WHERE other.trip_id = shared.trip_id
AND other.id != shared.id
)
;
to replace the group by with a hash join, or
SELECT unnest(array_agg(orders.id))
FROM orders
GROUP BY trip_id
HAVING count(*) > 1
;
to hopefully get Postgres to just use the trip_id index.

In Sql Server 2008, can I INSERT multiple rows with some fixed column values and some from a SELECT statement that uses one of the fixed values?

I’m building an insert statement dynamically to add multiple rows of data to a table in a batch, as I believe it is more efficient than inserting one at a time. However, I need the last couple of columns in each inserted row to be set with the results of querying another table using a value from the new row. This is my imaginary pseudocode version:
INSERT INTO TableA (column1, column2, column3, column4, column5)
VALUES (SELECT {value1a}, {value1b}, {value1c}, b.column1, b.column2 FROM TableB b WHERE b.column3 = {value1c}),
(SELECT {value2a}, {value2b}, {value2c}, b.column1, b.column2 FROM TableB b WHERE b.column3 = {value2c}),
…
Now here is another wrinkle: I have a unique index on TableA with an ignore clause, and there is a lot of redundant data to process, so only about 15% of the rows in any given batch insert will actually be added to the database. Does this mean it would be more efficient to insert the rows with values for columns 1 – 3, then query for the rows that were inserted, and update column 4 and 5? If so, would the following be the most efficient way to do that for all the inserted rows?
UPDATE a SET a.column4 = b.column1, a.column5 = b.column2
FROM TableA a INNER JOIN TableB b ON b.column3 = a.column3
WHERE a.CreatedAt >= {BatchInsertTime}
(assuming no other processes are adding rows to the table)
For better efficiency and a simpler way to join TableB, send all the TableA rows in a JSON doc, eg
insert into TableA (column1, column2, column3, column4, column5) …
select d.*, b.column1 column4, b.column2 column5
from openjson(#json)
with
(
column1 varchar(20),
column2 int,
column3 varchar(20)
) as d
left join TableB b
on b.column3 = d.column2
where #json is an NVARCHAR(MAX) parameter that looks like
[
{"column1":"foo", "column2":3,"column3":"bar" },
{"column1":"foo2", "column2":4,"column3":"bar2" },
. . .
]

PostgreSQL - Append a table to another and add a field without listing all fields

I have two tables:
table_a with fields item_id,rank, and 50 other fields.
table_b with fields item_id, and the same 50 fields as table_a
I need to write a SELECT query that adds the rows of table_b to table_a but with rank set to a specific value, let's say 4.
Currently I have:
SELECT * FROM table_a
UNION
SELECT item_id, 4 rank, field_1, field_2, ...
How can I join the two tables together without writing out all of the fields and without using an INSERT query?
EDIT:
My idea is to join table_b to table_a somehow with the rank field remaining empty, then simply replace the null rank fields. The rank field is never null, but item_id can be duplicated and table_a may have item_id values that are not in table_b, and vice-versa.
I am not sure I understand why you need this, but you can use jsonb functions:
select (jsonb_populate_record(null::table_a, row)).*
from (
select to_jsonb(a) as row
from table_a a
union
select to_jsonb(b) || '{"rank": 4}'
from table_b b
) s
order by item_id;
Working example in rextester.
I'm pretty sure I've got it. The predefined rank column can be inserted into table_b by joining to the subset of itself with only the columns left of the column behind which you want to insert.
WITH
_leftcols AS ( SELECT item_id, 4 rank FROM table_b ),
_combined AS ( SELECT * FROM table_b JOIN _leftcols USING (item_id) )
SELECT * FROM _combined
UNION
SELECT * FROM table_a

A usually simple SQL join

I need advice on doing a JOIN with PostgreSQl. I want to take the sum (or number of times id 1 is entered) of a single id and place it into a new column in table b.
Table a
id username comment
1 Bob Hi
2 Sally Hello
1 Bob Bye
Table b
id something total_comments
1 null 2
Create a trigger for insert, update, delete on the Table a to select the sum and update in Table b
You could use SELECT INTO if table_b doesn't already exist.
SELECT
id
, NULL AS something
, COUNT(comment) AS total_comments
INTO table_B
FROM table_a
GROUP BY id
Or INSERT INTO if table_b does exist.
INSERT INTO table_b (id, something, total_comments)
SELECT
id
, NULL AS something
, COUNT(comment) AS total_comments
FROM table_a
GROUP BY id

Postgresql how to select values in the column from one table that are only available in another table?

I am using Postgresql and need to query two tables like this:
Table1
ID Bill
A 1
B 2
B 3
C 4
Table2
ID
A
B
I want a table with all the columns in Table1 but keeping only the records with IDs that are available in Table2 (A and B in this case). Also, Table2's ID is unique.
ID Bill
A 1
B 2
B 3
Which join I should use or if I can use WHERE statement?
Thanks!
SELECT Table1.*
FROM Table1
INNER JOIN Table2 USING (ID);
or
SELECT *
FROM Table1
WHERE ID IN (SELECT ID FROM Table2);
but the first one is better for performance reason.
SELECT *
FROM Table1
WHERE EXISTS (
SELECT 1 FROM Table2 WHERE Table2.ID = Table1.ID LIMIT 1
)