Multiple UPDATE ... FROM same row is not working - postgresql

I'm trying to do multiple update, but it works only for the first row.
I have table "users" with 2 records:
create table users
(
uid serial not null
constraint users_pkey
primary key,
balance numeric default 0 not null
);
INSERT INTO public.users (uid, balance) VALUES (2, 100);
INSERT INTO public.users (uid, balance) VALUES (1, 100);
I try to UPDATE user "1" twice with the query, but it update only one time:
balance for user "1" become "105", not "115"
update users as u
set balance = balance + c.bal
from (values (1, 5),
(1, 10)
) as c(uid, bal)
where c.uid = u.uid;
Why it not updated for all rows from subquery?

The postgresql documentation gives no reason for this behaviour but does specify it.
Relevant quote
When a FROM clause is present, what essentially happens is that the
target table is joined to the tables mentioned in the from_list, and
each output row of the join represents an update operation for the
target table. When using FROM you should ensure that the join produces
at most one output row for each row to be modified. In other words, a
target row shouldn't join to more than one row from the other
table(s). If it does, then only one of the join rows will be used to
update the target row, but which one will be used is not readily
predictable.
Use a SELECT with a GROUP BY to combine the rows before performing the update.

You need to aggregate in the inner query before joining:
update users as u
set balance = balance + d.bal
from (
select uid, sum(bal) bal
from ( values (1, 5), (1, 10) ) as c(uid, bal)
group by uid
) d
where d.uid = u.uid;
Demo on DB Fiddle:
| uid | balance |
| --- | ------- |
| 2 | 100 |
| 1 | 115 |

Related

Postgresql group into predefined groups where group names come from a database table

I have a database table with data similar to this.
create table DataTable (
name text,
value number
)
insert into DataTable values
('A', 1),('A', 2),('B', 3),('Other', 5),('C', 1);
And i have another table
create table "group" (
name text,
default boolean
)
insert into "group" values
('A', false),('B', false),('Other', true);
I want to group the data in the first table based on the defined groups in the second table.
Expected output
Name | sum
A | 3
B | 3
Other | 6
Right now I'm using this query:
select coalesce(g.name, (select name from group where default = true)) name
sum(dt.value)
from DataTable dt
left join group g on dt.name = g.name
group by 1
This works but can cause performance tips in some situations. Any better way to do this?

delete duplicates in a table and update references

I have a table with id, we now added a new field where we calculated uniques from an external source, which made us realize we actually have duplicates in the database:
Main Table
id | unique_id | ...
---|------------
4 | A |
5 | A
6 | B
We can see: 5 is actually a duplicate of 4, as they both have the same unique_id.
Now this needs to be cleaned up.
I sadly can not simply delete those duplicates (5), as other tables depend on it:
Other Table (OtherTable.main_id REFERENCES MainTable.id)
id | main_id | ...
---|------------
1 | 4 | Blah
2 | 5
3 | 6
Now I have to clean up the duplicates, here
UPDATE OtherTable SET main_id = 5 WHERE main_id=4
How can I do that in an efficient update?
I tried to simply update every reference to the first one with that same unique_id, however that didn't complete in a day.
UPDATE "OtherTable" SET "main_id" = (SELECT "id" FROM "MainTable" WHERE "unique_id" = (SELECT "unique_id" FROM "MainTable" WHERE "id" == "OtherTable"."main_id") LIMIT 1)
If it helps, the MainTable contains about 750,000 entries, the OtherTable contains 12,000,000 rows.
Probably that's because those tripple select one is quite inefficient.
For the simple part of deletion the duplicates (after I would be done with changing the references to the first one of it's kind) I found this query to work swiftly enough:
DELETE FROM MainTable
WHERE id IN
(SELECT id
FROM
(SELECT id,
ROW_NUMBER() OVER( PARTITION BY unique_id
ORDER BY id ) AS row_num
FROM MainTable ) t
WHERE t.row_num > 1 );
However I need a way to update the references to the non-deleted ones of the duplicates.
Instead of UPDATE with a nested query, I'd suggest using UPDATE FROM for a join, and the same window function as in your DELETE statement:
UPDATE "OtherTable" AS other
SET main_id = main.min_id
FROM (SELECT
id,
first_value(id) OVER (PARTITION BY unique_id ORDER BY id) AS min_id
FROM "MainTable"
) AS main
WHERE main.id = other.main_id
AND main.id <> main.min_id

Does the returning clause always execute first?

I have a many-to-many relation representing containers holding items.
I have a primary key row_id in the table.
I insert four rows: (container_id, item_id) values (1778712425160346751, 4). These rows will be identical except the aforementioned unique row_id.
I subsequently execute the following query:
delete from contains
where item_id = 4 and
container_id = '1778712425160346751' and
row_id =
(
select max(row_id) from contains
where container_id = '1778712425160346751' and
item_id = 4
)
returning
(
select count(*) from contains
where container_id = '1778712425160346751' and
item_id = 4
);
Now I expected to get 3 returned from this query, but I got a 4. Getting a 4 is the desired behavior, but it is not what was expected.
My question is: can I always expect that the returning clause executes before the delete, or is this an idiosyncrasy of certain versions or specific software?
The use of a query in returning section is allowed but not documented. For the documentation:
output_expression
An expression to be computed and returned by the DELETE command after each row is deleted. The expression can use any column names of the table named by table_name or table(s) listed in USING. Write * to return all columns.
It seems logical that the query sees the table in a state before deleting, as the statement is not completed yet.
create temp table test as
select id from generate_series(1, 4) id;
delete from test
returning id, (select count(*) from test);
id | count
----+-------
1 | 4
2 | 4
3 | 4
4 | 4
(4 rows)
The same concerns update:
create temp table test as
select id from generate_series(1, 4) id;
update test
set id = id+ 1
returning id, (select sum(id) from test);
id | sum
----+-----
2 | 10
3 | 10
4 | 10
5 | 10
(4 rows)

how to create a table which inserts two tables having same primary key without duplicates and i need all the data

Postgres:
create table stock(item_id int primary key, balance float);
insert into stock values(10,2200);
insert into stock values(20,1900);
select * from stock;
create table buy(item_id int primary key, volume float);
insert into buy values(10,1000);
insert into buy values(30,300);
select * from buy;
results:
item_id | balance
---------+---------
10 | 2200
20 | 1900
(2 rows)
item_id | volume
---------+--------
10 | 1000
30 | 300
(2 rows)
Now i want another table which include these two table's data.
The new table which has 3 rows of data with item_id(10,20,30) and no duplication
I need query for this; either by merge or join.
I'm guessing:
that you really want a view rather than a table
that the values in the 'buy' table are supposed to be deducted from the 'stock'
so here's what I think you are after:
create view v_current_stock as
select item_id, sum(balance) as balance
from ( select item_id, balance from stock
union all
select item_id, -volume from buy )
group by item_id;
EDIT: seems like my guesswork was a bit off (see comments). Perhaps you are looking for a full join:
create view v as
select * from stock full join buy using (item_id);
select * from v;
item_id | balance | volume
---------+---------+--------
10 | 2200 | 1000
20 | 1900 |
30 | | 300
You can use a insert into ... select syntax :
create table mytable(item_id int primary key, balance float, volume float);
insert into mytable
select distinct stock.item_id, balance, volume
from stock
inner join buy on buy.item_id = stock.item_id;
You can use a different type of join if needed (left join or full join). In your case, I think you need a full join, but since I'm not sure I'll stick with the inner join in the example.

PostgreSQL: Can't use DISTINCT for some data types

I have a table called _sample_table_delme_data_files which contains some duplicates. I want to copy its records, without duplicates, into data_files:
INSERT INTO data_files (SELECT distinct * FROM _sample_table_delme_data_files);
ERROR: could not identify an ordering operator for type box3d
HINT: Use an explicit ordering operator or modify the query.
Problem is, PostgreSQL can not compare (or order) box3d types. How do I supply such an ordering operator so I can get only the distinct into my destination table?
Thanks in advance,
Adam
If you don't add the operator, you could try translating the box3d data to text using its output function, something like:
INSERT INTO data_files (SELECT distinct othercols,box3dout(box3dcol) FROM _sample_table_delme_data_files);
Edit The next step is: cast it back to box3d:
INSERT INTO data_files SELECT othercols, box3din(b) FROM (SELECT distinct othercols,box3dout(box3dcol) AS b FROM _sample_table_delme_data_files);
(I don't have box3d on my system so it's untested.)
The datatype box3d doesn't have an operator for the DISTINCT-operation. You have to create the operator, or ask the PostGIS-project, maybe somebody has already fixed this problem.
Finally, this was solved by a colleague.
Let's see how many dups are there:
SELECT COUNT(*) FROM _sample_table_delme_data_files ;
count
-------
12728
(1 row)
Now, we shall add another column to the source table to help us differentiate similar rows:
ALTER TABLE _sample_table_delme_data_files ADD COLUMN id2 serial;
We can now see the dups:
SELECT id, id2 FROM _sample_table_delme_data_files ORDER BY id LIMIT 10;
id | id2
--------+------
198748 | 6449
198748 | 85
198801 | 166
198801 | 6530
198829 | 87
198829 | 6451
198926 | 88
198926 | 6452
199062 | 6532
199062 | 168
(10 rows)
And remove them:
DELETE FROM _sample_table_delme_data_files
WHERE id2 IN (SELECT max(id2) FROM _sample_table_delme_data_files
GROUP BY id
HAVING COUNT(*)>1);
Let's see it worked:
SELECT id FROM _sample_table_delme_data_files GROUP BY id HAVING COUNT(*)>1;
id
----
(0 rows)
Remove the auxiliary column:
ALTER TABLE _sample_table_delme_data_files DROP COLUMN id2;
ALTER TABLE
Insert the remaining rows into the destination table:
INSERT INTO data_files (SELECT * FROM _sample_table_delme_data_files);
INSERT 0 6364