Postgresql Update inside For Loop - postgresql

I'm new enough to postgresql, and I'm having issues updating a column of null values in a table using a for loop. The table i'm working on is huge so for brevity i'll give a smaller example which should get the point across. Take the following table
+----+----------+----------+
| id | A | B | C |
+----+----------+----------+
| a | 1 | 0 | NULL |
| b | 1 | 1 | NULL |
| c | 2 | 4 | NULL |
| a | 3 | 2 | NULL |
| c | 2 | 3 | NULL |
| d | 4 | 2 | NULL |
+----+----------+----------+
I want to write a for loop which iterates through all of the rows and does some operation
on the values in columns a and b and then inserts a new value in c.
For example, where id = a , update table set C = A*B, or where id = d set C = A + B etc. This would then give me a table like
+----+----------+----------+
| id | A | B | C |
+----+----------+----------+
| a | 1 | 0 | 0 |
| b | 1 | 1 | NULL |
| c | 2 | 4 | NULL |
| a | 3 | 2 | 6 |
| c | 2 | 3 | NULL |
| d | 4 | 2 | 6 |
+----+----------+----------+
So ultimately I'd like to loop through all the rows of the table and update column C according to the value in the "id" column. The function I've written (which isn't giving any errors but also isn't updating anything either) looks like this...
-- DROP FUNCTION some_function();
CREATE OR REPLACE FUNCTION some_function()
RETURNS void AS
$BODY$
DECLARE
--r integer; not too sure if this needs to be declared or not
result int;
BEGIN
FOR r IN select * from 'table_name'
LOOP
select(
case
when id = 'a' THEN B*C
when id = 'd' THEN B+C
end)
into result;
update table set C = result
WHERE id = '';
END LOOP;
RETURN;
END
$BODY$
LANGUAGE plpgsql
I'm sure there's something silly i'm missing, probably around what I'm, returning... void in this case. But as I only want to update existing rows should I need to return anything? There's probably easier ways of doing this than using a loop but I'd like to get it working using this method.
If anyone could point me in the right direction or point out anything blatantly obvious that I'm doing wrong I'd much appreciate it.
Thanks in advance.

No need for a loop or a function, this can be done with a single update statement:
update table_name
set c = case
when id = 'a' then a*b
when id = 'd' then a+b
else c -- don't change anything
end;
SQLFiddle: http://sqlfiddle.com/#!15/b65cb/2
The reason your function isn't doing anything is this:
update table set C = result
WHERE id = '';
You don't have a row with an empty string in the column id. Your function also seems to use the wrong formula: when id = 'a' THEN B*C I guess that should be: then a*b. As C is NULL initially, b*c will also yield null. So even if your update in the loop would find a row, it would update it to NULL.
You are also retrieving the values incorrectly from the cursor.
If you really, really want to do it inefficiently in a loop, the your function should look something like this (not tested!):
CREATE OR REPLACE FUNCTION some_function()
RETURNS void AS
$BODY$
DECLARE
result int;
BEGIN
-- r is a structure that contains an element for each column in the select list
FOR r IN select * from table_name
LOOP
if r.id = 'a' then
result := r.a * r.b;
end if;
if r.id = 'b' then
result := r.a + r.b;
end if;
update table
set C = result
WHERE id = r.id; -- note the where condition that uses the value from the record variable
END LOOP;
END
$BODY$
LANGUAGE plpgsql
But again: if your table is "huge" as you say, the loop is an extremely bad solution. Relational databases are made to deal with "sets" of data. Row-by-row processing is an anti-pattern that will almost always have bad performance.
Or to put it the other way round: doing set-based operations (like my single update example) is always the better choice.

Related

Aggregate parents recursively in PostgreSQL

In a child-parent table, I need to aggregate all parents for each child. I can readily get children per parent in a CTE query, but can't figure how to reverse it (sqfiddle here). Given this:
CREATE TABLE rel(
child integer,
parent integer
);
INSERT INTO rel(child, parent)
VALUES
(1,NULL),
(2,1),
(3,1),
(4,3),
(5,2),
(6,4),
(7,2),
(8,7),
(9,8);
a query that will return an array of parents (order is not important):
1, {NULL}
2, {1}
3, {1}
4, {3,1}
5, {2,1}
6, {4,3,1}
7, {2,1}
8, {7,2,1}
9, {8,7,2,1}
Even if there is an accepted answer, I would like to show how the problem can be solved in pure SQL in a much simpler way, with a recursive CTE:
WITH RECURSIVE t(child, parentlist) AS (
SELECT child , ARRAY[]::INTEGER[] FROM rel WHERE parent IS NULL
UNION
SELECT rel.child, rel.parent || t.parentlist
FROM rel
JOIN t ON rel.parent = t.child
) SELECT * FROM t;
child | parentlist
-------+------------
1 | {}
2 | {1}
3 | {1}
4 | {3,1}
5 | {2,1}
7 | {2,1}
6 | {4,3,1}
8 | {7,2,1}
9 | {8,7,2,1}
(9 rows)
If you insist on having a singleton {NULL} for children with an empty list of parents, just say
SELECT child,
CASE WHEN CARDINALITY(parentlist) = 0
THEN ARRAY[NULL]::INTEGER[]
ELSE parentlist
END
FROM t;
instead of SELECT * FROM t, but frankly, I don’t see why you should.
A final remark: I am not aware of any efficient way to do this with relational databases, either in pure SQL or with procedural languages. The point is that JOIN’s are inherently expensive, and if you have really large tables, your queries will take lots of time. You can mitigate the problem with indexes, but the best way to tackle this kind of problems is by using graphing software and not RDBMS.
For this you *can create a PL. I did something similar, here is my PL that handles any father-son structure, it returned a table, but for your case I changed a little bit:
DROP FUNCTION IF EXISTS ancestors(text,integer,integer);
CREATE OR REPLACE FUNCTION ancestors(
table_name text,
son_id integer,-- the id of the son you want its ancestors
ancestors integer)-- how many ancestors you want. 0 for every ancestor.
RETURNS integer[]
AS $$
DECLARE
ancestors_list integer[];
father_id integer:=0;
query text;
row integer:=0;
BEGIN
LOOP
query:='SELECT child, parent FROM '||quote_ident(table_name) || ' WHERE child='||son_id;
EXECUTE query INTO son_id,father_id;
RAISE NOTICE 'son:% | father: %',son_id,father_id;
IF son_id IS NOT NULL
THEN
ancestors_list:=array_append(ancestors_list,father_id);
son_id:=father_id;
ELSE
ancestors:=0;
father_id:=0;
END IF;
IF ancestors=0
THEN
EXIT WHEN father_id IS NULL;
ELSE
row:=row+1;
EXIT WHEN ancestors<=row;
END IF;
END LOOP;
RETURN ancestors_list;
END;
$$ LANGUAGE plpgsql;
Once the PL is created, to get wat you want just query:
SELECT *,ancestors('rel',child,0) from rel
This returns:
child | parent | ancestors
------+--------+-----------------
1 | NULL | {NULL}
2 | 1 | {1,NULL}
3 | 1 | {1,NULL}
4 | 3 | {3,1,NULL}
5 | 2 | {2,1,NULL}
6 | 4 | {4,3,1,NULL}
7 | 2 | {2,1,NULL}
8 | 7 | {7,2,1,NULL}
9 | 8 | {8,7,2,1,NULL}
If you don't want the NULL to appear, just update the PL ;)

Redshift. Convert comma delimited values into rows with all combinations

I have:
user_id|user_name|user_action
-----------------------------
1 | Shone | start,stop,cancell
I would like to see:
user_id|user_name|parsed_action
-------------------------------
1 | Shone | start
1 | Shone | start,stop
1 | Shone | start,cancell
1 | Shone | start,stop,cancell
1 | Shone | stop
1 | Shone | stop,cancell
1 | Shone | cancell
....
You can create the following Python UDF:
create or replace function get_unique_combinations(list varchar(max))
returns varchar(max)
stable as $$
from itertools import combinations
arr = list.split(',')
response = []
for L in range(1, len(arr)+1):
for subset in combinations(arr, L):
response.append(','.join(subset))
return ';'.join(response)
$$ language plpythonu;
that will take your list of actions and return unique combinations separated by semicolon (elements in combinations themselves will be separated by commas). Then you use a UNION hack to split values into separate rows like this:
WITH unique_combinations as (
SELECT
user_id
,user_name
,get_unique_combinations(user_actions) as action_combinations
FROM your_table
)
,unwrap_lists as (
SELECT
user_id
,user_name
,split_part(action_combinations,';',1) as parsed_action
FROM unique_combinations
UNION ALL
SELECT
user_id
,user_name
,split_part(action_combinations,';',2) as parsed_action
FROM unique_combinations
-- as much UNIONS as possible combinations you have for a single element, with the 3rd parameter (1-based array index) increasing by 1
)
SELECT *
FROM unwrap_lists
WHERE parsed_action is not null

Postgres - updates with join gives wrong results

I'm having some hard time understanding what I'm doing wrong.
The result of this query shows the same results for each row instead of being updated by the right result.
My DATA
I'm trying to update a table of stats over a set of business
business_stats ( id SERIAL,
pk integer not null,
b_total integer,
PRIMARY KEY(pk)
);
the details of each business are stored here
business_details (id SERIAL,
category CHARACTER VARYING,
feature_a CHARACTER VARYING,
feature_b CHARACTER VARYING,
feature_c CHARACTER VARYING
);
and here a table that associate the pk with the category
datasets (id SERIAL,
pk integer not null,
category CHARACTER VARYING;
PRIMARY KEY(pk)
);
WHAT I DID (wrong)
UPDATE business_stats
SET b_total = agg.total
FROM business_stats b,
( SELECT d.pk, count(bd.id) total
FROM business_details AS bd
INNER JOIN datasets AS d
ON bd.category = d.category
GROUP BY d.pk
) agg
WHERE b.pk = agg.pk;
The result of this query is
| id | pk | b_total |
+----+----+-----------+
| 1 | 14 | 273611 |
| 2 | 15 | 273611 |
| 3 | 16 | 273611 |
| 4 | 17 | 273611 |
but if I run just the SELECT the results of each pk are completely different
| pk | agg.total |
+----+-------------+
| 14 | 273611 |
| 15 | 407802 |
| 16 | 179996 |
| 17 | 815580 |
THE QUESTION
why is this happening?
why is the WHERE clause not working?
Before writing this question I've used as reference these posts: a, b, c
Do the following (I always recommend against joins in Updates)
UPDATE business_stats bs
SET b_total =
( SELECT count(c.id) total
FROM business_details AS bd
INNER JOIN datasets AS d
ON bd.category = d.category
where d.pk=bs.pk
)
/*optional*/
where exists (SELECT *
FROM business_details AS bd
INNER JOIN datasets AS d
ON bd.category = d.category
where d.pk=bs.pk)
The issue is your FROM clause. The repeated reference to business_stats means you aren't restricting the join like you expect to. You're joining agg against the second unrelated mention of business_stats rather than the row you want to update.
Something like this is what you are after (warning not tested):
UPDATE business_stats AS b
SET b_total = agg.total
FROM
(...) agg
WHERE b.pk = agg.pk;

Postgresql replace string in table from another table

I have a problem with two tables, those used for accounting
First table named tabela1 have a set of symbol and account. The second table is the symbol, the name to be changed in the first table and the record number of the first table.
Tabela1 is a:
ID |KNT_S_WN | KNT_S_MA |
1 |3021-_R | 3021-_K-_W|
2 |_W-_R | _Z |
Tabelas is a:
ID | SYMBOL |REP |
1 | _R |7Z45 |
1 | _K |321-05 |
1 | _W |490 |
2 | _W |C1 |
2 | _R |C17 |
2 | _Z |320 |
I need this output:
ID |KNT_S_WN | KNT_S_MA |
1 |3021-7Z45 | 3021-321-05-490|
2 |C1-C17 | 320 |
I try this:
update tabela set
knt_s_wn=replace(knt_s_wn,
(select symbol from tabelas where tabela.id=tabelas.id and position(tabelas.symbol in knt_s_wn)>0),
(select a from tabelas where tabela.id=tabelas.id and position(tabelas.symbol in knt_s_wn)>0))
If I use this expression, if it is not knt_s_wn symbol is deleted (blank).
Please help me!!!
One of the simplest solution is to replace strings in a loop inside a plpgsql function:
create or replace function multireplace(aid int, str text)
returns text language plpgsql as $$
declare
rec record;
begin
for rec in
select *
from tabelas
where id = aid
loop
str:= replace(str, rec.symbol, rec.rep);
end loop;
return str;
end $$;
Test it here.
Pure sql solution (ie: without procedural sql) to get:
I need this output:
ID |KNT_S_WN | KNT_S_MA |
1 |3021-7Z45 | 3021-321-05-490|
2 |C1-C17 | 320 |
is below:
with recursive t(id, knt_s_wn, knt_s_ma, symbols, reps) as (
select
tabela.id,
knt_s_wn,
knt_s_ma,
array_agg(symbol),
array_agg(rep)
from tabela
join tabelas on tabelas.id = tabela.id
group by 1, 2, 3
union all
select
id,
replace(knt_s_wn, symbols[1], reps[1]),
replace(knt_s_ma, symbols[1], reps[1]),
array_remove(symbols, symbols[1]),
array_remove(reps, reps[1])
from t
where array_length(symbols, 1) > 0
)
select id, knt_s_wn, knt_s_ma
from t
where symbols = array[]::text[];

Need cleaner update method in PostgreSQL 9.1.3

Using PostgreSQL 9.1.3 I have a points table like so (What's the right way to show tables here??)
| Column | Type | Table Modifiers | Storage
|--------|-------------------|-----------------------------------------------------|----------|
| id | integer | not null default nextval('points_id_seq'::regclass) | plain |
| name | character varying | not null | extended |
| abbrev | character varying | not null | extended |
| amount | real | not null | plain |
In another table, orders I have a bunch of columns where the name of the column exists in the points table via the abbrev column, as well as a total_points column
| Column | Type | Table Modifiers |
|--------------|--------|--------------------|
| ud | real | not null default 0 |
| sw | real | not null default 0 |
| prrv | real | not null default 0 |
| total_points | real | default 0 |
So in orders I have the sw column, and in points I'll now have an amount that realtes to the column where abbrev = sw
I have about 15 columns like that in the points table, and now I want to set a trigger so that when I create/update an entry in the points table, I calculate a total score. Basically with just those three shown I could do it long-hand like this:
UPDATE points
SET total_points =
ud * (SELECT amount FROM points WHERE abbrev = 'ud') +
sw * (SELECT amount FROM points WHERE abbrev = 'sw') +
prrv * (SELECT amount FROM points WHERE abbrev = 'prrv')
WHERE ....
But that's just plain ugly and repetative, and like I said there are really 15 of them (right now...). I'm hoping there's a more sophisticated way to handle this.
In general each of those silly names on the orders table represents a type of work associated with the order, and each of those types has a 'cost' to it, which is stores in the points table. I'm not married to this structure if there's a cleaner setup.
"Serialize" the costs for orders:
CREATE TABLE order_cost (
order_cost_id serial PRIMARY KEY
, order_id int NOT NULL REFERENCES order
, cost_type_id int NOT NULL REFERENCES points
, cost int NOT NULL DEFAULT 0 -- in Cent
);
For a single row:
UPDATE orders o
SET total_points = COALESCE((
SELECT sum(oc.cost * p.amount) AS order_cost
FROM order_cost oc
JOIN points p ON oc.cost_type_id = p.id
WHERE oc.order_id = o.order_id
), 0);
WHERE o.order_id = $<order_id> -- your order_id here ...
Never use the lossy type real for currency data. Use exact types like money, numeric or just integer - where integer is supposed to store the amount in Cent.
More advice in this closely related example:
How to implement a many-to-many relationship in PostgreSQL?