I'm comparing PostgreSQL vs. SQLServer for migrating purposes. Now I'm evaluating T-SQL vs. PL/pgSQL, the thing is that in T-SQL you can use loops or declare variables, for example:
declare #counter int
set #counter = 0
while #counter < 10
begin
set #counter = #counter + 1
print 'The counter is ' + cast(#counter as char)
end
There is no need to put it inside a function or procedure. Can I do that in PostgreSQL?
Searching on the web I found a negative answer doing it in MySQL but I didn't find such answer for Postgres.
You cannot DECLARE (global) variables (there are workarounds) nor loop with plain SQL - with the exception of recursive CTEs as provided by #bma (which is actually iterating over rows, not looping, strictly speaking).
However, there is the DO statement for such ad-hoc procedural code. Introduced with Postgres 9.0. It works like a one-time function, but does not take any parameters and does not return anything. You can RAISE notices et al, so your example would just work fine:
DO
$do$
DECLARE
_counter int := 0;
BEGIN
WHILE _counter < 10
LOOP
_counter := _counter + 1;
RAISE NOTICE 'The counter is %', _counter; -- coerced to text automatically
END LOOP;
END
$do$
If not specified otherwise, the language in the body defaults to plpgsql. You can use any registered procedural language though, if you declare it (like: LANGUAGE plpython).
Postgres also offers generate_series() to generate sets ad-hoc, which may obviate the need for looping in many cases. Try a search here on SO for examples.
Also, you can use the WHERE clause in a data-modifying CTE in plain SQL to fork cases and emulate IF .. THEN .. ELSE .. END ...
You can recursively query result sets using WITH RECURSIVE, assuming you are on Postgresql 8.4+. Docs: http://www.postgresql.org/docs/current/static/queries-with.html
This would allow you to loop your set and process the data in various ways.
Related
I have Postgres function that needs to iterate on an ARRAY of tables_name and should save the value that will be returned from the query each time to array.
maybe this is not correct way so if there is better ways to do it I'll be glad to know :)
I've try with format function to generate different queries each time.
CREATE OR REPLACE FUNCTION array_iter(tables_name text[],idd integer)
RETURNS void
LANGUAGE 'plpgsql'
AS $BODY$
declare
current_table text;
current_height integer :=0;
quer text;
heights integer[];
begin
FOREACH current_table IN ARRAY $1
LOOP
quer:=format('SELECT height FROM %s WHERE %s.idd=$2', current_table);
current_height:=EXECUTE quer;
SELECT array_append(heights, current_height);
END LOOP;
RAISE NOTICE '%', heights;
end;
$BODY$;
First off you desperately need to update your Postgres as version 9.1 is no longer supported and has not been for 5 years (Oct 2016). I would suggest going to v13 as it is the latest, but an absolute minimum to 10.12. That will still has slightly over a year (Nov 2022) before it looses support. So with that in mind.
The statement quer:=format('SELECT height FROM %s WHERE %s.idd=$2', current_table); is invalid, it contains 2 format specifiers but only 1 argument. You could use the single argument by including the argument number on each specifier. So quer:=format('SELECT height FROM %1s WHERE %1s.idd=$2', current_table);. But that is not necessary as the 2nd is a table alias which need not be the table name and since you only have 1 table is not needed at all. I would however move the parameter ($2) out of the select and use a format specifiers/argument for it.
The statement current_height:=EXECUTE quer; is likewise invalid, you cannot make the Right Val of assignment a select. For this you use the INTO option which follows the statement. execute query into ....
While SELECT array_append(heights, current_height); is a valid statement a simple assignment heights = heights || current_height; seems easier (at least imho).
Finally there a a couple omissions. Prior to running a dynamic SQL statement it is good practice to 'print' or log the statement before executing. What happens when the statement has an error. And why build a function to do all this work just to throw away the results, so instead of void return integer array (integer[]).
So we arrive at:
create or replace function array_iter(tables_name text[],idd integer)
returns integer[]
language plpgsql
as $$
declare
current_table text;
current_height integer :=0;
quer text;
heights integer[];
begin
foreach current_table in array tables_name
loop
quer:=format('select height from %I where id=%s', current_table,idd);
raise notice 'Running Query:: %',quer;
execute quer into current_height;
heights = heights || current_height;
end loop;
raise notice '%', heights;
return heights;
exception
when others then
raise notice 'Query failed:: SqlState:%, ErrorMessage:%', sqlstate,sqlerrm;
raise;
end;
$$;
This does run on version as old as 9.5 (see fiddle) although I cannot say that it runs on the even older 9.1.
I have written an AGGREGATE function that approximates a SELECT COUNT(DISTINCT ...) over a UUID column, a kind of poor man's HyperLogLog (and having different perf characteristics).
However, it is very slow because I am using set_bit on a BIT and that has copy-on-write semantics.
So my question is:
is there a way to inplace / mutably update a BIT or bytea?
failing that, are there any binary data structures that allow mutable/in-place set_bit edits?
A constraint is that I can't push C code or extensions to implement this. But I can use extensions that are available in AWS RDS postgres. If it's not faster than HLL then I'll just be using HLL. Note that HLL is optimised for pre-aggregated counts, it isn't terribly fast at doing adhoc count estimates over millions of rows (although still faster than a raw COUNT DISTINCT).
Below is the code for context, probably buggy too:
CREATE OR REPLACE FUNCTION uuid_approx_count_distinct_sfunc (BIT(83886080), UUID)
RETURNS BIT(83886080) AS $$
DECLARE
s BIT(83886080) := $1;
BEGIN
IF s IS NULL THEN s := '0' :: BIT(83886080); END IF;
RETURN set_bit(s, abs(mod(uuid_hash($2), 83886080)), 1);
END
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION uuid_approx_count_distinct_ffunc (BIT(83886080))
RETURNS INTEGER AS $$
DECLARE
i INTEGER := 83886079;
s INTEGER := 0;
BEGIN
LOOP
EXIT WHEN i < 0;
IF get_bit($1, i) = 1 THEN s := s + 1; END IF;
i := i - 1;
END LOOP;
RETURN s;
END
$$ LANGUAGE plpgsql;
CREATE OR REPLACE AGGREGATE approx_count_distinct (UUID) (
SFUNC = uuid_approx_count_distinct_sfunc,
FINALFUNC = uuid_approx_count_distinct_ffunc,
STYPE = BIT(83886080),
COMBINEFUNC = bitor,
PARALLEL = SAFE
);
Yeah, SQL isn't actually that fast for raw computation. I might try a UDF, perhaps pljava or plv8 (JavaScript) which compile just-in-time to native and available on most major hosting providers. Of course for performance, use C (perhaps via LLVM) for maximum performance at maximum pain. Plv8 should take minutes to prototype, just pass an array constructed from array_agg(). Obviously keep the array size to millions of items, or find a way to roll-up your sketches ( bitwuse-AND ?)
https://plv8.github.io/#function-calls
https://www.postgresqltutorial.com/postgresql-aggregate-functions/postgresql-array_agg-function/
FYI HyperLogLog is available as an open source extension for PostgreSQL from Citus/Microsoft and of course available on Azure. https://www.google.com/search?q=hyperloglog+postgres
(You could crib from their coffee and just change the core algorithm, then test side by side). Citus is pretty easy to install, so this isn't a bad option.
I have to count the number of bits in postgresql on very large integer columns for which i wrote a postgresql funtion to count the number of bits in an integer.
CREATE OR REPLACE FUNCTION bitcount(i integer) RETURNS integer AS $$
DECLARE n integer;
DECLARE bitCount integer;
BEGIN
bitCount := 0;
LOOP
IF i = 0 THEN
EXIT;
END IF;
i := i & (i-1);
bitCount:= bitCount+1;
END LOOP;
RETURN bitCount;
END
$$ LANGUAGE plpgsql;
but i found one more way to do this using pg's inbuilt functions as well
like
SELECT char_length( replace(100::bit(31)::TEXT, '0', ''));
so i decided to compare performance of both of the ways
so i used below queries to compare
First
SELECT a.n, bitcount(a.n)
from generate_series(1, 100000) as a(n);
Second
SELECT a.n, char_length( replace(a.n::bit(31)::TEXT, '0', ''))
FROM generate_series(1, 100000) as a(n);
I was expecting that First method will outperform the second one
but to my surprise both of them performed almost same. In fact on my machine second one always completed a few seconds faster even with large number of integers.
can anyone explain me why second is almost as fast as first despite of using cast and string operation ?
I'd say it is because PL/pgSQL is known to be slow.
Write the function in PL/Perl or PL/Python for better performance.
I have created a "merge" function which is supposed to execute either an UPDATE or an INSERT query, depending on existing data. Instead of writing an upsert-wrapper for each table (as in most of the available examples), this function takes entire SQL strings. Both of the SQL strings are automatically generated by our application.
The plan is to call the function like this:
-- hypothetical "settings" table, with a primary key of (user_id, setting):
SELECT merge(
$$UPDATE settings SET value = 'x' WHERE user_id = 42 AND setting = 'foo'$$,
$$INSERT INTO settings (user_id, setting, value) VALUES (42, 'foo', 'x')$$
);
Here's the full code of the merge() function:
CREATE OR REPLACE FUNCTION merge (update_sql TEXT, insert_sql TEXT) RETURNS TEXT AS
$func$
DECLARE
max_iterations INTEGER := 10;
i INTEGER := 0;
num_updated INTEGER;
BEGIN
-- usually returns before re-entering the loop
LOOP
-- first try the update
EXECUTE update_sql;
GET DIAGNOSTICS num_updated = ROW_COUNT;
IF num_updated > 0 THEN
RETURN 'UPDATE';
END IF;
-- nothing was updated: try the insert, watching out for concurrent inserts
BEGIN
EXECUTE insert_sql;
RETURN 'INSERT';
EXCEPTION WHEN unique_violation THEN
-- nop; just loop and try again from the top
END;
-- emergency brake
i := i + 1;
IF i >= max_iterations THEN
RAISE EXCEPTION 'merge(): tried looping % times, giving up now.', i;
EXIT;
END IF;
END LOOP;
END;
$func$
LANGUAGE plpgsql;
It appears to work well enough in my tests, but I'm not certain if I haven't missed anything crucial, especially regarding concurrent UPDATE/INSERT/DELETE queries, which may be issued without using this function. Did I overlook anything important?
Among the resources I consulted for this function are:
UPDATE/INSERT example 40.2 in the PostgreSQL manual
Why is UPSERT so complicated?
SO: Insert, on duplicate update (postgresql)
(Edit: one of the goals was to avoid locking the target table.)
The answer to your question depends your the context of how your application(s) will access the database. There are many ways to solve this as nicely discussed in depesz's post you cited by yourself. In addition you might want to also consider using writeable CTEs see here. Also the [question]Insert, on duplicate update in PostgreSQL? has some interesting discussions for your decision making process.
First of all, yes I've read documentation for DO statement :)
http://www.postgresql.org/docs/9.1/static/sql-do.html
So my question:
I need to execute some dynamic block of code that contains UPDATE statements and calculate the number of all affected rows. I'm using Ado.Net provider.
In Oracle the solution would have 4 steps:
add InputOutput parameter "N" to command
add BEGIN ... END; to command
add :N := :N + sql%rowcount after each statement.
It's done! We can read N parameter from command, after execute it.
How can I do it with PostgreSQL? I'm using npgsql provider but can migrate to devard if it helps.
DO statement blocks are good to execute dynamic SQL. They are no good to return values. Use a plpgsql function for that.
The key statement you need is:
GET DIAGNOSTICS integer_var = ROW_COUNT;
Details in the manual.
Example code:
CREATE OR REPLACE FUNCTION f_upd_some()
RETURNS integer AS
$func$
DECLARE
ct int;
i int;
BEGIN
EXECUTE 'UPDATE tbl1 ...'; -- something dynamic here
GET DIAGNOSTICS ct = ROW_COUNT; -- initialize with 1st count
UPDATE tbl2 ...; -- nothing dynamic here
GET DIAGNOSTICS i = ROW_COUNT;
ct := ct + i; -- add up
RETURN ct;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM f_upd_some();
My solution is quite simple. In Oracle I need to use variables to calculate the sum of updated rows because command.ExecuteNonQuery() returns only the count of rows affected by the last UPDATE in the batch.
However, npgsql returns the sum of all rows updated by all UPDATE queries. So I only need to call command.ExecuteNonQuery() and get the result without any variables. Much easier than with Oracle.