PostgresQL window function over blocks of continuous IDs - postgresql

I have a table with partially consecutive integer ids, i.e. there are blocks such as 1,2,3, 6,7,8, 10, 23,24,25,26.
the gap size is dynamic
the length of the blocks is dynamic
I am breaking my head about a simple solution that selects from the table
and includes a column where the value corresponds to the first id of the respective block.
I.e. something like this
select id, first(id) over <what goes here?> first from table;
The result should look as following
| id | first |
|----|-------|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 6 | 6 |
| 7 | 6 |
| 8 | 6 |
| 10 | 10 |
| 23 | 23 |
| 24 | 23 |
| 25 | 23 |
| 26 | 23 |
Afterwards i could use this column nicely with the partition by window function clause.
What I came up with so far always looked similar to this and didn't succeed:
WITH foo AS (
SELECT LAG(id) OVER (ORDER BY id) AS previous_id,
id AS id,
id - LAG(id, 1, id) OVER (ORDER BY id) AS first_in_sequence
FROM table)
SELECT *,
FIRST_VALUE(id) OVER (ORDER BY id) AS first
FROM foo
ORDER BY id;
Defining a custom postgres function would also be an acceptable solution.
Thanks for any advice,
Marti

In Postgres you can create a custom aggregate. Example:
create or replace function first_in_series_func(int[], int)
returns int[] language sql immutable
as $$
select case
when $1[2] is distinct from $2- 1 then array[$2, $2]
else array[$1[1], $2] end;
$$;
create or replace function first_in_series_final(int[])
returns int language sql immutable
as $$
select $1[1]
$$;
create aggregate first_in_series(int) (
sfunc = first_in_series_func,
finalfunc = first_in_series_final,
stype = int[]
);
Db<>fiddle.
Read in the docs: User-Defined Aggregates

Here is an idea how this could be done. An implicit cursor is not horribly efficient though.
create or replace function ff()
returns table (r_id integer, r_first integer)
language plpgsql as
$$
declare
running_previous integer;
running_id integer;
running_first integer := null;
begin
for running_id in select id from _table order by id loop
if running_previous is distinct from running_id - 1 then
running_first := running_id;
end if;
r_id := running_id;
r_first := running_first;
running_previous := running_id;
return next;
end loop;
end
$$;
-- test
select * from ff() as t(id, first);

Related

Postgres 12 - CREATE AGGREGATE looks right, but results never return

I've been wanting a reason to try out CREATE AGGREGATE, and now have one: Root Mean Square/Quadratic Mean. I posted some broken code, that I've corrected, based on helpful suggestions from jjanes. Here's the working setup, with my custom tools and types schemas...you could use your own.
Now that it's working, I'm finding that the custom aggregate is dramatically slower than raw SQL. The grouping field is indexed, the aggregated field is not. Is this speed difference to be expected, and can it be overcome in SQL or PL/PgSQL?
First, here's the working code:
------------------------------------------------------
-- Create compound type to pass down processing chain
------------------------------------------------------
DROP TYPE types.rms_state CASCADE;
CREATE TYPE types.rms_state AS (
running_count int4,
running_sum_squares int4
);
------------------------------------------------------
-- Create the per-row function
------------------------------------------------------
DROP FUNCTION IF EXISTS tools.rms_row_function(types.rms_state, int4);
CREATE FUNCTION tools.rms_row_function (
rms_data_in types.rms_state,
value_from_row int4
)
RETURNS types.rms_state
LANGUAGE plpgsql
IMMUTABLE
STRICT
AS $BODY$
DECLARE
rms_data_out types.rms_state;
BEGIN
-- RAISE NOTICE 'rms_row_function: rms_data_in: %', rms_data_in::text;
rms_data_out.running_count := rms_data_in.running_count + 1;
rms_data_out.running_sum_squares := rms_data_in.running_sum_squares + (value_from_row ^ 2);
RETURN rms_data_out;
END;
$BODY$;
------------------------------------------------------
-- Create the final results function
------------------------------------------------------
DROP FUNCTION IF EXISTS tools.rms_result_function(types.rms_state);
CREATE FUNCTION tools.rms_result_function (
rms_data_in types.rms_state
)
RETURNS real
LANGUAGE plpgsql
IMMUTABLE
STRICT
AS $BODY$
DECLARE
rms_out real;
BEGIN
-- RAISE NOTICE 'rms_result_function: rms_data_in: %', rms_data_in::text;
IF (rms_data_in.running_count = 0) THEN
rms_out := 0;
ELSE
rms_out := (rms_data_in.running_sum_squares / rms_data_in.running_count)::real;
rms_out := rms_out ^ 0.5; -- Get the square root and return it
END IF;
RETURN rms_out;
END;
$BODY$;
------------------------------------------------------
-- Create the aggregate bindings/declaration
------------------------------------------------------
CREATE AGGREGATE tools.rms (int4)
(
sfunc = tools.rms_row_function,
finalfunc = tools.rms_result_function,
stype = types.rms_state,
FINALFUNC_MODIFY = READ_WRITE,
initcond = '(0,0)' -- Reset on each group, must be a textual version of state data.
);
I'm using a field named analytic_productivity.num_inst in my example, but it could be any int4 field. Here's a stripped-down table declation:
CREATE TABLE IF NOT EXISTS data.analytic_productivity (
id uuid NOT NULL DEFAULT NULL,
facility_id uuid NOT NULL DEFAULT NULL,
num_inst integer NOT NULL DEFAULT 0,
);
The facility table is included in the query for a name lookup:
select facility.name_ as facility_name,
sqrt(avg(power(num_inst, 2))) as inst_rms, -- root mean square/quadratic mean,
rms(num_inst) as inst_rms_check
from analytic_productivity
left join facility on facility.id = analytic_productivity.facility_id
group by 1
order by 1
Below are some sample results.
+-----------------+--------------------+----------------+
| facility_name | inst_rms | inst_rms_check |
+-----------------+--------------------+----------------+
| Anderson | 5.191804567965901 | 5.0990195 |
| Baldwin North | 42.24082451064157 | 42.237423 |
| Curvey | 41.75334367003306 | 41.749252 |
| Daodge Creeek | 28.75910443926612 | 28.757608 |
| Edgards | 42.430040392954375 | 42.426407 |
+-------------------------+--------------------+--------+
I'm not alarmed about the slight difference in scores, as I'm using a real, which only supports six decimals.

postgres, bulk update using data from another table

I have one target table (already populated with data) and another one (source table) from wich I need to retrieve data into first one.
target_table
postgres=# select id,id_user from ttasks;
id | id_user
----+---------
1 |
2 |
3 |
4 |
5 |
(5 rows)
source_table
postgres=# select id from tusers where active;
id
------
1011
1012
1013
1014
(4 rows)
I need to update id_user column of ttasks table using id's from tusers table, so final result on ttasks should be:
# expected result after update [select id, id_user from ttasks;]
id | id_user
----+---------
1 | 1011
2 | 1012
3 | 1013
4 | 1014
5 | 1011
(5 rows)
What I have tried (similar to INSERT ... FROM ... statement):
postgres=# update ttasks t1 set id_user = q1.id from (select id from tusers where active) q1 returning t1.id,t1.id_user;
id | id_user
----+---------
1 | 1011
2 | 1011
3 | 1011
4 | 1011
5 | 1011
(5 rows)
but this query allways use first id from my q1 subquery.
Any idea, help or even solution on how can I accomplish this task ?
Thank You very much!
p.s. This is my first post on this community so please be gentle with me if something in my question is not conforming with your rules.
Finally, after one of my friends told me that not everything could be coded in a "keep it stupid simple" manner, I wrote a plpqsql (PL/PGSQL) function that does the job for me and more than, allow to use some advanced filters inside.
CREATE OR REPLACE FUNCTION assign_workers_to_tasks(i_workers_table regclass, i_workers_table_tc text, i_tasks_table regclass, i_tasks_table_tc text, i_workers_filter text DEFAULT ''::text, i_tasks_filter text DEFAULT ''::text)
RETURNS void AS
$BODY$
DECLARE workers int[]; i integer; total_workers integer; r record; get_tasks text;
begin
i_workers_filter := 'where '||nullif(i_workers_filter,'');
i_tasks_filter := 'where '||nullif(i_tasks_filter,'');
EXECUTE format('select array_agg(%s) from (select %s from %s %s order by %s) q', i_workers_table_tc, i_workers_table_tc,i_workers_table, i_workers_filter,i_workers_table_tc)
INTO workers; --available [filtered] workers
total_workers := coalesce(array_length(workers,1),0); --total of available [filtered] workers
IF total_workers = 0 THEN
EXECUTE format('update %s set %s=null %s', i_tasks_table, i_tasks_table_tc, i_tasks_filter);
RETURN;
END IF;
i :=1;
get_tasks := format('select * from %s %s',i_tasks_table,i_tasks_filter); --[filtered] tasks
FOR r IN EXECUTE (get_tasks) LOOP
EXECUTE format('update %s set %s=%s where id = %s', i_tasks_table, i_tasks_table_tc, workers[i],r.id);
i := i+1;
IF i>total_workers THEN i := 1; END IF;
END LOOP;
RETURN;
end;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION assign_workers_to_tasks(regclass, text, regclass, text, text, text)
OWNER TO postgres;
and to fulfil my own question:
select assign_workers_to_tasks('tusers','id','ttasks','id_user','active');

Improving PL/pgSQL function

I just finished writing my first PLSQL function. Here what it does.
The SQL function attempt to reset the duplicate timestamp to NULL.
From table call_records find all timestamp that are duplicated.(using group by)
loop through each timestamp.Find all record with same timestamp (times-1, so that only 1 record for a given times is present)
From all the records found in step 2 update the timestamp to NULL
Here how the SQL function looks like.
CREATE OR REPLACE FUNCTION nullify() RETURNS INTEGER AS $$
DECLARE
T call_records.timestamp%TYPE;
-- Not sure why row_type does not work
-- R call_records%ROWTYPE;
S integer;
CRNS bigint[];
TMPS bigint[];
sql_stmt varchar = '';
BEGIN
FOR T,S IN (select timestamp,count(timestamp) as times from call_records where timestamp IS NOT NULL group by timestamp having count(timestamp) > 1)
LOOP
sql_stmt := format('SELECT ARRAY(select plain_crn from call_records where timestamp=%s limit %s)',T,S-1);
EXECUTE sql_stmt INTO TMPS;
CRNS := array_cat(CRNS,TMPS);
END LOOP;
sql_stmt = format('update call_records set timestamp=null where plain_crn in (%s)',array_to_string(CRNS,','));
RAISE NOTICE '%',sql_stmt;
EXECUTE sql_stmt ;
RETURN 1;
END
$$ LANGUAGE plpgsql;
Help me understand more PL/pgSQL language my suggesting me how it can be done better.
#a_horse_with_no_name: Here how the DB structure looks like
\d+ call_records;
id integer primary key
plain_crn bigint
timestamp bigint
efd integer default 0
id | efd | plain_crn | timestamp
----------+------------+------------+-----------
1 | 2016062936 | 8777444059 | 14688250050095
2 | 2016062940 | 8777444080 | 14688250050095
3 | 2016063012 | 8880000000 | 14688250050020
4 | 2016043011 | 8000000000 | 14688240012012
5 | 2016013011 | 8000000001 | 14688250050020
6 | 2016022011 | 8440000001 |
Now,
select timestamp,count(timestamp) as times from call_records where timestamp IS NOT NULL group by timestamp having count(timestamp) > 1
timestamp | count
-----------------+-----------
14688250050095 | 2
14688250050020 | 2
All that I want is to update the duplicate timestamp to null so that only one of them record has the given timestamp.
In short the above query should return result like this
select timestamp,count(timestamp) as times from call_records where timestamp IS NOT NULL group by timestamp;
timestamp | count
-----------------+-----------
14688250050095 | 1
14688250050020 | 1
You can use array variables directly (filter with predicate =ANY() - using dynamic SQL is wrong for this purpose:
postgres=# DO $$
DECLARE x int[] = '{1,2,3}';
result int[];
BEGIN
SELECT array_agg(v)
FROM generate_series(1,10) g(v)
WHERE v = ANY(x)
INTO result;
RAISE NOTICE 'result is: %', result;
END;
$$;
NOTICE: result is: {1,2,3}
DO
Next - this is typical void function - it doesn't return any interesting. Usually these functions returns nothing when all is ok or raises exception. The returning 1 RETURN 1 is useless.
CREATE OR REPLACE FUNCTION foo(par int)
RETURNS void AS $$
BEGIN
IF EXISTS(SELECT * FROM footab WHERE id = par)
THEN
...
ELSE
RAISE EXCEPTION 'Missing data for parameter: %', par;
END IF;
END;
$$ LANGUAGE plpgsql;

Create dynamic tables based on a loop in PostgreSQL 9.2

I have a function where I want to create a table for a every year based on the year from bill date which I will be looping.
CREATE OR REPLACE FUNCTION ccdb.ccdb_archival()
RETURNS void AS
$BODY$
DECLARE dpsql text;
DECLARE i smallint;
BEGIN
FOR i IN SELECT DISTINCT EXTRACT(year FROM bill_date) FROM ccdb.bills ORDER BY 1 LOOP
DO $$
BEGIN
CREATE TABLE IF NOT EXISTS ccdb_archival.bills||i (LIKE ccdb.bills INCLUDING ALL);
BEGIN
ALTER TABLE ccdb_archival.bills ADD COLUMN archival_date timestamp;
EXCEPTION
WHEN duplicate_column THEN RAISE NOTICE 'column archival_date already exists in <table_name>.';
END;
END;
$$;
INSERT INTO ccdb_archival.bills
SELECT *, now() AS archival_date
FROM ccdb.bills
WHERE bill_date::date >= current_date - interval '3 years' AND bill_date::date < current_date - interval '8 years';
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
I want to concatenate the year with the actual table name for each year.
I am unable to do the same with the above code. I get an error:
ERROR: syntax error at or near "||"
LINE 3: CREATE TABLE IF NOT EXISTS ccdb_archival.bills||i (LI...
Please suggest how do I achieve my requirement.
you cannot compose strings with metadata. You should utilize execute: http://www.postgresql.org/docs/9.1/static/ecpg-sql-execute-immediate.html
To create N tables with a prefix use this script.
This code uses a for loop and variable to creates 10 table starting with prefix 'sbtest' namely sbtest1, sbtest2 ... sbtest10
create_table.sql
do $$
DECLARE myvar integer;
begin
for myvar in 1..10 loop
EXECUTE format('CREATE TABLE sbtest%s (
id SERIAL NOT NULL,
k INTEGER NOT NULL,
c CHAR(120) NOT NULL,
pad CHAR(60) NOT NULL,
PRIMARY KEY (id))', myvar);
end loop;
end; $$
Run it using psql -U user_name -d database_name -f create_table.sql
Example Table sbtest1 is as
id | k | c | pad
----+---+---+-----
(0 rows)
Table "public.sbtest1"
Column | Type | Collation | Nullable | Default | Storage | Stats
target | Description
--------+----------------+-----------+----------+-------------------------------------+----------+------
--------+-------------
id | integer | | not null | nextval('sbtest1_id_seq'::regclass) | plain |
|
k | integer | | not null | | plain |
|
c | character(120) | | not null | | extended |
|
pad | character(60) | | not null | | extended |
|
Indexes:
"sbtest1_pkey" PRIMARY KEY, btree (id)
Access method: heap

In a PostgreSQL WHERE clause, how do I find all rows whose ID's are NOT in an array?

Okay, I've got a stored procedure... how do I find all rows whose ID's are not in an array? (Keep in mind that I'm using a PostgreSQL array that is created dynamically when the stored procedure is created
Example:
| people |
-------------
| id | Name |
-------------
| 1 | Bob |
| 2 | Te |
| 3 | Jack |
| 4 | John |
The array has somePGArray := [1,3], so a psuedo-query would look like this:
SELECT * FROM people WHERE id NOT IN (somePGArray)
Resulting query:
| people |
-------------
| id | Name |
-------------
| 2 | Te |
| 4 | John |
As a bonus, I also have no idea how to create an array and append ID's to it, so if you have a quick hint how to do that, that'd be tremendously helpful. :-)
create table foo1 ( id integer, name text );
insert into foo1 values (1,'Bob'),(2,'Te'),(3,'Jack'),(4,'John');
select * from foo1 where id not in (1,2);
select * from foo1 where not (id = ANY(ARRAY[1,2]));
create or replace function so_example(int)
returns SETOF foo1 as $$
declare
id alias for $1;
idlist int[] := '{1}';
q text;
rec record;
begin
idlist := idlist || ARRAY[id];
q := 'select * from foo1 where not (id = ANY('||quote_literal(idlist)||'))';
raise notice 'foo % %', idlist,q;
for rec in execute(q) loop
return next rec;
end loop;
end; $$
language 'plpgsql';
select * from so_example(3);
If you're talking about an actual PostgreSQL array, use
SELECT * FROM people WHERE NOT id = ANY ('{1,3}')
Just remove the square braces:
WHERE id NOT IN (1,2)
As a bonus, your NOT IN clause could be populated with a subquery that would look like:
Where id not in (select id from sometable where some_field = somevalue)
You could also do a dynamic string concatenation to generate a comma-separated set and inject it into an ad hoc query.