I have a function where I want to create a table for a every year based on the year from bill date which I will be looping.
CREATE OR REPLACE FUNCTION ccdb.ccdb_archival()
RETURNS void AS
$BODY$
DECLARE dpsql text;
DECLARE i smallint;
BEGIN
FOR i IN SELECT DISTINCT EXTRACT(year FROM bill_date) FROM ccdb.bills ORDER BY 1 LOOP
DO $$
BEGIN
CREATE TABLE IF NOT EXISTS ccdb_archival.bills||i (LIKE ccdb.bills INCLUDING ALL);
BEGIN
ALTER TABLE ccdb_archival.bills ADD COLUMN archival_date timestamp;
EXCEPTION
WHEN duplicate_column THEN RAISE NOTICE 'column archival_date already exists in <table_name>.';
END;
END;
$$;
INSERT INTO ccdb_archival.bills
SELECT *, now() AS archival_date
FROM ccdb.bills
WHERE bill_date::date >= current_date - interval '3 years' AND bill_date::date < current_date - interval '8 years';
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
I want to concatenate the year with the actual table name for each year.
I am unable to do the same with the above code. I get an error:
ERROR: syntax error at or near "||"
LINE 3: CREATE TABLE IF NOT EXISTS ccdb_archival.bills||i (LI...
Please suggest how do I achieve my requirement.
you cannot compose strings with metadata. You should utilize execute: http://www.postgresql.org/docs/9.1/static/ecpg-sql-execute-immediate.html
To create N tables with a prefix use this script.
This code uses a for loop and variable to creates 10 table starting with prefix 'sbtest' namely sbtest1, sbtest2 ... sbtest10
create_table.sql
do $$
DECLARE myvar integer;
begin
for myvar in 1..10 loop
EXECUTE format('CREATE TABLE sbtest%s (
id SERIAL NOT NULL,
k INTEGER NOT NULL,
c CHAR(120) NOT NULL,
pad CHAR(60) NOT NULL,
PRIMARY KEY (id))', myvar);
end loop;
end; $$
Run it using psql -U user_name -d database_name -f create_table.sql
Example Table sbtest1 is as
id | k | c | pad
----+---+---+-----
(0 rows)
Table "public.sbtest1"
Column | Type | Collation | Nullable | Default | Storage | Stats
target | Description
--------+----------------+-----------+----------+-------------------------------------+----------+------
--------+-------------
id | integer | | not null | nextval('sbtest1_id_seq'::regclass) | plain |
|
k | integer | | not null | | plain |
|
c | character(120) | | not null | | extended |
|
pad | character(60) | | not null | | extended |
|
Indexes:
"sbtest1_pkey" PRIMARY KEY, btree (id)
Access method: heap
Related
I've been wanting a reason to try out CREATE AGGREGATE, and now have one: Root Mean Square/Quadratic Mean. I posted some broken code, that I've corrected, based on helpful suggestions from jjanes. Here's the working setup, with my custom tools and types schemas...you could use your own.
Now that it's working, I'm finding that the custom aggregate is dramatically slower than raw SQL. The grouping field is indexed, the aggregated field is not. Is this speed difference to be expected, and can it be overcome in SQL or PL/PgSQL?
First, here's the working code:
------------------------------------------------------
-- Create compound type to pass down processing chain
------------------------------------------------------
DROP TYPE types.rms_state CASCADE;
CREATE TYPE types.rms_state AS (
running_count int4,
running_sum_squares int4
);
------------------------------------------------------
-- Create the per-row function
------------------------------------------------------
DROP FUNCTION IF EXISTS tools.rms_row_function(types.rms_state, int4);
CREATE FUNCTION tools.rms_row_function (
rms_data_in types.rms_state,
value_from_row int4
)
RETURNS types.rms_state
LANGUAGE plpgsql
IMMUTABLE
STRICT
AS $BODY$
DECLARE
rms_data_out types.rms_state;
BEGIN
-- RAISE NOTICE 'rms_row_function: rms_data_in: %', rms_data_in::text;
rms_data_out.running_count := rms_data_in.running_count + 1;
rms_data_out.running_sum_squares := rms_data_in.running_sum_squares + (value_from_row ^ 2);
RETURN rms_data_out;
END;
$BODY$;
------------------------------------------------------
-- Create the final results function
------------------------------------------------------
DROP FUNCTION IF EXISTS tools.rms_result_function(types.rms_state);
CREATE FUNCTION tools.rms_result_function (
rms_data_in types.rms_state
)
RETURNS real
LANGUAGE plpgsql
IMMUTABLE
STRICT
AS $BODY$
DECLARE
rms_out real;
BEGIN
-- RAISE NOTICE 'rms_result_function: rms_data_in: %', rms_data_in::text;
IF (rms_data_in.running_count = 0) THEN
rms_out := 0;
ELSE
rms_out := (rms_data_in.running_sum_squares / rms_data_in.running_count)::real;
rms_out := rms_out ^ 0.5; -- Get the square root and return it
END IF;
RETURN rms_out;
END;
$BODY$;
------------------------------------------------------
-- Create the aggregate bindings/declaration
------------------------------------------------------
CREATE AGGREGATE tools.rms (int4)
(
sfunc = tools.rms_row_function,
finalfunc = tools.rms_result_function,
stype = types.rms_state,
FINALFUNC_MODIFY = READ_WRITE,
initcond = '(0,0)' -- Reset on each group, must be a textual version of state data.
);
I'm using a field named analytic_productivity.num_inst in my example, but it could be any int4 field. Here's a stripped-down table declation:
CREATE TABLE IF NOT EXISTS data.analytic_productivity (
id uuid NOT NULL DEFAULT NULL,
facility_id uuid NOT NULL DEFAULT NULL,
num_inst integer NOT NULL DEFAULT 0,
);
The facility table is included in the query for a name lookup:
select facility.name_ as facility_name,
sqrt(avg(power(num_inst, 2))) as inst_rms, -- root mean square/quadratic mean,
rms(num_inst) as inst_rms_check
from analytic_productivity
left join facility on facility.id = analytic_productivity.facility_id
group by 1
order by 1
Below are some sample results.
+-----------------+--------------------+----------------+
| facility_name | inst_rms | inst_rms_check |
+-----------------+--------------------+----------------+
| Anderson | 5.191804567965901 | 5.0990195 |
| Baldwin North | 42.24082451064157 | 42.237423 |
| Curvey | 41.75334367003306 | 41.749252 |
| Daodge Creeek | 28.75910443926612 | 28.757608 |
| Edgards | 42.430040392954375 | 42.426407 |
+-------------------------+--------------------+--------+
I'm not alarmed about the slight difference in scores, as I'm using a real, which only supports six decimals.
I have a table with partially consecutive integer ids, i.e. there are blocks such as 1,2,3, 6,7,8, 10, 23,24,25,26.
the gap size is dynamic
the length of the blocks is dynamic
I am breaking my head about a simple solution that selects from the table
and includes a column where the value corresponds to the first id of the respective block.
I.e. something like this
select id, first(id) over <what goes here?> first from table;
The result should look as following
| id | first |
|----|-------|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 6 | 6 |
| 7 | 6 |
| 8 | 6 |
| 10 | 10 |
| 23 | 23 |
| 24 | 23 |
| 25 | 23 |
| 26 | 23 |
Afterwards i could use this column nicely with the partition by window function clause.
What I came up with so far always looked similar to this and didn't succeed:
WITH foo AS (
SELECT LAG(id) OVER (ORDER BY id) AS previous_id,
id AS id,
id - LAG(id, 1, id) OVER (ORDER BY id) AS first_in_sequence
FROM table)
SELECT *,
FIRST_VALUE(id) OVER (ORDER BY id) AS first
FROM foo
ORDER BY id;
Defining a custom postgres function would also be an acceptable solution.
Thanks for any advice,
Marti
In Postgres you can create a custom aggregate. Example:
create or replace function first_in_series_func(int[], int)
returns int[] language sql immutable
as $$
select case
when $1[2] is distinct from $2- 1 then array[$2, $2]
else array[$1[1], $2] end;
$$;
create or replace function first_in_series_final(int[])
returns int language sql immutable
as $$
select $1[1]
$$;
create aggregate first_in_series(int) (
sfunc = first_in_series_func,
finalfunc = first_in_series_final,
stype = int[]
);
Db<>fiddle.
Read in the docs: User-Defined Aggregates
Here is an idea how this could be done. An implicit cursor is not horribly efficient though.
create or replace function ff()
returns table (r_id integer, r_first integer)
language plpgsql as
$$
declare
running_previous integer;
running_id integer;
running_first integer := null;
begin
for running_id in select id from _table order by id loop
if running_previous is distinct from running_id - 1 then
running_first := running_id;
end if;
r_id := running_id;
r_first := running_first;
running_previous := running_id;
return next;
end loop;
end
$$;
-- test
select * from ff() as t(id, first);
I created a function that doesn't work well. It doesn't enter the loop and I don't understand why.
My func:
CREATE OR REPLACE FUNCTION Control_Reports_Pg.control_reports_fn (P_Report_Type smallint, P_Log_File_Name text,C_Path text) RETURNS bigint AS $body$
DECLARE
V_Return smallint;
V_Function_Return smallint:=1;
C_Daily_Reports varchar(400);
C_Function_Name varchar(200) := 'Control_Reports_Fn';
Rec_Daily_Reports CONTROL_REPORTS%ROWTYPE;
BEGIN
C_Daily_Reports := 'SELECT REPORT_ORDER,PROCEDURE_NAME,DIRECTORY_NAME,FILE_NAME,TITLE FROM CONTROL_REPORTS WHERE RUN_FLAG=1::bigint AND REPORT_TYPE=' || P_Report_Type || '::smallint ORDER BY REPORT_ORDER ';
RAISE NOTICE 'sql to run over in loop : %',C_Daily_Reports;
FOR Rec_Daily_Reports IN EXECUTE C_Daily_Reports
LOOP
RAISE NOTICE 'INSIDE LOOP OF CONTROL_REPORTS, Procedure_name : %, File_name : %, Directory_name : %',Rec_Daily_Reports.Directory_Name,Rec_Daily_Reports.File_Name, Rec_Daily_Reports.Title;
END LOOP;
........
mydb=> \d control_reports;
Table "mysc.control_reports"
Column | Type | Modifiers
----------------+------------------------+-----------
report_order | bigint |
report_type | smallint |
procedure_name | character varying(100) |
directory_name | character varying(100) |
file_name | character varying(100) |
title | character varying(500) |
run_flag | bigint |
The errors I get when I run the func from psql:
mysc=> select control_reports_pg.control_reports_fn(1::smallint ,
'daily_log_control_file.txt'::text,'/PostgreSQL/comb_logs'::text);
NOTICE: sql to run over in loop : SELECT
REPORT_ORDER,PROCEDURE_NAME,DIRECTORY_NAME,FILE_NAME,TITLE FROM CONTROL_REPORTS
WHERE RUN_FLAG=1::bigint AND REPORT_TYPE=1::smallint ORDER BY REPORT_ORDER
NOTICE: FUNC : Control_Reports_Fn, SQLERRM: invalid input syntax for
integer: "Chrg_in_b"
When I run the select in psql I don't get any errors and I get a result. Chrg_in_b is the value of the column Procedure_name of the first row that I get back from the select query. (If I drop the ORDER BY I just get a different procedure_name but same error).
I just finished writing my first PLSQL function. Here what it does.
The SQL function attempt to reset the duplicate timestamp to NULL.
From table call_records find all timestamp that are duplicated.(using group by)
loop through each timestamp.Find all record with same timestamp (times-1, so that only 1 record for a given times is present)
From all the records found in step 2 update the timestamp to NULL
Here how the SQL function looks like.
CREATE OR REPLACE FUNCTION nullify() RETURNS INTEGER AS $$
DECLARE
T call_records.timestamp%TYPE;
-- Not sure why row_type does not work
-- R call_records%ROWTYPE;
S integer;
CRNS bigint[];
TMPS bigint[];
sql_stmt varchar = '';
BEGIN
FOR T,S IN (select timestamp,count(timestamp) as times from call_records where timestamp IS NOT NULL group by timestamp having count(timestamp) > 1)
LOOP
sql_stmt := format('SELECT ARRAY(select plain_crn from call_records where timestamp=%s limit %s)',T,S-1);
EXECUTE sql_stmt INTO TMPS;
CRNS := array_cat(CRNS,TMPS);
END LOOP;
sql_stmt = format('update call_records set timestamp=null where plain_crn in (%s)',array_to_string(CRNS,','));
RAISE NOTICE '%',sql_stmt;
EXECUTE sql_stmt ;
RETURN 1;
END
$$ LANGUAGE plpgsql;
Help me understand more PL/pgSQL language my suggesting me how it can be done better.
#a_horse_with_no_name: Here how the DB structure looks like
\d+ call_records;
id integer primary key
plain_crn bigint
timestamp bigint
efd integer default 0
id | efd | plain_crn | timestamp
----------+------------+------------+-----------
1 | 2016062936 | 8777444059 | 14688250050095
2 | 2016062940 | 8777444080 | 14688250050095
3 | 2016063012 | 8880000000 | 14688250050020
4 | 2016043011 | 8000000000 | 14688240012012
5 | 2016013011 | 8000000001 | 14688250050020
6 | 2016022011 | 8440000001 |
Now,
select timestamp,count(timestamp) as times from call_records where timestamp IS NOT NULL group by timestamp having count(timestamp) > 1
timestamp | count
-----------------+-----------
14688250050095 | 2
14688250050020 | 2
All that I want is to update the duplicate timestamp to null so that only one of them record has the given timestamp.
In short the above query should return result like this
select timestamp,count(timestamp) as times from call_records where timestamp IS NOT NULL group by timestamp;
timestamp | count
-----------------+-----------
14688250050095 | 1
14688250050020 | 1
You can use array variables directly (filter with predicate =ANY() - using dynamic SQL is wrong for this purpose:
postgres=# DO $$
DECLARE x int[] = '{1,2,3}';
result int[];
BEGIN
SELECT array_agg(v)
FROM generate_series(1,10) g(v)
WHERE v = ANY(x)
INTO result;
RAISE NOTICE 'result is: %', result;
END;
$$;
NOTICE: result is: {1,2,3}
DO
Next - this is typical void function - it doesn't return any interesting. Usually these functions returns nothing when all is ok or raises exception. The returning 1 RETURN 1 is useless.
CREATE OR REPLACE FUNCTION foo(par int)
RETURNS void AS $$
BEGIN
IF EXISTS(SELECT * FROM footab WHERE id = par)
THEN
...
ELSE
RAISE EXCEPTION 'Missing data for parameter: %', par;
END IF;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION getParentLtree(parent_id bigint, tbl_name varchar)
RETURNS ltree AS
$BODY$
DECLARE
parent_ltree ltree;
BEGIN
-- This works fine:
-- select into parent_ltree l_tree from tbl1 where id = parent_id;
EXECUTE format('select into parent_ltree l_tree from %I
where id = %I', tbl_name,parent_id);
RETURN parent_ltree;
END;
$BODY$ LANGUAGE plpgsql;
There are 2 issues in above function:
parent_id is integer but it is replaced with quotes? What is the correct format specifier for int variables?
select into does not work with EXECUTE? How can I make above commented query to use table name passed?
This would be shorter, faster and safer:
CREATE OR REPLACE FUNCTION get_parent_ltree(parent_id bigint, tbl_name regclass
, OUT parent_ltree ltree)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE format('SELECT l_tree FROM %s WHERE id = $1', tbl_name)
INTO parent_ltree
USING parent_id;
END
$func$;
Why?
Most importantly, use the USING clause of EXECUTE for parameter values. Don't convert them to text, concatenate and interpret them back. That would be slower and error-prone.
Normally you would use the %I specifier with format() for identifiers like the table name. For existing tables, a regclass object-identifier type may be even better. See:
Table name as a PostgreSQL function parameter
The OUT parameter makes it simpler. Performance is the same.
Don't use unquoted CaMeL case identifiers like getParentLtree in Postgres. Details in the manual.
Use %s for strings. %I is for identifiers:
select format('select into parent_ltree l_tree from %I where id = %s', 'tbl1', 1);
format
---------------------------------------------------------
select into parent_ltree l_tree from tbl1 where id = 1
http://www.postgresql.org/docs/current/static/functions-string.html#FUNCTIONS-STRING-FORMAT
PL/pgSQL's select into is not the same as Postgresql's select into. Use instead create table as:
create table parent_ltree as
select l_tree
from tbl1
where id = 1
http://www.postgresql.org/docs/current/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROW
Tip: Note that this interpretation of SELECT with INTO is quite different from PostgreSQL's regular SELECT INTO command, wherein the INTO target is a newly created table. If you want to create a table from a SELECT result inside a PL/pgSQL function, use the syntax CREATE TABLE ... AS SELECT.
To select into a variable from an execute statement:
EXECUTE format('select l_tree from %I where id = %s', tbl_name,parent_id)
into parent_ltree;
http://www.postgresql.org/docs/current/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN
Following postgres uses a for loop and variable to creates 10 table starting with prefix 'sbtest' namely sbtest1, sbtest2 ... sbtest10
create_table.sql
do $$
DECLARE myvar integer;
begin
for myvar in 1..10 loop
EXECUTE format('CREATE TABLE sbtest%s (
id SERIAL NOT NULL,
k INTEGER NOT NULL,
c CHAR(120) NOT NULL,
pad CHAR(60) NOT NULL,
PRIMARY KEY (id))', myvar);
end loop;
end; $$
Run it using psql -U user_name -d database_name -f create_table.sql
Example Table sbtest1 is as
id | k | c | pad
----+---+---+-----
(0 rows)
Table "public.sbtest1"
Column | Type | Collation | Nullable | Default | Storage | Stats
target | Description
--------+----------------+-----------+----------+-------------------------------------+----------+------
--------+-------------
id | integer | | not null | nextval('sbtest1_id_seq'::regclass) | plain |
|
k | integer | | not null | | plain |
|
c | character(120) | | not null | | extended |
|
pad | character(60) | | not null | | extended |
|
Indexes:
"sbtest1_pkey" PRIMARY KEY, btree (id)
Access method: heap