Postgresql dynamic dataset aggregation - postgresql

I'm trying to aggregate multiple datasets where I have to be able to create dynamic rules on how to aggregate the data on an id basis. Below is a quick approach I wrote up that seems to do what I intended. Is there a more performant (and preferably safe) way of doing this. tbl1...n will be larger in size than agg_rule. I'm running postgres 13.0, if there are new features coming in 14 that'll help, that could also be of interest. I am only interesting in coalescing and sum:ing like below if that simplifies the problem.
CREATE TABLE tbl1 AS
SELECT 1 id
, 1 seq
, 1 val;
CREATE TABLE tbl2 AS
SELECT 1 id
, 1 seq
, 1 val;
CREATE TABLE tbl3 AS
SELECT 1 id
, 1 seq
, 1 val;
CREATE TABLE agg_rule AS
SELECT 1 id
, 'coalesce(tbl1.val, tbl2.val + tbl3.val)' expr;
CREATE OR REPLACE FUNCTION eval(_id INTEGER, _seq INTEGER)
RETURNS INTEGER
LANGUAGE plpgsql
AS $$
DECLARE _res INTEGER;
BEGIN
EXECUTE 'SELECT ' || (
SELECT expr
FROM agg_rule
WHERE id = _id
) || '
FROM tbl1
FULL JOIN tbl2 USING (id, seq)
FULL JOIN tbl3 USING (id, seq)
WHERE id = ' || _id || '
AND seq = ' || _seq || ';' INTO _res;
RETURN _res;
END
$$;

Related

PgSQL function returning table and extra data computed in process

In PgSQL I make huge select, and then I want count it's size and apply some extra filters.
execute it twice sound dumm,
so I wrapped it in function
and then "cache" it and return union of filtered table and extra row at the end where in "id" column store size
with q as (select * from myFunc())
select * from q
where q.distance < 400
union all
select count(*) as id, null,null,null
from q
but it also doesn't look like proper solution...
and so the question: is in pg something like "generator function" or any other stuff that can properly solve this ?
postgreSQL 13
myFunc aka "selectItemsByRootTag"
CREATE OR REPLACE FUNCTION selectItemsByRootTag(
in tag_name VARCHAR(50)
)
RETURNS table(
id BIGINT,
name VARCHAR(50),
description TEXT,
/*info JSON,*/
distance INTEGER
)
AS $$
BEGIN
RETURN QUERY(
WITH RECURSIVE prod AS (
SELECT
tags.name, tags.id, tags.parent_tags
FROM
tags
WHERE tags.name = (tags_name)
UNION
SELECT c.name, c.id , c.parent_tags
FROM
tags as c
INNER JOIN prod as p
ON c.parent_tags = p.id
)
SELECT
points.id,
points.name,
points.description,
/*points.info,*/
points.distance
from points
left join tags on points.tag_id = tags.id
where tags.name in (select prod.name from prod)
);
END;
$$ LANGUAGE plpgsql;
as a result i want see maybe set of 2 table or generator function that yield some intermediate result not shure how exacltly it should look
demo
CREATE OR REPLACE FUNCTION pg_temp.selectitemsbyroottag(tag_name text, _distance numeric)
RETURNS TABLE(id bigint, name text, description text, distance numeric, count bigint)
LANGUAGE plpgsql
AS $function$
DECLARE _sql text;
BEGIN
_sql := $p1$WITH RECURSIVE prod AS (
SELECT
tags.name, tags.id, tags.parent_tags
FROM
tags
WHERE tags.name ilike '%$p1$ || tag_name || $p2$%'
UNION
SELECT c.name, c.id , c.parent_tags
FROM
tags as c
INNER JOIN prod as p
ON c.parent_tags = p.id
)
SELECT
points.id,
points.name,
points.description,
points.distance,
count(*) over ()
from points
left join tags on points.tag_id = tags.id
where tags.name in (select prod.name from prod)
and points.distance > $p2$ || _distance
;
raise notice '_sql: %', _sql;
return query execute _sql;
END;
$function$
You can call it throug following way
select * from pg_temp.selectItemsByRootTag('test',20);
select * from pg_temp.selectItemsByRootTag('test_8',20) with ORDINALITY;
The 1 way to call the function, will have a row of total count total number of rows. Second way call have number of rows plus a serial incremental number.
I also make where q.distance < 400 into function input argument.
selectItemsByRootTag('test',20); means that q.distance > 20 and tags.name ilike '%test%'.

How to use a recursive query inside of a function?

I have a working recursive function down below that returns the hierarchy of a .Net Element, in this example 'ImageBrush'
WITH RECURSIVE r AS (
SELECT id, name, dependent_id, 1 AS level
FROM dotNetHierarchy
WHERE name = 'ImageBrush'
UNION ALL
SELECT g.id, lpad(' ', r.level) || g.name, g.dependent_id, r.level + 1
FROM dotNetHierarchy g JOIN r ON g.id = r.dependent_id
)
SELECT name FROM r;
Now I want to use it inside a function, where the input is a text like 'ImageBrush' which gets inserted in the WHERE statement of the recursive query.
The table dotNetHierarchy has 3 columns: id, name, dependent_id
The following didnĀ“t work with PgAdmin just querying forever without an error:
CREATE OR REPLACE FUNCTION get_hierarchy(inp text) RETURNS
TABLE (id int, name text, id int, lev int) AS
$BODY$
DECLARE
BEGIN
WITH RECURSIVE r AS (
SELECT id, name, dependent_id, 1 AS level
FROM dotNetHierarchy
WHERE name = inp
UNION ALL
SELECT g.id, lpad(' ', r.level) || g.name, g.dependent_id, r.level + 1
FROM dotNetHierarchy g JOIN r ON g.id = r.dependent_id
)
SELECT name FROM r;
return;
END
$BODY$
LANGUAGE plpgsql;
I tried searching google for using recursive querys in functions, but the few results proved not to be working.
I would appreciate any help, thanks in advance.
Your function doesn't return anything. You would at least need a return query, but it's more efficient to use a language sql function to encapsulate a query:
CREATE OR REPLACE FUNCTION get_hierarchy(inp text)
RETURNS TABLE (id int, name text, id int, lev int) AS
$BODY$
WITH RECURSIVE r AS (
SELECT id, name, dependent_id, 1 AS level
FROM dotNetHierarchy
WHERE name = inp
UNION ALL
SELECT g.id, lpad(' ', r.level) || g.name, g.dependent_id, r.level + 1
FROM dotNetHierarchy g JOIN r ON g.id = r.dependent_id
)
SELECT name
FROM r;
$BODY$
LANGUAGE sql;

I am getting Dollar sign unterminated

I want to create a function like below which inserts data as per the input given. But I keep on getting an error about undetermined dollar sign.
CREATE OR REPLACE FUNCTION test_generate
(
ref REFCURSOR,
_id INTEGER
)
RETURNS refcursor AS $$
DECLARE
BEGIN
DROP TABLE IF EXISTS test_1;
CREATE TEMP TABLE test_1
(
id int,
request_id int,
code text
);
IF _id IS NULL THEN
INSERT INTO test_1
SELECT
rd.id,
r.id,
rd.code
FROM
test_2 r
INNER JOIN
raw_table rd
ON
rd.test_2_id = r.id
LEFT JOIN
observe_test o
ON
o.raw_table_id = rd.id
WHERE o.id IS NULL
AND COALESCE(rd.processed, 0) = 0;
ELSE
INSERT INTO test_1
SELECT
rd.id,
r.id,
rd.code
FROM
test_2 r
INNER JOIN
raw_table rd
ON rd.test_2_id = r.id
WHERE r.id = _id;
END IF;
DROP TABLE IF EXISTS tmp_test_2_error;
CREATE TEMP TABLE tmp_test_2_error
(
raw_table_id int,
test_2_id int,
error text,
record_num int
);
INSERT INTO tmp_test_2_error
(
raw_table_id,
test_2_id,
error,
record_num
)
SELECT DISTINCT
test_1.id,
test_1.test_2_id,
'Error found ' || test_1.code,
0
FROM
test_1
WHERE 1 = 1
AND data_origin.id IS NULL;
INSERT INTO tmp_test_2_error
SELECT DISTINCT
test_1.id,
test_1.test_2_id,
'Error found ' || test_1.code,
0
FROM
test_1
INNER JOIN
data_origin
ON
data_origin.code = test_1.code
WHERE dop.id IS NULL;
DROP table IF EXISTS test_latest;
CREATE TEMP TABLE test_latest AS SELECT * FROM observe_test WHERE 1 = 2;
INSERT INTO test_latest
(
raw_table_id,
series_id,
timestamp
)
SELECT
test_1.id,
ds.id AS series_id,
now()
FROM
test_1
INNER JOIN data_origin ON data_origin.code = test_1.code
LEFT JOIN
observe_test o ON o.raw_table_id = test_1.id
WHERE o.id IS NULL;
CREATE TABLE latest_observe_test as Select * from test_latest where 1=0;
INSERT INTO latest_observe_test
(
raw_table_id,
series_id,
timestamp,
time
)
SELECT
t.id,
ds.id AS series_id,
now(),
t.time
FROM
test_latest t
WHERE t.series_id IS DISTINCT FROM observe_test.series_id;
DELETE FROM test_2_error re
USING t
WHERE t.test_2_id = re.test_2_id;
INSERT INTO test_2_error (test_2_id, error, record_num)
SELECT DISTINCT test_2_id, error, record_num FROM tmp_test_2_error ORDER BY error;
UPDATE raw_table AS rd1
SET processed = case WHEN tre.raw_table_id IS null THEN 2 ELSE 1 END
FROM test_1 tr
LEFT JOIN
tmp_test_2_error tre ON tre.raw_table_id = tr.id
WHERE rd1.id = tr.id;
OPEN ref FOR
SELECT 1;
RETURN ref;
OPEN ref for
SELECT o.* from observe_test o
;
RETURN ref;
OPEN ref FOR
SELECT
rd.id,
ds.id AS series_id,
now() AS timestamp,
rd.time
FROM test_2 r
INNER JOIN raw_table rd ON rd.test_2_id = r.id
INNER JOIN data_origin ON data_origin.code = rd.code
WHERE o.id IS NULL AND r.id = _id;
RETURN ref;
END;
$$ LANGUAGE plpgsql VOLATILE COST 100;
I am not able to run this procedure.
Can you please help me where I have done wrong?
I am using squirrel and face the same question as you.
until I found that:
-- Note that if you want to create the function under Squirrel SQL,
-- you must go to Sessions->Session Properties
-- then SQL tab and change the Statement Separator from ';' to something else
-- (for intance //). Otherwise Squirrel SQL sends one piece to the server
-- that stops at the first encountered ';', and the server cannot make
-- sense of it. With the separator changed as suggested, you type everything
-- as above and end with
-- ...
-- end;
-- $$ language plpgsql
-- //
--
-- You can then restore the default separator, or use the new one for
-- all queries ...
--

Postgres find all rows in database tables matching criteria on a given column

I am trying to write sub-queries so that I search all tables for a column named id and since there are multiple tables with id column, I want to add the condition, so that id = 3119093.
My attempt was:
Select *
from information_schema.tables
where id = '3119093' and id IN (
Select table_name
from information_schema.columns
where column_name = 'id' );
This didn't work so I tried:
Select *
from information_schema.tables
where table_name IN (
Select table_name
from information_schema.columns
where column_name = 'id' and 'id' IN (
Select * from table_name where 'id' = 3119093));
This isn't the right way either. Any help would be appreciated. Thanks!
A harder attempt is:
CREATE OR REPLACE FUNCTION search_columns(
needle text,
haystack_tables name[] default '{}',
haystack_schema name[] default '{public}'
)
RETURNS table(schemaname text, tablename text, columnname text, rowctid text)
AS $$
begin
FOR schemaname,tablename,columnname IN
SELECT c.table_schema,c.table_name,c.column_name
FROM information_schema.columns c
JOIN information_schema.tables t ON
(t.table_name=c.table_name AND t.table_schema=c.table_schema)
WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
AND c.table_schema=ANY(haystack_schema)
AND t.table_type='BASE TABLE'
--AND c.column_name = "id"
LOOP
EXECUTE format('SELECT ctid FROM %I.%I WHERE cast(%I as text) like %L',
schemaname,
tablename,
columnname,
needle
) INTO rowctid;
IF rowctid is not null THEN
RETURN NEXT;
END IF;
END LOOP;
END;
$$ language plpgsql;
select * from search_columns('%3119093%'::varchar,'{}'::name[]) ;
The only problem is this code displays the table name and column name. I have to then manually enter
Select * from table_name where id = 3119093
where I got the table name from the code above.
I want to automatically implement returning rows from a table but I don't know how to get the table name automatically.
I took the time to make it work for you.
For starters, some information on what is going on inside the code.
Explanation
function takes two input arguments: column name and column value
it requires a created type that it will be returning a set of
first loop identifies tables that have a column name specified as the input argument
then it forms a query which aggregates all rows that match the input condition inside every table taken from step 3 with comparison based on ILIKE - as per your example
function goes into the second loop only if there is at least one row in currently visited table that matches specified condition (then the array is not null)
second loop unnests the array of rows that match the condition and for every element it puts it in the function output with RETURN NEXT rec clause
Notes
Searching with LIKE is inefficient - I suggest adding another input argument "column type" and restrict it in the lookup by adding a join to pg_catalog.pg_type table.
The second loop is there so that if more than 1 row is found for a particular table, then every row gets returned.
If you are looking for something else, like you need key-value pairs, not just the values, then you need to extend the function. You could for example build json format from rows.
Now, to the code.
Test case
CREATE TABLE tbl1 (col1 int, id int); -- does contain values
CREATE TABLE tbl2 (col1 int, col2 int); -- doesn't contain column "id"
CREATE TABLE tbl3 (id int, col5 int); -- doesn't contain values
INSERT INTO tbl1 (col1, id)
VALUES (1, 5), (1, 33), (1, 25);
Table stores data:
postgres=# select * From tbl1;
col1 | id
------+----
1 | 5
1 | 33
1 | 25
(3 rows)
Creating type
CREATE TYPE sometype AS ( schemaname text, tablename text, colname text, entirerow text );
Function code
CREATE OR REPLACE FUNCTION search_tables_for_column (
v_column_name text
, v_column_value text
)
RETURNS SETOF sometype
LANGUAGE plpgsql
STABLE
AS
$$
DECLARE
rec sometype%rowtype;
v_row_array text[];
rec2 record;
arr_el text;
BEGIN
FOR rec IN
SELECT
nam.nspname AS schemaname
, cls.relname AS tablename
, att.attname AS colname
, null::text AS entirerow
FROM
pg_attribute att
JOIN pg_class cls ON att.attrelid = cls.oid
JOIN pg_namespace nam ON cls.relnamespace = nam.oid
WHERE
cls.relkind = 'r'
AND att.attname = v_column_name
LOOP
EXECUTE format('SELECT ARRAY_AGG(row(tablename.*)::text) FROM %I.%I AS tablename WHERE %I::text ILIKE %s',
rec.schemaname, rec.tablename, rec.colname, quote_literal(concat('%',v_column_value,'%'))) INTO v_row_array;
IF v_row_array is not null THEN
FOR rec2 IN
SELECT unnest(v_row_array) AS one_row
LOOP
rec.entirerow := rec2.one_row;
RETURN NEXT rec;
END LOOP;
END IF;
END LOOP;
END
$$;
Exemplary call & output
postgres=# select * from search_tables_for_column('id','5');
schemaname | tablename | colname | entirerow
------------+-----------+---------+-----------
public | tbl1 | id | (1,5)
public | tbl1 | id | (1,25)
(2 rows)

Postgresql, select a "fake" row

In Postgres 8.4 or higher, what is the most efficient way to get a row of data populated by defaults without actually creating the row. Eg, as a transaction (pseudocode):
create table "mytable"
(
id serial PRIMARY KEY NOT NULL,
parent_id integer NOT NULL DEFAULT 1,
random_id integer NOT NULL DEFAULT random(),
)
begin transaction
fake_row = insert into mytable (id) values (0) returning *;
delete from mytable where id=0;
return fake_row;
end transaction
Basically I'd expect a query with a single row where parent_id is 1 and random_id is a random number (or other function return value) but I don't want this record to persist in the table or impact on the primary key sequence serial_id_seq.
My options seem to be using a transaction like above or creating views which are copies of the table with the fake row added but I don't know all the pros and cons of each or whether a better way exists.
I'm looking for an answer that assumes no prior knowledge of the datatypes or default values of any column except id or the number or ordering of the columns. Only the table name will be known and that a record with id 0 should not exist in the table.
In the past I created the fake record 0 as a permanent record but I've come to consider this record a type of pollution (since I typically have to filter it out of future queries).
You can copy the table definition and defaults to the temp table with:
CREATE TEMP TABLE table_name_rt (LIKE table_name INCLUDING DEFAULTS);
And use this temp table to generate dummy rows. Such table will be dropped at the end of the session (or transaction) and will only be visible to current session.
You can query the catalog and build a dynamic query
Say we have this table:
create table test10(
id serial primary key,
first_name varchar( 100 ),
last_name varchar( 100 ) default 'Tom',
age int not null default 38,
salary float default 100.22
);
When you run following query:
SELECT string_agg( txt, ' ' order by id )
FROM (
select 1 id, 'SELECT ' txt
union all
select 2, -9999 || ' as id '
union all
select 3, ', '
|| coalesce( column_default, 'null'||'::'||c.data_type )
|| ' as ' || c.column_name
from information_schema.columns c
where table_schema = 'public'
and table_name = 'test10'
and ordinal_position > 1
) xx
;
you will get this sting as a result:
"SELECT -9999 as id , null::character varying as first_name ,
'Tom'::character varying as last_name , 38 as age , 100.22 as salary"
then execute this query and you will get the "phantom row".
We can build a function that build and excecutes the query and return our row as a result:
CREATE OR REPLACE FUNCTION get_phantom_rec (p_i test10.id%type )
returns test10 as $$
DECLARE
v_sql text;
myrow test10%rowtype;
begin
SELECT string_agg( txt, ' ' order by id )
INTO v_sql
FROM (
select 1 id, 'SELECT ' txt
union all
select 2, p_i || ' as id '
union all
select 3, ', '
|| coalesce( column_default, 'null'||'::'||c.data_type )
|| ' as ' || c.column_name
from information_schema.columns c
where table_schema = 'public'
and table_name = 'test10'
and ordinal_position > 1
) xx
;
EXECUTE v_sql INTO myrow;
RETURN myrow;
END$$ LANGUAGE plpgsql ;
and then this simple query gives you what you want:
select * from get_phantom_rec ( -9999 );
id | first_name | last_name | age | salary
-------+------------+-----------+-----+--------
-9999 | | Tom | 38 | 100.22
I would just select the fake values as literals:
select 1 id, 1 parent_id, 1 user_id
The returned row will be (virtually) indistinguishable from a real row.
To get the values from the catalog:
select
0 as id, -- special case for serial type, just return 0
(select column_default::int -- Cast to int, because we know the column is int
from INFORMATION_SCHEMA.COLUMNS
where table_name = 'mytable'
and column_name = 'parent_id') as parent_id,
(select column_default::int -- Cast to int, because we know the column is int
from INFORMATION_SCHEMA.COLUMNS
where table_name = 'mytable'
and column_name = 'user_id') as user_id;
Note that you must know what the columns are and their type, but this is reasonable. If you change the table schema (except default value), you would need to tweak the query.
See the above as a SQLFiddle.