In PostgreSQL, how to execute a recursive query with "dynamics" clauses ?
I'm using a recursive query because my data is a hierarchical model. An object can have children, themselves can also have children...
.
My goal is to find the last children of a searched object. My recursive query use a specific value in the where clause (A in this exemple, the searched object). This values are stored in an other table (called my_table here). An other table store every relations between object (A to B, A to C, A to D, D to E, D to F...), called my_table_filiation here. I need to repeat all this recursive query for each distinct values of my_table.cadastral_reference (A, B, C, D, E, F).
In other words how can I dynamically change a clause in a recursive query and run it for every values of a distinct table ?
Tables are like this :
CREATE TABLE IF NOT EXISTS my_table
(
id integer NOT NULL DEFAULT nextval('my_table_id_seq'::regclass),
cadastral_reference character varying(14) COLLATE pg_catalog."default",
filiation character varying COLLATE pg_catalog."default",
CONSTRAINT my_table_pkey PRIMARY KEY (id)
)
CREATE TABLE IF NOT EXISTS my_table_filiation
(
id integer NOT NULL DEFAULT nextval('my_table_filiation_id_seq'::regclass),
mother character varying(14) COLLATE pg_catalog."default",
daughter character varying(14) COLLATE pg_catalog."default",
CONSTRAINT my_table_filiation_pkey PRIMARY KEY (id)
)
Recursive query is like this :
WITH RECURSIVE q_filiation (mother, daughter) AS (
SELECT mother, daughter
FROM my_table_filiation
WHERE mother = 'A'
UNION ALL
SELECT p.mother, p.daughter
FROM q_filiation f, my_table_filiation p
WHERE p.mother = f.daughter
)
SELECT
array_agg(DISTINCT mother) AS mother,
array_agg(daughter) AS last_daughter,
FROM q_filiation
Actual result :
mother | last_daughter
--------------------
{A} | {E,F}
Desired results (the where clause declined with every values of my_table.cadastral_reference):
mother | last_daughter
--------------------
{A} | {E,F}
{B} | {}
{C} | {}
{D} | {E,F}
Finally I create a function with my previous query :
CREATE OR REPLACE FUNCTION my_function_filiation(
research_filiation character varying,
OUT mother character varying,
OUT daughter character varying)
RETURNS SETOF record
LANGUAGE 'plpgsql'
COST 100
VOLATILE PARALLEL UNSAFE
ROWS 1000
AS $BODY$
DECLARE
rec record;
sql text;
BEGIN
sql := '
WITH RECURSIVE q_filiation (mother, daughter) AS (
SELECT mother, daughter
FROM my_table_filiation
WHERE mother = '''||research_filiation||'''
UNION ALL
SELECT p.mother, p.daughter
FROM q_filiation f, my_table_filiation p
WHERE p.mother = f.daughter
)
SELECT
array_agg(DISTINCT mother) AS mother,
array_agg(daughter) AS last_daughter,
FROM q_filiation
';
FOR rec IN EXECUTE sql
LOOP
mother := rec.mother;
daughter := rec.daughter;
RETURN NEXT;
END LOOP;
END;
$BODY$;
Now I can use it in queries :
SELECT
my_function_filiation(p.cadastral_reference)
FROM my_table p;
If anyone know better solutions, on method or syntax, feel free to contribute.
Related
Steps for Execution:
Table Creation
CREATE TABLE xyz.table_a(
id bigint NOT NULL,
scores jsonb,
CONSTRAINT table_a_pkey PRIMARY KEY (id)
);
Add some dummy data :
INSERT INTO xyz.table_a(
id, scores)
VALUES (1, '{"a":20,"b":20}');
Function Creation
CREATE OR REPLACE FUNCTION xyz.example(
table_name text,
regular_columns text,
json_column text,
view_name text
) RETURNS text
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
DECLARE
cols TEXT;
cols_sum TEXT;
BEGIN
EXECUTE
format(
$ex$SELECT string_agg(
format(
'CAST(%2$s->>%%1$L AS INTEGER)',
key),
', '
)
FROM (SELECT DISTINCT key
FROM %1$s, jsonb_each(%2$s)
ORDER BY 1
) s;$ex$,
table_name, json_column
)
INTO cols;
EXECUTE
format(
$ex$SELECT string_agg(
format(
'CAST(%2$s->>%%1$L AS INTEGER)',
key
),
'+'
)
FROM (SELECT DISTINCT key
FROM %1$s, jsonb_each(%2$s)
ORDER BY 1) s;$ex$,
table_name, json_column
)
INTO cols_sum;
EXECUTE
format(
$ex$DROP VIEW IF EXISTS %2$s;
CREATE VIEW %2$s AS
SELECT %3$s, %4$s, SUM(%5$s) AS total
FROM %1$s
GROUP BY %3$s$ex$,
table_name, view_name, regular_columns, cols, cols_sum
);
RETURN cols;
END
$BODY$:
Call Function
SELECT xyz.example(
'xyz.table_a',
' id',
'scores',
'xyz.view_table_a'
);
Once you run these steps, I am getting an error
ERROR: column "int4" specified more than once
CONTEXT: SQL statement "
DROP VIEW IF EXISTS xyz.view_table_a;
CREATE VIEW xyz.view_table_a AS
SELECT id, CAST(scores->>'a' AS INTEGER), CAST(scores->>'b' AS INTEGER), SUM(CAST(scores->>'a' AS INTEGER)+CAST(scores->>'b' AS INTEGER)) AS total FROM xyz.table_a GROUP BY id
Look at the error message closely:
...
SELECT id, CAST(scores->>'a' AS INTEGER), CAST(scores->>'b' AS INTEGER),
...
There are multiple expressions without column alias. A named column like "id" defaults to the given name. But other expressions default to the internal type name, which is "int4" for integer. One might assume that the JSON key name is used, but that's not so. CAST(scores->>'a' AS INTEGER) is just another expression returning an unnamed integer value.
This still works for a plain SELECT. Postgres tolerates duplicate column names in the (outer) SELECT list. But a VIEW cannot be created that way. Would result in ambiguities.
Either add column aliases to expressions in the SELECT list:
SELECT id, CAST(scores->>'a' AS INTEGER) AS a, CAST(scores->>'b' AS INTEGER) AS b, ...
Or add a list of column names to CREATE VIEW:
CREATE VIEW xyz.view_table_a(id, a, b, ...) AS ...
Something like this should fix your function (preserving literal spelling of JSON key names:
...
format(
'CAST(%2$s->>%%1$L AS INTEGER) AS %%1$I',
key),
...
See the working demo here:
db<>fiddle here
Aside, your nested format() calls make the code pretty hard to read and maintain.
Detail Question: I have a function which takes inputs in the form of JSON and I would like to either Insert or Update the input into the existing table. Now, if I get multiple inputs, how do I handle.
Table Structure
create table sample ( colA Integer, colB character varying, colC character varying, colD character varying);
create type tt_sample AS
(colA Integer, colB character varying, colC character varying, colD character varying);
Function
CREATE OR REPLACE FUNCTION ins_upd_sample(
tt_sample text)
RETURNS timestamp without time zone
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
DECLARE
BEGIN
with cte as(
INSERT INTO sample(
colA, colB, colC, colD
)
SELECT
tmd.colA, tmd.colB, tmd.colC, tmd.colD
FROM json_populate_recordset(null::tt_sample ,tt_sample::json) tmd
LEFT JOIN sample md ON
md.colA = tmd.colA
AND md.colB = tmd.colB
WHERE md.colB IS NULL
RETURNING * /*Some Usage*/
)
/*some usage*/
with cte2 as(
UPDATE sample md SET
colC = tmd.colC, colD = tmd.colD
FROM json_populate_recordset(null::tt_sample ,tt_sample::json) tmd
where md.colA = tmd.colA AND md.colB = tmd.colB
AND md.colB IS NOT NULL
RETURNING */*some usage */
)
/*some usage*/
return( SELECT
/*timestamp */);
END;
$BODY$;
INPUTS:
select ins_upd_sample ('[{"colA":21, "colB":"abc", "colC":null, "colD":null},
{"colA":21, "colB":"abc", "colC":"xyz, "colD":"xyz"}]')
Desired Result :
Only 1 record should be in the table. First record should get Inserted and next record should get updated. I am getting two Inserted record, and is duplicate. ( obviously, update is there for second one ).
Is it possible to commit the transatcion in between.
So here is a possible workaround. Add a column that only allows 1 value ie a default and a check constraint that value is the default and it is unique. Now in your insert do not reference this column, just take the default. So something like:
create table <your_table>
( ...
, singleton varchar(1) default 'A'
, constraint singleton_bk unique (singleton)
, constraint singleton check (singleton = 'A')
) ;
Then revise insert to:
insert into <your_table>( ... ) -- omit column singleton
values (...)
on conflict (singleton)
do update
set <column> = excluded.<columns>;
See example here.
I have the following three tables :
create table drugs(
id integer,
name varchar(20),
primary key(id)
);
create table prescription(
id integer,
drug_id integer,
primary key(id),
foreign key(drug_id) references drugs(id)
);
create table visits(
patient_id varchar(10),
prescription_id integer,
primary key( patient_id , prescription_id),
foreign key(prescription_id) references prescription(id)
);
I wrote the following function on these tables to show me a patient's drugs list(the patient id is parameter):
CREATE OR REPLACE FUNCTION public.patients_drugs(
patientid character varying)
RETURNS TABLE(drug_id integer, drug_name character varying)
LANGUAGE 'plpgsql'
COST 100
STABLE STRICT
ROWS 1000
AS $BODY$
begin
create temporary table result_table(
drug_id integer,
drug_name varchar(20)
);
return query select distinct drug.id , drug.name
from visits join prescription
on visits.patient_id = patientID;
end;
$BODY$;
However, it gives me this error:
CREATE TABLE is not allowed in a non-volatile function
You don't need to create a table in order to be able to "return a table". Just get rid of the CREATE TABLE statement.
But your query isn't correct either, as you are selecting columns from the drug table, but you never include that in the FROM clause. You can also get rid of the distinct clause if you don't use a join, but an EXISTS condition:
CREATE OR REPLACE FUNCTION public.patients_drugs(p_patientid character varying)
RETURNS TABLE(drug_id integer, drug_name character varying)
LANGUAGE plpgsql
AS $BODY$
begin
return query
select d.*
from drugs d
where exists (select *
from prescription p
join visits v on v.prescription_id = p.id
where d.id = p.drug_id
and v.patientid = p_patientid);
end;
$BODY$;
Or better, use a simple SQL function:
CREATE OR REPLACE FUNCTION public.patients_drugs(p_patientid character varying)
RETURNS TABLE(drug_id integer, drug_name character varying)
LANGUAGE sql
AS
$BODY$
select d.*
from drugs d
where exists (select *
from prescription p
join visits v on v.prescription_id = p.id
where d.id = p.drug_id
and v.patientid = p_patientid);
$BODY$;
I am trying to write sub-queries so that I search all tables for a column named id and since there are multiple tables with id column, I want to add the condition, so that id = 3119093.
My attempt was:
Select *
from information_schema.tables
where id = '3119093' and id IN (
Select table_name
from information_schema.columns
where column_name = 'id' );
This didn't work so I tried:
Select *
from information_schema.tables
where table_name IN (
Select table_name
from information_schema.columns
where column_name = 'id' and 'id' IN (
Select * from table_name where 'id' = 3119093));
This isn't the right way either. Any help would be appreciated. Thanks!
A harder attempt is:
CREATE OR REPLACE FUNCTION search_columns(
needle text,
haystack_tables name[] default '{}',
haystack_schema name[] default '{public}'
)
RETURNS table(schemaname text, tablename text, columnname text, rowctid text)
AS $$
begin
FOR schemaname,tablename,columnname IN
SELECT c.table_schema,c.table_name,c.column_name
FROM information_schema.columns c
JOIN information_schema.tables t ON
(t.table_name=c.table_name AND t.table_schema=c.table_schema)
WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
AND c.table_schema=ANY(haystack_schema)
AND t.table_type='BASE TABLE'
--AND c.column_name = "id"
LOOP
EXECUTE format('SELECT ctid FROM %I.%I WHERE cast(%I as text) like %L',
schemaname,
tablename,
columnname,
needle
) INTO rowctid;
IF rowctid is not null THEN
RETURN NEXT;
END IF;
END LOOP;
END;
$$ language plpgsql;
select * from search_columns('%3119093%'::varchar,'{}'::name[]) ;
The only problem is this code displays the table name and column name. I have to then manually enter
Select * from table_name where id = 3119093
where I got the table name from the code above.
I want to automatically implement returning rows from a table but I don't know how to get the table name automatically.
I took the time to make it work for you.
For starters, some information on what is going on inside the code.
Explanation
function takes two input arguments: column name and column value
it requires a created type that it will be returning a set of
first loop identifies tables that have a column name specified as the input argument
then it forms a query which aggregates all rows that match the input condition inside every table taken from step 3 with comparison based on ILIKE - as per your example
function goes into the second loop only if there is at least one row in currently visited table that matches specified condition (then the array is not null)
second loop unnests the array of rows that match the condition and for every element it puts it in the function output with RETURN NEXT rec clause
Notes
Searching with LIKE is inefficient - I suggest adding another input argument "column type" and restrict it in the lookup by adding a join to pg_catalog.pg_type table.
The second loop is there so that if more than 1 row is found for a particular table, then every row gets returned.
If you are looking for something else, like you need key-value pairs, not just the values, then you need to extend the function. You could for example build json format from rows.
Now, to the code.
Test case
CREATE TABLE tbl1 (col1 int, id int); -- does contain values
CREATE TABLE tbl2 (col1 int, col2 int); -- doesn't contain column "id"
CREATE TABLE tbl3 (id int, col5 int); -- doesn't contain values
INSERT INTO tbl1 (col1, id)
VALUES (1, 5), (1, 33), (1, 25);
Table stores data:
postgres=# select * From tbl1;
col1 | id
------+----
1 | 5
1 | 33
1 | 25
(3 rows)
Creating type
CREATE TYPE sometype AS ( schemaname text, tablename text, colname text, entirerow text );
Function code
CREATE OR REPLACE FUNCTION search_tables_for_column (
v_column_name text
, v_column_value text
)
RETURNS SETOF sometype
LANGUAGE plpgsql
STABLE
AS
$$
DECLARE
rec sometype%rowtype;
v_row_array text[];
rec2 record;
arr_el text;
BEGIN
FOR rec IN
SELECT
nam.nspname AS schemaname
, cls.relname AS tablename
, att.attname AS colname
, null::text AS entirerow
FROM
pg_attribute att
JOIN pg_class cls ON att.attrelid = cls.oid
JOIN pg_namespace nam ON cls.relnamespace = nam.oid
WHERE
cls.relkind = 'r'
AND att.attname = v_column_name
LOOP
EXECUTE format('SELECT ARRAY_AGG(row(tablename.*)::text) FROM %I.%I AS tablename WHERE %I::text ILIKE %s',
rec.schemaname, rec.tablename, rec.colname, quote_literal(concat('%',v_column_value,'%'))) INTO v_row_array;
IF v_row_array is not null THEN
FOR rec2 IN
SELECT unnest(v_row_array) AS one_row
LOOP
rec.entirerow := rec2.one_row;
RETURN NEXT rec;
END LOOP;
END IF;
END LOOP;
END
$$;
Exemplary call & output
postgres=# select * from search_tables_for_column('id','5');
schemaname | tablename | colname | entirerow
------------+-----------+---------+-----------
public | tbl1 | id | (1,5)
public | tbl1 | id | (1,25)
(2 rows)
The following is my function get_reportees performed on the self referencing table emp_tabref1
CREATE OR REPLACE FUNCTION get_reportees4(IN id integer)
RETURNS TABLE(e_id integer, e_name character varying, e_manager integer, e_man_name character varying) AS
$$
BEGIN
RETURN QUERY
WITH RECURSIVE manger_hierarchy(e_id, e_name, m_id, m_name) AS
(
SELECT e.emp_id, e.emp_name, e.mgr_id, e.emp_name AS man_name
FROM emp_tabref1 e WHERE e.emp_id = id
UNION
SELECT rp.emp_id, rp.emp_name, rp.mgr_id, rp.emp_name AS man_name
FROM manger_hierarchy mh INNER JOIN emp_tabref1 rp ON mh.e_id = rp.mgr_id
)
SELECT * from manger_hierarchy;
END;
$$ LANGUAGE plpgsql VOLATILE
Table structure of emp_tabref1:
CREATE TABLE **emp_tabref1**
(
emp_id integer NOT NULL,
emp_name character varying(50) NOT NULL,
mgr_id integer,
CONSTRAINT emp_tabref_pkey PRIMARY KEY (emp_id),
CONSTRAINT emp_tabref_mgr_id_fkey FOREIGN KEY (mgr_id)
REFERENCES emp_tabref (emp_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
What I want returned is the hierarchy (both above and below) of the id that we are passing which will have the emp_name, emp_id, mgr_id and mgr_name.
But my function is returning like this:
select * from get_reportees4(9)
e_id e_name e_manager e_man_name
1 9 "Emp9" 10 "Emp9"
2 5 "Emp5" 9 "Emp5"
3 6 "Emp6" 9 "Emp6"
where my expected output is
e_id e_name e_manager e_man_name
1 9 "Emp9" 10 "Emp10"
2 5 "Emp5" 9 "Emp9"
3 6 "Emp6" 9 "Emp9"
The function should return the manager name and not the employee name. Please help!
Found a solution! By creating a new join between the temporary manger_hierarchy table and the emp_tabref1 table using mgr_id and emp_id
CREATE OR REPLACE FUNCTION get_reportees4(IN id integer)
RETURNS TABLE(e_id integer, e_name character varying, e_manager integer, e_man_name character varying) AS
$$
BEGIN
RETURN QUERY
WITH RECURSIVE manger_hierarchy(e_id, e_name, m_id, m_name) AS
(
SELECT e.emp_id, e.emp_name, e.mgr_id, e.emp_name AS man_name
FROM emp_tabref1 e WHERE e.emp_id = id
UNION
SELECT rp.emp_id, rp.emp_name, rp.mgr_id, rp.emp_name AS man_name
FROM manger_hierarchy mh INNER JOIN emp_tabref1 rp ON mh.e_id = rp.mgr_id
)
SELECT manger_hierarchy.e_id, manger_hierarchy.e_name, manger_hierarchy.m_id, emp_tabref1.emp_name
FROM manger_hierarchy LEFT JOIN emp_tabref1 ON manger_hierarchy.m_id = emp_tabref1.emp_id;
END;
$$ LANGUAGE plpgsql VOLATILE
SELECT manger_hierarchy.e_id, manger_hierarchy.e_name, manger_hierarchy.m_id, emp_tabref1.emp_name
FROM manger_hierarchy LEFT JOIN emp_tabref1 ON manger_hierarchy.m_id = emp_tabref1.emp_id;