How to construct a dynamic chain of batch number - postgresql

I am trying to convert a table such as seen below in a dynamic way to construct a traceability chain model of batch throughout my process (success of tanks)
After some tests in excel, I went with SQL because I am working on a database and I thought the looping function would be the way.
I am looking for help because my code is getting overly complex for me (first project in SQL) and maybe I am missing a simpler solution with or without PostgreSQL.
Right now, I am stuck in the dynamic function that would create a table with as many columns as unique tanks and populate them based on the joins of multiple tables.
Any help would be appreciated, Thanks!
Database table:
US tank
DS tank
US batch n°
DS batch n°
citerne
B430
122
55
B430
K4
55
603
US means UpStream and DS DownStream
Result table expected:
citerne
B430
K4
122
55
603
there is a finite number of tanks but an infinite number of the batch numbers.
Here is a schema of my method : https://i.stack.imgur.com/bl7hx.png
Here is a fiddle of my data and expected result : https://dbfiddle.uk/wG9ghV_P
Here is the result expected from this data (also in the fiddle): https://i.stack.imgur.com/2OQhS.png

The proposed solution here just requires a dedicated composite type which can be dynamically created and then it relies on the jsonb type and standard functions :
Creating the composite type dynamically :
CREATE OR REPLACE PROCEDURE create_composite_type() LANGUAGE plpgsql AS $$
DECLARE
column_list text ;
BEGIN
SELECT string_agg(DISTINCT quote_ident(t.tank) || ' text', ',')
INTO column_list
FROM ( SELECT "US tank" FROM my_table
UNION ALL
SELECT "DS tank" FROM my_table
) AS t(tank) ;
EXECUTE 'DROP TYPE IF EXISTS composite_type' ;
EXECUTE 'CREATE TYPE composite_type AS (' || column_list || ')' ;
END ;
$$ ;
Calling the procedure and executing the right query to get your expected result :
CALL create_composite_type() ;
SELECT (jsonb_populate_record( null :: composite_type
, jsonb_object_agg(t.tank, t.batch)
)
).*
FROM ( SELECT "US tank" AS tank, "US batch n°" AS batch FROM my_table
UNION
SELECT "DS tank", "DS batch n°" FROM my_table
) AS t ;
The result is :
B430
citerne
K4
55
122
603
see the test in dbfiddle

Related

Deleting old rows in a schema using stored procedure

Is there a way to select all tables in a schema and delete rows that do not fit the condition (older than some date) in one procedure? I can do the same thing using 2 separate queries,
It would look something like: SELECT table_name FROM information_schema.tables WHERE table_schema = 'schemaName' and then DELETE FROM table_name WHERE time < now()-'12 months'::interval;" but cannot put my head on how to do the same using one stored procedure, I assume I should use for loop on some type of select query, but since I never realy worked with loops in postgres I always get some type of exception trying to do this.
Any help appreciated a lot
You can typically combine both SELECT and DELETE statements using a CTE :
WITH list AS
( SELECT ...
FROM ...
WHERE ...
RETURNING ...
)
DELETE FROM ...
USING list
WHERE ...
Except the fact that in your case, the returned parameter of the SELECT statement corresponds to the name of the table where rows must be deleted. This means that the table name is not known before the run time, and this implies to use a dynamic sql command within a plpgsql FUNCTION :
CREATE OR REPLACE FUNCTION delete_rows_from_table(table_name text)
RETURNS void LANGUAGE plpgsql AS
$$
BEGIN
EXECUTE E'
DELETE FROM ' || quote_ident(table_name) || E'
WHERE time < now()- \'12 months\':: interval' ;
END ;
$$ ;
Then you can use the FUNCTION delete_rows_from_table in your SELECT statement :
SELECT delete_rows_from_table(table_name)
FROM information_schema.tables
WHERE table_schema = 'schemaName'

How to set a composite type column using dynamic sql in trigger procedure

I have a trigger function that is called by several tables when COLUMN A is updated, so that COLUMN B can be updated based on value from a different function. (More complicated to explain than it really is). The trigger function takes in col_a and col_b since they are different for the different tables.
IF needs_updated THEN
sql = format('($1).%2$s = dbo.foo(($1).%1$s); ', col_a, col_b);
EXECUTE sql USING NEW;
END IF;
When I try to run the above, the format produces this sql:
($1).NameText = dbo.foo(($1).Name);
When I execute the SQL with the USING I am expecting something like this to happen (which works when executed straight up without dynamic sql):
NEW.NameText = dbo.foo(NEW.Name);
Instead I get:
[42601] ERROR: syntax error at or near "$1"
How can I dynamically update the column on the record/composite type NEW?
This isn't going to work because NEW.NameText = dbo.foo(NEW.Name); isn't a correct sql query. And I cannot think of the way you could dynamically update variable attribute of NEW. My suggestion is to explicitly define behaviour for each of your tables:
IF TG_TABLE_SCHEMA = 'my_schema' THEN
IF TG_TABLE_NAME = 'my_table_1' THEN
NEW.a1 = foo(NEW.b1);
ELSE IF TG_TABLE_NAME = 'my_table_2' THEN
NEW.a2 = foo(NEW.b2);
... etc ...
END IF;
END IF;
First: This is a giant pain in plpgsql. So my best recommendation is to do this in some other PL, such as plpythonu or plperl. Doing this in either of those would be trivial. Even if you don't want to do the whole trigger in another PL, you could still do something like:
v_new RECORD;
BEGIN
v_new := plperl_function(NEW, column_a...)
The key to doing this in plpgsql is creating a CTE that has what you need in it:
c_new_old CONSTANT text := format(
'WITH
NEW AS (SELECT (r).* FROM (SELECT ($1)::%1$s r) s)
, OLD AS (SELECT (r).* FROM (SELECT ($2)::%1$s r) s
'
, TG_RELID::regclass
);
You will also need to define a v_new that is a plain record. You could then do something like:
-- Replace 2nd field in NEW with a new value
sql := c_new_old || $$SELECT row(NEW.a, $3, NEW.c) FROM NEW$$
EXECUTE sql INTO v_new USING NEW, OLD, new_value;

Postgres FOR LOOP

I am trying to get 25 random samples of 15,000 IDs from a table. Instead of manually pressing run every time, I'm trying to do a loop. Which I fully understand is not the optimum use of Postgres, but it is the tool I have. This is what I have so far:
for i in 1..25 LOOP
insert into playtime.meta_random_sample
select i, ID
from tbl
order by random() limit 15000
end loop
Procedural elements like loops are not part of the SQL language and can only be used inside the body of a procedural language function, procedure (Postgres 11 or later) or a DO statement, where such additional elements are defined by the respective procedural language. The default is PL/pgSQL, but there are others.
Example with plpgsql:
DO
$do$
BEGIN
FOR i IN 1..25 LOOP
INSERT INTO playtime.meta_random_sample
(col_i, col_id) -- declare target columns!
SELECT i, id
FROM tbl
ORDER BY random()
LIMIT 15000;
END LOOP;
END
$do$;
For many tasks that can be solved with a loop, there is a shorter and faster set-based solution around the corner. Pure SQL equivalent for your example:
INSERT INTO playtime.meta_random_sample (col_i, col_id)
SELECT t.*
FROM generate_series(1,25) i
CROSS JOIN LATERAL (
SELECT i, id
FROM tbl
ORDER BY random()
LIMIT 15000
) t;
About generate_series():
What is the expected behaviour for multiple set-returning functions in SELECT clause?
About optimizing performance of random selections:
Best way to select random rows PostgreSQL
Below is example you can use:
create temp table test2 (
id1 numeric,
id2 numeric,
id3 numeric,
id4 numeric,
id5 numeric,
id6 numeric,
id7 numeric,
id8 numeric,
id9 numeric,
id10 numeric)
with (oids = false);
do
$do$
declare
i int;
begin
for i in 1..100000
loop
insert into test2 values (random(), i * random(), i / random(), i + random(), i * random(), i / random(), i + random(), i * random(), i / random(), i + random());
end loop;
end;
$do$;
I just ran into this question and, while it is old, I figured I'd add an answer for the archives. The OP asked about for loops, but their goal was to gather a random sample of rows from the table. For that task, Postgres 9.5+ offers the TABLESAMPLE clause on WHERE. Here's a good rundown:
https://www.2ndquadrant.com/en/blog/tablesample-in-postgresql-9-5-2/
I tend to use Bernoulli as it's row-based rather than page-based, but the original question is about a specific row count. For that, there's a built-in extension:
https://www.postgresql.org/docs/current/tsm-system-rows.html
CREATE EXTENSION tsm_system_rows;
Then you can grab whatever number of rows you want:
select * from playtime tablesample system_rows (15);
I find it more convenient to make a connection using a procedural programming language (like Python) and do these types of queries.
import psycopg2
connection_psql = psycopg2.connect( user="admin_user"
, password="***"
, port="5432"
, database="myDB"
, host="[ENDPOINT]")
cursor_psql = connection_psql.cursor()
myList = [...]
for item in myList:
cursor_psql.execute('''
-- The query goes here
''')
connection_psql.commit()
cursor_psql.close()
Here is the one complex postgres function involving UUID Array, For loop, Case condition and Enum data update. This function parses each row and checks for the condition and updates the individual row.
CREATE OR REPLACE FUNCTION order_status_update() RETURNS void AS $$
DECLARE
oid_list uuid[];
oid uuid;
BEGIN
SELECT array_agg(order_id) FROM order INTO oid_list;
FOREACH uid IN ARRAY uid_list
LOOP
WITH status_cmp AS (select COUNT(sku)=0 AS empty,
COUNT(sku)<COUNT(sku_order_id) AS partial,
COUNT(sku)=COUNT(sku_order_id) AS full
FROM fulfillment
WHERE order_id=oid)
UPDATE order
SET status=CASE WHEN status_cmp.empty THEN 'EMPTY'::orderstatus
WHEN status_cmp.full THEN 'FULL'::orderstatus
WHEN status_cmp.partial THEN 'PARTIAL'::orderstatus
ELSE null
END
FROM status_cmp
WHERE order_id=uid;
END LOOP;
END;
$$ LANGUAGE plpgsql;
To run the above function
SELECT order_status_update();
Using procedure.
CREATE or replace PROCEDURE pg_temp_3.insert_data()
LANGUAGE SQL
BEGIN ATOMIC
INSERT INTO meta_random_sample(col_serial, parent_id)
SELECT t.*
FROM generate_series(1,25) i
CROSS JOIN LATERAL (
SELECT i, parent_id
FROM parent_tree order by random() limit 2
) t;
END;
Call the procedure.
call pg_temp_3.insert_data();
PostgreSQL manual: https://www.postgresql.org/docs/current/sql-createprocedure.html

EF can't infer return schema from Stored Procedure selecting from a #temp table

Suppose the following:
CREATE PROCEDURE [MySPROC]
AS
BEGIN
CREATE TABLE #tempSubset(
[MyPrimaryKey] [bigint] NOT NULL,
[OtherColumn] [int] NOT NULL)
INSERT INTO #tempSubset (MyPrimaryKey, OtherColumn)
SELECT SomePrimaryKey, SomeColumn
FROM SomeHugeTable
WHERE LimitingCondition = true
SELECT MyPrimaryKey, OtherColumn
FROM #tempSubset
WHERE SomeExpensiveCondition = true
END
When I generate a function import or map a return type, EF doesn't generate a complex type or tells me:
The selected stored procedure or function returns no columns
How to overcome this?
Other answers suggest using table variables (not going to do this for performance reasons) faking the return schema and commenting out the real stored procedure, other suggest doing similar with views... but there must be a way to do this without having to add unnecessary overhead or requiring me to break a stored procedure to update the model?
CREATE PROCEDURE [MySPROC]
AS
BEGIN
--supplying a data contract
IF 1 = 2 BEGIN
SELECT
cast(null as bigint) as MyPrimaryKey,
cast(null as int) as OtherColumn
WHERE
1 = 2
END
CREATE TABLE #tempSubset(
[MyPrimaryKey] [bigint] NOT NULL,
[OtherColumn] [int] NOT NULL)
INSERT INTO #tempSubset (MyPrimaryKey, OtherColumn)
SELECT SomePrimaryKey, SomeColumn
FROM SomeHugeTable
WHERE LimitingCondition = true
SELECT MyPrimaryKey, OtherColumn
FROM #tempSubset
WHERE SomeExpensiveCondition = true
END
Supplying a faux data contract for the result set is the easiest, cleanest and fastest way to take care of the issue. This same problem exists in data source controls in SSIS too. .NET will read the result set from the unreachable "contract" section of the query and supply the metadata for the complex type. No performance impact and no need to comment out the SQL that does the actual work.
Adding this to the top of the stored procedure definition: SET FMTONLY OFF allowed the model to infer the schema from the temporary table without issue. As a bonus, it doesn't require additional maintenance for a contract.
Example:
SET FMTONLY OFF
CREATE TABLE #tempTable (
...
)
...
SELECT * FROM #tempTable
Solution 1
Use a table variable instead of a temporary table.
Solution 2
Use the Set FMTONLY off; SQL command in the procedure and you will get the column information to create a new complex type.
Solution 3
This is not a good way, but it's a very easy way. Just add a select statement with dummy data and it will not execute because 1=0.
you can check details on this link
This is incomplete but when set fmtonly off does not work, you can generate the data contract using the following:
SELECT *
FROM tempdb.sys.columns
WHERE [object_id] = OBJECT_ID(N'tempdb..#u');
select case system_type_id
when 62 then 'cast(null as float) as '
when 175 then 'cast(null as char(' + cast(max_length as varchar(50)) + ')) as '
when 167 then 'cast(null as varchar(' + cast(max_length as varchar(50)) + ')) as '
when 56 then 'cast(null as int) as '
when 104 then 'cast(null as bit) as '
when 106 then 'cast(null as decimal(' + cast(precision as varchar(50)) + ',' + cast(scale as varchar(50)) + ')) as '
when 40 then 'cast(null as date) as '
end
+ name + ','
from tempdb.sys.columns
WHERE [object_id] = OBJECT_ID(N'tempdb..#u');

Loop through the list of tables and check for a value in a field (DB2)

In DB2, I can get a list of tables with the following sql statement:
select tabname from syscat.tables where `tabschema = 'DBO'
Assuming that each table has a field named a1, how can I
loop through the tables and check for a value in that field
in every table?
There are two general ways. One would be to write a program that processes each file to check that column. The program could use embedded SQL to retrieve the count of the chosen value from each table. Or you could create a stored proc that accepts a table and schema name as inputs and sets an output value as essentially a boolean indicator of whether or not that table had the chosen value.
Potentially, you could perhaps create an outer proc to loop through the list of tables. And for each table it would call the inner proc that tests presence of the value.
This is a test proc that I used to verify the basic principle. It checks a column for APFILE='ACCPTH'. It returns either (1) or (0) depending on whether any row has that value or not.
-- Generate SQL
-- Version: V6R1M0 080215
-- Generated on: 03/22/14 02:59:07
-- Relational Database: TISI
-- Standards Option: DB2 for i
DROP SPECIFIC PROCEDURE SQLEXAMPLE.CHKFLDVAL ;
SET PATH "QSYS","QSYS2","SYSPROC","SYSIBMADM","mylib" ;
CREATE PROCEDURE SQLEXAMPLE.CHKFLDVAL (
IN TABLENAME VARCHAR(128) ,
IN SCHEMANAME VARCHAR(128) ,
OUT VALFOUND SMALLINT )
LANGUAGE SQL
SPECIFIC SQLEXAMPLE.CHKFLDVAL
NOT DETERMINISTIC
READS SQL DATA
CALLED ON NULL INPUT
SET OPTION ALWBLK = *ALLREAD ,
ALWCPYDTA = *OPTIMIZE ,
COMMIT = *NONE ,
CLOSQLCSR = *ENDMOD ,
DECRESULT = (31, 31, 00) ,
DFTRDBCOL = *NONE ,
DLYPRP = *NO ,
DYNDFTCOL = *NO ,
DYNUSRPRF = *USER ,
RDBCNNMTH = *RUW ,
SRTSEQ = *HEX
P1 : BEGIN
DECLARE STMTSQL VARCHAR ( 256 ) ;
DECLARE RTNRESULT SMALLINT ;
SET STMTSQL = 'VALUES (select CASE WHEN count(*) = 0 THEN 0 ELSE 1 END as chkVal from ' CONCAT SCHEMANAME CONCAT '.' CONCAT TABLENAME CONCAT ' where APFILE=''ACCPTH'' group by APFILE) INTO ?' ;
PREPARE STMT_NAME FROM STMTSQL ;
EXECUTE STMT_NAME USING RTNRESULT ;
SET VALFOUND = RTNRESULT ;
END P1 ;
COMMENT ON SPECIFIC PROCEDURE SQLEXAMPLE.CHKFLDVAL
IS 'Check field value in some table' ;
If I call it with a different TableName or SchemaName parameter value, I can get different values returned in rtnResult.
SQL is all that's actually needed. It's not a particularly good thing for SQL to do.
You cannot do this using just SQL statements. You will have to do a bit of scripting or programming of some sort to create new queries based on the table names you find and run them.