Accessing columns inside a record using table alias in Postgresql - postgresql

I'm trying to iterate over the result of a query using a record data type. Nevertheless, if I try to access one column using the table alias defined in the query, I get the following error:
ERRO: schema "inv_row" does not exist
CONTEXT: SQL command "SELECT inv_row.s.processor <> inv_row.d.processor"
PL/pgSQL function "teste" line 7 at IF
Here is the code that throws this error:
CREATE OR REPLACE FUNCTION teste() returns void as $$
DECLARE
inv_row record;
BEGIN
FOR inv_row in SELECT * FROM sa_inventory s LEFT JOIN dim_inventory d ON s.macaddr = d.macaddr LOOP
IF inv_row.s.processor <> inv_row.d.processor THEN
<do something>;
END IF;
END LOOP;
END;
$$ language plpgsql;
Is there another way to access a column of a particular table inside a record data type?

Fortunately the answer here is relatively simple. You have to use parentheses to indicate tuples:
IF (inv_row.s).processor <> (inv_row.d).processor THEN
This is because SQL specifies meaning to the depth of the namespaces and therefore without this PostgreSQL cannot safely determine what this means. So inv_row.s.processor means the processor column of the s table in the inv_row schema. However (inv_row.s).processor means take the s column of the inf_row table, treat it as a tuple, and take the processor column of that.

Related

Declare and return value for DELETE and INSERT

I am trying to remove duplicated data from some of our databases based upon unique id's. All deleted data should be stored in a separate table for auditing purposes. Since it concerns quite some databases and different schemas and tables I wanted to start using variables to reduce chance of errors and the amount of work it will take me.
This is the best example query I could think off, but it doesn't work:
do $$
declare #source_schema varchar := 'my_source_schema';
declare #source_table varchar := 'my_source_table';
declare #target_table varchar := 'my_target_schema' || source_table || '_duplicates'; --target schema and appendix are always the same, source_table is a variable input.
declare #unique_keys varchar := ('1', '2', '3')
begin
select into #target_table
from #source_schema.#source_table
where id in (#unique_keys);
delete from #source_schema.#source_table where export_id in (#unique_keys);
end ;
$$;
The query syntax works with hard-coded values.
Most of the times my variables are perceived as columns or not recognized at all. :(
You need to create and then call a plpgsql procedure with input parameters :
CREATE OR REPLACE PROCEDURE duplicates_suppress
(my_target_schema text, my_source_schema text, my_source_table text, unique_keys text[])
LANGUAGE plpgsql AS
$$
BEGIN
EXECUTE FORMAT(
'WITH list AS (INSERT INTO %1$I.%3$I_duplicates SELECT * FROM %2$I.%3$I WHERE array[id] <# %4$L :: integer[] RETURNING id)
DELETE FROM %2$I.%3$I AS t USING list AS l WHERE t.id = l.id', my_target_schema, my_source_schema, my_source_table, unique_keys :: text) ;
END ;
$$ ;
The procedure duplicates_suppress inserts into my_target_schema.my_source_table || '_duplicates' the rows from my_source_schema.my_source_table whose id is in the array unique_keys and then deletes these rows from the table my_source_schema.my_source_table .
See the test result in dbfiddle.
As has been commented, you need some kind of dynamic SQL. In a FUNCTION, PROCEDURE or a DO statement to do it on the server.
You should be comfortable with PL/pgSQL. Dynamic SQL is no beginners' toy.
Example with a PROCEDURE, like Edouard already suggested. You'll need a FUNCTION instead to wrap it in an outer transaction (like you very well might). See:
When to use stored procedure / user-defined function?
CREATE OR REPLACE PROCEDURE pg_temp.f_archive_dupes(_source_schema text, _source_table text, _unique_keys int[], OUT _row_count int)
LANGUAGE plpgsql AS
$proc$
-- target schema and appendix are always the same, source_table is a variable input
DECLARE
_target_schema CONSTANT text := 's2'; -- hardcoded
_target_table text := _source_table || '_duplicates';
_sql text := format(
'WITH del AS (
DELETE FROM %I.%I
WHERE id = ANY($1)
RETURNING *
)
INSERT INTO %I.%I TABLE del', _source_schema, _source_table
, _target_schema, _target_table);
BEGIN
RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _unique_keys; -- execute
GET DIAGNOSTICS _row_count = ROW_COUNT;
END
$proc$;
Call:
CALL pg_temp.f_archive_dupes('s1', 't1', '{1, 3}', 0);
db<>fiddle here
I made the procedure temporary, since I assume you don't need to keep it permanently. Create it once per database. See:
How to create a temporary function in PostgreSQL?
Passed schema and table names are case-sensitive strings! (Unlike unquoted identifiers in plain SQL.) Either way, be wary of SQL-injection when concatenating SQL dynamically. See:
Are PostgreSQL column names case-sensitive?
Table name as a PostgreSQL function parameter
Made _unique_keys type int[] (array of integer) since your sample values look like integers. Use a the actual data type of your id columns!
The variable _sql holds the query string, so it can easily be debugged before actually executing. Using RAISE NOTICE '%', _sql; for that purpose.
I suggest to comment the EXECUTE line until you are sure.
I made the PROCEDURE return the number of processed rows. You didn't ask for that, but it's typically convenient. At hardly any cost. See:
Dynamic SQL (EXECUTE) as condition for IF statement
Best way to get result count before LIMIT was applied
Last, but not least, use DELETE ... RETURNING * in a data-modifying CTE. Since that has to find rows only once it comes at about half the cost of separate SELECT and DELETE. And it's perfectly safe. If anything goes wrong, the whole transaction is rolled back anyway.
Two separate commands can also run into concurrency issues or race conditions which are ruled out this way, as DELETE implicitly locks the rows to delete. Example:
Replicating data between Postgres DBs
Or you can build the statements in a client program. Like psql, and use \gexec. Example:
Filter column names from existing table for SQL DDL statement
Based on Erwin's answer, minor optimization...
create or replace procedure pg_temp.p_archive_dump
(_source_schema text, _source_table text,
_unique_key int[],_target_schema text)
language plpgsql as
$$
declare
_row_count bigint;
_target_table text := '';
BEGIN
select quote_ident(_source_table) ||'_'|| array_to_string(_unique_key,'_') into _target_table from quote_ident(_source_table);
raise notice 'the deleted table records will store in %.%',_target_schema, _target_table;
execute format('create table %I.%I as select * from %I.%I limit 0',_target_schema, _target_table,_source_schema,_source_table );
execute format('with mm as ( delete from %I.%I where id = any (%L) returning * ) insert into %I.%I table mm'
,_source_schema,_source_table,_unique_key, _target_schema, _target_table);
GET DIAGNOSTICS _row_count = ROW_COUNT;
RAISE notice 'rows influenced, %',_row_count;
end
$$;
--
if your _unique_key is not that much, this solution also create a table for you. Obviously you need to create the target schema yourself.
If your unique_key is too much, you can customize to properly rename the dumped table.
Let's call it.
call pg_temp.p_archive_dump('s1','t1', '{1,2}','s2');
s1 is the source schema, t1 is source table, {1,2} is the unique key you want to extract to the new table. s2 is the target schema

How to migrate oracle's pipelined function into PostgreSQL

As I am newbie to plpgSQL,
I stuck while migrating a Oracle query into PostgreSQL.
Oracle query:
create or replace FUNCTION employee_all_case(
p_ugr_id IN integer,
p_case_type_id IN integer
)
RETURN number_tab_t PIPELINED
-- LANGUAGE 'plpgsql'
-- COST 100
-- VOLATILE
-- AS $$
-- DECLARE
is
l_user_id NUMBER;
l_account_id NUMBER;
BEGIN
l_user_id := p_ugr_id;
l_account_id := p_case_type_id;
FOR cases IN
(SELECT ccase.case_id, ccase.employee_id
FROM ct_case ccase
INNER JOIN ct_case_type ctype
ON (ccase.case_type_id=ctype.case_type_id)
WHERE ccase.employee_id = l_user_id)
LOOP
IF cases.employee_id IS NOT NULL THEN
PIPE ROW (cases.case_id);
END IF;
END LOOP;
RETURN;
END;
--$$
When I execute this function then I get the following result
select * from table(select employee_all_case(14533,1190) from dual)
My question here is: I really do not understand the pipelined function and how can I obtain the same result in PostgreSQL as Oracle query ?
Please help.
Thank you guys, your solution was very helpful.
I found the desire result:
-- select * from employee_all_case(14533,1190);
-- drop function employee_all_case
create or replace FUNCTION employee_all_case(p_ugr_id IN integer ,p_case_type_id IN integer)
returns table (case_id double precision)
-- PIPELINED
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $$
DECLARE
-- is
l_user_id integer;
l_account_id integer;
BEGIN
l_user_id := cp_lookup$get_user_id_from_ugr_id(p_ugr_id);
l_account_id := cp_lookup$acctid_from_ugr(p_ugr_id);
RETURN QUERY SELECT ccase.case_id
FROM ct_case ccase
INNER JOIN ct_case_type ctype ON ccase.case_type_id = ctype.case_type_id
WHERE ccase.employee_id = p_ugr_id
and ccase.employee_id IS NOT NULL;
--return NEXT;
END;
$$
You would rewrite that to a set returning function:
Change the return type to
RETURNS SETOF integer
and do away with the PIPELINED.
Change the PIPE ROW statement to
RETURN NEXT cases.case_id;
Of course, you will have to do the obvious syntactic changes, like using integer instead of NUMBER and putting the IN before the parameter name.
But actually, it is quite unnecessary to write a function for that. Doing it in a single SELECT statement would be both simpler and faster.
Pipelined functions are best translated to a simple SQL function returning a table.
Something like this:
create or replace function employee_all_case(p_ugr_id integer, p_case_type_IN integer)
returns table (case_id integer)
as
$$
SELECT ccase.case_id
FROM ct_case ccase
INNER JOIN ct_case_type ctype ON ccase.case_type_id = ctype.case_type_id
WHERE ccase.employee_id = p_ugr_id
and cases.employee_id IS NOT NULL;
$$
language sql;
Note that your sample code did not use the second parameter p_case_type_id.
Usage is also more straightforward:
select *
from employee_all_case(14533,1190);
Before diving into the solution, I will provide some information which will help you to understand better.
So basically PIPELINED came into picture for improving memory allocation at run time.
As you all know collections will occupy space when ever they got created. So the more you use, the more memory will get allocated.
Pipelining negates the need to build huge collections by piping rows out of the function.
saving memory and allowing subsequent processing to start before all the rows are generated.
Pipelined table functions include the PIPELINED clause and use the PIPE ROW call to push rows out of the function as soon as they are created, rather than building up a table collection.
By using Pipelined how memory usage will be optimized?
Well, it's very simple. instead of storing data into an array, just process the data by using pipe row(desired type). This actually returns the row and process the next row.
coming to solution in plpgsql
simple but not preferred while storing large data.
Remove PIPELINED from return declaration and return an array of desired type. something like RETURNS typrec2[].
Where ever you are using pipe row(), add that entry to array and finally return that array.
create a temp table like
CREATE TEMPORARY TABLE temp_table (required fields) ON COMMIT DROP;
and insert data into it. Replace pipe row with insert statement and finally return statement like
return query select * from temp_table
**The best link for understanding PIPELINED in oracle [https://oracle-base.com/articles/misc/pipelined-table-functions]
pretty ordinary for postgres reference [http://manojadinesh.blogspot.com/2011/11/pipelined-in-oracle-as-well-in.html]
Hope this helps some one conceptually.

plpgsql : how to reference variable in sql statement

I am rather new to PL/pgSQL and don't know how to reference a variable in a SELECT statement.
In this following function the SELECT INTO always returns NULL:
$body$
DECLARE
r RECORD;
c CURSOR FOR select name from t_people;
nb_bills integer;
BEGIN
OPEN c;
LOOP
FETCH c INTO r;
EXIT WHEN NOT FOUND;
RAISE NOTICE 'name found: %', r.name;
SELECT count(bill_id) INTO nb_bills from t_bills where name = r.name;
END LOOP;
END;
$body$
RAISE NOTICE allows me to verify that my CURSOR is working well: names are properly retrieved, but for some reason still unknown to me, not properly passed to the SELECT INTO statement.
For debugging purpose, I tried to replace the variable in SELECT INTO with a constant value and it worked:
SELECT count( bill_id) INTO nb_bills from t_bills where name = 'joe';
I don't know how to reference r.name in the SELECT INTO statement.
I tried r.name, I tried to create another variable containing quotes, it is always returning NULL.
I am stuck. If anyone knows ...
the SELECT INTO always returns NULL
not properly passed to the SELECT INTO statement.
it is always returning NULL.
None of this makes sense.
SELECT statements do not return anything by itself in PL/pgSQL. You have to either assign results to variables or explicitly return results with one of the available RETURN variants.
SELECT INTO is only used for variable assignment and does not return anything, either. Not to be confused with SELECT INTO in plain SQL - which is generally discouraged:
Combine two tables into a new one so that select rows from the other one are ignored
It's not clear what's supposed to be returned in your example. You did not disclose the return type of the function and you did not actually return anything.
Start by reading the chapter Returning From a Function in the manual.
Here are some related answers with examples:
Can I make a plpgsql function return an integer without using a variable?
Return a query from a function?
How to return multiple rows from PL/pgSQL function?
Return setof record (virtual table) from function
And there may be naming conflicts with parameter names. We can't tell unless you post the complete function definition.
Better approach
That said, explicit cursors are only actually needed on rare occasions. Typically, the implicit cursor of a FOR loop is simpler to handle and cheaper:
Truncating all tables in a Postgres database
And most of the time you don't even need any cursors or loops at all. Set-based solutions are typically simpler and faster.

Dynamic table name for INSERT INTO query

I am trying to figure out how to write an INSERT INTO query with table name and column name of the source as parameter.
For starters I was just trying to parametrize the source table name. I have written the following query. For now I am declaring and assigning the value of the variable tablename directly, but in actual example it would come from some other source/list. The target table has only one column.
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$$
DECLARE
tablename text;
BEGIN
tablename := 'Table_1';
EXECUTE 'INSERT INTO "Schemaname"."targettable"
SELECT "Col_A"
FROM "schemaname".'
||quote_ident(tablename);
END
$$ LANGUAGE PLPGSQL;
Although the query runs without any error no changes are reflected at the target table. On running the query I get the following output.
Query OK, 0 rows affected (execution time: 296 ms; total time: 296 ms)
I want the changes to be reflected at the target table. I don't know how to resolve the problem.
Audited code
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$func$
DECLARE
_tbl text := 'Table_1'; -- or 'table_1'?
BEGIN
EXECUTE 'INSERT INTO schemaname.targettable(column_name)
SELECT "Col_A"
FROM schemaname.' || quote_ident(_tbl); -- or "Schemaname"?
END
$func$ LANGUAGE plpgsql;
Always use an explicit target list for persisted INSERT statements.
You can assign variables at declare time.
It's a wide-spread folly to use double-quoted identifiers to preserve otherwise illegal spelling. You have to keep double-quoting the name for the rest of its existence. One or more of those errors seem to have crept into your code: "Schemaname" or "schemaname"? Table_1 or "Table_1"?
Are PostgreSQL column names case-sensitive?
When you provide an identifier like a table name as text parameter and escape it with quote_ident(), it is case sensitive!
Identifiers in SQL code are cast to lower case unless double-quoted. But quote-ident() (which you must use to defend against SQL injection) preserves the spelling you provide with double-quotes where necessary.
Function with parameter
CREATE OR REPLACE FUNCTION foo(_tbl text)
RETURNS void AS
$func$
BEGIN
EXECUTE 'INSERT INTO schemaname.targettable(column_name)
SELECT "Col_A"
FROM schemaname.' || quote_ident(_tbl);
END
$func$ LANGUAGE plpgsql;
Call:
SELECT foo('tablename'); -- tablename is case sensitive
There are other ways:
Table name as a PostgreSQL function parameter

Trying to convert simple execution stored procedure to postgres function - can't get CURRVAL('table_seq') to function

The MySQL Stored Procedure was:
BEGIN
set #sql=_sql;
PREPARE stmt FROM #sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
set _ires=LAST_INSERT_ID();
END$$
I tried to convert it to:
BEGIN
EXECUTE _sql;
SELECT INTO _ires CURRVAL('table_seq');
RETURN;
END;
I get the error:
SQL error:
ERROR: relation "table_seq" does not exist
LINE 1: SELECT CURRVAL('table_seq')
^
QUERY: SELECT CURRVAL('table_seq')
CONTEXT: PL/pgSQL function "myexecins" line 4 at SQL statement
In statement:
SELECT myexecins('SELECT * FROM tblbilldate WHERE billid = 2')
The query used is for testing purposes only. I believe this function is used to get the row id of the inserted or created row from the query. Any Suggestions?
When you create tables with serial columns, sequences are, by default, named as tablename_columnname_seq, but it seems that you're trying to access tablename_seq. So with a table called foobar and primary key column foobar_id it ends up being foobar_foobar_id_seq.
By the way, a cleaner way to get the primary key after an insert is using the RETURNING clause in INSERT. E.g.:
_sql = 'INSERT INTO sometable (foobar) VALUES (123) RETURNING sometable_id';
EXECUTE _sql INTO _ires;
PostgreSQL is saying that there is no sequence called "table_seq". Are you sure that that is the right name? The name you would use would depend on what is in _sql as each SERIAL or BIGSERIAL gets its own sequence, you can also define sequences and wire them up by hand.
In any case, lastval() is a closer match to MySQL's LAST_INSERT_ID(), lastval() returns the most recently returned value from any sequence in the current session:
lastval
Return the value most recently returned by nextval in the current session. This
function is identical to currval, except that instead of taking the sequence name
as an argument it fetches the value of the last sequence used by nextval in the
current session. It is an error to call lastval if nextval has not yet been called
in the current session.
Using lastval() also means that you don't have to worry about what's in _sql, unless of course it doesn't use a sequence at all.