postgres function to remove numbers from data in columns

postgres function to remove numbers from data in columns - postgresql

I've found a stackoverflow answer on how to create a function in MSSQL to remove numerical values from a varchar column, I need to translate this into Postgres if possible. This could do with being a function as I need to use it multiple times across a few different databases.
I've tried my best to convert the MSSQL version but it's still not working, could anyone help fill in the gaps? I'm not a database expert at all, thanks!
CREATE Function public.RemoveNumericCharacters(Temp VarChar(1000))
Returns VarChar(1000)
AS $$
Declare NumRange varchar(50) = '%[0-9]%';
While PatIndex(#NumRange, Temp) > 0
Set Temp = Stuff(Temp, PatIndex(#NumRange, Temp), 1, '')
Return Temp
End
$$
LANGUAGE SQL;
For a bit more context - the column I need this for is an email column so it will need to convert for example testuser345#hotmail.com to testuser#hotmail.com.

No need for your own function, this can be achieved using regexp_replace()
select regexp_replace('testuser345#hotmail.com', '[0-9]+', 'g')
Obviously you can put this into a function:
CREATE Function public.removenumericcharacters(p_input text)
returns text
AS $$
select regexp_replace(p_input, '[0-9]+', 'g');
$$
LANGUAGE SQL
immutable;

Related

Declare and return value for DELETE and INSERT

I am trying to remove duplicated data from some of our databases based upon unique id's. All deleted data should be stored in a separate table for auditing purposes. Since it concerns quite some databases and different schemas and tables I wanted to start using variables to reduce chance of errors and the amount of work it will take me.
This is the best example query I could think off, but it doesn't work:
do $$
declare #source_schema varchar := 'my_source_schema';
declare #source_table varchar := 'my_source_table';
declare #target_table varchar := 'my_target_schema' || source_table || '_duplicates'; --target schema and appendix are always the same, source_table is a variable input.
declare #unique_keys varchar := ('1', '2', '3')
begin
select into #target_table
from #source_schema.#source_table
where id in (#unique_keys);
delete from #source_schema.#source_table where export_id in (#unique_keys);
end ;
$$;
The query syntax works with hard-coded values.
Most of the times my variables are perceived as columns or not recognized at all. :(

You need to create and then call a plpgsql procedure with input parameters :
CREATE OR REPLACE PROCEDURE duplicates_suppress
(my_target_schema text, my_source_schema text, my_source_table text, unique_keys text[])
LANGUAGE plpgsql AS
$$
BEGIN
EXECUTE FORMAT(
'WITH list AS (INSERT INTO %1$I.%3$I_duplicates SELECT * FROM %2$I.%3$I WHERE array[id] <# %4$L :: integer[] RETURNING id)
DELETE FROM %2$I.%3$I AS t USING list AS l WHERE t.id = l.id', my_target_schema, my_source_schema, my_source_table, unique_keys :: text) ;
END ;
$$ ;
The procedure duplicates_suppress inserts into my_target_schema.my_source_table || '_duplicates' the rows from my_source_schema.my_source_table whose id is in the array unique_keys and then deletes these rows from the table my_source_schema.my_source_table .
See the test result in dbfiddle.

As has been commented, you need some kind of dynamic SQL. In a FUNCTION, PROCEDURE or a DO statement to do it on the server.
You should be comfortable with PL/pgSQL. Dynamic SQL is no beginners' toy.
Example with a PROCEDURE, like Edouard already suggested. You'll need a FUNCTION instead to wrap it in an outer transaction (like you very well might). See:
When to use stored procedure / user-defined function?
CREATE OR REPLACE PROCEDURE pg_temp.f_archive_dupes(_source_schema text, _source_table text, _unique_keys int[], OUT _row_count int)
LANGUAGE plpgsql AS
$proc$
-- target schema and appendix are always the same, source_table is a variable input
DECLARE
_target_schema CONSTANT text := 's2'; -- hardcoded
_target_table text := _source_table || '_duplicates';
_sql text := format(
'WITH del AS (
DELETE FROM %I.%I
WHERE id = ANY($1)
RETURNING *
)
INSERT INTO %I.%I TABLE del', _source_schema, _source_table
, _target_schema, _target_table);
BEGIN
RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _unique_keys; -- execute
GET DIAGNOSTICS _row_count = ROW_COUNT;
END
$proc$;
Call:
CALL pg_temp.f_archive_dupes('s1', 't1', '{1, 3}', 0);
db<>fiddle here
I made the procedure temporary, since I assume you don't need to keep it permanently. Create it once per database. See:
How to create a temporary function in PostgreSQL?
Passed schema and table names are case-sensitive strings! (Unlike unquoted identifiers in plain SQL.) Either way, be wary of SQL-injection when concatenating SQL dynamically. See:
Are PostgreSQL column names case-sensitive?
Table name as a PostgreSQL function parameter
Made _unique_keys type int[] (array of integer) since your sample values look like integers. Use a the actual data type of your id columns!
The variable _sql holds the query string, so it can easily be debugged before actually executing. Using RAISE NOTICE '%', _sql; for that purpose.
I suggest to comment the EXECUTE line until you are sure.
I made the PROCEDURE return the number of processed rows. You didn't ask for that, but it's typically convenient. At hardly any cost. See:
Dynamic SQL (EXECUTE) as condition for IF statement
Best way to get result count before LIMIT was applied
Last, but not least, use DELETE ... RETURNING * in a data-modifying CTE. Since that has to find rows only once it comes at about half the cost of separate SELECT and DELETE. And it's perfectly safe. If anything goes wrong, the whole transaction is rolled back anyway.
Two separate commands can also run into concurrency issues or race conditions which are ruled out this way, as DELETE implicitly locks the rows to delete. Example:
Replicating data between Postgres DBs
Or you can build the statements in a client program. Like psql, and use \gexec. Example:
Filter column names from existing table for SQL DDL statement

Based on Erwin's answer, minor optimization...
create or replace procedure pg_temp.p_archive_dump
(_source_schema text, _source_table text,
_unique_key int[],_target_schema text)
language plpgsql as
$$
declare
_row_count bigint;
_target_table text := '';
BEGIN
select quote_ident(_source_table) ||'_'|| array_to_string(_unique_key,'_') into _target_table from quote_ident(_source_table);
raise notice 'the deleted table records will store in %.%',_target_schema, _target_table;
execute format('create table %I.%I as select * from %I.%I limit 0',_target_schema, _target_table,_source_schema,_source_table );
execute format('with mm as ( delete from %I.%I where id = any (%L) returning * ) insert into %I.%I table mm'
,_source_schema,_source_table,_unique_key, _target_schema, _target_table);
GET DIAGNOSTICS _row_count = ROW_COUNT;
RAISE notice 'rows influenced, %',_row_count;
end
$$;
--
if your _unique_key is not that much, this solution also create a table for you. Obviously you need to create the target schema yourself.
If your unique_key is too much, you can customize to properly rename the dumped table.
Let's call it.
call pg_temp.p_archive_dump('s1','t1', '{1,2}','s2');
s1 is the source schema, t1 is source table, {1,2} is the unique key you want to extract to the new table. s2 is the target schema

How to return a table by rowtype in PL/pgSQL

I am trying to implement a function that returns a table with the same structure as an input table in the parameter, using PL/pgSQL (PostgreSQL 9.3). Basically, I want to update a table, and return a copy of the updated table with plpgsql. I searched around SO and found several related questions (e.g. Return dynamic table with unknown columns from PL/pgSQL function and Table name as a PostgreSQL function parameter), which lead to the following minimal test example:
CREATE OR REPLACE FUNCTION change_val(_lookup_tbl regclass)
RETURNS _lookup_tbl%rowtype AS --problem line
$func$
BEGIN
RETURN QUERY EXECUTE format('UPDATE %s SET val = 2 RETURNING * ; ', _lookup_tbl);
END
$func$ LANGUAGE plpgsql;
But I can't get past giving the correct return type for TABLE or SETOF RECORD in the problem line. According to this answer:
SQL demands to know the return type at call time
But I think the return type (which I intend to borrow from the input table type) is known. Can some one help explain if it is possible to fix the signature of the above PL/pgSQL function?
Note, I need to parametrize the input table and return the update of that table. Alternatives are welcome.

What you have so far looks good. The missing ingredient: polymorphic types.
CREATE OR REPLACE FUNCTION change_val(_tbl_type anyelement)
RETURNS SETOF anyelement -- problem solved
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
'UPDATE %s SET val = 2 RETURNING *;'
, pg_typeof(_tbl_type))
);
END
$func$;
Call (important):
SELECT * FROM change_val(NULL::some_tbl);
db<>fiddle here
Old sqlfiddle
See (last paragraph):
Refactor a PL/pgSQL function to return the output of various SELECT queries

Retrieving a value from a RECORD

In a plpgsql function, I have a variable of type record:
my_rec RECORD;
This record contains a row from an arbitrary table, so I do not know the columns before it is executed.
However, I do have the name of at least one of the columns available as a varchar.
The question is: How do I retrieve the value for a given column from my_rec?

Use hstore to work with records with dynamic columns in PL/PgSQL functions:
CREATE EXTENSION hstore;
CREATE OR REPLACE FUNCTION test_fn(col_name text) RETURNS text AS $$
DECLARE
input_row record;
col_value text;
BEGIN
SELECT INTO input_row
*
FROM ( VALUES ('a','b','c',1) ) AS dummyrow(col1,col2,col3,intcol);
SELECT INTO col_value
hstore(input_row) -> col_name;
RETURN col_value;
END;
$$ LANGUAGE 'plpgsql';
hstore is an extension, but it's an extension that's been bundled with PostgreSQL since 8.3 and has been installable using CREATE EXTENSION since 9.1. The record-to-hstore conversion has been supported since something like 8.4 or 9.0.

I don't know of a way to do this in plpgsql. I did a bit of testing for you and tried to make a "EXECUTE SELECT" solution work, such as:
EXECUTE 'select $1.' || quote_ident(the_param) USING my_rec INTO my_var;
This does not work for me and I get:
could not identify column "{{param_value here}}" in record data type
Here is a very similar question from a few years ago saying that it is not possible with plpgsql. Per it's suggestion, it appears that it should be possible with some other languages. Quoting Tom Lane's answer:
There is no way to do that in plpgsql. You could do it in the other PLs
(eg plperl, pltcl) since they are not as strongly typed as plpgsql.

Function returning table

I want a plpgsql function that returns the content of any table, given the name. The function below, although not working because of many reasons will gove you the general idea. Safety and coding practice aside, what's the easiest way to accomplish this?
In the end I want to get these results trough a Java CallableStatement.
CREATE OR REPLACE FUNCTION get_table(tablename VARCHAR)
RETURNS SETOF RECORD AS $PROC$
BEGIN
RETURN QUERY SELECT * FROM tablename;
END;
$PROC$ LANGUAGE 'plpgsql';

You can get your function working like this:
CREATE OR REPLACE FUNCTION get_table(tablename VARCHAR)
RETURNS SETOF RECORD AS $PROC$
BEGIN
RETURN QUERY EXECUTE 'SELECT * FROM ' || quote_ident(tablename);
END;
$PROC$ LANGUAGE 'plpgsql';
In order to call it, you must specify names and data types for all returned columns. If you want to list table "t" which has two columns, you could use your function like this:
SELECT * FROM get_table('t') x(id int, val text);
Which of course, is longer and a lot more trouble than either:
SELECT * FROM t;
or the equivalent:
TABLE t;
I really can't imagine a use-case where such a function makes anything better.

Getting Postgres to truncate values if necessary?

If I create a table mytable with column data as varchar(2) and then insert something like '123' into the column, postgres will give me an error for Value too long for type.
How can I have Postgres ignore this and truncate the value if necessary?
Also, I do not (when creating the query) know the actual size of the data column in mytable so I can't just cast it.

According to the postgres documentation, you have to explicitly cast it to achieve this behavior, as returning an error is a requirement of the SQL standard. Is there no way to inspect the table's schema before creating the query to know what to cast it to?

Use text type with trigger instead:
create table mytable (
data text
);
create or replace function mytable_data_trunc_trigger()
returns trigger language plpgsql volatile as $$
begin
NEW.data = substring(NEW.data for 2);
return NEW;
end;
$$;
create trigger mytable_data_truncate_trigger
before insert or update on mytable for each row
execute procedure mytable_data_trunc_trigger();
insert into mytable values (NULL),('1'),('12'),('123');
select * from mytable;
data
------
1
12
12
(4 rows)

Easiest is just substring
INSERT INTO mytable (data) VALUES (substring('123' from 1 for 2));

You could change the datatype to varchar, and then use a trigger to enforce the two char constraint.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

postgres function to remove numbers from data in columns - postgresql

Related

Declare and return value for DELETE and INSERT

How to return a table by rowtype in PL/pgSQL

Retrieving a value from a RECORD

Function returning table

Getting Postgres to truncate values if necessary?

Categories

Resources