Variable column name on function - postgresql

I'm new to pgsql but have I have 8 years of experience with MSSQL, what i'm trying achieve is: create a function to apply this remove invalid data from names, it will remove all special characters, numbers and accents, keeping only spaces and a-Z characters, I want to use it on columns of different tables, but I cant really find what I'm doing wrong.
Here is my code:
CREATE OR REPLACE FUNCTION f_validaNome (VARCHAR(255))
RETURNS VARCHAR(255) AS
SELECT regexp_replace(unaccent($1), '[^[:alpha:]\s]', '', 'g')
COMMIT
If I run
SELECT regexp_replace(unaccent(column_name), '[^[:alpha:]\s]', '', 'g')
from TableA
my code runs fine. I don't know exactly what is wrong with the function code.

That's not how functions are written in Postgres.
As documented in the manual the function's body must be passed as a string and you need to specify which language the function is written in. Functions can be written in SQL, PL/pgSQL, PL/python, PL/perl and many others. There is also no need to reference parameters by position. Passing a dollar quoted string makes writing the function body easier.
For what you are doing, a simple SQL function is enough. It's also unnecessary to use an arbitrary character limit like 255 (which does have any performance or storage advantages over any other defined max length). So just use text.
CREATE OR REPLACE FUNCTION f_validanome (p_input text)
RETURNS text
AS
$body$ --<< string starts here.
SELECT regexp_replace(unaccent(p_input), '[^[:alpha:]\s]', '', 'g'); --<< required ; at the end
$body$ --<< string ends here
language sql
immutable; --<< required ; at the end

Related

How to pass a Tables Column into a plpgsql Functiion while performing a SELECT... statement

I googled but everyone was asking for how to pass tables or how to use the return result into a Function; I want to do neither. I simply want to take the value of a Column (lets assume col2 below is of the text datatype) of a table, and pass that data into a Function, so I can manipulate the data, but in the SELECT... statement itself, i.e.
SELECT t.col1, "myCustomFunction"(t.col2)
FROM tbl t
WHERE t.col1 = 'someCondition';
CREATE OR REPLACE FUNCTION myCustomFunction(myArg text)
RETURNS text AS $$
DECLARE
BEGIN
RETURN UPPER(myArg);
END
$$ LANGUAGE plpgsql;
... So if myCustomerFunction()'s job was capitalize letters (its not, just an example), the output would be the table with col2 data all capitalized.
Is this possible? I supposed it would be no different than embedding a CASE expression there, which I know works, and a Function returns a result, so I assumed it would be the same, but I am getting SQL Error
You cannot to pass named column to some function and you cannot to return this named column like table with this column. The table is set of rows, and almost all processing in Postgres is based on rows processing. Usually you need to hold only data of one row in memory, so you can process much bigger dataset than is your memory.
Inside PL/pgSQL function you have not informations about outer. You can get just data of scalar types, arrays of scalars, or composite or arrays of composites (or ranges and multiranges - this special kind of composite and array of composite). Nothing else.
Theoretically you can aggregate data in one column to array, and later you can expand this array to table. But these operations are memory expensive and can be slow. You need it only in few cases (like computing of median function), but it is slow, and there is risk of out of memory exception.
When object names are not doubled quoted Postgres processes then internally as lower case. When doubled quoted the are processed exactly as quoted. The thing is these may not be the same. You defined the function as FUNCTION myCustomFunction(myArg text) Not doubled quoted, but attempt to call it via "myCustomFunction"(t.col2). Unfortunately myCustomFunction processed as mycustomfunction but "myCustomFunction" is processed exactly as it appears. Those are NOT the same. Either change your select to:
SELECT t.col1,myCustomFunction(t.col2)
FROM tbl t
WHERE t.col1 = 'someCondition';
or change the function definition to:
CREATE OR REPLACE FUNCTION "myCustomFunction"(myArg text)
RETURNS text AS $$
DECLARE
BEGIN
RETURN UPPER(myArg);
END
$$ LANGUAGE plpgsql;

PostgreSQL how to identify if values need to be quoted

I am trying to write a function that will dynamically create the sql statement, but I am facing problems with typecasts so, how can I identify using the type of field if it needs to be quoted
-- using this I can recover the types of each field
-- but I do not have a simple way to express that for a specific type it need
-- to quote
create table test (
id serial not null,
employee_name text,
salary decimal(12,2),
hire_date date,
active boolean
);
select column_name,data_type, null as need_to_be_quoted
from information_schema.columns
where table_name = 'table_name';
column type need to be quoted (this is a missing information)
-------------------------------------
id integer false
employee_name text true
salary decimal false
hire_date date true
active boolean false
quote_ident docs says:
Return the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only if necessary
But it is not what I was expecting:
insert into test (employee_name, salary, hire_date, active)
values (quote_identy('John Doe'), quote_identy(100000), quote_identy(current_date), quote_identy(true));
This is kind of necessary because I am trying to generate the statement string dinamically.
I have values to be inserted in some table, I can discover the type of each value, but to generated the insert string statement, I should know if a specific value type should be quoted or not for example
text: type should be quoted in the string statement
boolean: should not be quoted
numeric: should not be quoted
date: should be quoted
Don't quote. Quoting adds complexity and if you don't get it just right it has syntax and security issues.
Instead use bind parameters. The details depend on what database library you're working with, but the basic idea is always the same. First prepare the statement with placeholders for your values. Then execute it passing in the values. The database takes care of putting the values into the statement. You can execute the same prepared statement multiple times optimizing your coding and database calls.
Here's how you'd do it in PL/pgSQL with EXECUTE. The linked documentation has lots of information about safely executing dynamic queries.
do $$
begin
execute '
insert into test (employee_name, salary, hire_date, active)
values ($1, $2, $3, $4)
' using 'John Doe', 100000, current_date, true;
end;
$$;
Furthermore, while writing a SQL builder is a good exercise, it's also very complicated and very easy to get subtly wrong. There are plenty of libraries which will build SQL for you.
Here's your statement using PHP's Laravel.
DB::table('test')->insert([
'employee_name' => 'John Doe',
'salary' => 100000,
'hire_date' => current_date,
'active' => true
);
If you're curious about how SQL builders work, you can dig inside Laravel for ideas. Writing one in PL/pgSQL is ambitious, but I don't know of one that exists. Good luck!

How to do variable substitution in plpgsql?

I've got a bit of complex sql code that I'm converting from MSSql to Postgres (using Entity Framework Core 2.1), to deal with potential race conditions when inserting to a table with a unique index. Here's the dumbed-down version:
const string QUERY = #"
DO
$$
BEGIN
insert into Foo (Field1,Field2,Field3)
values (#value1,#value2,#value3);
EXCEPTION WHEN others THEN
-- do nothing; it's a race condition
END;
$$ LANGUAGE plpgsql;
select *
from Foo
where Field1 = #value1
and Field2 = #value2;
";
return DbContext.Foos
.FromSql(QUERY,
new NpgsqlParameter("value1", value1),
new NpgsqlParameter("value2", value2),
new NpgsqlParameter("value3", value3))
.First();
In other words, try to insert the record, but don't throw an exception if the attempt to insert it results in a unique index violation (index is on Field1+Field2), and return the record, whether it was created by me or another thread.
This concept worked fine in MSSql, using a TRY..CATCH block. As far as I can tell, the way to handle Postgres exceptions is as I've done, in a plpgsql block.
BUT...
It appears that variable substitution in plpgsql blocks doesn't work. The code above fails on the .First() (no elements in sequence), and when I comment out the EXCEPTION line, I see the real problem, which is:
Npgsql.PostgresException : 42703: column "value1" does not exist
When I test using regular Sql, i.e. doing the insert without using a plpgsql block, this works fine.
So, what is the correct way to do variable substitution in a plpgsql block?
The reason this doesn't work is that the body of the DO statement is actually a string, a text. See reference
$$ is just another way to delimit text in postgresql. It can be just as well be replaced with ' or $somestuff$.
As it is a string, Npgsql and Postgresql have no reason to mess with #value1 in it.
Solutions? Only a very ugly one, so not using this construction, as you're not able to pass it any values. And messing with string concatenation is no different than doing concatenation in C# in the first place.
Alternatives? Yes!
You don't need to handle exceptions in plpgsql blocks. Simply insert, use the ON CONFLICT DO NOTHING, and be on your way.
INSERT INTO Foo (Field1,Field2,Field3)
VALUES (#value1,#value2,#value3)
ON CONFLICT DO NOTHING;
select *
from Foo
where Field1 = #value1
and Field2 = #value2;
Or if you really want to keep using plpgsql, you can simply create a temporary table, using the ON COMMIT DROP option, fill it up with these parameters as one row, then use it in the DO statement. For that to work all your code must execute as part of one transaction. You can use one explicitly just in case.
The only ways to pass parameters to plpgsql code is via these 2 methods:
Declaring a function, then calling it with arguments
When already inside a plpgsql block you can call:
EXECUTE $$ INSERT ... VALUES ($1, $2, $3); $$ USING 3, 'text value', 5.234;
End notes:
As a fellow T-SQL developer who loved its freedom, but transitioned to Postgresql, I have to say that the BIG difference is that on one side there's T-SQL which gives the power, and on the other side it's a very powerful Postgresql-flavored SQL. plpgsql is very rarely warranted. In fact, in a code base of megabytes of complex SQL stuff, I can rewrite pretty much every plpgsql code in SQL. That's how powerful it really is compared to MSSQL-flavored SQL. It just takes some getting used to, and befriending the very ample documentation. Good luck!

Postgres Full Text Search customize tsvector conversion

I want to build a tsvector based on the id (1-111.1x) and the name fields of my row, all from a trigger and with tsvector_update_trigger(tsv, 'pg_catalog.german', id, name).
But my id ends up cut off like '-111.1' instead of '1-111.1x'.
Is there a way to customize the conversion, so that the id field is being retained (or to apply certain operations like lower()), all while the name field is properly converted?
Something like this (which doesn't work, as setweight expects a tsvector)?
CREATE FUNCTION tsv_trigger() RETURNS trigger AS $$
begin
new.tsv :=
setweight(new.id, 'A') ||
setweight(to_tsvector('pg_catalog.german', coalesce(new.name,'')), 'D');
return new;
end
$$ LANGUAGE plpgsql;
CREATE TRIGGER TS_tsv
BEFORE INSERT OR UPDATE ON "model"
FOR EACH ROW EXECUTE PROCEDURE
tsv_trigger();
Thanks!
I ended up creating another field '_id', that is a normalized represantation of 'id' (so instead of '1-111' -> '1111'). Search input must then be stripped of '-', '.', replacing these with empty strings.
Of course we need to be careful, as sometimes stripping certain characters might not be suitable; I've created a regex pattern that only strips them within matching ids, but not within text.
In my case, this seems like a viable solution, though its merely a workaround. I'd be happy to flag a real solution.

Print ASCII-art formatted SETOF records from inside a PL/pgSQL function

I would love to exploit the SQL output formatting of PostgreSQL inside my PL/pgSQL functions, but I'm starting to feel I have to give up the idea.
I have my PL/pgSQL function query_result:
CREATE OR REPLACE FUNCTION query_result(
this_query text
) RETURNS SETOF record AS
$$
BEGIN
RETURN QUERY EXECUTE this_query;
END;
$$ LANGUAGE plpgsql;
..merrily returning a SETOF records from an input text query, and which I can use for my SQL scripting with dynamic queries:
mydb=# SELECT * FROM query_result('SELECT ' || :MYVAR || ' FROM Alice') AS t (id int);
id
----
1
2
3
So my hope was to find a way to deliver this same nicely formatted output from inside a PL/pgSQL function instead, but RAISE does not support SETOF types, and there's no magic predefined cast from SETOF records to text (I know I could create my own CAST..)
If I create a dummy print_result function:
CREATE OR REPLACE FUNCTION print_result(
this_query text
) RETURNS void AS
$$
BEGIN
SELECT query_result(this_query);
END;
$$ LANGUAGE plpgsql;
..I cannot print the formatted output:
mydb=# SELECT print_result('SELECT ' || :MYVAR || ' FROM Alice');
ERROR: set-valued function called in context that cannot accept a set
...
Thanks for any suggestion (which preferably works with PostgreSQL 8.4).
Ok, to do anything with your result set in print_result you'll have to loop over it. That'll look something like this -
Here result_record is defined as a record variable. For the sake of explanation, we'll also assume that you have a formatted_results variable that is defined as text and defaulted to a blank string to hold the formatted results.
FOR result_record IN SELECT * FROM query_result(this_query) AS t (id int) LOOP
-- With all this, you can do something like this
formatted_results := formatted_results ||','|| result_record.id;
END LOOP;
RETURN formatted_results;
So, if you change print_results to return text, declare the variables as I've described and add this in, your function will return a comma-separated list of all your results (with an extra comma at the end, I'm sure you can make use of PostgreSQL's string functions to trim that). I'm not sure this is exactly what you want, but this should give you a good idea about how to manipulate your result set. You can get more information here about control structures, which should let you do pretty much whatever you want.
EDIT TO ANSWER THE REAL QUESTION:
The ability to format data tuples as readable text is a feature of the psql client, not the PostgreSQL server. To make this feature available in the server would require extracting relevant code or modules from the psql utility and recompiling them as a database function. This seems possible (and it is also possible that someone has already done this), but I am not familiar enough with the process to provide a good description of how to do that. Most likely, the best solution for formatting query results as text will be to make use of PostgreSQL's string formatting functions to implement the features you need for your application.