PostgreSQL how to identify if values need to be quoted - postgresql

I am trying to write a function that will dynamically create the sql statement, but I am facing problems with typecasts so, how can I identify using the type of field if it needs to be quoted
-- using this I can recover the types of each field
-- but I do not have a simple way to express that for a specific type it need
-- to quote
create table test (
id serial not null,
employee_name text,
salary decimal(12,2),
hire_date date,
active boolean
);
select column_name,data_type, null as need_to_be_quoted
from information_schema.columns
where table_name = 'table_name';
column type need to be quoted (this is a missing information)
-------------------------------------
id integer false
employee_name text true
salary decimal false
hire_date date true
active boolean false
quote_ident docs says:
Return the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only if necessary
But it is not what I was expecting:
insert into test (employee_name, salary, hire_date, active)
values (quote_identy('John Doe'), quote_identy(100000), quote_identy(current_date), quote_identy(true));
This is kind of necessary because I am trying to generate the statement string dinamically.
I have values to be inserted in some table, I can discover the type of each value, but to generated the insert string statement, I should know if a specific value type should be quoted or not for example
text: type should be quoted in the string statement
boolean: should not be quoted
numeric: should not be quoted
date: should be quoted

Don't quote. Quoting adds complexity and if you don't get it just right it has syntax and security issues.
Instead use bind parameters. The details depend on what database library you're working with, but the basic idea is always the same. First prepare the statement with placeholders for your values. Then execute it passing in the values. The database takes care of putting the values into the statement. You can execute the same prepared statement multiple times optimizing your coding and database calls.
Here's how you'd do it in PL/pgSQL with EXECUTE. The linked documentation has lots of information about safely executing dynamic queries.
do $$
begin
execute '
insert into test (employee_name, salary, hire_date, active)
values ($1, $2, $3, $4)
' using 'John Doe', 100000, current_date, true;
end;
$$;
Furthermore, while writing a SQL builder is a good exercise, it's also very complicated and very easy to get subtly wrong. There are plenty of libraries which will build SQL for you.
Here's your statement using PHP's Laravel.
DB::table('test')->insert([
'employee_name' => 'John Doe',
'salary' => 100000,
'hire_date' => current_date,
'active' => true
);
If you're curious about how SQL builders work, you can dig inside Laravel for ideas. Writing one in PL/pgSQL is ambitious, but I don't know of one that exists. Good luck!

Related

Variable column name on function

I'm new to pgsql but have I have 8 years of experience with MSSQL, what i'm trying achieve is: create a function to apply this remove invalid data from names, it will remove all special characters, numbers and accents, keeping only spaces and a-Z characters, I want to use it on columns of different tables, but I cant really find what I'm doing wrong.
Here is my code:
CREATE OR REPLACE FUNCTION f_validaNome (VARCHAR(255))
RETURNS VARCHAR(255) AS
SELECT regexp_replace(unaccent($1), '[^[:alpha:]\s]', '', 'g')
COMMIT
If I run
SELECT regexp_replace(unaccent(column_name), '[^[:alpha:]\s]', '', 'g')
from TableA
my code runs fine. I don't know exactly what is wrong with the function code.
That's not how functions are written in Postgres.
As documented in the manual the function's body must be passed as a string and you need to specify which language the function is written in. Functions can be written in SQL, PL/pgSQL, PL/python, PL/perl and many others. There is also no need to reference parameters by position. Passing a dollar quoted string makes writing the function body easier.
For what you are doing, a simple SQL function is enough. It's also unnecessary to use an arbitrary character limit like 255 (which does have any performance or storage advantages over any other defined max length). So just use text.
CREATE OR REPLACE FUNCTION f_validanome (p_input text)
RETURNS text
AS
$body$ --<< string starts here.
SELECT regexp_replace(unaccent(p_input), '[^[:alpha:]\s]', '', 'g'); --<< required ; at the end
$body$ --<< string ends here
language sql
immutable; --<< required ; at the end

How to update column based on column name in postgres?

I've narrowed it down to two possibilities - DynamicSQL and using a case statement.
However, I've failed with both of these.
I simply don't understand dynamicSQL, and how I would use it in my case.
This is my attempt using case statements; one of many failed variations.
SELECT column_name,
CASE WHEN column_name = 'address' THEN (**update statement gives syntax error within here**)
END
FROM information_schema.columns
WHERE table_name = 'employees';
As an overview, I'm using Axios to talk to my Node server, which is making calls to my Heroku database using Massivejs.
Maybe this isn't the way to go - so here's my main problem:
I've ran into troubles because the values I'm planning on using as column names are sent to my server as strings. The exact call that I've been trying to use is
update employees
set $1 = $2
where employee_id = $3;
Once again, I'm passing into those using massive.
I get the error back { error: syntax error at or near "'address'"} because my incoming values are strings. My thought process was that the above statement would allow me to use variables because 'address' is encapsulated by quotes.
But alas, my thought process has failed me.
This seems to be close to answering my question, but I can't seem to figure out what to do in my case if using dynamic SQL.
How to use dynamic column names in an UPDATE or SELECT statement in a function?
Thanks in advance.
I will show you a way to do this by using a function.
First we create the employees table :
CREATE TABLE employees(
id BIGSERIAL PRIMARY KEY,
column1 TEXT,
column2 TEXT
);
Next, we create a function that requires three parameters:
columnName - the name of the column that needs to be updated
columnValue - the new value to which the column needs to be updated
employeeId - the id of the employee that will be updated
By using the format function we generate the update query as a string and use the EXECUTE command to execute the query.
Here is the code of the function.
CREATE OR REPLACE FUNCTION update_columns_on_employee(columnName TEXT, columnValue TEXT, employeeId BIGINT)
RETURNS VOID AS
$$
DECLARE update_statement TEXT := format('UPDATE EMPLOYEES SET %s = ''%s'' WHERE id = %L',columnName, columnValue, employeeId);
BEGIN
EXECUTE update_statement;
end;
$$ LANGUAGE plpgsql;
Now, lets insert some data into the employees table
INSERT INTO employees(column1, column2) VALUES ('column1_start_value','column2_start_value');
So now we currently have an employee with an id value of 1 who has 'column1_start_value' value for the column1, and 'column2_start_value' value for column2.
If we want to update the value of column2 from 'column2_start_value' to 'column2_new_value' all we have to do is execute the following call
SELECT * FROM update_columns_on_employee('column2','column2_new_value',1);

Access and return result from INSERT INTO in PL/pgSQL function

I am currently learning a lot of PostgreSQL, especially PLPGSQL and am struggling in handling query results in functions.
I want to create a wrapper around a user table and use the result later on and then return it.
In my case the user and account are two different tables and I want to create it in one go.
My first and naïve approach was to build the following:
CREATE OR REPLACE FUNCTION schema.create_user_with_login (IN email varchar, IN password varchar, IN firstname varchar DEFAULT NULL, IN surname varchar DEFAULT NULL)
RETURNS schema.user
LANGUAGE plpgsql
VOLATILE
RETURNS NULL ON NULL INPUT
AS
$$
declare
created_user schema."user";
begin
INSERT INTO schema."user" ("firstname", "surname", "email")
VALUES (firstname, surname, email)
RETURNING * INTO created_user;
// [...] create accounts and other data using e.g. created_user.id
// the query should return the initially created user
RETURN created_user
end;
$$;
This approach does not work, as schema.user has NOT NULL fields (a domain type with that constraint) and will throw an exception for the declared statement:
domain schema."USER_ID" does not allow null values
So maybe it could work, but not with in that constrained environment.
I also tried to use RETURNS SETOF schema.user and directly RETURN QUERY INSERT ...., but this does not return all columns, but instead one column with all the data.
How can I achieve the effect of returning the initial user object as a proper user row while having the data available inside the function?
I am using Postgres 9.6. My version output:
PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit
Issue 1
I also tried to use RETURNS SETOF schema.user and directly RETURN QUERY INSERT ...., but this does not return all columns, but instead
one column with all the data.
Sure it returns all columns. You have to call set-returning functions like this:
SELECT * FROM schema.create_user_with_login;
You have to declare it as RETURNS SETOF foo.users to cooperate with RETURN QUERY.
Issue 2
It's nonsense to declare your function as STRICT (synonym for RETURNS NULL ON NULL INPUT) and then declare NULL parameter default values:
... firstname varchar DEFAULT NULL, IN surname varchar DEFAULT NULL)
You cannot pass NULL values to a function defined STRICT, it would just return NULL and do nothing. While firstname and surname are meant to be optional, do not define the function strict (or pass empty strings instead or something)
More suggestions
Don't call your schema "schema".
Don't use the reserved word user as identifier at all.
Use legal, lower-case, unquoted identifiers everywhere if possible.
Function
All things considered, your function might look like this:
CREATE OR REPLACE FUNCTION foo.create_user_with_login (_email text
, _password text
, _firstname text = NULL
, _surname text = NULL)
RETURNS SETOF foo.users
LANGUAGE plpgsql AS -- do *not* define it STRICT
$func$
BEGIN
RETURN QUERY
WITH u AS (
INSERT INTO foo.users (firstname, surname, email)
VALUES (_firstname, _surname, _email)
RETURNING *
)
, a AS ( -- create account using created_user.id
INSERT INTO accounts (user_id)
SELECT u.user_id FROM u
)
-- more chained CTEs with DML statements?
TABLE u; -- return the initially created user
END
$func$;
Yes, that's a single SQL statement with several data-modifying CTE to do it all. Fastest and cleanest. The function wrapper is optional for convenience. Might as well be LANGUAGE sql. Related:
Insert data in 3 tables at a time using Postgres
I prepended function parameter names with underscore (_email) to rule out naming conventions. This is totally optional, but you have carefully keep track of the scope of conflicting parameters, variables, and column names if you don't.
TABLE u is short for SELECT * FROM u.
Is there a shortcut for SELECT * FROM?
Store results of query in a plpgsql variable?
Three distinct cases:
Single value:
Store query result in a variable using in PL/pgSQL
Single row
Declare row type variable in PL/pgSQL
Set of rows (= table)
There are no "table variables", but several other options:
PostgreSQL table variable
How to use a record type variable in plpgsql?

Is it possible to avoid explicit casts for composite types in plpgsql functions?

I am developing a framework that dynamically creates tables for contents storage on PostgreSQL 9.1. One of the API functions allows caller to save a new contents entry by specifying all fields within a given object (say, web form). In order to receive a set of fields framework creates a composite type.
Consider the following code:
CREATE SEQUENCE seq_contents MINVALUE 10000;
CREATE TABLE contents (
content_id int8 not null,
is_edited boolean not null default false,
is_published boolean not null default false,
"Input1" varchar(60),
"CheckBox1" int2,
"TheBox" varchar(60),
"Slider1" varchar(60)
);
CREATE TYPE "contentsType" AS (
"Input1" varchar(60),
"CheckBox1" int2,
"TheBox" varchar(60),
"Slider1" varchar(60)
);
CREATE OR REPLACE FUNCTION push(in_all anyelement) RETURNS int8 AS $push$
DECLARE
_c_id int8;
BEGIN
SELECT nextval('seq_contents') INTO _c_id;
EXECUTE $$INSERT INTO contents
SELECT a.*, b.*
FROM (SELECT $1, true, false) AS a,
(SELECT $2.*) AS b$$ USING _c_id, in_all;
RETURN _c_id;
END;
$push$ LANGUAGE plpgsql;
Now, in order to call this function I have to add explicit cast, like this:
SELECT push(('input1',1,'thebox','slider1')::"contentsType");
Is there a way to avoid explicit cast? As I would like external callers not to deal with casts, i.e. hide the logic behind the PostgreSQL functions. Currently I have such error:
SELECT push(('input1',1,'thebox','slider1'));
ERROR: PL/pgSQL functions cannot accept type record
CONTEXT: compilation of PL/pgSQL function "push" near line 1
Have you considered passing the record variable as its text representation?
In theory, every record variable can be cast to and from text with the normal CAST operator.
Here is the function modified so that in_all has type text and gets casted to "contentsType" in the USING clause:
CREATE OR REPLACE FUNCTION push(in_all text) RETURNS int8 AS $push$
DECLARE
_c_id int8;
BEGIN
SELECT nextval('seq_contents') INTO _c_id;
EXECUTE $$INSERT INTO contents
SELECT a.*, b.*
FROM (SELECT $1, true, false) AS a,
(SELECT $2.*) AS b$$ USING _c_id, in_all::"contentsType";
RETURN _c_id;
END;
$push$ LANGUAGE plpgsql;
Then it can be called like this (no explicit reference to the type)
select push( '(input1,1,thebox,slider1)' );
or like that (explicit record casted to text)
SELECT push(('input1',1,'thebox','slider1')::"contentsType"::text);
That would work not just with "contentsType", but any other record type, assuming the function is able to convert it back to that type.
Also in plpgsql, I assume this should work as well:
ret := push(r::text);
when r is a record variable.
Since you're hard-coding the table name into which you want to insert, and you have a fixed number and type of parameters it needs, I'm not clear on why you need the "contentsType" type at all. Why not eliminate the extra level of parentheses from the function calling, and just pass the four parameters directly? That keeps everything simpler.
CREATE OR REPLACE FUNCTION push(
"Input1" varchar(60),
"CheckBox1" int2,
"TheBox" varchar(60),
"Slider1" varchar(60)
) RETURNS int8 AS $push$
DECLARE
_c_id int8;
BEGIN
SELECT nextval('seq_contents') INTO _c_id;
EXECUTE $$INSERT INTO contents
VALUES ($1, true, false, $2, %3, %4, $5)
$$ USING _c_id, "Input1", "CheckBox1", "TheBox", "Slider1");
RETURN _c_id;
END;
$push$ LANGUAGE plpgsql;
That makes calling the function look like this:
SELECT push('input1',1,'thebox','slider1');
If you're looking to generalized the push() function so that it works for all tables, you'll hit other problems if you get past this one. You won't be able to get past the fact that the function will need to know the table name during execution. If you want to overload the function so that you can have a separate push() for each record type, you need to provide information on the record type somehow. So, if you're looking to do something like this, the short answer to your question is "No."
On the other hand, you may be making this a little harder than it needs to be. I hope you are aware that there is automatically a type created for every table, by the same name as the table. You could probably leverage that to both avoid declaring the type explicitly and to pass a record with the same name as your table -- with dummy entries for the values that the function will fill. I think you could make one totally generic push function, although it might be hard to get past the strong typing issues in plpgsql; writing the function in C might be easier if you're familiar with it.

PostgreSQL Dollar-Quoted Strings Constants to Prevent SQL Injection

Can I safely prevent SQL Injection using PostgreSQL's Dollar-Quoted String Constants?
I know the best was to handle dynamic queries is to have them generated in a application layer with a parametrized query, that's not what this question is about. All of the business logic is in stored procedures.
I have a stored procedure that takes parameters and generates a query, runs it, formats the results and returns it as a chunk of text. This function is passed a table name, column names and WHERE parameters. The WHERE parameters passed to the function are from user entered data in the database. I would like to make sure that the stings are sanitized so the query that is built is safe.
Using PostgreSQLs Dollar-Quoted Strings Constants, I should be able to safely sanitize all string input other than ' $$ '. However, if I do a string replace on "$" to escape it, I should be able to do a string comparison that is safe.
Stored Procedure:
function_name(tablename text, colnames text[], whereparam text)
--Build dynamic query...
Function Call:
SELECT
function_name('tablename', ARRAY['col1', 'col2', 'col3'], 'AND replace(col1, ''$'', ''/$'') = $$' || replace(alt_string_col, '$', '/$') || '$$ ')
FROM alttable
WHERE alt_id = 123;
Query Generated:
SELECT col1, col2, col3 FROM tablename WHERE 1=1 AND replace(col1, '$', '/$') = $$un/safe'user /$/$ data;$$
Since I'm escaping the col1 field before I compare it to escaped user data, even if the user enters, "un/safe'user $$ data;" in the field, alt_string_col, the double dollar sign does not break the query and the comparison passes.
Is this a safe way to escape strings in PostgreSQL stored procedure?
Edit1
Thanks to Erwin Brandstetter. Using the USING clause for EXECUTE I was about to create a function that can be called like this:
SELECT function_name(
'tablename',
ARRAY['col1', 'col2', 'col3'],
ARRAY[' AND col1 = $1 ', ' OR col2 = $5 '],
quote_literal(alt_string_col)::text, --Text 1-4
NULL::text,
NULL::text,
NULL::text,
alt_active_col::boolean, --Bool 1-4
NULL::boolean,
NULL::boolean,
NULL::boolean,
NULL::integer, --Int 1-4
NULL::integer,
NULL::integer,
NULL::integer
)
FROM alttable
WHERE alt_id = 123;
It gives some flexibility to the WHERE clauses that can be passed in.
Inside the stored procedure I have something like this for the EXECUTE statement.
FOR results IN EXECUTE(builtquery) USING
textParm1,
textParm2,
textParm3,
textParm4,
boolParm1,
boolParm2,
boolParm3,
boolParm4,
intParm1,
intParm2,
intParm3,
intParm4
LOOP
-- Do some stuff
END LOOP;
Use quote_ident() to safeguard against SQL injection while concatenating identifiers. Or format() in Postgres 9.1 or later.
Use the USING clause for EXECUTE in PL/pgSQL code to pass values. Or at least quote_literal().
To make sure a table name exists (and is quoted and schema-qualified automatically where necessary when concatenated), use the special data type regclass.
More about executing dynamic SQL with PL/pgSQL:
PostgreSQL parameterized Order By / Limit in table function
Table name as a PostgreSQL function parameter
Since PostgreSQL 9.0 you can also use anonymous code blocks with the DO statement to execute dynamic SQL.