PostgreSQL - finding text values that are not valid JSON - postgresql

Casting a text value that doesn't contain valid JSON to JSON just bombs out the statement. I have no idea which row or value. How do I get this?
Fingers crossed that it doesn't involve a UDF... I really don't want to inject these into our stack. (We use Ruby Rails ActiveRecord for database migrations, and I'm almost certain UDFs will not be recorded into schema.rb.)

you an check values with exception to find rows, eg:
t=# do
$$
declare _r record;
begin
for _r in (with j as (select * from (values ('3'),('{"valid":true}'),('notValid') ) as son) select column1 v from j) loop
begin
perform _r.v::json;
exception when others then raise info '%',concat(_r.v,' ',SQLSTATE);
end;
end loop;
end;
$$
;
INFO: notValid 22P02
DO
t=# \e
INFO: test3 22P02
INFO: notValid 22P02
DO

Related

Call postgresql record's field by name

I have a function that uses RECORD to temporarily store the data. I can use it - it's fine. My problem is that I can't hardcode columns I need to get from the RECORD. I must do it dynamically. Something line:
DECLARE
r1 RECORD;
r2 RECORD;
BEGIN
for r1 in Select column_name
from columns_to_process
where process_now = True
loop
for r2 in Select *
from my_data_table
where whatever
loop
-----------------------------
here I must call column by its name that is unknown at design time
-----------------------------
... do something with
r2.(r1.column_name)
end loop;
end loop;
END;
Does anyone know how to do it?
best regards
M
There is no need to select the all the qualifying rows and compute the total in a loop. Actually when working with SQL try to drop the word loop for your vocabulary; instead just use sum(column_name) in the select. The issue here is that you do not know what column to sum when the query is written, and all structural components(table names, columns names, operators, etc) must be known before submitting. You cannot use a variable for a structural component - in this case a column name. To do that you must use dynamic sql - i.e. SQL statement built by the process. The following accomplishes that: See example here.
create or replace function sum_something(
the_something text -- column name
, for_id my_table.id%type -- my_table.id
)
returns numeric
language plpgsql
as $$
declare
k_query_base constant text :=
$STMT$ Select sum(%I) from my_table where id = %s; $STMT$;
l_query text;
l_sum numeric;
begin
l_query = format(k_query_base, the_something, for_id);
raise notice E'Rumming Statememt:\n %',l_query; -- for prod raise Log
execute l_query into l_sum;
return l_sum;
end;
$$;
Well, after some time I figured out that I could use temporary table instead of RECORD. Doing so gives me all advantages of using dynamic queries so I can call any column by its name.
DECLARE
_my_var bigint;
BEGIN
create temporary table _my_temp_table as
Select _any, _column, _you, _need
from _my_table
where whatever = something;
execute 'Select ' || _any || ' from _my_temp_table' into _my_var;
... do whatever
END;
However I still believe that there should be a way to call records field by it's name.

Dynamic columns in SQL statement

I am trying to have a dynamic variable that I can specify different column's with (depending on some if statements). Explained in code, I am trying to replace this:
IF (TG_TABLE_NAME='this') THEN INSERT INTO table1 (name_id) VALUES id.NEW END IF;
IF (TG_TABLE_NAME='that') THEN INSERT INTO table1 (lastname_id) VALUES id.NEW END IF;
IF (TG_TABLE_NAME='another') THEN INSERT INTO table1 (age_id) VALUES id.NEW END IF;
With this:
DECLARE
varName COLUMN;
BEGIN
IF (TG_TABLE_NAME='this') THEN varName = 'name_id';
ELSE IF (TG_TABLE_NAME='that') THEN varName = 'lastname_id';
ELSE (TG_TABLE_NAME='another') THEN varName = 'age_id';
END IF;
INSERT INTO table1 (varName) VALUES id.NEW;
END;
The INSERT string is just an example, it's actually something longer. I am a beginner at pgSQL. I've seen some examples but I'm only getting more confused. If you can provide an answer that is also more safe from SQL injection that would be awesome.
One way to do what you're looking for is to compose your INSERT statement dynamically based on the named table. The following function approximates the logic you laid out in the question:
CREATE OR REPLACE FUNCTION smart_insert(table_name TEXT) RETURNS VOID AS $$
DECLARE
target TEXT;
statement TEXT;
BEGIN
CASE table_name
WHEN 'this' THEN target := 'name_id';
WHEN 'that' THEN target := 'lastname_id';
WHEN 'another' THEN target := 'age_id';
END CASE;
statement :=
'INSERT INTO '||table_name||'('||target||') VALUES (nextval(''id''));';
EXECUTE statement;
END;
$$ LANGUAGE plpgsql;
Note that I'm using a sequence to populate these tables (the call to nextval). I'm not sure if that is your use case, but hopefully this example is extensible enough for you to modify it to fit your scenario. A contrived demo:
postgres=# SELECT smart_insert('this');
smart_insert
--------------
(1 row)
postgres=# SELECT smart_insert('that');
smart_insert
--------------
(1 row)
postgres=# SELECT name_id FROM this;
name_id
---------
101
(1 row)
postgres=# SELECT lastname_id FROM that;
lastname_id
-------------
102
(1 row)
Your example doesn't make a lot of sense. Probably over-simplified. Anyway, here is a trigger function for the requested functionality that inserts the new id in a selected column of a target table, depending on the triggering table:
CREATE OR REPLACE FUNCTION smart_insert(table_name TEXT)
RETURNS trigger AS
$func$
BEGIN
EXECUTE
'INSERT INTO table1 ('
|| CASE TG_TABLE_NAME
WHEN 'this' THEN 'name_id'
WHEN 'that' THEN 'lastname_id'
WHEN 'another' THEN 'age_id'
END CASE
||') VALUES ($1)'
USING NEW.id;
END
$func$ LANGUAGE plpgsql;
To refer to the id column of the new row, use NEW.id not id.NEW.
To pass a value to dynamic code, use the USING clause of EXECUTE. This is faster and more elegant, avoids casting to text and back and also makes SQL injection impossible.
Don't use many variables and assignments in plpgsql, where this is comparatively expensive.
If the listed columns of the target table don't have non-default column defaults, you don't even need dynamic SQL:
CREATE OR REPLACE FUNCTION smart_insert(table_name TEXT)
RETURNS trigger AS
$func$
BEGIN
INSERT INTO table1 (name_id, lastname_id, age_id)
SELECT CASE WHEN TG_TABLE_NAME = 'this' THEN NEW.id END
, CASE WHEN TG_TABLE_NAME = 'that' THEN NEW.id END
, CASE WHEN TG_TABLE_NAME = 'another' THEN NEW.id END;
END
$func$ LANGUAGE plpgsql;
A CASE expression without ELSE clause defaults to NULL, which is the default column default.
Both variants are safe against SQL injection.

Passing table names in an array

I need to do the same deletion or purge operation (based on several conditions) on a set of tables. For that I am trying to pass the table names in an array to a function. I am not sure if I am doing it right. Or is there a better way?
I am pasting just a sample example this is not the real function I have written but the basic is same as below:
CREATE OR REPLACE FUNCTION test (tablename text[]) RETURNS int AS
$func$
BEGIN
execute 'delete * from '||tablename;
RETURN 1;
END
$func$ LANGUAGE plpgsql;
But when I call the function I get an error:
select test( {'rajeev1'} );
ERROR: syntax error at or near "{"
LINE 10: select test( {'rajeev1'} );
^
********** Error **********
ERROR: syntax error at or near "{"
SQL state: 42601
Character: 179
Array syntax
'{rajeev1, rajeev2}' or ARRAY['rajeev1', 'rajeev2']. Read the manual.
TRUNCATE
Since you are deleting all rows from the tables, consider TRUNCATE instead. Per documentation:
Tip: TRUNCATE is a PostgreSQL extension that provides a faster
mechanism to remove all rows from a table.
Be sure to study the details. If TRUNCATE works for you, the whole operation becomes very simple, since the command accepts multiple tables:
TRUNCATE rajeev1, rajeev2, rajeev3, ..
Dynamic DELETE
Else you need dynamic SQL like you already tried. The scary missing detail: you are completely open to SQL injection and catastrophic syntax errors. Use format() with %I (not %s to sanitize identifiers like table names. Or, better yet in this particular case, use an array of regclass as parameter instead:
CREATE OR REPLACE FUNCTION f_del_all(_tbls regclass)
RETURNS void AS
$func$
DECLARE
_tbl regclass;
BEGIN
FOREACH _tbl IN ARRAY _tbls LOOP
EXECUTE format('DELETE * FROM %s', _tbl);
END LOOP;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT f_del_all('{rajeev1,rajeev2,rajeev3}');
Explanation here:
Table name as a PostgreSQL function parameter
You used wrong syntax for text array constant in the function call. But even if it was right, your function is not correct.
If your function has text array as argument you should loop over the array to execute query for each element.
CREATE OR REPLACE FUNCTION test (tablenames text[]) RETURNS int AS
$func$
DECLARE
tablename text;
BEGIN
FOREACH tablename IN ARRAY tablenames LOOP
EXECUTE FORMAT('delete * from %s', tablename);
END LOOP;
RETURN 1;
END
$func$ LANGUAGE plpgsql;
You can then call the function for several tables at once, not only for one.
SELECT test( '{rajeev1, rajeev2}' );
If you do not need this feature, simply change the argument type to text.
CREATE OR REPLACE FUNCTION test (tablename text) RETURNS int AS
$func$
BEGIN
EXECUTE format('delete * from %s', tablename);
RETURN 1;
END
$func$ LANGUAGE plpgsql;
SELECT test('rajeev1');
I recommend using the format function.
If you want to execute a function (say purge_this_one_table(tablename)) on a group of tables identified by similar names you can use this construction:
create or replace function purge_all_these_tables(mask text)
returns void language plpgsql
as $$
declare
tabname text;
begin
for tabname in
select relname
from pg_class
where relkind = 'r' and relname like mask
loop
execute format(
'purge_this_one_table(%s)',
tabname);
end loop;
end $$;
select purge_all_these_tables('agg_weekly_%');
It should be:
select test('{rajeev1}');

Variables for identifiers inside IF EXISTS in a plpgsql function

CREATE OR REPLACE FUNCTION drop_now()
RETURNS void AS
$BODY$
DECLARE
row record;
BEGIN
RAISE INFO 'in';
FOR row IN
select relname from pg_stat_user_tables
WHERE schemaname='public' AND relname LIKE '%test%'
LOOP
IF EXISTS(SELECT row.relname.tm FROM row.relname
WHERE row.relname.tm < current_timestamp - INTERVAL '90 minutes'
LIMIT 1)
THEN
-- EXECUTE 'DROP TABLE ' || quote_ident(row.relname);
RAISE INFO 'Dropped table: %', quote_ident(row.relname);
END IF;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
Could you tell me how to use variables in SELECT which is inside IF EXISTS? At the present moment, row.relname.tm and row.relname are treated literally which is not I want.
CREATE OR REPLACE FUNCTION drop_now()
RETURNS void AS
$func$
DECLARE
_tbl regclass;
_found int;
BEGIN
FOR _tbl IN
SELECT relid
FROM pg_stat_user_tables
WHERE schemaname = 'public'
AND relname LIKE '%test%'
LOOP
EXECUTE format($f$SELECT 1 FROM %s
WHERE tm < now() - interval '90 min'$f$, _tbl);
GET DIAGNOSTICS _found = ROW_COUNT;
IF _found > 0 THEN
-- EXECUTE 'DROP TABLE ' || _tbl;
RAISE NOTICE 'Dropped table: %', _tbl;
END IF;
END LOOP;
END
$func$ LANGUAGE plpgsql;
Major points
row is a reserved word in the SQL standard. It's use is allowed in Postgres, but it's still unwise. I make it a habbit to prepend psql variable with an underscore _ to avoid any naming conflicts.
You don't don't select the whole row anyway, just the table name in this example. Best use a variable of type regclass, thereby avoiding SQL injection by way of illegal table names automatically. Details in this related answer:
Table name as a PostgreSQL function parameter
You don't need LIMIT in an EXISTS expression, which only checks for the existence of any rows. And you don't need meaningful target columns for the same reason. Just write SELECT 1 or SELECT * or something.
You need dynamic SQL for queries with variable identifiers. Plain SQL does not allow for that. I.e.: build a query string and EXECUTE it. Details in this closely related answer:
Dynamic SQL (EXECUTE) as condition for IF statement
The same is true for a DROP statement, should you want to run it. I added a comment.
You'll need to build your query as a string then execute that - see the section on executing dynamic commands in the plpgsql section of the manual.

I want to have my pl/pgsql script output to the screen

I have the following script that I want output to the screen from.
CREATE OR REPLACE FUNCTION randomnametest() RETURNS integer AS $$
DECLARE
rec RECORD;
BEGIN
FOR rec IN SELECT * FROM my_table LOOP
SELECT levenshtein('mystring',lower('rec.Name')) ORDER BY levenshtein;
END LOOP;
RETURN 1;
END;
$$ LANGUAGE plpgsql;
I want to get the output of the levenshein() function in a table along with the rec.Name. How would I do that? Also, it is giving me an error about the line where I call levenshtein(), saying that I should use perform instead.
Assuming that you want to insert the function's return value and the rec.name into a different table. Here is what you can do (create the table new_tab first)-
SELECT levenshtein('mystring',lower(rec.Name)) AS L_val;
INSERT INTO new_tab (L_val, rec.name);
The usage above is demonstrated below.
I guess, you can use RAISE INFO 'This is %', rec.name; to view the values.
CREATE OR REPLACE FUNCTION randomnametest() RETURNS integer AS $$
DECLARE
rec RECORD;
BEGIN
FOR rec IN SELECT * FROM my_table LOOP
SELECT levenshtein('mystring',lower(rec.Name))
AS L_val;
RAISE INFO '% - %', L_val, rec.name;
END LOOP;
RETURN 1;
END;
$$ LANGUAGE plpgsql;
Note- the FROM clause is optional in case you select from a function in a select like netxval(sequence_name) and don't have any actual table to select from i.e. like SELECT nextval(sequence_name) AS next_value;, in Oracle terms it would be SELECT sequence_name.nextval FROM dual; or SELECT function() FROM dual;. There is no dual in postgreSQL.
I also think that the ORDER BY is not necessary since my assumption would be that your function levenshtein() will most likely return only one value at any point of time, and hence wouldn't have enough data to ORDER.
If you want the output from a plpgsql function like the title says:
CREATE OR REPLACE FUNCTION randomnametest(_mystring text)
RETURNS TABLE (l_dist int, name text) AS
$BODY$
BEGIN
RETURN QUERY
SELECT levenshtein(_mystring, lower(t.name)), t.name
FROM my_table t
ORDER BY 1;
END;
$$ LANGUAGE plpgsql;
Declare the table with RETURNS TABLE.
Use RETURN QUERY to return records from the function.
Avoid naming conflicts between column names and OUT parameters (from the RETURNS TABLE clause) by table-qualifying column names in queries. OUT parameters are visible everywhere in the function body.
I made the string to compare to a parameter to the function to make this more useful.
There are other ways, but this is the most effective for the task. You need PostgreSQL 8.4 or later.
For a one-time use I would consider to just use a plain query (= function body without the RETURN QUERY above).