PostgreSQL 9.4 - select dynamic subset of JSONB fields - postgresql-9.4

Suppose I have table with some JSONB column. JSON is rather huge, and in most of the cases I need to retrieve just 5% of its content by field names (just to improve performance). Set of the needed fields varies from case to case, but it is still around 5% of whole JSON data.
I know how to do this for hardcoded set of fields. The question is - is it possible to make for externally provided set of fields. This needs to be implemented as stored function.
Here is sample with 'hardcoded set of fields':
CREATE TABLE test_data (
json_data jsonb NOT NULL
);
INSERT INTO test_data(json_data)
VALUES ('{"row_id": 1, "f1": 1, "f2": 2, "f3": 3, "f4": 4}'),
('{"row_id": 2, "f1": 1, "f2": 2, "f3": 3, "f4": 4}');
CREATE OR REPLACE FUNCTION get_important_data(IN data_set varchar = 'row_id,f1,f2')
RETURNS table (data json)
LANGUAGE plpgsql AS $$
BEGIN
RETURN QUERY
SELECT (SELECT row_to_json(data_row) FROM (
-- this needs to be dynamic
select json_data->>'row_id' as "row_id",
json_data->>'f1' as "f1",
json_data->>'f2' as "f2"
) data_row ) as data
FROM test_data;
END $$;
SELECT get_important_data(/* use default data set*/);
SELECT get_important_data('row_id,f2,f4');

While I think (I'm not sure though) you could get away without any dynamic SQL and some cumbersome joins and pivot-like queries, I think the most straightforward and possibly performing way here is a little dynamic query in your function.
This would be my quick take on your solution. Note the explicit conversion to text, to recreate the effect of your ->> operator:
CREATE OR REPLACE FUNCTION get_important_data(data_set text DEFAULT 'row_id,f1,f2') RETURNS TABLE (
data json
) AS $$
BEGIN
RETURN QUERY EXECUTE format('SELECT row_to_json(j.*) FROM test_data, LATERAL jsonb_to_record(json_data) AS j(%s);',
(SELECT array_to_string(array_agg(trim(field)), ' text, ') || ' text'
FROM unnest(string_to_array(data_set, ',')) AS field)
);
END;
$$ LANGUAGE plpgsql;
But in my opinion, since you are ultimately returning a subset of a json filed still in json format, you'd better leave it untouched. I like the following version more, and it should run faster too:
CREATE OR REPLACE FUNCTION get_important_data(data_set text DEFAULT 'row_id,f1,f2') RETURNS TABLE (
data json
) AS $$
DECLARE
rec_fields record;
arr_fields text[];
BEGIN
FOR rec_fields IN
SELECT trim(unnest(string_to_array(data_set, ','))) AS field
LOOP
arr_fields := arr_fields || format('json_data->%1$L AS %1$I', rec_fields.field);
END LOOP;
RETURN QUERY EXECUTE format('SELECT row_to_json(r) FROM (SELECT %s FROM test_data) AS r', array_to_string(arr_fields, ','));
END;
$$ LANGUAGE plpgsql;
The final touch would be dropping that text input parameter in favour of a json one ('["row_id", "f1", "f2"]'), but I'll leave that for you.

Related

Postgres - function get the values from one query then insert as dynamic sql string

I am building a function on postgresql, basically send one id from one table then re-build the insert statement of that row and insert it as string column from another table.
I have this table, in insert_query I want to store the insert statement of one row, with his variables:
create table get_insert (tabname varchar(30), insert_query varchar(5000));
I want to store something like this on insert_query column:
Insert into baseball_table (code, name) select '01','Robet';
Currently my function is storing just this, which doesn't work since I need to store the real values:
INSERT INTO baseball_table(code,name) SELECT code,name FROM baseball_table WHERE id=1;
This is my function:
CREATE OR REPLACE FUNCTION get_values(
_id character varying
)
RETURNS boolean
LANGUAGE 'plpgsql'
VOLATILE PARALLEL UNSAFE
AS $function$
DECLARE v_id integer;
DECLARE sql_brand varchar;
BEGIN
sql_query'INSERT INTO baseball_table(code,name) SELECT code,name FROM core.brand WHERE id=' || v_id ||'
';
INSERT INTO core.clone_brand (tabname, insert_query)VALUES ('brand',sql_query);
RETURN true;
END;
$function$;
Which is the best way to get the real values without making variables of each column?
Regards
I want to get the way to get the real values without making variables of each column.

How to call an array in stored procedure?

May I know on how to call an array in stored procedure? I tried to enclosed it with a bracket to put the column_name that need to be insert in the new table.
CREATE OR REPLACE PROCEDURE data_versioning_nonull(new_table_name VARCHAR(100),column_name VARCHAR(100)[], current_table_name VARCHAR(100))
language plpgsql
as $$
BEGIN
EXECUTE ('CREATE TABLE ' || quote_ident(new_table_name) || ' AS SELECT ' || quote_ident(column_name) || ' FROM ' || quote_ident(current_table_name));
END $$;
CALL data_versioning_nonull('sales_2019_sample', ['orderid', 'product', 'address'], 'sales_2019');
Using execute format() lets you replace all the quote_ident() with %I placeholders in a single text instead of a series of concatenated snippets. %1$I lets you re-use the first argument.
It's best if you use ARRAY['a','b','c']::VARCHAR(100)[] to explicitly make it an array of your desired type. '{"a","b","c"}'::VARCHAR(100)[] works too.
You'll need to convert the array into a list of columns some other way, because when cast to text, it'll get curly braces which are not allowed in the column list syntax. Demo
It's not a good practice to introduce random limitations - PostgreSQL doesn't limit identifier lengths to 100 characters, so you don't have to either. The default limit is 63 bytes, so you can go way, way longer than 100 characters (demo). You can switch that data type to a regular text. Interestingly, exceeding specified varchar length would just convert it to unlimited varchar, making it just syntax noise.
DBFiddle online demo
CREATE TABLE sales_2019(orderid INT,product INT,address INT);
CREATE OR REPLACE PROCEDURE data_versioning_nonull(
new_table_name TEXT,
column_names TEXT[],
current_table_name TEXT)
LANGUAGE plpgsql AS $$
DECLARE
list_of_columns_as_quoted_identifiers TEXT;
BEGIN
SELECT string_agg(quote_ident(name),',')
INTO list_of_columns_as_quoted_identifiers
FROM unnest(column_names) name;
EXECUTE format('CREATE TABLE %1$I.%2$I AS SELECT %3$s FROM %1$I.%4$I',
current_schema(),
new_table_name,
list_of_columns_as_quoted_identifiers,
current_table_name);
END $$;
CALL data_versioning_nonull(
'sales_2019_sample',
ARRAY['orderid', 'product', 'address']::text[],
'sales_2019');
Schema awareness: currently the procedure creates the new table in the default schema, based on a table in that same default schema - above I made it explicit, but that's what it would do without the current_schema() calls anyway. You could add new_table_schema and current_table_schema parameters and if most of the time you don't expect them to be used, you can hide them behind procedure overloads for convenience, using current_schema() to keep the implicit behaviour. Demo
First, change your stored procedure to convert selected columns from array to csv like this.
CREATE OR REPLACE PROCEDURE data_versioning_nonull(new_table_name VARCHAR(100),column_name VARCHAR(100)[], current_table_name VARCHAR(100))
language plpgsql
as $$
BEGIN
EXECUTE ('CREATE TABLE ' || quote_ident(new_table_name) || ' AS SELECT ' || array_to_string(column_name, ',') || ' FROM ' || quote_ident(current_table_name));
END $$;
Then call it as:
CALL data_versioning_nonull('sales_2019_sample', '{"orderid", "product", "address"}', 'sales_2019');

How to create a helper function for queries in postgresql?

I have a query similar to this:
SELECT COALESCE(
(SELECT jsonb_agg(s) FROM ((SELECT column FROM my_table)) AS s),
'[]'::jsonb
);
which returns all rows in json format:
[
{
"column": 1,
},
{
"column": 2
}
]
And if there are no rows, it returns a empty array instead of null.
I want to re-use this, but it would quickly become a mess to type out the whole thing everywhere. That's why I trying to create this as_json function:
-- DOESNT WORK
CREATE FUNCTION as_json(query ???)
RETURNS jsonb
LANGUAGE sql
AS $$
SELECT COALESCE((SELECT jsonb_agg(s) FROM query AS s), '[]'::jsonb);
$$;
If I could make it work, using it would be as simple as
as_json(SELECT column FROM my_table)
Can I achieve something like that in postgresql?
It's actually quite easy to write something similar to query_to_xml()
create or replace function query_to_jsonb(p_query text, p_include_nulls boolean default false)
returns jsonb
as
$$
declare
l_result jsonb;
l_sql text;
begin
if p_include_nulls then
l_sql := 'select jsonb_agg(to_jsonb(dt)) from ('||p_query||') as dt';
else
l_sql := 'select jsonb_agg(jsonb_strip_nulls(to_jsonb(dt))) from ('||p_query||') as dt';
end if;
execute l_sql
into l_result;
return coalesce(l_result, '{}');
end;
$$
language plpgsql;
Then you can return the result of a query (string) as a JSON value, e.g.
select query_to_jsonb('SELECT column FROM my_table');
To remove NULL values from the result, use:
select query_to_jsonb('select col1, col2 from some_table', true);

Function to return dynamic set of columns for given table

I have a fields table to store column information for other tables:
CREATE TABLE public.fields (
schema_name varchar(100),
table_name varchar(100),
column_text varchar(100),
column_name varchar(100),
column_type varchar(100) default 'varchar(100)',
column_visible boolean
);
And I'd like to create a function to fetch data for a specific table.
Just tried sth like this:
create or replace function public.get_table(schema_name text,
table_name text,
active boolean default true)
returns setof record as $$
declare
entity_name text default schema_name || '.' || table_name;
r record;
begin
for r in EXECUTE 'select * from ' || entity_name loop
return next r;
end loop;
return;
end
$$
language plpgsql;
With this function I have to specify columns when I call it!
select * from public.get_table('public', 'users') as dept(id int, uname text);
I want to pass schema_name and table_name as parameters to function and get record list, according to column_visible field in public.fields table.
Solution for the simple case
As explained in the referenced answers below, you can use registered (row) types, and thus implicitly declare the return type of a polymorphic function:
CREATE OR REPLACE FUNCTION public.get_table(_tbl_type anyelement)
RETURNS SETOF anyelement AS
$func$
BEGIN
RETURN QUERY EXECUTE format('TABLE %s', pg_typeof(_tbl_type));
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM public.get_table(NULL::public.users); -- note the syntax!
Returns the complete table (with all user columns).
Wait! How?
Detailed explanation in this related answer, chapter
"Various complete table types":
Refactor a PL/pgSQL function to return the output of various SELECT queries
TABLE foo is just short for SELECT * FROM foo:
Is there a shortcut for SELECT * FROM?
2 steps for completely dynamic return type
But what you are trying to do is strictly impossible in a single SQL command.
I want to pass schema_name and table_name as parameters to function and get record list, according to column_visible field in
public.fields table.
There is no direct way to return an arbitrary selection of columns (return type not known at call time) from a function - or any SQL command. SQL demands to know number, names and types of resulting columns at call time. More in the 2nd chapter of this related answer:
How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?
There are various workarounds. You could wrap the result in one of the standard document types (json, jsonb, hstore, xml).
Or you generate the query with one function call and execute the result with the next:
CREATE OR REPLACE FUNCTION public.generate_get_table(_schema_name text, _table_name text)
RETURNS text AS
$func$
SELECT format('SELECT %s FROM %I.%I'
, string_agg(quote_ident(column_name), ', ')
, schema_name
, table_name)
FROM fields
WHERE column_visible
AND schema_name = _schema_name
AND table_name = _table_name
GROUP BY schema_name, table_name
ORDER BY schema_name, table_name;
$func$ LANGUAGE sql;
Call:
SELECT public.generate_get_table('public', 'users');
This create a query of the form:
SELECT usr_id, usr FROM public.users;
Execute it in the 2nd step. (You might want to add column numbers and order columns.)
Or append \gexec in psql to execute the return value immediately. See:
How to force evaluation of subquery before joining / pushing down to foreign server
Be sure to defend against SQL injection:
INSERT with dynamic table name in trigger function
Define table and column names as arguments in a plpgsql function?
Asides
varchar(100) does not make much sense for identifiers, which are limited to 63 characters in standard Postgres:
Maximum characters in labels (table names, columns etc)
If you understand how the object identifier type regclass works, you might replace schema and table name with a singe regclass column.
I think you just need another query to get the list of columns you want.
Maybe something like (this is untested):
create or replace function public.get_table(_schema_name text, _table_name text, active boolean default true) returns setof record as $$
declare
entity_name text default schema_name || '.' || table_name;
r record;
columns varchar;
begin
-- Get the list of columns
SELECT string_agg(column_name, ', ')
INTO columns
FROM public.fields
WHERE fields.schema_name = _schema_name
AND fields.table_name = _table_name
AND fields.column_visible = TRUE;
-- Return rows from the specified table
RETURN QUERY EXECUTE 'select ' || columns || ' from ' || entity_name;
RETURN;
end
$$
language plpgsql;
Keep in mind that column/table references may need to be surrounded by double quotes if they have certain characters in them.

Execute a dynamic crosstab query

I implemented this function in my Postgres database: http://www.cureffi.org/2013/03/19/automatically-creating-pivot-table-column-names-in-postgresql/
Here's the function:
create or replace function xtab (tablename varchar, rowc varchar, colc varchar, cellc varchar, celldatatype varchar) returns varchar language plpgsql as $$
declare
dynsql1 varchar;
dynsql2 varchar;
columnlist varchar;
begin
-- 1. retrieve list of column names.
dynsql1 = 'select string_agg(distinct '||colc||'||'' '||celldatatype||''','','' order by '||colc||'||'' '||celldatatype||''') from '||tablename||';';
execute dynsql1 into columnlist;
-- 2. set up the crosstab query
dynsql2 = 'select * from crosstab (
''select '||rowc||','||colc||','||cellc||' from '||tablename||' group by 1,2 order by 1,2'',
''select distinct '||colc||' from '||tablename||' order by 1''
)
as ct (
'||rowc||' varchar,'||columnlist||'
);';
return dynsql2;
end
$$;
So now I can call the function:
select xtab('globalpayments','month','currency','(sum(total_fees)/sum(txn_amount)*100)::decimal(48,2)','text');
Which returns (because the return type of the function is varchar):
select * from crosstab (
'select month,currency,(sum(total_fees)/sum(txn_amount)*100)::decimal(48,2)
from globalpayments
group by 1,2
order by 1,2'
, 'select distinct currency
from globalpayments
order by 1'
) as ct ( month varchar,CAD text,EUR text,GBP text,USD text );
How can I get this function to not only generate the code for the dynamic crosstab, but also execute the result? I.e., the result when I manually copy/paste/execute is this. But I want it to execute without that extra step: the function shall assemble the dynamic query and execute it:
Edit 1
This function comes close, but I need it to return more than just the first column of the first record
Taken from: Are there any way to execute a query inside the string value (like eval) in PostgreSQL?
create or replace function eval( sql text ) returns text as $$
declare
as_txt text;
begin
if sql is null then return null ; end if ;
execute sql into as_txt ;
return as_txt ;
end;
$$ language plpgsql
usage: select * from eval($$select * from analytics limit 1$$)
However it just returns the first column of the first record :
eval
----
2015
when the actual result looks like this:
Year, Month, Date, TPV_USD
---- ----- ------ --------
2016, 3, 2016-03-31, 100000
What you ask for is impossible. SQL is a strictly typed language. PostgreSQL functions need to declare a return type (RETURNS ..) at the time of creation.
A limited way around this is with polymorphic functions. If you can provide the return type at the time of the function call. But that's not evident from your question.
Refactor a PL/pgSQL function to return the output of various SELECT queries
You can return a completely dynamic result with anonymous records. But then you are required to provide a column definition list with every call. And how do you know about the returned columns? Catch 22.
There are various workarounds, depending on what you need or can work with. Since all your data columns seem to share the same data type, I suggest to return an array: text[]. Or you could return a document type like hstore or json. Related:
Dynamic alternative to pivot with CASE and GROUP BY
Dynamically convert hstore keys into columns for an unknown set of keys
But it might be simpler to just use two calls: 1: Let Postgres build the query. 2: Execute and retrieve returned rows.
Selecting multiple max() values using a single SQL statement
I would not use the function from Eric Minikel as presented in your question at all. It is not safe against SQL injection by way of maliciously malformed identifiers. Use format() to build query strings unless you are running an outdated version older than Postgres 9.1.
A shorter and cleaner implementation could look like this:
CREATE OR REPLACE FUNCTION xtab(_tbl regclass, _row text, _cat text
, _expr text -- still vulnerable to SQL injection!
, _type regtype)
RETURNS text
LANGUAGE plpgsql AS
$func$
DECLARE
_cat_list text;
_col_list text;
BEGIN
-- generate categories for xtab param and col definition list
EXECUTE format(
$$SELECT string_agg(quote_literal(x.cat), '), (')
, string_agg(quote_ident (x.cat), %L)
FROM (SELECT DISTINCT %I AS cat FROM %s ORDER BY 1) x$$
, ' ' || _type || ', ', _cat, _tbl)
INTO _cat_list, _col_list;
-- generate query string
RETURN format(
'SELECT * FROM crosstab(
$q$SELECT %I, %I, %s
FROM %I
GROUP BY 1, 2 -- only works if the 3rd column is an aggregate expression
ORDER BY 1, 2$q$
, $c$VALUES (%5$s)$c$
) ct(%1$I text, %6$s %7$s)'
, _row, _cat, _expr -- expr must be an aggregate expression!
, _tbl, _cat_list, _col_list, _type);
END
$func$;
Same function call as your original version. The function crosstab() is provided by the additional module tablefunc which has to be installed. Basics:
PostgreSQL Crosstab Query
This handles column and table names safely. Note the use of object identifier types regclass and regtype. Also works for schema-qualified names.
Table name as a PostgreSQL function parameter
However, it is not completely safe while you pass a string to be executed as expression (_expr - cellc in your original query). This kind of input is inherently unsafe against SQL injection and should never be exposed to the general public.
SQL injection in Postgres functions vs prepared queries
Scans the table only once for both lists of categories and should be a bit faster.
Still can't return completely dynamic row types since that's strictly not possible.
Not quite impossible, you can still execute it (from a query execute the string and return SETOF RECORD.
Then you have to specify the return record format. The reason in this case is that the planner needs to know the return format before it can make certain decisions (materialization comes to mind).
So in this case you would EXECUTE the query, return the rows and return SETOF RECORD.
For example, we could do something like this with a wrapper function but the same logic could be folded into your function:
CREATE OR REPLACE FUNCTION crosstab_wrapper
(tablename varchar, rowc varchar, colc varchar,
cellc varchar, celldatatype varchar)
returns setof record language plpgsql as $$
DECLARE outrow record;
BEGIN
FOR outrow IN EXECUTE xtab($1, $2, $3, $4, $5)
LOOP
RETURN NEXT outrow
END LOOP;
END;
$$;
Then you supply the record structure on calling the function just like you do with crosstab.
Then when you all the query you would have to supply a record structure (as (col1 type, col2 type, etc) like you do with connectby.