Virtualize columns in SQL function - postgresql

I'm trying to figure out if it's possible to create an SQL function that treats an argument row as if it were "duck-typed". That is, I would like to be able to pass rows from different tables or views that have certain common column names and operate on those columns within the function.
Here's a very trivial example to try to describe the issue:
=> CREATE TABLE tab1 (
id SERIAL PRIMARY KEY,
has_desc BOOLEAN,
x1 TEXT,
description TEXT
);
CREATE TABLE
=> CREATE FUNCTION get_desc(tab tab1) RETURNS TEXT AS $$
SELECT CASE tab.has_desc
WHEN True THEN
tab.description
ELSE
'Default Description'
END;
$$ LANGUAGE SQL;
=> INSERT INTO tab1 (has_desc, x1, description) VALUES (True, 'Foo', 'FooDesc');
INSERT 0 1
=> INSERT INTO tab1 (has_desc, x1, description) VALUES (True, 'Bar', 'BarDesc');
INSERT 0 1
=> SELECT get_desc(tab1) FROM tab1;
get_desc
----------
BarDesc
FooDesc
(2 rows)
This is of course very artificial. In reality, my table has many more fields, and the function is way more complicated that that.
Now I want to add other tables/views and pass them to the same function. The new tables/views have columns that differ, but the columns the function will care about are common to all of them. To add to the trivial example, I add these two tables:
CREATE TABLE tab2 (
id SERIAL PRIMARY KEY,
has_desc BOOLEAN,
x2 TEXT,
description TEXT
);
CREATE TABLE tab3 (
id SERIAL PRIMARY KEY,
has_desc BOOLEAN,
x3 TEXT,
description TEXT
);
Note all three have the has_desc and description fields that are the only ones actually used in get_desc. But of course if I try to use the existing function with tab2, I get:
=> select get_desc(tab2) FROM tab2;
ERROR: function get_desc(tab2) does not exist
LINE 1: select get_desc(tab2) FROM tab2;
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
I would like to be able to define a common function that does the same thing as get_desc but takes as argument a row from any of the three tables. Is there any way to do that?
Or alternatively is there some way to cast entire rows to a common row type that includes only a defined set of fields?
(I realize I could change the function arguments to just take XX.has_desc and XX.description but I'm trying to isolate which fields are used inside the function without needing to expand those in every place the function is called.)

You can create a cast:
CREATE CAST (tab2 AS tab1) WITH INOUT;
INSERT INTO tab2 (has_desc, x2, description) VALUES (True, 'Bar', 'From Tab2');
SELECT get_desc(tab2::tab1) FROM tab2;
get_desc
-----------
From Tab2
(1 row)

I'm adding an answer to show the complete way I solved this for posterity. But thanks to #klin for getting me pointed in the right direction. (One problem with #klin's bare CAST is that it doesn't produce the right row type when the two tables' common columns don't appear in the same relative position within their respective column lists.)
My solution adds a new custom TYPE (gdtab) containing the common fields, then a function that can convert from each source table's row type to the gdtab type, then adding a CAST to make each conversion implicit.
-- Common type for get_desc function
CREATE TYPE gdtab AS (
id INTEGER,
has_desc BOOLEAN,
description TEXT
);
CREATE FUNCTION get_desc(tab gdtab) RETURNS TEXT AS $$
SELECT CASE tab.has_desc
WHEN True THEN
tab.description
ELSE
'Default Description'
END;
$$ LANGUAGE SQL;
CREATE TABLE tab1 (
id SERIAL PRIMARY KEY,
has_desc BOOLEAN,
x1 TEXT,
description TEXT
);
-- Convert tab1 rowtype to gdtab type
CREATE FUNCTION tab1_as_gdtab(t tab1) RETURNS gdtab AS $$
SELECT CAST(ROW(t.id, t.has_desc, t.description) AS gdtab);
$$ LANGUAGE SQL;
-- Implicitly cast from tab1 to gdtab as needed for get_desc
CREATE CAST (tab1 AS gdtab) WITH FUNCTION tab1_as_gdtab(tab1) AS IMPLICIT;
CREATE TABLE tab2 (
id SERIAL PRIMARY KEY,
x2 TEXT,
x2x TEXT,
has_desc BOOLEAN,
description TEXT
);
CREATE FUNCTION tab2_as_gdtab(t tab2) RETURNS gdtab AS $$
SELECT CAST(ROW(t.id, t.has_desc, t.description) AS gdtab);
$$ LANGUAGE SQL;
CREATE CAST (tab2 AS gdtab) WITH FUNCTION tab2_as_gdtab(tab2) AS IMPLICIT;
Test usage:
INSERT INTO tab1 (has_desc, x1, description) VALUES (True, 'FooBlah', 'FooDesc'),
(False, 'BazBlah', 'BazDesc'),
(True, 'BarBlah', 'BarDesc');
INSERT INTO tab2 (has_desc, x2, x2x, description) VALUES (True, 'FooBlah', 'x2x', 'FooDesc'),
(False, 'BazBlah', 'x2x', 'BazDesc'),
(True, 'BarBlah', 'x2x', 'BarDesc');
SELECT get_desc(tab1) FROM tab1;
SELECT get_desc(tab2) FROM tab2;

Postgresql functions depend on the schema of the argument. If the different tables have different schemas, then you can always project them into the common sub-schema needed by get_desc. This can be done in a quick and temporary fashion with a WITH clause before the get_desc use.
If this answer is too thin on details, just add a comment and I'll flesh out some example.
More details:
CREATE TABLE subschema_table ( has_desc boolean, description text ) ;
CREATE FUNCTION get_desc1(tab subschema_table) RETURNS TEXT AS $$
SELECT CASE tab.has_desc
WHEN True THEN
tab.description
ELSE
'Default Description'
END; $$ LANGUAGE SQL;
Now, the following will work (with other tables also):
WITH subschema AS (SELECT has_desc, description FROM tab1)
SELECT get_desc1(subschema) FROM subschema;
The VIEW method didn't work in my test (VIEWs don't seem to have the appropriate schema.
Maybe the other answer gives a better way.

Related

Filter a row by column types in a function

Right now I have a generic notification function that is triggered after create on a couple of tables in my database (there's a node process on the other end listening for notifications). Here's what my update trigger looks like:
CREATE OR REPLACE FUNCTION notify_create() RETURNS trigger
LANGUAGE plpgsql
AS $$
BEGIN
PERFORM pg_notify('update_watchers',
json_build_object(
'eventType', 'new',
'type', TG_TABLE_NAME,
'payload', row_to_json(NEW)
)::text
);
RETURN NEW;
END;
$$;
The problem is, if NEW is too big, this will overflow the limit of 8000 bytes in a couple of limited corner cases (I rarely have a new item in the table that is that big). In the notify_update function, I just report on which columns have changed by listing the column names. That would work here, but what I would rather do is only have row_to_json pull out entries from NEW that are of type integer.
That is because sometimes what I'm notifying is "hey there's a new entry in an entity table". The new entry could be from a couple of different tables (documents, profiles, etc). In that case, I really only need the id, since anyone who is interested in the new value ends up fetching it later anyway.
Sometimes I'm notifying "hey, there's a new entry in a join table", in which case I don't have an id field but instead have something like documents_id and profiles_id.
I could just write a bunch of different notify_create functions, for each scenario. I'd prefer to have one that did something like
row_to_json(NEW.filter(t => typeof t === 'number'))
to mix together plpgsql and javascript notation, but I'm sure you get the point: only include those fields of NEW that are number typed
Is this possible, or should I just write a bunch of different notifiers?
You can easily eliminate json objects of type other than number, e.g.:
with my_table(int1, text1, int2, date1, float1) as (
values
(1, 'text1', 100, '2017-01-01'::date, 123.54)
)
select jsonb_object_agg(key, value) filter (where jsonb_typeof(value) = 'number')
from my_table,
jsonb_each(to_jsonb(my_table))
jsonb_object_agg
--------------------------------------------
{"int1": 1, "int2": 100, "float1": 123.54}
(1 row)
The function below leaves only integers:
create or replace function leave_integers(jdata jsonb)
returns jsonb language sql as $$
select jsonb_object_agg(key, value)
filter (
where jsonb_typeof(value) = 'number'
and value::text not like '%.%')
from jsonb_each(jdata)
$$;
with my_table(int1, text1, int2, date1, float1) as (
values
(1, 'text1', 100, '2017-01-01'::date, 123.54)
)
select leave_integers(to_jsonb(my_table))
from my_table;
leave_integers
--------------------------
{"int1": 1, "int2": 100}
(1 row)
Alternative (better) solution
This function checks Postgres types directly and returns values strictly from integer columns.
create or replace function integer_columns_to_jsonb(anyelement)
returns jsonb language sql as $$
select jsonb_object_agg(key, value)
from jsonb_each(to_jsonb($1))
where key in (
select attname
from pg_type t
join pg_attribute on typrelid = attrelid
where t.oid = pg_typeof($1)
and atttypid = 'int'::regtype)
$$;
The example shows that the function eliminates some corner cases handled incorrectly by leave_integers():
create table my_table (int1 int, int2 int, float1 float, text1 text);
insert into my_table values (1, 2, 3, '4');
select integer_columns_to_jsonb(t), leave_integers(to_jsonb(t))
from my_table t;
integer_columns_to_jsonb | leave_integers
--------------------------+-------------------------------------
{"int1": 1, "int2": 2} | {"int1": 1, "int2": 2, "float1": 3}
(1 row)

Postgresql - retrieving referenced fields in a query

I have a table created like
CREATE TABLE data
(value1 smallint references labels,
value2 smallint references labels,
value3 smallint references labels,
otherdata varchar(32)
);
and a second 'label holding' table created like
CREATE TABLE labels (id serial primary key, name varchar(32));
The rationale behind it is that value1-3 are a very limited set of strings (6 options) and it seems inefficient to enter them directly in the data table as varchar types. On the other hand these do occasionally change, which makes enum types unsuitable.
My question is, how can I execute a single query such that instead of the label IDs I get the relevant labels?
I looked at creating a function for it and stumbled at the point where I needed to pass the label holding table name to the function (there are several such (label holding) tables across the schema). Do I need to create a function per label table to avoid that?
create or replace function translate
(ref_id smallint,reference_table regclass) returns varchar(128) as
$$
begin
select name from reference_table where id = ref_id;
return name;
end;
$$
language plpgsql;
And then do
select
translate(value1, labels) as foo,
translate(value2, labels) as bar
from data;
This however errors out with
ERROR: relation "reference_table" does not exist
All suggestions welcome - at this point a can still alter just about anything...
CREATE TABLE labels
( id smallserial primary key
, name varchar(32) UNIQUE -- <<-- might want this, too
);
CREATE TABLE data
( value1 smallint NOT NULL REFERENCES labels(id) -- <<-- here
, value2 smallint NOT NULL REFERENCES labels(id)
, value3 smallint NOT NULL REFERENCES labels(id)
, otherdata varchar(32)
, PRIMARY KEY (value1,value2,value3) -- <<-- added primary key here
);
-- No need for a function here.
-- For small sizes of the `labels` table, the query below will always
-- result in hash-joins to perform the lookups.
SELECT l1.name AS name1, l2.name AS name2, l3.name AS name3
, d.otherdata AS the_data
FROM data d
JOIN labels l1 ON l1.id = d.value1
JOIN labels l2 ON l2.id = d.value2
JOIN labels l3 ON l3.id = d.value3
;
Note: labels.id -> labels.name is a functional dependency (id is the primary key), but that doesn't mean that you need a function. The query just acts like a function.
You can pass the label table name as string, construct a query as string and execute it:
sql = `select name from ` || reference_table_name || `where id = ` || ref_id;
EXECUTE sql INTO name;
RETURN name;

access postgres field given field name as text string

I have a table in postgres:
create table fubar (
name1 text,
name2 text, ...,
key integer);
I want to write a function which returns field values from fubar given the column names:
function getFubarValues(col_name text, key integer) returns text ...
where getFubarValues returns the value of the specified column in the row identified by key. Seems like this should be easy.
I'm at a loss. Can someone help? Thanks.
Klin's answer is a good (i.e. safe) approach to the question as posed, but it can be simplified:
PostgreSQL's -> operator allows expressions. For example:
CREATE TABLE test (
id SERIAL,
js JSON NOT NULL,
k TEXT NOT NULL
);
INSERT INTO test (js,k) VALUES ('{"abc":"def","ghi":"jkl"}','abc');
SELECT js->k AS value FROM test;
Produces
value
-------
"def"
So we can combine that with row_to_json:
CREATE TABLE test (
id SERIAL,
a TEXT,
b TEXT,
k TEXT NOT NULL
);
INSERT INTO test (a,b,k) VALUES
('foo','bar','a'),
('zip','zag','b');
SELECT row_to_json(test)->k AS value FROM test;
Produces:
value
-------
"foo"
"zag"
Here I'm getting the key from the table itself but of course you could get it from any source / expression. It's just a value. Also note that the result returned is a JSON value type (it doesn't know if it's text, numeric, or boolean). If you want it to be text, just cast it: (row_to_json(test)->k)::TEXT
Now that the question itself is answered, here's why you shouldn't do this, and what you should do instead!
Never trust any data. Even if it already lives inside your database, you shouldn't trust it. The method I've posted here is safe against SQL injection attacks, but an attacker could still set k to 'id' and see a column which was not intended to be visible to them.
A much better approach is to structure your data with this type of query in mind. Postgres has some excellent datatypes for this; HSTORE and JSON/JSONB. Merge your dynamic columns into a single column with one of those types (I'd suggest HSTORE for its simplicity and generally being more complete).
This has several advantages: your schema is well-defined and does not need to change if you add more dynamic columns, you do not need to perform expensive re-casting (i.e. row_to_json), and you are able to take advantage of indexes on your columns (thanks to PostgreSQL's functional indexes).
The equivalent to the code I wrote above would be:
CREATE EXTENSION HSTORE; -- necessary if you're not already using HSTORE
CREATE TABLE test (
id SERIAL,
cols HSTORE NOT NULL,
k TEXT NOT NULL
);
INSERT INTO test (cols,k) VALUES
('a=>"foo",b=>"bar"','a'),
('a=>"zip",b=>"zag"','b');
SELECT cols->k AS value FROM test;
Or, for automatic escaping of your values when inserting, you can use one of:
INSERT INTO test (cols,k) VALUES
(hstore( 'a', 'foo' ) || hstore( 'b', 'bar' ), 'a'),
(hstore( ARRAY['a','b'], ARRAY['zip','zag'] ), 'b');
See http://www.postgresql.org/docs/9.1/static/hstore.html for more details.
You can use dynamic SQL to select a column by name:
create or replace function get_fubar_values (col_name text, row_key integer)
returns setof text language plpgsql as $$begin
return query execute 'select ' || quote_ident(col_name) ||
' from fubar where key = $1' using row_key;
end$$;

Avoid putting PostgreSQL function result into one field

The end result of what I am after is a query that calls a function and that function returns a set of records that are in their own separate fields. I can do this but the results of the function are all in one field.
ie: http://i.stack.imgur.com/ETLCL.png and the results I am after are: http://i.stack.imgur.com/wqRQ9.png
Here's the code to create the table
CREATE TABLE tbl_1_hm
(
tbl_1_hm_id bigserial NOT NULL,
tbl_1_hm_f1 VARCHAR (250),
tbl_1_hm_f2 INTEGER,
CONSTRAINT tbl_1_hm PRIMARY KEY (tbl_1_hm_id)
)
-- do that for a few times to get some data
INSERT INTO tbl_1_hm (tbl_1_hm_f1, tbl_1_hm_f2)
VALUES ('hello', 1);
CREATE OR REPLACE FUNCTION proc_1_hm(id BIGINT)
RETURNS TABLE(tbl_1_hm_f1 VARCHAR (250), tbl_1_hm_f2 int AS $$
SELECT tbl_1_hm_f1, tbl_1_hm_f2
FROM tbl_1_hm
WHERE tbl_1_hm_id = id
$$ LANGUAGE SQL;
--And here is the current query I am running for my results:
SELECT t1.tbl_1_hm_id, proc_1_hm(t1.tbl_1_hm_id) AS t3
FROM tbl_1_hm AS t1
Thanks for having a read. Please if you want to haggle about the semantics of what I am doing by hitting the same table twice or my naming convention --> this is a simplified test.
When a function returns a set of records, you should treat it as a table source:
SELECT t1.tbl_1_hm_id, t3.*
FROM tbl_1_hm AS t1, proc_1_hm(t1.tbl_1_hm_id) AS t3;
Note that functions are implicitly using a LATERAL join (scroll down to sub-sections 4 and 5) so you can use fields from tables listed previously without having to specify an explicit JOIN condition.

Is there a more succinct way to cast query results to a type?

I'm often casting query results to user defined types. Consider this simplistic example:
test=# create type test_type as (a int, b int);
CREATE TYPE
test=# create table test_table (a int, b int);
CREATE TABLE
test=# insert into test_table values (1,2);
INSERT 0 1
test=# select r::test_type from (select * from test_table t) as r;
r
-------
(1,2)
(1 row)
For a lot of my queries having a subquery is necessary and that works great. However, sometimes it's a simple 1 to 1 mapping from table to type, like in the example above.
Is there an easier way to express this?
When I try what seems obvious to me I get errors:
test=# select t::test_type from test_table t;
ERROR: cannot cast type test_table to test_type
LINE 1: select t::test_type from test_table t;
^
You can use the ROW construct with wildcard column expansion for this.
regress=> select ROW(t.*)::testtype from testtable t;
row
-------
(1,2)
(1 row)
A bit easier
select t::test_type
from (table test_table) t;
or create your own cast
create function test_table_2_test_type (test_table_value test_table)
returns test_type as $$
select test_table_value.a, test_table_value.b;
$$ language sql;
create cast (test_table as test_type)
with function test_table_2_test_type (test_table);
select t::test_type
from test_table t;
t
-------
(1,2)
http://www.postgresql.org/docs/current/static/sql-createcast.html
Casting techniques
Or you can cast to text as intermediary type, since everything can be cast to and from text:
SELECT t::text::test_type FROM test_table t;
This is the "catch-all" ad-hoc solution for all kinds of similar problems, where you know types to be compatible, but a direct cast is missing.
The ROW constructor as demonstrated by #Craig is more elegant for this case.
Creating a new cast like demonstrated by #Clodoaldo is smarter for cases you are going to use regularly.
However, the superior solution would be to remove the problem.
Simple solution: use only the table type
For the simple case demonstrated, just don't create an additional type at all. Use the type test_table that's created implicitly automatically. Per documentation:
CREATE TABLE also automatically creates a data type that represents
the composite type corresponding to one row of the table.
So, just:
CREATE TABLE test_table (a int, b int);
INSERT INTO test_table VALUES (1,2);
SELECT r FROM test_table r;
r
-------
(1,2)
But the demo in your question is probably just a simplification. For a general solution:
Multiple tables sharing the same type
Postgres is called an ORDBMS (object-relational database management system) by some, for a reason. Use typed tables.
Create the common type explicitly or reuse the implicit type of the "master" table.
This is ..
.. not inheritance.
.. different from CREATE TABLE t (LIKE master)
.. different from CREATE TABLE t AS SELECT * FROM master LIMIT 0
Per documentation:
OF type_name
Creates a typed table, which takes its structure from the specified composite type (name optionally schema-qualified). A typed
table is tied to its type; for example the table will be dropped if
the type is dropped (with DROP TYPE ... CASCADE).
When a typed table is created, then the data types of the columns are determined by the underlying composite type and are not specified
by the CREATE TABLE command. But the CREATE TABLE command can add
defaults and constraints to the table and can specify storage parameters.
Recipe
CREATE TYPE master AS (a int, b int);
Or
CREATE TABLE master (a int, b int);
Then use that type to create more tables of the same type:
CREATE TABLE table1 OF master;
CREATE TABLE table2 OF master;
INSERT INTO table1 VALUES (1,2);
INSERT INTO table2 VALUES (1,2);
SELECT r FROM table1 r; -- returns composite type "table1"
SELECT r::master FROM table1 r; -- returns composite type "master"
The composite types master, table1, table2 are 100% identical and can be cast into each other automatically.
If I'd gone to the trouble of creating a new type, I think I'd use it in the create table statement.
create table test_table (t test_type);
insert into test_table values ((2, 3));
-- No cast needed here.
select * from test_table;
t
test_type
--
(2, 3)
-- No cast needed here.
select t from (select * from test_table) x;
t
test_type
--
(2, 3)
If you have to manufacture a value of type "test_type" on the fly, use a row constructor and a type cast. I think the "::" syntax is more concise than CAST(), but both work.
select (2, 3)::test_type
row
test_type
--
(2, 3)