Filter a row by column types in a function - postgresql

Right now I have a generic notification function that is triggered after create on a couple of tables in my database (there's a node process on the other end listening for notifications). Here's what my update trigger looks like:
CREATE OR REPLACE FUNCTION notify_create() RETURNS trigger
LANGUAGE plpgsql
AS $$
BEGIN
PERFORM pg_notify('update_watchers',
json_build_object(
'eventType', 'new',
'type', TG_TABLE_NAME,
'payload', row_to_json(NEW)
)::text
);
RETURN NEW;
END;
$$;
The problem is, if NEW is too big, this will overflow the limit of 8000 bytes in a couple of limited corner cases (I rarely have a new item in the table that is that big). In the notify_update function, I just report on which columns have changed by listing the column names. That would work here, but what I would rather do is only have row_to_json pull out entries from NEW that are of type integer.
That is because sometimes what I'm notifying is "hey there's a new entry in an entity table". The new entry could be from a couple of different tables (documents, profiles, etc). In that case, I really only need the id, since anyone who is interested in the new value ends up fetching it later anyway.
Sometimes I'm notifying "hey, there's a new entry in a join table", in which case I don't have an id field but instead have something like documents_id and profiles_id.
I could just write a bunch of different notify_create functions, for each scenario. I'd prefer to have one that did something like
row_to_json(NEW.filter(t => typeof t === 'number'))
to mix together plpgsql and javascript notation, but I'm sure you get the point: only include those fields of NEW that are number typed
Is this possible, or should I just write a bunch of different notifiers?

You can easily eliminate json objects of type other than number, e.g.:
with my_table(int1, text1, int2, date1, float1) as (
values
(1, 'text1', 100, '2017-01-01'::date, 123.54)
)
select jsonb_object_agg(key, value) filter (where jsonb_typeof(value) = 'number')
from my_table,
jsonb_each(to_jsonb(my_table))
jsonb_object_agg
--------------------------------------------
{"int1": 1, "int2": 100, "float1": 123.54}
(1 row)
The function below leaves only integers:
create or replace function leave_integers(jdata jsonb)
returns jsonb language sql as $$
select jsonb_object_agg(key, value)
filter (
where jsonb_typeof(value) = 'number'
and value::text not like '%.%')
from jsonb_each(jdata)
$$;
with my_table(int1, text1, int2, date1, float1) as (
values
(1, 'text1', 100, '2017-01-01'::date, 123.54)
)
select leave_integers(to_jsonb(my_table))
from my_table;
leave_integers
--------------------------
{"int1": 1, "int2": 100}
(1 row)
Alternative (better) solution
This function checks Postgres types directly and returns values strictly from integer columns.
create or replace function integer_columns_to_jsonb(anyelement)
returns jsonb language sql as $$
select jsonb_object_agg(key, value)
from jsonb_each(to_jsonb($1))
where key in (
select attname
from pg_type t
join pg_attribute on typrelid = attrelid
where t.oid = pg_typeof($1)
and atttypid = 'int'::regtype)
$$;
The example shows that the function eliminates some corner cases handled incorrectly by leave_integers():
create table my_table (int1 int, int2 int, float1 float, text1 text);
insert into my_table values (1, 2, 3, '4');
select integer_columns_to_jsonb(t), leave_integers(to_jsonb(t))
from my_table t;
integer_columns_to_jsonb | leave_integers
--------------------------+-------------------------------------
{"int1": 1, "int2": 2} | {"int1": 1, "int2": 2, "float1": 3}
(1 row)

Related

ERROR: column "int4" specified more than once

Steps for Execution:
Table Creation
CREATE TABLE xyz.table_a(
id bigint NOT NULL,
scores jsonb,
CONSTRAINT table_a_pkey PRIMARY KEY (id)
);
Add some dummy data :
INSERT INTO xyz.table_a(
id, scores)
VALUES (1, '{"a":20,"b":20}');
Function Creation
CREATE OR REPLACE FUNCTION xyz.example(
table_name text,
regular_columns text,
json_column text,
view_name text
) RETURNS text
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
DECLARE
cols TEXT;
cols_sum TEXT;
BEGIN
EXECUTE
format(
$ex$SELECT string_agg(
format(
'CAST(%2$s->>%%1$L AS INTEGER)',
key),
', '
)
FROM (SELECT DISTINCT key
FROM %1$s, jsonb_each(%2$s)
ORDER BY 1
) s;$ex$,
table_name, json_column
)
INTO cols;
EXECUTE
format(
$ex$SELECT string_agg(
format(
'CAST(%2$s->>%%1$L AS INTEGER)',
key
),
'+'
)
FROM (SELECT DISTINCT key
FROM %1$s, jsonb_each(%2$s)
ORDER BY 1) s;$ex$,
table_name, json_column
)
INTO cols_sum;
EXECUTE
format(
$ex$DROP VIEW IF EXISTS %2$s;
CREATE VIEW %2$s AS
SELECT %3$s, %4$s, SUM(%5$s) AS total
FROM %1$s
GROUP BY %3$s$ex$,
table_name, view_name, regular_columns, cols, cols_sum
);
RETURN cols;
END
$BODY$:
Call Function
SELECT xyz.example(
'xyz.table_a',
' id',
'scores',
'xyz.view_table_a'
);
Once you run these steps, I am getting an error
ERROR: column "int4" specified more than once
CONTEXT: SQL statement "
DROP VIEW IF EXISTS xyz.view_table_a;
CREATE VIEW xyz.view_table_a AS
SELECT id, CAST(scores->>'a' AS INTEGER), CAST(scores->>'b' AS INTEGER), SUM(CAST(scores->>'a' AS INTEGER)+CAST(scores->>'b' AS INTEGER)) AS total FROM xyz.table_a GROUP BY id
Look at the error message closely:
...
SELECT id, CAST(scores->>'a' AS INTEGER), CAST(scores->>'b' AS INTEGER),
...
There are multiple expressions without column alias. A named column like "id" defaults to the given name. But other expressions default to the internal type name, which is "int4" for integer. One might assume that the JSON key name is used, but that's not so. CAST(scores->>'a' AS INTEGER) is just another expression returning an unnamed integer value.
This still works for a plain SELECT. Postgres tolerates duplicate column names in the (outer) SELECT list. But a VIEW cannot be created that way. Would result in ambiguities.
Either add column aliases to expressions in the SELECT list:
SELECT id, CAST(scores->>'a' AS INTEGER) AS a, CAST(scores->>'b' AS INTEGER) AS b, ...
Or add a list of column names to CREATE VIEW:
CREATE VIEW xyz.view_table_a(id, a, b, ...) AS ...
Something like this should fix your function (preserving literal spelling of JSON key names:
...
format(
'CAST(%2$s->>%%1$L AS INTEGER) AS %%1$I',
key),
...
See the working demo here:
db<>fiddle here
Aside, your nested format() calls make the code pretty hard to read and maintain.

Virtualize columns in SQL function

I'm trying to figure out if it's possible to create an SQL function that treats an argument row as if it were "duck-typed". That is, I would like to be able to pass rows from different tables or views that have certain common column names and operate on those columns within the function.
Here's a very trivial example to try to describe the issue:
=> CREATE TABLE tab1 (
id SERIAL PRIMARY KEY,
has_desc BOOLEAN,
x1 TEXT,
description TEXT
);
CREATE TABLE
=> CREATE FUNCTION get_desc(tab tab1) RETURNS TEXT AS $$
SELECT CASE tab.has_desc
WHEN True THEN
tab.description
ELSE
'Default Description'
END;
$$ LANGUAGE SQL;
=> INSERT INTO tab1 (has_desc, x1, description) VALUES (True, 'Foo', 'FooDesc');
INSERT 0 1
=> INSERT INTO tab1 (has_desc, x1, description) VALUES (True, 'Bar', 'BarDesc');
INSERT 0 1
=> SELECT get_desc(tab1) FROM tab1;
get_desc
----------
BarDesc
FooDesc
(2 rows)
This is of course very artificial. In reality, my table has many more fields, and the function is way more complicated that that.
Now I want to add other tables/views and pass them to the same function. The new tables/views have columns that differ, but the columns the function will care about are common to all of them. To add to the trivial example, I add these two tables:
CREATE TABLE tab2 (
id SERIAL PRIMARY KEY,
has_desc BOOLEAN,
x2 TEXT,
description TEXT
);
CREATE TABLE tab3 (
id SERIAL PRIMARY KEY,
has_desc BOOLEAN,
x3 TEXT,
description TEXT
);
Note all three have the has_desc and description fields that are the only ones actually used in get_desc. But of course if I try to use the existing function with tab2, I get:
=> select get_desc(tab2) FROM tab2;
ERROR: function get_desc(tab2) does not exist
LINE 1: select get_desc(tab2) FROM tab2;
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
I would like to be able to define a common function that does the same thing as get_desc but takes as argument a row from any of the three tables. Is there any way to do that?
Or alternatively is there some way to cast entire rows to a common row type that includes only a defined set of fields?
(I realize I could change the function arguments to just take XX.has_desc and XX.description but I'm trying to isolate which fields are used inside the function without needing to expand those in every place the function is called.)
You can create a cast:
CREATE CAST (tab2 AS tab1) WITH INOUT;
INSERT INTO tab2 (has_desc, x2, description) VALUES (True, 'Bar', 'From Tab2');
SELECT get_desc(tab2::tab1) FROM tab2;
get_desc
-----------
From Tab2
(1 row)
I'm adding an answer to show the complete way I solved this for posterity. But thanks to #klin for getting me pointed in the right direction. (One problem with #klin's bare CAST is that it doesn't produce the right row type when the two tables' common columns don't appear in the same relative position within their respective column lists.)
My solution adds a new custom TYPE (gdtab) containing the common fields, then a function that can convert from each source table's row type to the gdtab type, then adding a CAST to make each conversion implicit.
-- Common type for get_desc function
CREATE TYPE gdtab AS (
id INTEGER,
has_desc BOOLEAN,
description TEXT
);
CREATE FUNCTION get_desc(tab gdtab) RETURNS TEXT AS $$
SELECT CASE tab.has_desc
WHEN True THEN
tab.description
ELSE
'Default Description'
END;
$$ LANGUAGE SQL;
CREATE TABLE tab1 (
id SERIAL PRIMARY KEY,
has_desc BOOLEAN,
x1 TEXT,
description TEXT
);
-- Convert tab1 rowtype to gdtab type
CREATE FUNCTION tab1_as_gdtab(t tab1) RETURNS gdtab AS $$
SELECT CAST(ROW(t.id, t.has_desc, t.description) AS gdtab);
$$ LANGUAGE SQL;
-- Implicitly cast from tab1 to gdtab as needed for get_desc
CREATE CAST (tab1 AS gdtab) WITH FUNCTION tab1_as_gdtab(tab1) AS IMPLICIT;
CREATE TABLE tab2 (
id SERIAL PRIMARY KEY,
x2 TEXT,
x2x TEXT,
has_desc BOOLEAN,
description TEXT
);
CREATE FUNCTION tab2_as_gdtab(t tab2) RETURNS gdtab AS $$
SELECT CAST(ROW(t.id, t.has_desc, t.description) AS gdtab);
$$ LANGUAGE SQL;
CREATE CAST (tab2 AS gdtab) WITH FUNCTION tab2_as_gdtab(tab2) AS IMPLICIT;
Test usage:
INSERT INTO tab1 (has_desc, x1, description) VALUES (True, 'FooBlah', 'FooDesc'),
(False, 'BazBlah', 'BazDesc'),
(True, 'BarBlah', 'BarDesc');
INSERT INTO tab2 (has_desc, x2, x2x, description) VALUES (True, 'FooBlah', 'x2x', 'FooDesc'),
(False, 'BazBlah', 'x2x', 'BazDesc'),
(True, 'BarBlah', 'x2x', 'BarDesc');
SELECT get_desc(tab1) FROM tab1;
SELECT get_desc(tab2) FROM tab2;
Postgresql functions depend on the schema of the argument. If the different tables have different schemas, then you can always project them into the common sub-schema needed by get_desc. This can be done in a quick and temporary fashion with a WITH clause before the get_desc use.
If this answer is too thin on details, just add a comment and I'll flesh out some example.
More details:
CREATE TABLE subschema_table ( has_desc boolean, description text ) ;
CREATE FUNCTION get_desc1(tab subschema_table) RETURNS TEXT AS $$
SELECT CASE tab.has_desc
WHEN True THEN
tab.description
ELSE
'Default Description'
END; $$ LANGUAGE SQL;
Now, the following will work (with other tables also):
WITH subschema AS (SELECT has_desc, description FROM tab1)
SELECT get_desc1(subschema) FROM subschema;
The VIEW method didn't work in my test (VIEWs don't seem to have the appropriate schema.
Maybe the other answer gives a better way.

Avoid putting PostgreSQL function result into one field

The end result of what I am after is a query that calls a function and that function returns a set of records that are in their own separate fields. I can do this but the results of the function are all in one field.
ie: http://i.stack.imgur.com/ETLCL.png and the results I am after are: http://i.stack.imgur.com/wqRQ9.png
Here's the code to create the table
CREATE TABLE tbl_1_hm
(
tbl_1_hm_id bigserial NOT NULL,
tbl_1_hm_f1 VARCHAR (250),
tbl_1_hm_f2 INTEGER,
CONSTRAINT tbl_1_hm PRIMARY KEY (tbl_1_hm_id)
)
-- do that for a few times to get some data
INSERT INTO tbl_1_hm (tbl_1_hm_f1, tbl_1_hm_f2)
VALUES ('hello', 1);
CREATE OR REPLACE FUNCTION proc_1_hm(id BIGINT)
RETURNS TABLE(tbl_1_hm_f1 VARCHAR (250), tbl_1_hm_f2 int AS $$
SELECT tbl_1_hm_f1, tbl_1_hm_f2
FROM tbl_1_hm
WHERE tbl_1_hm_id = id
$$ LANGUAGE SQL;
--And here is the current query I am running for my results:
SELECT t1.tbl_1_hm_id, proc_1_hm(t1.tbl_1_hm_id) AS t3
FROM tbl_1_hm AS t1
Thanks for having a read. Please if you want to haggle about the semantics of what I am doing by hitting the same table twice or my naming convention --> this is a simplified test.
When a function returns a set of records, you should treat it as a table source:
SELECT t1.tbl_1_hm_id, t3.*
FROM tbl_1_hm AS t1, proc_1_hm(t1.tbl_1_hm_id) AS t3;
Note that functions are implicitly using a LATERAL join (scroll down to sub-sections 4 and 5) so you can use fields from tables listed previously without having to specify an explicit JOIN condition.

Merging Concatenating JSON(B) columns in query

Using Postgres 9.4, I am looking for a way to merge two (or more) json or jsonb columns in a query. Consider the following table as an example:
id | json1 | json2
----------------------------------------
1 | {'a':'b'} | {'c':'d'}
2 | {'a1':'b2'} | {'f':{'g' : 'h'}}
Is it possible to have the query return the following:
id | json
----------------------------------------
1 | {'a':'b', 'c':'d'}
2 | {'a1':'b2', 'f':{'g' : 'h'}}
Unfortunately, I can't define a function as described here. Is this possible with a "traditional" query?
In Postgres 9.5+ you can merge JSONB like this:
select json1 || json2;
Or, if it's JSON, coerce to JSONB if necessary:
select json1::jsonb || json2::jsonb;
Or:
select COALESCE(json1::jsonb||json2::jsonb, json1::jsonb, json2::jsonb);
(Otherwise, any null value in json1 or json2 returns an empty row)
For example:
select data || '{"foo":"bar"}'::jsonb from photos limit 1;
?column?
----------------------------------------------------------------------
{"foo": "bar", "preview_url": "https://unsplash.it/500/720/123"}
Kudos to #MattZukowski for pointing this out in a comment.
Here is the complete list of build-in functions that can be used to create json objects in PostgreSQL. http://www.postgresql.org/docs/9.4/static/functions-json.html
row_to_json and json_object doest not allow you to define your own keys, so it can't be used here
json_build_object expect you to know by advance how many keys and values our object will have, that's the case in your example, but should not be the case in the real world
json_object looks like a good tool to tackle this problem but it forces us to cast our values to text so we can't use this one either
Well... ok, wo we can't use any classic functions.
Let's take a look at some aggregate functions and hope for the best... http://www.postgresql.org/docs/9.4/static/functions-aggregate.html
json_object_agg Is the only aggregate function that build objects, that's our only chance to tackle this problem. The trick here is to find the correct way to feed the json_object_agg function.
Here is my test table and data
CREATE TABLE test (
id SERIAL PRIMARY KEY,
json1 JSONB,
json2 JSONB
);
INSERT INTO test (json1, json2) VALUES
('{"a":"b", "c":"d"}', '{"e":"f"}'),
('{"a1":"b2"}', '{"f":{"g" : "h"}}');
And after some trials and errors with json_object here is a query you can use to merge json1 and json2 in PostgreSQL 9.4
WITH all_json_key_value AS (
SELECT id, t1.key, t1.value FROM test, jsonb_each(json1) as t1
UNION
SELECT id, t1.key, t1.value FROM test, jsonb_each(json2) as t1
)
SELECT id, json_object_agg(key, value)
FROM all_json_key_value
GROUP BY id
For PostgreSQL 9.5+, look at Zubin's answer.
This function would merge nested json objects
create or replace function jsonb_merge(CurrentData jsonb,newData jsonb)
returns jsonb
language sql
immutable
as $jsonb_merge_func$
select case jsonb_typeof(CurrentData)
when 'object' then case jsonb_typeof(newData)
when 'object' then (
select jsonb_object_agg(k, case
when e2.v is null then e1.v
when e1.v is null then e2.v
when e1.v = e2.v then e1.v
else jsonb_merge(e1.v, e2.v)
end)
from jsonb_each(CurrentData) e1(k, v)
full join jsonb_each(newData) e2(k, v) using (k)
)
else newData
end
when 'array' then CurrentData || newData
else newData
end
$jsonb_merge_func$;
Looks like nobody proposed this kind of solution yet, so here's my take, using custom aggregate functions:
create or replace aggregate jsonb_merge_agg(jsonb)
(
sfunc = jsonb_concat,
stype = jsonb,
initcond = '{}'
);
create or replace function jsonb_concat(a jsonb, b jsonb) returns jsonb
as 'select $1 || $2'
language sql
immutable
parallel safe
;
Note: this is using || which replaces existing values at same path instead of deeply merging them.
Now jsonb_merge_agg is accessible like so:
select jsonb_merge_agg(some_col) from some_table group by something;
Also you can tranform json into text, concatenate, replace and convert back to json. Using the same data from Clément you can do:
SELECT replace(
(json1::text || json2::text),
'}{',
', ')::json
FROM test
You could also concatenate all json1 into single json with:
SELECT regexp_replace(
array_agg((json1))::text,
'}"(,)"{|\\| |^{"|"}$',
'\1',
'g'
)::json
FROM test
This is a very old solution, since 9.4 you should use json_object_agg and simple || concatenate operator. Keeping here just for reference.
However this question is answered already some time ago; the fact that when json1 and json2 contain the same key; the key appears twice in the document, does not seem to be best practice.
Therefore u can use this jsonb_merge function with PostgreSQL 9.5:
CREATE OR REPLACE FUNCTION jsonb_merge(jsonb1 JSONB, jsonb2 JSONB)
RETURNS JSONB AS $$
DECLARE
result JSONB;
v RECORD;
BEGIN
result = (
SELECT json_object_agg(KEY,value)
FROM
(SELECT jsonb_object_keys(jsonb1) AS KEY,
1::int AS jsb,
jsonb1 -> jsonb_object_keys(jsonb1) AS value
UNION SELECT jsonb_object_keys(jsonb2) AS KEY,
2::int AS jsb,
jsonb2 -> jsonb_object_keys(jsonb2) AS value ) AS t1
);
RETURN result;
END;
$$ LANGUAGE plpgsql;
The following query returns the concatenated jsonb columns, where the keys in json2 are dominant over the keys in json1:
select id, jsonb_merge(json1, json2) from test
FYI, if someone's using jsonb in >= 9.5 and they only care about top-level elements being merged without duplicate keys, then it's as easy as using the || operator:
select '{"a1": "b2"}'::jsonb || '{"f":{"g" : "h"}}'::jsonb;
?column?
-----------------------------
{"a1": "b2", "f": {"g": "h"}}
(1 row)
Try this, if anyone having an issue for merging two JSON object
select table.attributes::jsonb || json_build_object('foo',1,'bar',2)::jsonb FROM table where table.x='y';
CREATE OR REPLACE FUNCTION jsonb_merge(pCurrentData jsonb, pMergeData jsonb, pExcludeKeys text[])
RETURNS jsonb IMMUTABLE LANGUAGE sql
AS $$
SELECT json_object_agg(key,value)::jsonb
FROM (
WITH to_merge AS (
SELECT * FROM jsonb_each(pMergeData)
)
SELECT *
FROM jsonb_each(pCurrentData)
WHERE key NOT IN (SELECT key FROM to_merge)
AND ( pExcludeKeys ISNULL OR key <> ALL(pExcludeKeys))
UNION ALL
SELECT * FROM to_merge
) t;
$$;
SELECT jsonb_merge('{"a": 1, "b": 9, "c": 3, "e":5}'::jsonb, '{"b": 2, "d": 4}'::jsonb, '{"c","e"}'::text[]) as jsonb
works well as an alternative to || when recursive deep merge is required (found here) :
create or replace function jsonb_merge_recurse(orig jsonb, delta jsonb)
returns jsonb language sql as $$
select
jsonb_object_agg(
coalesce(keyOrig, keyDelta),
case
when valOrig isnull then valDelta
when valDelta isnull then valOrig
when (jsonb_typeof(valOrig) <> 'object' or jsonb_typeof(valDelta) <> 'object') then valDelta
else jsonb_merge_recurse(valOrig, valDelta)
end
)
from jsonb_each(orig) e1(keyOrig, valOrig)
full join jsonb_each(delta) e2(keyDelta, valDelta) on keyOrig = keyDelta
$$;

Returning a nested composite type from a PL/pgSQL function

I'm trying to return nested data of this format from PostgreSQL into PHP associative arrays.
[
'person_id': 1,
'name': 'My Name',
'roles': [
[ 'role_id': 1, 'role_name': 'Name' ],
[ 'role_id': 2, 'role_name': 'Another role name' ]
]
]
It seems like it could be possible using composite types. This answer describes how to return a composite type from a function, but it doesn't deal with an array of composite types. I'm having some trouble with arrays.
Here are my tables and types:
CREATE TEMP TABLE people (person_id integer, name text);
INSERT INTO "people" ("person_id", "name") VALUES
(1, 'name!');
CREATE TEMP TABLE roles (role_id integer, person_id integer, role_name text);
INSERT INTO "roles" ("role_id", "person_id", "role_name") VALUES
(1, 1, 'role name!'),
(2, 1, 'another role');
CREATE TYPE role AS (
"role_name" text
);
CREATE TYPE person AS (
"person_id" int,
"name" text,
"roles" role[]
);
My get_people() function parses fine, but there are runtime errors. Right now I'm getting the error: array value must start with "{" or dimension information
CREATE OR REPLACE FUNCTION get_people()
RETURNS person[] AS $$
DECLARE myroles role[];
DECLARE myperson people%ROWTYPE;
DECLARE result person[];
BEGIN
FOR myperson IN
SELECT *
FROM "people"
LOOP
SELECT "role_name" INTO myroles
FROM "roles"
WHERE "person_id" = myperson.person_id;
result := array_append(
result,
(myperson.person_id, myperson.name, myroles::role[])::person
);
END LOOP;
RETURN result;
END; $$ LANGUAGE plpgsql;
UPDATE in reply to Erwin Brandstetter's question at the end of his answer:
Yeah, I could return a SETOF a composite type. I've found SETs are easier to deal with than arrays, because SELECT queries return SETs. The reason I'd rather return a nested array is because I think representing nested data as a set of rows is a little awkward. Here's an example:
person_id | person_name | role_name | role_id
-----------+-------------+-----------+-----------
1 | Dilby | Some role | 1978
1 | Dilby | Role 2 | 2
2 | Dobie | NULL | NULL
In this example, person 1 has 2 roles, and person 2 has none. I'm using a structure like this for another one of my PL/pgSQL functions. I wrote a brittle PHP function that converts record sets like this into nested arrays.
This representation works fine, but I'm worried about adding more nested fields to this structure. What if each person also has a group of jobs? Statuses? etc. My conversion function will have to become more complicated. The representation of the data will be complicated as well. If a person has n roles, m jobs, and o statuses, that person fills max(n, m, o) rows, with person_id, person_name, and whatever other data they have uselessly duplicated in the extra rows. I'm not at all worried about performance, but I want to do this the simplest way possible. Of course.. maybe this is the simplest way!
I hope this helps to illustrate why I'd rather deal directly with nested arrays in PostgreSQL. And of course I'd love to hear any suggestions you have.
And for anyone dealing with PostgreSQL composite types with PHP, I've found this library to be really useful for parsing PostgreSQL's array_agg() output in PHP: https://github.com/nehxby/db_type. Also, this project looks interesting: https://github.com/chanmix51/Pomm
Consider this (improved and fixed) test case, tested with PostgreSQL 9.1.4:
CREATE SCHEMA x;
SET search_path = x, pg_temp;
CREATE TABLE people (person_id integer primary key, name text);
INSERT INTO people (person_id, name) VALUES
(1, 'name1')
,(2, 'name2');
CREATE TABLE roles (role_id integer, person_id integer, role_name text);
INSERT INTO roles (role_id, person_id, role_name) VALUES
(1, 1, 'role name!')
,(2, 1, 'another role')
,(3, 2, 'role name2!')
,(4, 2, 'another role2');
CREATE TYPE role AS (
role_id int
,role_name text
);
CREATE TYPE person AS (
person_id int
,name text
,roles role[]
);
Function:
CREATE OR REPLACE FUNCTION get_people()
RETURNS person[] LANGUAGE sql AS
$func$
SELECT ARRAY (
SELECT (p.person_id, p.name
,array_agg((r.role_id, r.role_name)::role))::person
FROM people p
JOIN roles r USING (person_id)
GROUP BY p.person_id
ORDER BY p.person_id
)
$func$;
Call:
SELECT get_people();
Clean up:
DROP SCHEMA x CASCADE;
Core features are:
A much simplified function that only wraps a plain SQL query.
Your key mistake was that you took role_name text from table roles and treated it as type role which is isn't.
I'll let the code speak for itself. There is just too much to explain and I don't have any more time now.
This is very advanced stuff and I am not sure you really need to return this nested type. Maybe there is a simpler way, like a SET of not nested complex type?