How do I define a record type for JSON in postgres - postgresql

Looking at postgres documentation for JSON functions (https://www.postgresql.org/docs/9.6/static/functions-json.html), there is a section I don't understand about expanding a JSON object into a set of rows.
The docs give a sample use of this function: json_populate_recordset(base anyelement, from_json json) as select * from json_populate_recordset(null::myrowtype, '[{"a":1,"b":2},{"a":3,"b":4}]')
But I'm not sure what that first argument (null::myrowtype) is -- a table definition?
The description of this function is: Expands the outermost array of objects in from_json to a set of rows whose columns match the record type defined by base (see note below).
None of the notes at the bottom seemed relevant. I'm hoping for a better explanation with sample code to understand it all.

The 2nd notice is the one of interest in the doc as it explains how missing fields/values are handled
Note: In json_populate_record, json_populate_recordset, json_to_record
and json_to_recordset, type coercion from the JSON is "best effort"
and may not result in desired values for some types. JSON keys are
matched to identical column names in the target row type. JSON fields
that do not appear in the target row type will be omitted from the
output, and target columns that do not match any JSON field will
simply be NULL.
json_populate_recordset maps the name of the json object to the column name in the table given as first argument.
create table public.test (a int, b text);
select * from json_populate_recordset(null::public.test, '[{"a":1,"b":"b2"},{"a":3,"b":"b4"}]');
a | b
---+----
1 | b2
3 | b4
(2 rows)
--Wrong column name:
select * from json_populate_recordset(null::public.test, '[{"a":1,"c":"c2"},{"a":3,"c":"c4"}]');
a | b
---+---
1 |
3 |
(2 rows)
--Wrong datatype:
select * from json_populate_recordset(null::public.test, '[{"a":1.1,"b":22},{"a":3.1,"b":44}]');
ERROR: invalid input syntax for integer: "1.1"
Alternatively, instead of using the column name/type from an existing table, you can define the columns on the fly
select * from json_to_recordset('[{"a":1,"b":"foo"},{"a":"2","c":"bar"}]') as x(a int, b text);
a | b
---+-----
1 | foo
2 |
(2 rows)
--> note that default type cast occurs ("2" is mapped to 2), missing fields are ignored (b, in second record) as well as fields not defined (c)

Related

Convert jsonb column to a user-defined type

I'm trying to convert each row in a jsonb column to a type that I've defined, and I can't quite seem to get there.
I have an app that scrapes articles from The Guardian Open Platform and dumps the responses (as jsonb) in an ingestion table, into a column called 'body'. Other columns are a sequential ID, and a timestamp extracted from the response payload that helps my app only scrape new data.
I'd like to move the response dump data into a properly-defined table, and as I know the schema of the response, I've defined a type (my_type).
I've been referring to the 9.16. JSON Functions and Operators in the Postgres docs. I can get a single record as my type:
select * from jsonb_populate_record(null::my_type, (select body from data_ingestion limit 1));
produces
id
type
sectionId
...
example_id
example_type
example_section_id
...
(abbreviated for concision)
If I remove the limit, I get an error, which makes sense: the subquery would be providing multiple rows to jsonb_populate_record which only expects one.
I can get it to do multiple rows, but the result isn't broken into columns:
select jsonb_populate_record(null::my_type, body) from reviews_ingestion limit 3;
produces:
jsonb_populate_record
(example_id_1,example_type_1,example_section_id_1,...)
(example_id_2,example_type_2,example_section_id_2,...)
(example_id_3,example_type_3,example_section_id_3,...)
This is a bit odd, I would have expected to see column names; this after all is the point of providing the type.
I'm aware I can do this by using Postgres JSON querying functionality, e.g.
select
body -> 'id' as id,
body -> 'type' as type,
body -> 'sectionId' as section_id,
...
from reviews_ingestion;
This works but it seems quite inelegant. Plus I lose datatypes.
I've also considered aggregating all rows in the body column into a JSON array, so as to be able to supply this to jsonb_populate_recordset but this seems a bit of a silly approach, and unlikely to be performant.
Is there a way to achieve what I want, using Postgres functions?
Maybe you need this - to break my_type record into columns:
select (jsonb_populate_record(null::my_type, body)).*
from reviews_ingestion
limit 3;
-- or whatever other query clauses here
i.e. select all from these my_type records. All column names and types are in place.
Here is an illustration. My custom type is delmet and CTO t remotely mimics data_ingestion.
create type delmet as (x integer, y text, z boolean);
with t(i, j, k) as
(
values
(1, '{"x":10, "y":"Nope", "z":true}'::jsonb, 'cats'),
(2, '{"x":11, "y":"Yep", "z":false}', 'dogs'),
(3, '{"x":12, "y":null, "z":true}', 'parrots')
)
select i, (jsonb_populate_record(null::delmet, j)).*, k
from t;
Result:
i
x
y
z
k
1
10
Nope
true
cats
2
11
Yep
false
dogs
3
12
true
parrots

KDB: How to assign string datatype to all columns

When I created the table Tab, I specified the columns as string,
Tab: ([Key1:string()] Col1:string();Col2:string();Col3:string())
But the column datatype (t) is empty. I suppose specifying the column as string has no effect.
meta Tab
c t f a
--------------------
Key1
Col1
Col2
Col3
After I do a bulk upsert in Java...
c.Dict dict = new c.Dict((Object[]) columns.toArray(new String[columns.size()]), data);
c.Flip flip = new c.Flip(dict);
conn.c.ks("upsert", table, flip);
The datatypes are all symbols:
meta Tab
c t f a
--------------------
Key1 s
Col1 s
Col2 s
Col3 s
How can I specify the datatype of the columns as string and have it remain as string?
You cant define a column of the empty table with as strings as they are merely lists of lists of characters
You can just set them as empty lists which is what your code is doing.
But the column will then take on the type of whatever data is inserted into it.
Real question is what is your java process sending symbols when it should be sending strings. You need to make the change there before publishing to KDB
Note if you define as chars you still wont be able to upsert strings
q)Tab: ([Key1:`char$()] Col1:`char$();Col2:`char$();Col3:`char$())
q)Tab upsert ([Key1:enlist"test"] Col1:enlist"test";Col2:enlist"test";Col3:enlist "test")
'rank
[0] Tab upsert ([Key1:enlist"test"] Col1:enlist"test";Col2:enlist"test";Col3:enlist "test")
^
q)Tab: ([Key1:()] Col1:();Col2:();Col3:())
q)Tab upsert ([Key1:enlist"test"] Col1:enlist"test";Col2:enlist"test";Col3:enlist "test")
Key1 | Col1 Col2 Col3
------| --------------------
"test"| "test" "test" "test"
KDB does not allow to define column types as list during creation of table. So that means you can not define your column type as String because that is also a list.
To do that only way is to define column as empty list like:
q) t:([]id:`int$();val:())
Then when you insert data to this table the column will automatically take type of that data.
q)`t insert (4;"row1")
q) meta t
c | t f a
---| -----
id | i
val| C
In your case, one option is to send string data from your Java process as mentioned by user 'emc211' or other option is to convert your data to string in KDB process before insertion.

What does `select` the same table as `from` mean?

Given a table
mytestdb=# select * from foo;
id | name
----+------
4 | Tim
(1 row)
what does select foo from foo output, i.e. what does select the same table as from mean? Thanks.
mytestdb=# select foo from foo;
foo
---------
(4,Tim)
(1 row)
Thanks.
My question comes from understanding what is the input to json_agg() in
mytestdb=# select json_agg(foo) from foo;
json_agg
-------------------------
[{"id":4,"name":"Tim"}]
(1 row)
See http://johnatten.com/2015/04/22/use-postgres-json-type-and-aggregate-functions-to-map-relational-data-to-json/
Using a table name or alias in a select list generates a composite value of the table's current row. Passing the table name or alias to a function that can accept a composite value will invoke the function for each row.
The construct
SELECT foo FROM foo;
is called a whole-row reference in PostgreSQL, since it retrieves the whole row as one.
Things become clearer if you know that with every CREATE TABLE goes an implicit CREATE TYPE that defines a composite type of the same name as the table.
You can easily see that by querying the catalogs:
SELECT typname, typtype, typinput, typoutput
FROM pg_type
WHERE typname = 'text';
So the result of the query above is a single item of type foo. Since it is a composite type, it is represented in row notation: surrounded by parentheses and the attribute values separated by commas.

When aliasing a function, I do not see composite types

I create table gs:
foo=>create table gs as select generate_subscripts('{{1,2,3},{4,5,6}}'::integer[],2);
I alias the table and the column:
foo=> select s.a from gs s(a);
a
---
1
2
3
(3 rows)
If I only alias the table, I see composite types:
foo=> select s from gs s;
s
-----
(1)
(2)
(3)
(3 rows)
But when I only alias a function as if it was a table, I do not see composite types, but it is as if I had aliased a table and column:
foo=> select s from generate_subscripts('{{1,2,3},{4,5,6}}'::integer[],2) s;
s
---
1
2
3
(3 rows)
I do not understand why I do not see composite types instead.
Set returning functions (or table functions, SRF) are treated differently than actual relations (tables, views, etc.). Especially, when they return only one column (a base type):
If no table_alias is specified, the function name is used as the table name; in the case of a ROWS FROM() construct, the first function's name is used.
If column aliases are not supplied, then for a function returning a base data type, the column name is also the same as the function name. For a function returning a composite type, the result columns get the names of the individual attributes of the type.
What is not covered though, is the case, when you supply a table alias, but not a column alias (for a single-columned SRF). In that case, the column alias will be the same as the table alias, so you can't access the function's row-type (composite type) explicitly (its reference is hidden by the column alias).
select s from generate_subscripts('{{1,2,3},{4,5,6}}'::integer[],2) s;
-- "s" is both a column alias and a table alias here, so:
select s.s from generate_subscripts('{{1,2,3},{4,5,6}}'::integer[],2) s;
-- is also valid
More intriguing however, is that when you use an explicit table alias and an explicit column alias for a single-columned SRF, the table alias also becomes a column alias (its type will be the base type, not a row-type -- composite type).
select s, a from generate_subscripts('{{1,2,3},{4,5,6}}'::integer[],2) s(a);
+---+---+
| s | a |
+---+---+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
+---+---+
I'm not sure though if it's a bug, or just an undocumented "feature".
http://rextester.com/VJOBI47962

Postgres query error

I have a query in postgres
insert into c_d (select * from cd where ak = '22019763');
And I get the following error
ERROR: column "region" is of type integer but expression is of type character varying
HINT: You will need to rewrite or cast the expression.
An INSERT INTO table1 SELECT * FROM table2 depends entirely on order of the columns, which is part of the table definition. It will line each column of table1 up with the column of table2 with the same order value, regardless of names.
The problem you have here is whatever column from cd with the same order value as c_d of the table "region" has an incompatible type, and an implicit typecast is not available to clear the confusion.
INSERT INTO SELECT * statements are stylistically bad form unless the two tables are defined, and will forever be defined, exactly the same way. All it takes is for a single extra column to get added to cd, and you'll start getting errors about extraneous extra columns.
If it is at all possible, what I would suggest is explicitly calling out the columns within the SELECT statement. You can call a function to change type within each of the column references (or you could define a new type cast to do this implicitly -- see CREATE CAST), and you can use AS to set the column label to match that of your target column.
If you can't do this for some reason, indicate that in your question.
Check out the PostgreSQL insert documentation. The syntax is:
INSERT INTO table [ ( column [, ...] ) ]
{ DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) | query }
which here would look something like:
INSERT INTO c_d (column1, column2...) select * from cd where ak = '22019763'
This is the syntax you want to use when inserting values from one table to another where the column types and order are not exactly the same.