KDB: How to assign string datatype to all columns - kdb

When I created the table Tab, I specified the columns as string,
Tab: ([Key1:string()] Col1:string();Col2:string();Col3:string())
But the column datatype (t) is empty. I suppose specifying the column as string has no effect.
meta Tab
c t f a
--------------------
Key1
Col1
Col2
Col3
After I do a bulk upsert in Java...
c.Dict dict = new c.Dict((Object[]) columns.toArray(new String[columns.size()]), data);
c.Flip flip = new c.Flip(dict);
conn.c.ks("upsert", table, flip);
The datatypes are all symbols:
meta Tab
c t f a
--------------------
Key1 s
Col1 s
Col2 s
Col3 s
How can I specify the datatype of the columns as string and have it remain as string?

You cant define a column of the empty table with as strings as they are merely lists of lists of characters
You can just set them as empty lists which is what your code is doing.
But the column will then take on the type of whatever data is inserted into it.
Real question is what is your java process sending symbols when it should be sending strings. You need to make the change there before publishing to KDB
Note if you define as chars you still wont be able to upsert strings
q)Tab: ([Key1:`char$()] Col1:`char$();Col2:`char$();Col3:`char$())
q)Tab upsert ([Key1:enlist"test"] Col1:enlist"test";Col2:enlist"test";Col3:enlist "test")
'rank
[0] Tab upsert ([Key1:enlist"test"] Col1:enlist"test";Col2:enlist"test";Col3:enlist "test")
^
q)Tab: ([Key1:()] Col1:();Col2:();Col3:())
q)Tab upsert ([Key1:enlist"test"] Col1:enlist"test";Col2:enlist"test";Col3:enlist "test")
Key1 | Col1 Col2 Col3
------| --------------------
"test"| "test" "test" "test"

KDB does not allow to define column types as list during creation of table. So that means you can not define your column type as String because that is also a list.
To do that only way is to define column as empty list like:
q) t:([]id:`int$();val:())
Then when you insert data to this table the column will automatically take type of that data.
q)`t insert (4;"row1")
q) meta t
c | t f a
---| -----
id | i
val| C
In your case, one option is to send string data from your Java process as mentioned by user 'emc211' or other option is to convert your data to string in KDB process before insertion.

Related

Convert jsonb column to a user-defined type

I'm trying to convert each row in a jsonb column to a type that I've defined, and I can't quite seem to get there.
I have an app that scrapes articles from The Guardian Open Platform and dumps the responses (as jsonb) in an ingestion table, into a column called 'body'. Other columns are a sequential ID, and a timestamp extracted from the response payload that helps my app only scrape new data.
I'd like to move the response dump data into a properly-defined table, and as I know the schema of the response, I've defined a type (my_type).
I've been referring to the 9.16. JSON Functions and Operators in the Postgres docs. I can get a single record as my type:
select * from jsonb_populate_record(null::my_type, (select body from data_ingestion limit 1));
produces
id
type
sectionId
...
example_id
example_type
example_section_id
...
(abbreviated for concision)
If I remove the limit, I get an error, which makes sense: the subquery would be providing multiple rows to jsonb_populate_record which only expects one.
I can get it to do multiple rows, but the result isn't broken into columns:
select jsonb_populate_record(null::my_type, body) from reviews_ingestion limit 3;
produces:
jsonb_populate_record
(example_id_1,example_type_1,example_section_id_1,...)
(example_id_2,example_type_2,example_section_id_2,...)
(example_id_3,example_type_3,example_section_id_3,...)
This is a bit odd, I would have expected to see column names; this after all is the point of providing the type.
I'm aware I can do this by using Postgres JSON querying functionality, e.g.
select
body -> 'id' as id,
body -> 'type' as type,
body -> 'sectionId' as section_id,
...
from reviews_ingestion;
This works but it seems quite inelegant. Plus I lose datatypes.
I've also considered aggregating all rows in the body column into a JSON array, so as to be able to supply this to jsonb_populate_recordset but this seems a bit of a silly approach, and unlikely to be performant.
Is there a way to achieve what I want, using Postgres functions?
Maybe you need this - to break my_type record into columns:
select (jsonb_populate_record(null::my_type, body)).*
from reviews_ingestion
limit 3;
-- or whatever other query clauses here
i.e. select all from these my_type records. All column names and types are in place.
Here is an illustration. My custom type is delmet and CTO t remotely mimics data_ingestion.
create type delmet as (x integer, y text, z boolean);
with t(i, j, k) as
(
values
(1, '{"x":10, "y":"Nope", "z":true}'::jsonb, 'cats'),
(2, '{"x":11, "y":"Yep", "z":false}', 'dogs'),
(3, '{"x":12, "y":null, "z":true}', 'parrots')
)
select i, (jsonb_populate_record(null::delmet, j)).*, k
from t;
Result:
i
x
y
z
k
1
10
Nope
true
cats
2
11
Yep
false
dogs
3
12
true
parrots

When aliasing a function, I do not see composite types

I create table gs:
foo=>create table gs as select generate_subscripts('{{1,2,3},{4,5,6}}'::integer[],2);
I alias the table and the column:
foo=> select s.a from gs s(a);
a
---
1
2
3
(3 rows)
If I only alias the table, I see composite types:
foo=> select s from gs s;
s
-----
(1)
(2)
(3)
(3 rows)
But when I only alias a function as if it was a table, I do not see composite types, but it is as if I had aliased a table and column:
foo=> select s from generate_subscripts('{{1,2,3},{4,5,6}}'::integer[],2) s;
s
---
1
2
3
(3 rows)
I do not understand why I do not see composite types instead.
Set returning functions (or table functions, SRF) are treated differently than actual relations (tables, views, etc.). Especially, when they return only one column (a base type):
If no table_alias is specified, the function name is used as the table name; in the case of a ROWS FROM() construct, the first function's name is used.
If column aliases are not supplied, then for a function returning a base data type, the column name is also the same as the function name. For a function returning a composite type, the result columns get the names of the individual attributes of the type.
What is not covered though, is the case, when you supply a table alias, but not a column alias (for a single-columned SRF). In that case, the column alias will be the same as the table alias, so you can't access the function's row-type (composite type) explicitly (its reference is hidden by the column alias).
select s from generate_subscripts('{{1,2,3},{4,5,6}}'::integer[],2) s;
-- "s" is both a column alias and a table alias here, so:
select s.s from generate_subscripts('{{1,2,3},{4,5,6}}'::integer[],2) s;
-- is also valid
More intriguing however, is that when you use an explicit table alias and an explicit column alias for a single-columned SRF, the table alias also becomes a column alias (its type will be the base type, not a row-type -- composite type).
select s, a from generate_subscripts('{{1,2,3},{4,5,6}}'::integer[],2) s(a);
+---+---+
| s | a |
+---+---+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
+---+---+
I'm not sure though if it's a bug, or just an undocumented "feature".
http://rextester.com/VJOBI47962

How do I define a record type for JSON in postgres

Looking at postgres documentation for JSON functions (https://www.postgresql.org/docs/9.6/static/functions-json.html), there is a section I don't understand about expanding a JSON object into a set of rows.
The docs give a sample use of this function: json_populate_recordset(base anyelement, from_json json) as select * from json_populate_recordset(null::myrowtype, '[{"a":1,"b":2},{"a":3,"b":4}]')
But I'm not sure what that first argument (null::myrowtype) is -- a table definition?
The description of this function is: Expands the outermost array of objects in from_json to a set of rows whose columns match the record type defined by base (see note below).
None of the notes at the bottom seemed relevant. I'm hoping for a better explanation with sample code to understand it all.
The 2nd notice is the one of interest in the doc as it explains how missing fields/values are handled
Note: In json_populate_record, json_populate_recordset, json_to_record
and json_to_recordset, type coercion from the JSON is "best effort"
and may not result in desired values for some types. JSON keys are
matched to identical column names in the target row type. JSON fields
that do not appear in the target row type will be omitted from the
output, and target columns that do not match any JSON field will
simply be NULL.
json_populate_recordset maps the name of the json object to the column name in the table given as first argument.
create table public.test (a int, b text);
select * from json_populate_recordset(null::public.test, '[{"a":1,"b":"b2"},{"a":3,"b":"b4"}]');
a | b
---+----
1 | b2
3 | b4
(2 rows)
--Wrong column name:
select * from json_populate_recordset(null::public.test, '[{"a":1,"c":"c2"},{"a":3,"c":"c4"}]');
a | b
---+---
1 |
3 |
(2 rows)
--Wrong datatype:
select * from json_populate_recordset(null::public.test, '[{"a":1.1,"b":22},{"a":3.1,"b":44}]');
ERROR: invalid input syntax for integer: "1.1"
Alternatively, instead of using the column name/type from an existing table, you can define the columns on the fly
select * from json_to_recordset('[{"a":1,"b":"foo"},{"a":"2","c":"bar"}]') as x(a int, b text);
a | b
---+-----
1 | foo
2 |
(2 rows)
--> note that default type cast occurs ("2" is mapped to 2), missing fields are ignored (b, in second record) as well as fields not defined (c)

PostgreSQL : Update column with same vaue

Following is sample table and data
create table chk_vals (vals text);
insert into chk_vals values ('1|2|4|3|9|8|34|35|38|1|37|1508|1534');
So,How to update column vals by appending integer in 4th position of the existing value(ie. 3 | is used as a seperator) into the last position along with symbol |
as you can see the existsing value if 1|2|4|3|9|8|34|35|38|1|37|1508|1534 and the output should be 1|2|4|3|9|8|34|35|38|1|37|1508|1534|3
Use PostgreSQL's split_part() to splits the field and find the value at position 4
select split_part(vals,'|',4) val from chk_vals
this will return value 3
update chk_vals
set vals=vals||format('|%s',(select split_part(vals,'|',4) val from chk_vals))
Format()

Postgresql Select all columns and column names with a specific value for a row

I have a table with many(+1000) columns and rows(~1M). The columns have either the value 1 , or are NULL.
I want to be able to select, for a specific row (user) retrieve the column names that have a value of 1.
Since there are many columns on the table, specifying the columns would yield a extremely long query.
You're doing something SQL is quite bad at - dynamic access to columns, or treating a row as a set. It'd be nice if this were easier, but it doesn't work well with SQL's typed nature and the concept of a relation. Working with your data set in its current form is going to be frustrating; consider storing an array, json, or hstore of values instead.
Actually, for this particular data model, you could probably use a bitfield. See bit(n) and bit varying(n).
It's still possible to make a working query with your current model PostgreSQL extensions though.
Given sample:
CREATE TABLE blah (id serial primary key, a integer, b integer, c integer);
INSERT INTO blah(a,b,c) VALUES (NULL, NULL, 1), (1, NULL, 1), (NULL, NULL, NULL), (1, 1, 1);
I would unpivot each row into a key/value set using hstore (or in newer PostgreSQL versions, the json functions). SQL its self provides no way to dynamically access columns, so we have to use an extension. So:
SELECT id, hs FROM blah, LATERAL hstore(blah) hs;
then extract the hstores to sets:
SELECT id, k, v FROM blah, LATERAL each(hstore(blah)) kv(k,v);
... at which point your can filter for values matching the criteria. Note that all columns have been converted to text, so you may want to cast it back:
SELECT id, k FROM blah, LATERAL each(hstore(blah)) kv(k,v) WHERE v::integer = 1;
You also need to exclude id from matching, so:
regress=> SELECT id, k FROM blah, LATERAL each(hstore(blah)) kv(k,v) WHERE v::integer = 1 AND
k <> 'id';
id | k
----+---
1 | c
2 | a
2 | c
4 | a
4 | b
4 | c
(6 rows)