upsert into unkeyed kdb table - kdb

I am trying to modify the entry in the factor column that corresponds to the provided date.
I cannot find any good documentation for KDB's upsert function and I have zero idea what I am doing wrong here..
query: {[table;dates;factors] table upsert (date:dates factor:factors);}
table: `test
dates: (2016.01.04T01:30:00.000; 2016.01.04T01:31:00.000)
factors: (0.9340078471263533; 0.9340078471263533)
query[table; dates; factors]
date price original factor askVol bidVol
-----------------------------------------------------------------------
....
2017.04.19T07:28:00.000 6.105 6.105 1 2.176407e+07 1.907746e+07
2017.04.19T07:29:00.000 6.105 6.105 1 2.274138e+07 1.893807e+07
2017.04.19T07:30:00.000 6.105 6.105 1 2.629207e+07 2.030017e+07
....
An error occurred during execution of the query.
The server sent the response:
type
Studio Hint: Possibly this error refers to wrong type, e.g `a+1

You have a small syntax error in the function query, when you define the table from the input arguments -
query: {[table;dates;factors] table upsert (date:dates factor:factors);}
Should be:
query:{[table;dates;factors] table upsert ([] date:dates; factor:factors);}
Note the additional [] after the opening ( for a table definition. Moreover, column values need to be delimited with ;

q)show table:([] dates:.z.D+til 3;factors:3?.1; something:3?`2)
dates factors something
-------------------------------
2017.04.20 0.09441671 hj
2017.04.21 0.07833686 lh
2017.04.22 0.04099561 mg
q)show factormap:(.z.D,.z.D+2)!10000.1 20000.2
2017.04.20| 10000.1
2017.04.22| 20000.2
q)update factors:factors^factormap[dates]from table
dates factors something
-------------------------------
2017.04.20 10000.1 hj
2017.04.21 0.07833686 lh
2017.04.22 20000.2 mg
q)

Related

PSQL crosstabs count(*) returning NULL

I have two issues:
I am trying to create a pivot table using the crosstabs tablefunc but all of the queries I've tried are returning NULL for all values.
I have two grouping variables (airport code and date) that are in one row of data that need to be separate columns in the pivot table, but I can only seem to get one to work.
I have gotten the pivot table to partially work by ignoring the date value for the moment. When I leave 'yyyymm' out of my query, the setup of my output table is okay, but the values don't calculate properly.
The data: I have rows with various airport codes, aircraft user and engine codes, flight identifiers, and year/month values. Each row counts for one flight. A simplified example looks like this:
ident
primary_fa
user_engn
yyyymm
20191122-AFR-23
MKE
O_O
201911
20191210-ASH-61
N90
T_R
201912
20200120-EDV-2
MKE
C_J
202001
20200811-FLC-148
A90
O_O
202008
I need my output table to count the number of arrivals for each user engine combo grouped by airport code and yyyymm. So the rows would be each airport code (primary_fa), yyyymm and columns would be user_engn codes (O_O, T_R, C_J, etc.) with counts for the number of flights per user_engn.
My goal output would look something like this:
primary_fa
yyyymm
C_J
T_R
O_O
MKE
201911
1
0
1
N90
201912
0
1
0
A90
202008
0
0
1
But I am getting this (because I have to ignore the date portion to even get this far):
primary_fa
C_J
T_R
O_O
MKE
NULL
NULL
NULL
N90
NULL
NULL
NULL
A90
NULL
NULL
NULL
I've tried a lot of different versions of the crosstabs query and the closest I have gotten to correct is this:
SELECT *
FROM crosstab(
'SELECT primary_fa as locid,
yyyymm,
count(*)
FROM fy20_keeps_emdf
GROUP BY primary_fa, yyyymm
ORDER BY 1,2',
'VALUES (''C_J''),(''O_O''),(''T_R'')')
AS (primary_fa varchar,
C_J bigint,
O_O bigint,
T_R bigint);
Am I missing something obvious or do I need to do more data manipulation to get this to work?

Convert jsonb column to a user-defined type

I'm trying to convert each row in a jsonb column to a type that I've defined, and I can't quite seem to get there.
I have an app that scrapes articles from The Guardian Open Platform and dumps the responses (as jsonb) in an ingestion table, into a column called 'body'. Other columns are a sequential ID, and a timestamp extracted from the response payload that helps my app only scrape new data.
I'd like to move the response dump data into a properly-defined table, and as I know the schema of the response, I've defined a type (my_type).
I've been referring to the 9.16. JSON Functions and Operators in the Postgres docs. I can get a single record as my type:
select * from jsonb_populate_record(null::my_type, (select body from data_ingestion limit 1));
produces
id
type
sectionId
...
example_id
example_type
example_section_id
...
(abbreviated for concision)
If I remove the limit, I get an error, which makes sense: the subquery would be providing multiple rows to jsonb_populate_record which only expects one.
I can get it to do multiple rows, but the result isn't broken into columns:
select jsonb_populate_record(null::my_type, body) from reviews_ingestion limit 3;
produces:
jsonb_populate_record
(example_id_1,example_type_1,example_section_id_1,...)
(example_id_2,example_type_2,example_section_id_2,...)
(example_id_3,example_type_3,example_section_id_3,...)
This is a bit odd, I would have expected to see column names; this after all is the point of providing the type.
I'm aware I can do this by using Postgres JSON querying functionality, e.g.
select
body -> 'id' as id,
body -> 'type' as type,
body -> 'sectionId' as section_id,
...
from reviews_ingestion;
This works but it seems quite inelegant. Plus I lose datatypes.
I've also considered aggregating all rows in the body column into a JSON array, so as to be able to supply this to jsonb_populate_recordset but this seems a bit of a silly approach, and unlikely to be performant.
Is there a way to achieve what I want, using Postgres functions?
Maybe you need this - to break my_type record into columns:
select (jsonb_populate_record(null::my_type, body)).*
from reviews_ingestion
limit 3;
-- or whatever other query clauses here
i.e. select all from these my_type records. All column names and types are in place.
Here is an illustration. My custom type is delmet and CTO t remotely mimics data_ingestion.
create type delmet as (x integer, y text, z boolean);
with t(i, j, k) as
(
values
(1, '{"x":10, "y":"Nope", "z":true}'::jsonb, 'cats'),
(2, '{"x":11, "y":"Yep", "z":false}', 'dogs'),
(3, '{"x":12, "y":null, "z":true}', 'parrots')
)
select i, (jsonb_populate_record(null::delmet, j)).*, k
from t;
Result:
i
x
y
z
k
1
10
Nope
true
cats
2
11
Yep
false
dogs
3
12
true
parrots

Select from a table with Limit expression works, without - fails

For a table t with a custom field c which is dictionary I could use select with limit expression, but simple select failes:
q)r1: `n`m`k!111b;
q)r2: `n`m`k!000b;
q)t: ([]a:1 2; b:10 20; c:(r1; r2));
q)t
a b c
----------------
1 10 `n`m`k!111b
2 20 `n`m`k!000b
q)select[2] c[`n] from t
x
-
1
0
q)select c[`n] from t
'type
[0] select c[`n] from t
^
Is it a bug, or am I missing something?
Upd:
Why does select [2] c[`n] from t work here?
Since c is a list, it does not support key indexing which is why it has returned a type
You need to index into each element instead of trying to index the column.
q)select c[;`n] from t
x
-
1
0
A list of confirming dictionaries outside of this context is equivalent to a table, so you can index like you were
q)c:(r1;r2)
q)type c
98h
q)c[`n]
10b
I would say that the way complex columns are represented in memory makes this not possible. I suspect that any modification that creates a copy of a subset of the elements will allow column indexing as the copy will be formatted as a table.
One example here is serialising and deserialising the column (not recommended to do this). In the case of select[n] it is selecting a subset of 2 elements
q)type exec c from t
0h
q)type exec -9!-8!c from t
98h
q)exec (-9!-8!c)[`n] from t
10b

Construct q command to get metadata for all tables

I'd like to construct a query to retrieve table metadata for each table.
I can get metadata for a single table with the meta function. I can chain that with tables \`., which returns all of the tables in the . namespace, to construct (meta')tables `..
This is almost what I want as it returns a list of metadata tables. The problem is that I dont know what metadata table belongs to which kdb table.
Ideally, I could construct a query which returns a table where each row is tablename + results of meta tablename. Any advice for constructing such a query?
q)trade:([] sym: 10?`4; time:10?.z.t; prx:10?100f; sz:10?10000);
q)quote:([] sym: 10?`4; time:10?.z.t; bPrx:10?100f; aPrx:10?100f; bSz:10?10000; aSz:10?10000);
q)testTable:update `s#a from ([] a:til 10; b: 10?`3; c:10?.z.p);
q)raze {update table:x from 0!meta x}'[tables[]]
c t f a table
--------------------
sym s quote
time t quote
bPrx f quote
aPrx f quote
bSz j quote
aSz j quote
a j s testTable
b s testTable
c p testTable
sym s trade
time t trade
prx f trade
sz j trade
I could construct a query which returns a table where each row is tablename + results of "meta tablename". Any advice for constructing such a query?
If you did want to do it in this manner, there are many ways. One example:
q)update tableMeta:meta'[table] from ([] table:tables[])
table tableMeta
--------------------------------------------------------------------------------
quote (+(,`c)!,`sym`time`bPrx`aPrx`bSz`aSz)!+`t`f`a!("stffjj";``````;``````)
testTable (+(,`c)!,`a`b`c)!+`t`f`a!("jsp";```;`s``)
trade (+(,`c)!,`sym`time`prx`sz)!+`t`f`a!("stfj";````;````)

How to apply max function for each row in KDB?

I want to ensure all values in column x are no smaller than 0.5, so I do:
update x:max (x 0.5) from myTable
But this gives an error (in Studio For KDB+):
An error occurred during execution of the query.
The server sent the response:
type
Studio Hint: Possibly this error refers to wrong type, e.g `a+1
What's wrong?
You can try using |
q)update x|0.5 from myTable
It should work. It worked for me. This is the query I used for testing:
update x:max(x;0.5) from myTable
-- Check semicolon in max function
Try the kdb vector conditional its similar to case-when in SQL:
q)t:([] a:6?.9)
q)t
a
---------
0.4237094
0.5712045
0.8705158
0.2075746
0.8549775
0.3951729
q)update ?[a<0.5;0.5;a] from t
a
---------
0.5
0.5712045
0.8705158
0.5
0.8549775
0.5
q)