KDB How to update column values - kdb

I have a table which has column of symbol type like below.
Name
Value
First
TP_RTD_FRV
Second
RF_QWE_FRV
Third
KF_FRV_POL
I need to update it as below, wherever I have FRV, I need to replace it with AB_FRV. How to achieve this?
Name
Value
First
TP_RTD_AB_FRV
Second
RF_QWE_AB_FRV
Third
KF_AB_FRV_POL

q)t
name v
---------------
0 TP_RTD_FRV
1 RF_QWE_FRV
2 KF_FRV_POL
3 THIS
4 THAT
q)update `$ssr[;"FRV";"AB_FRV"]each string v from t
name v
------------------
0 TP_RTD_AB_FRV
1 RF_QWE_AB_FRV
2 KF_AB_FRV_POL
3 THIS
4 THAT
or without using qSQL
q)#[t;`v;]{`$ssr[;"FRV";"AB_FRV"]each string x}
name v
------------------
0 TP_RTD_AB_FRV
1 RF_QWE_AB_FRV
2 KF_AB_FRV_POL
3 THIS
4 THAT
Depending on the uniqueness of the data, you might benefit from .Q.fu
q)t:1000000#t
q)\t #[t;`v;]{`$ssr[;"FRV";"AB_FRV"]each string x}
2343
q)\t #[t;`v;].Q.fu {`$ssr[;"FRV";"AB_FRV"]each string x}
10

Related

Type error of getting average by id in KDB

I am trying make a function for the aggregate consumption by mid in a kdb+ table (aggregate value by mid). Also this table is being imported from a csv file like this:
table: ("JJP";enlist",")0:`:data.csv
Where the meta data is for the table columns is:
mid is type long(j), value(j) is type long and ts is type timestamp (p).
Here is my function:
agg: {select avg value by mid from table}
but I get the
'type
[0] get select avg value by mid from table
But the type of value is type long (j). So I am not sure why I can't get the avg I also tried this with type int.
Value can't be used as a column name because it is keyword used in kdb+. Renaming the column should correct the issue.
value is a keyword and should not be used as a column name.
https://code.kx.com/q/ref/value/
You can remove it as a column name using .Q.id
https://code.kx.com/q/ref/dotq/#qid-sanitize
q)t:flip`value`price!(1 2;1 2)
q)t
value price
-----------
1 1
2 2
q)t:.Q.id t
q)t
value1 price
------------
1 1
2 2
Or xcol
https://code.kx.com/q/ref/cols/#xcol
q)(enlist[`value]!enlist[`val]) xcol t
val price
---------
1 1
2 2
You can rename the value column as you read it:
flip`mid`val`ts!("JJP";",")0:`:data.csv

select only those columns from table have not null values in q kdb

I have a table:
q)t:([] a:1 2 3; b:```; c:`a`b`c)
a b c
-----
1 a
2 b
3 c
From this table I want to select only the columns who have not null values, in this case column b should be omitted from output.(something similar to dropna method in pandas).
expected output
a c
---
1 a
2 b
3 c
I tried many things like
select from t where not null cols
but of no use.
Here is a simple solution that does just what you want:
q)where[all null t]_t
a c
---
1 a
2 b
3 c
[all null t] gives a dictionary that checks if the column values are all null or not.
q)all null t
a| 0
b| 1
c| 0
Where returns the keys of the dictionary where it is true
q)where[all null t]
,`b
Finally you use _ to drop the columns from table t
Hopefully this helps
A modification of Sander's solution which handles string columns (or any nested columns):
q)t:([] a:1 2 3; b:```; c:`a`b`c;d:" ";e:("";"";"");f:(();();());g:(1 1;2 2;3 3))
q)t
a b c d e f g
----------------
1 a "" 1 1
2 b "" 2 2
3 c "" 3 3
q)where[{$[type x;all null x;all 0=count each x]}each flip t]_t
a c g
-------
1 a 1 1
2 b 2 2
3 c 3 3
The nature of kdb is column based, meaning that where clauses function on the rows of a given column.
To make a QSQL query produce your desired behaviour, you would need to first examine all your columns and determine which are all null, and then feed that into a functional statement. Which would be horribly inefficient.
Given that you need to fully examine all the columns data regardless (to check if all the values are null) the following will achieve that
q)#[flip;;enlist] k!d k:key[d] where not all each null each value d:flip t
a c
---
1 a
2 b
3 c
Here I'm transforming the table into a dictionary, and extracting its values to determine if any columns consist only of nulls (all each null each). I'm then applying that boolean list to the keys of the dictionary (i.e., the column names) through a where statement. We can then reindex into the original dictionary with those keys and create a subset dictionary of non-null columns and convert that back into a table.
I've generalized the final transformation back into a table by habit with an error catch to ensure that the dictionary will be converted into a table even if only a single row is valid (preventing a 'rank error)

Scenario based questions in Datastage

I have two scenario based questions here.
Question 1
Input Dataset
Col1
A
A
B
C
C
B
D
A
C
Output Dataset
Col1 Col2
A 1
A 2
A 3
B 1
B 2
C 1
C 2
C 3
D 1
Question2
Input data string
AA-BB-CC-DD-EE-FF (can be of any delimiter and string can have any length)
Output data string
string 1 -> AA
string 2 -> BB
string 3 -> CC
string 4 -> DD
Thanks & Regards,
Subhasree
Question 1: Can be solved with a transformer. Sort the data and use the lastrowingroup functionality.
For Col2 just create a counter as a stage variable and add 1 for each row - if reset it with a second stage variable if lastrowingroup is reached.
Aternatively you could use a rownumber column in SQL.
Question2: You have not provided enough information. Is string1 a column or row? If you do not know anything upfront about the structure (any delimiter) this will get hard...

kdb q - apply each-left for each atom in list and reduce

I would like to apply each-left between a column of a table and each atom in a list. I cannot use each-both because the table column and the list are not of same length.
I have seen this done in one line somewhere already but I can't find it anymore..
Example:
t:([] name:("jim";"john";"john";"julia");c1: til 4);
searchNames:("jim";"john");
f:{[name;nameCol] nameCol like\:name}; / each-left between name (e.g. "jim") and column
g:f[;t[`name]];
r:g each searchNames; / result: (1000b;0110b)
filter:|/[r]; / result: 1110b
select from t where filter
How can I do that more q-like?
If you wish to use like with each-right /::
q)select from t where any name like/:searchNames
name c1
---------
"jim" 0
"john" 1
"john" 2
In this case you can simply use in as you are not using any wildcards:
q)select from t where name in searchNames
name c1
---------
"jim" 0
"john" 1
"john" 2
Below is a generic function you could use, given two lists of different sizes.
q)f:{(|) over x like/:y}
q)
q)select from t where f[name;searchNames]
name c1
---------
"jim" 0
"john" 1
"john" 2
Or, wrapping it up in a single function (assuming always searching a table column):
q)f2:{x where (|) over (0!x)[y] like/:z}
q)
q)f2[t;`name;searchNames]
name c1
---------
"jim" 0
"john" 1
"john" 2
But in the scenario you describe, Thomas' solution seems the most natural.

TorQ: .loader.loadallfiles and referential integrity leads to `cast error

I have a table volatilitysurface and a detail table volatilitysurface_smile as part of the detail table I define a foreign key to the master table i.e.
volatilitysurface::([date:`datetime$(); ccypair:`symbol$()] atm_convention:`symbol$(); ...);
volatilitysurface_smile::([...] volatilitysurface:`volatilitysurface$(); ...);
When I try using AquaQ's TorQ .loader.loadallfiles to load the detail table volatilitysurface_smile I need as part of the "dataprocessfunc" function to dynamically build the foreign key field i.e.
rawdatadir:hsym `$("" sv (getenv[`KDBRAWDATA]; "volatilitysurface_smile"));
.loader.loadallfiles[`headers`types`separator`tablename`dbdir`partitioncol`partitiontype`dataprocessfunc!(`x`ccypair...;"ZS...";enlist ",";`volatilitysurface_smile;target;`date;`month;{[p;t] select date,ccypair,volatilitysurface,... from update date:x,volatilitysurface:`volatilitysurface$(x,'ccypair) from t}); rawdatadir];
Note the part:
update date:x,volatilitysurface:`volatilitysurface$(x,'ccypair) from t
The cast error is pointing to the construction of the volatilitysurface key. However, this works outside .loader.loadallfiles and the tables are globally :: and fully defined before calling the .loader.loadallfiles function.
Any ideas how to deal with this use-case? If the detail table foreign key is not initialized then the insertion will fail.
The error may be due to the scoping in the update. As you are running the cast/update within the .loader namespace the tablename would need to be full scoped (`..volatilitysurface).
eg. update date:x,volatilitysurface:`..volatilitysurface$(x,'ccypair) from t
Regards,
Scott
Are you sure that all possible x & ccypair combinations are in the volatilitysurface table? The 'cast error would seem to suggest this is not the case e.g.
q)t:([a:1 2 3;b:`a`b`c] c:"ghi")
q)update t:`t$(a,'b) from ([] a:2 3 1;b:`b`c`a)
a b t
-----
2 b 1
3 c 2
1 a 0
q)update t:`t$(a,'b) from ([] a:2 3 1 5;b:`b`c`a`d)
'cast
[0] update t:`t$(a,'b) from ([] a:2 3 1 5;b:`b`c`a`d)
^
Note in the second case I have the a-b pair of (5;`d), which isn't present in the table t, and so I get the 'cast error
You can determine if there are missing keys, and which they are, like so:
q)all (exec (a,'b) from ([] a:2 3 1;b:`b`c`a)) in key t //check for presence, all present
1b
q)all (exec (a,'b) from ([] a:2 3 1 5;b:`b`c`a`d)) in key t //check for presence, not all present
0b
q)k where not (k:exec (a,'b) from ([] a:2 3 1 5;b:`b`c`a`d)) in key t //check which keys AREN'T present
5 `d
If this is the case, I guess you kind of have two options:
Make sure the volatilitysurface table is loaded correctly - assuming you have full data coverage in your files, presumably every possible key should be present in this table
If there is the possibility of possibly keys not being present in the volatilitysurface table, you could perhaps add dummy records to it before making the foreign key (which could be replaced if an actual record comes in later
The second option could perhaps work something like this:
q.test){if[count k:k where not (k:exec (a,'b) from x) in key `..t;#[`..t;;:;value[`..t](0N;`)]'[k]];update t:`t$(a,'b) from x}([] a:2 3 1;b:`b`c`a)
a b t
-----
2 b 1
3 c 2
1 a 0
q.test){if[count k:k where not (k:exec (a,'b) from x) in key `..t;#[`..t;;:;value[`..t](0N;`)]'[k]];update t:`t$(a,'b) from x}([] a:2 3 1 5 6;b:`b`c`a`d`e)
a b t
-----
2 b 1
3 c 2
1 a 0
5 d 3
6 e 4
q.test)value `..t //check table t, new dummy records added by previous call
a b| c
---| -
1 a| g
2 b| h
3 c| i
5 d|
6 e|
I've done these tests inside a namespace as this is how the dataprocess function will run in TorQ (i.e. at certain places you need to use `..t to access t in the root namespace.) The analogous version of this function for your setup (with some nicer formatting than the one-liners above) would be something like:
{
if[count k:k where not (k:exec (x,'ccypair from volatilitysurface_smile) in key `..volatilitysurface; //check for missing keys
#[`..volatilitysurface;;:;value[`..volatilitysurface](0Nz;`)]'[k]]; //index into null key of table to get dummy record and upsert to global volatilitysurface table
update volatilitysurface:`volatilitysurface$(x,'ccypair) from x //create foreign key
}