KDB/Q: How to write a loop and append output table? - kdb

Disclaimer: I am very new to the Q language so please excuse my silly question.
I have a function that currently is taking on 2 parameters (date;sym).It runs fine for 1 sym and 1 day. however, I need to perform this on multiple syms and dates which will take forever.
How do I create a loop that run the function on every sym, and on every date?
In python, it is straighforward as :
for date in datelist:
for sym in symlist:
func(date,sym)
How can I do something similar to this in Q? and how can I dynamically change the output table names and append them to 1 single table?
Currently, I am using the following:
output: raze .[function] peach paralist
where paralist is a list of parameter pairs: ((2020.06.01;ABC);(2020.06.01;XYZ)) but imho this is nowhere near efficient.
What would be the best way to achieve this in Q?

I'll generalize everything, if you have a given function foo which will operate on an atom dt with a vector s
q)foo:{[dt;s] dt +\: s}
q)dt:10?10
q)s:100?10
q)dt
8 1 9 5 4 6 6 1 8 5
q)s
4 9 2 7 0 1 9 2 1 8 8 1 7 2 4 5 4 2 7 8 5 6 4 1 3 3 7 8 2 1 4 2 8 0 5 8 5 2 8..
q)foo[;s] each dt
12 17 10 15 8 9 17 10 9 16 16 9 15 10 12 13 12 10 15 16 13 14 12 9 11 11 ..
5 10 3 8 1 2 10 3 2 9 9 2 8 3 5 6 5 3 8 9 6 7 5 2 4 4 ..
13 18 11 16 9 10 18 11 10 17 17 10 16 11 13 14 13 11 16 17 14 15 13 10 12 12 ..
9 14 7 12 5 6 14 7 6 13 13 6 12 7 9 10 9 7 12 13 10 11 9 6 8 8 ..
The solution is to project the symList over the function in question, then use each (or peach) for the date variable.
If your function requires an atomic date and sym, then you can just create a new function to implement this
q)bar:{[x;y] foo[x;] each y};

datelist:`date$10?10
symlist:10?`IBM`MSFT`GOOG
function:{0N!(x;y)}
{.[function;x]} peach datelist cross symlist
cross will return all combinations of sym and date
Is this what you need?

Try to use two "double" '
raze function'[datelist]'[symlist]
peach or each won't work here. They are not operators, but anonymous functions with two parameters: each is k){x'y}. That is why function each list1 each list2 statement is invalid, but function'[list1]'[list2] works.

From reading your response to another answer you are looking to save the results with unique names yes? Take a look at this solution using set to save and get to retrieve.
q)t:flip enlist each `colA`colB!(100;`name)
q)t
colA colB
---------
100 name
q)f:{[date;sym]tblName:`$string[date],string sym;tblName set update date:date,sym:sym from t}
q)newTbls:f'[.z.d+til 3;`AAA`BBB`CCC]
q)newTbls
`2020.09.02AAA`2020.09.03BBB`2020.09.04CCC
q)get each newTbls
+`colA`colB`date`sym!(,100;,`name;,2020.09.02;,`AAA)
+`colA`colB`date`sym!(,100;,`name;,2020.09.03;,`BBB)
+`colA`colB`date`sym!(,100;,`name;,2020.09.04;,`CCC)
q)get first newTbls
colA colB date sym
------------------------
100 name 2020.09.02 AAA
Does this meet your needs?

This could be a stab in the dark, but why not create a hdb rather than all these variables output20191005ABC, output20191006ABC .. etc and given you want to append them to 1 table.
Below I have outlined how to create a date partitioned hdb called outputHDB which has one table outputTbl. I created the hdb by running a function by date and sym and then upserting those rows to disk.
C:\Users\Matthew Moore>mkdir outputHDB
C:\Users\Matthew Moore>cd outputHDB
// can change the outputHDB as desired
// start q
h:hopen `::6789; // as a demo I connected to another hdb process and extracted some data per sym / date over IPC
hdbLoc:hsym `$"C:/Users/Matthew Moore/outputHDB";
{[d;sl]
{[d;s]
//output:yourFunc[date;sym];
// my func as a demo, I'm grabbing rows where price = max price by date and by sym from another hdb using the handle h
output:{[d;s]
h({[d;s] select from trades where date = d, sym = s, price = max price};d;s)
}[d;s];
// HDB Part
path:` sv (hdbLoc;`$string d;`outputTbl;`);
// change `outputTbl to desired table name
// dynamically creates the save location and then upserts one syms data directly to disk
// e.g. `:C:/Users/Matthew Moore/outputHDB/2014.04.21/outputTbl/
// extra / at the end saves the table as splayed i.e. each column is it's own file within the outputTbl directory
path upsert .Q.en[`:.;output];
// .Q.en enumerates syms in a table which is required when saving a table splayed
}[d;] each sl;
// applies the parted attribute to the sym column on disk, this speeds up querying for on disk data
#[` sv (hdbLoc;`$string d;`outputTbl;`);`sym;`p#];
}[;`AAPL`CSCO`DELL`GOOG`IBM`MSFT`NOK`ORCL`YHOO] each dateList:2014.04.21 2014.04.22 2014.04.23 2014.04.24 2014.04.25;
Now that the hdb has been created, you can load it from disk and query with qSQL
q)\l .
q)select from outputTbl where date = 2014.04.24, sym = `GOOG
date sym time src price size
------------------------------------------------------------
2014.04.24 GOOG 2014.04.24D13:53:59.182000000 O 46.43 2453

Related

How to apply a function to each columns in kdb?

Let's say I have the following,
q)add2:{[x]:x+2};
q)Bidcols:`Bid1px`Bid2px`Bid3px;
q)table:([]time:9 11;Bid1px:4 5;Bid2px:7 3;Bid3px:6 8);
time Bid1px Bid2px Bid3px
-------------------------
9 4 7 6
11 5 3 8
and I want to apply this add2 function to each cols of the table like the below
q)table:update Bid1px:add2'[Bid1px],Bid2px:add2'[Bid2px],Bid3px:add2'[Bid3px] from table;
time Bid1px Bid2px Bid3px
-------------------------
9 6 9 8
11 7 5 10
My questions are:
Is there a way to do this using Bidcols?
What are the other efficient ways to achieve this?
Thanks in advance.
You can do this using a function select:
q)?[table; (); 0b; `time`Bid1px ! (`time; (each; add2; `Bid1px))]
time Bid1px
-----------
9 6
11 7
For (1), if you want to do it using Bidcols:
q)?[table; (); 0b; ] (cols table) ! {$[x in Bidcols; (each; add2; x); x]} each cols table
time Bid1px Bid2px Bid3px
-------------------------
9 6 9 8
11 7 5 10
I'm not sure what you mean for (2)? Are you asking for the most efficient way to do this?
Functional select/update are most general and flexible approaches. However, in this particular case reassignment works well:
table[Bidcols]: add2 table[Bidcols];
because add2 function already supports vectors.
If add didn't support vectors straightaway, e.g.
add: {[x]: $[x>10;x+2;x+3]}
Following reassignment would work:
table[Bidcols]: (add'') table[Bidcols];
Another solution:
q)#[table;Bidcols;add2]
time Bid1px Bid2px Bid3px
-------------------------
9 6 9 8
11 7 5 10
or
q)#[`table;Bidcols;add2]
for an in-place update.

KDB/Q: multiple PEACH?

I have a function that takes 2 parameters: date and sym. I would like to do this for multiple dates and multiple sym. I have a list for each parameter. I can currently loop through 1 list using
raze function[2020.07.07;] peach symlist
How can I do something similar but looping through the list of dates too?
You may try following:
Create list of pairs of input parameters.
Write anonymous function which calls your function and use peachon list op paired parameters
For example
symlist: `A`B`C; // symlist defined for testing
function: {(x;y)}; // function defined for testing
raze {function . x} peach (2020.07.07 2020.07.08 2020.07.09) cross symlist
I think this could work:
raze function'[2020.07.07 2020.07.08 2020.07.09;] peach symlist
If not some more things to consider. Could you change your function to accept a sym list instead of individual syms by including an each/peach inside it? Then you could each the dates.
Also, you could create a new list of each date matched with the symlist and create a new function which takes this list and does whatever the initial function did by separating the elements of the list.
q)dates
2020.08.31 2020.09.01 2020.09.02
q)sym
`llme`obpb`dhca`mhod`mgpg`jokg`kgnd`nhke`oofi`fnca`jffe`hjca`mdmc
q)func
{[date;syms]string[date],/:string peach syms}
q)func2
{[list]func[list 0;list 1]}
q)\t res1:func[;sym]each dates
220
q)\t res2:func[;sym]peach dates
102
q)
q)func2
{[list]func[list 0;list 1]}
q)dateSymList:dates,\:enlist sym
q)\t res3:func2 peach dateSymList
80
q)res3~res2
1b
q)res3~res1
1b
Let us know if any of those solutions work, thanks.
Some possible ways to do this
Can project dyadic f as monadic & parallelise over list of argument pairs
q)a:"ABC";b:til 3;f:{(x;y)}
q)\s 4
q)(f .)peach l:raze a,\:/:b
"A" 0
"B" 0
"C" 0
"A" 1
"B" 1
"C" 1
"A" 2
"B" 2
"C" 2
Or could define function to take a dictionary argument & parallelise over a table
q)f:{x`c1`c2}
q)f peach flip`c1`c2!flip l
"A" 0
"B" 0
"C" 0
"A" 1
"B" 1
"C" 1
"A" 2
"B" 2
"C" 2
Jason
I'll generalize everything, if you have a given function foo which will operate on an atom dt with a vector s
q)foo:{[dt;s] dt +\: s}
q)dt:10?10
q)s:100?10
q)dt
8 1 9 5 4 6 6 1 8 5
q)s
4 9 2 7 0 1 9 2 1 8 8 1 7 2 4 5 4 2 7 8 5 6 4 1 3 3 7 8 2 1 4 2 8 0 5 8 5 2 8..
q)foo[;s] each dt
12 17 10 15 8 9 17 10 9 16 16 9 15 10 12 13 12 10 15 16 13 14 12 9 11 11 ..
5 10 3 8 1 2 10 3 2 9 9 2 8 3 5 6 5 3 8 9 6 7 5 2 4 4 ..
13 18 11 16 9 10 18 11 10 17 17 10 16 11 13 14 13 11 16 17 14 15 13 10 12 12 ..
9 14 7 12 5 6 14 7 6 13 13 6 12 7 9 10 9 7 12 13 10 11 9 6 8 8 ..
The solution is to project the symList over the function in question, then use each (or peach) for the date variable.
If your function requires an atomic date and sym, then you can just create a new function to implement this
q)bar:{[x;y] foo[x;] each y};
datelist:`date$10?10
symlist:10?`IBM`MSFT`GOOG
function:{0N!(x;y)}
{.[function;x]} each datelist cross symlist

Why does readmatrix in Matlab skip the first n lines?

In my simulation I am writing data to file using writematrix, then later reading it back using readmatrix. I am appending to a single file at each time step, each line is the same length or longer than the previous line.
For some reason when using readmatrix on the output file, the first n lines are skipped entirely, as in not read at all. For example, my file looks like this:
...
11.8,1,2,3,4,5,6,7,8,9,10,2
11.9,1,2,3,4,5,6,7,8,9,10,2
...
12.3,1,2,3,4,5,6,7,8,9,10,2
12.4,7,8,9,10,7,8,9,10,1,2,1,1,2,3,4,5,6,3,4,5,6,1
12.5,7,8,9,10,7,8,9,10,1,2,1,1,2,3,4,5,6,3,4,5,6,1
...
30.5,7,8,9,10,7,8,9,10,1,2,2,1,2,3,4,5,6,3,4,5,6,2
30.6,7,8,9,10,7,8,9,10,1,2,2,1,2,3,4,5,6,3,4,5,6,2
30.7,17,18,19,20,1,2,7,8,9,10,1,1,2,3,4,5,6,3,4,5,6,2,11,12,13,14,15,16,7,8,9,10,1
30.8,17,18,19,20,1,2,7,8,9,10,1,1,2,3,4,5,6,3,4,5,6,2,11,12,13,14,15,16,7,8,9,10,1
...
(the first column is a time stamp, so the first ellipsis represents t=0 to t=11.7. At t=30.7 there is another step jump in the number of entries), and when I read using the command
data = readmatrix('/path/to/file/data.csv');
the matrix data looks like
12.4 7 8 9 10 7 8 9 10 1 2 1 1 2 3 4 5 6 3 4 5 6 1
12.5 7 8 9 10 7 8 9 10 1 2 1 1 2 3 4 5 6 3 4 5 6 1
12.6 7 8 9 10 7 8 9 10 1 2 1 1 2 3 4 5 6 3 4 5 6 1
...
30.5 7 8 9 10 7 8 9 10 1 2 2 1 2 3 4 5 6 3 4 5 6 2
30.6 7 8 9 10 7 8 9 10 1 2 2 1 2 3 4 5 6 3 4 5 6 2
30.7 17 18 19 20 1 2 7 8 9 10 1 1 2 3 4 5 6 3 4 5 6 2 11 12 13 14 15 16 7 8 9 10 1
30.8 17 18 19 20 1 2 7 8 9 10 1 1 2 3 4 5 6 3 4 5 6 2 11 12 13 14 15 16 7 8 9 10 1
...
That is to say, all the entries before t=12.4 (i.e. the first step jump in line length) are skipped.
In the file, if I delete everything before the first step jump (i.e everything before t=12.4), then I get the same matrix data, so we can conclude the subsequent step jumps cause no issue. If I delete everything from the second step jump (i.e. everything after t=30.6) then it still skips all the entries before t=12.4. If I have no step jumps (i.e. only t=0 to t=12.3) then it happily reads in the first lines.
I've tried reading the same file using csvread and it returns all of the data from the beginning of the file (albeit padded with zeros instead of nans), so I'm confident the issue isn't with the file.
Why is this happening?
A minimum working example is the first code block without the ellipses.
For reference, the first lines have 12 csvs, and each step jump increase that by 11
Edit:
Output from detectImportOptions
ans =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {','}
Whitespace: '\b\t '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'UTF-8'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2', 'Var3' ... and 20 more}
VariableTypes: {'double', 'double', 'double' ... and 20 more}
SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 20 more}
VariableOptions: Show all 23 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
PreserveVariableNames: false
Location Properties:
DataLines: [4 Inf]
VariableNamesLine: 0
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
To display a preview of the table, use preview
Matlab's readmatrix is trying to be smart and locate a 2-D matrix within the data model of the CSV file you're passing it. It looks like it's passing over the first few lines which don't have explicit trailing empty "cells".
You can control this by setting the import options. Run opts = detectImportOptions(...); on your file and have a look at the DataLines property. If it doesn't start at 1, set it to [1 Inf] to force readmatrix to read in all the lines. And then call readmatrix, explicitly passing in that options structure.
To do this compactly (and probably more efficiently), call readmatrix with an explicit option right off the bat like this:
readmatrix(path2mat,delimitedTextImportOptions('DataLines',[0,Inf]))

How to sum across a row in KDB/Q

I have a table rCom which has various columns. I would like to sum across each row..
for example:
Date TypeA TypeB TypeC TypeD
date1 40.5 23.1 45.1 65.2
date2 23.3 32.2 56.1 30.1
How can I write a q query to add a fourth column 'Total' that sums across each row?
why not just:
update Total: TypeA+TypeB+TypeC+TypeD from rCom
?
Sum will work just fine:
q)flip`a`b`c!3 3#til 9
a b c
-----
0 3 6
1 4 7
2 5 8
q)update d:sum(a;b;c) from flip`a`b`c!3 3#til 9
a b c d
--------
0 3 6 9
1 4 7 12
2 5 8 15
Sum has map reduce which will be better for a huge table.
One quick point regarding summing across rows. You should be careful about nulls in 1 column resulting in a null result for the sum. Borrowing #WooiKent Lee's example.
We put a null into the first position of the a column. Notice how our sum now becomes null
q)wn:.[flip`a`b`c!3 3#til 9;(0;`a);first 0#] //with null
q)update d:sum (a;b;c) from wn
a b c d
--------
3 6
1 4 7 12
2 5 8 15
This is a direct effect of the way nulls in q are treated. If you sum across a simple list, the nulls are ignored
q)sum 1 2 3 0N
6
However, a sum across a general list will not display this behavior
q)sum (),/:1 2 3 0N
,0N
So, for your table situation, you might want to fill in with a zero beforehand
q)update d:sum 0^(a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
Or alternatively, make it s.t. you are actually summing across simple lists rather than general lists.
q)update d:sum each flip (a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
For a more complete reference on null treatment please see the reference website
This is what worked:
select Answer:{[x;y;z;a] x+y+z+a }'[TypeA;TypeB;TypeC;TypeD] from
([] dt:2014.01.01 2014.01.02 2014.01.03; TypeA:4 5 6; TypeB:1 2 3; TypeC:8 9 10; TypeD:3 4 5)

Selecting specific rows of a matrix in Matlab [duplicate]

This question already has an answer here:
Extract rows from matrix and make a new matrix in MATLAB
(1 answer)
Closed 10 years ago.
I have a 6639x5 matrix in Matlab and I would like to select certain specific rows in a particular order( say 1st,11th,21st,31st rows... and subsequent additions of 10 until end) to form a new matrix.Any ideas?
Thank you,
Oti.
subset = a(1:10:end, :);
Selects every 10th row until the end, and all columns.
Example:
>> a = magic(5)
a =
17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
11 18 25 2 9
>> a(1:2:end, :)
ans =
17 24 1 8 15
4 6 13 20 22
11 18 25 2 9