map values based on condition in kdb - kdb

I have a set of trade data like the following:
date
sym
asset
points
tenor
2020.11.10
ABC
FX
0.2
"AN"
2020.11.15
ABB
DX
0.1
"AN"
2020.11.17
AAA
FX
0.1
"ON"
2020.11.1
ABB
FX
0.3
"AN"
2020.11.1
AAA
FX
0.1
"SW"
for the asset=FX the tenor has to mapped/updated with below:
"AN"->"AN/3S"
"ON"->"0N"
"SW"->"1W"
The result should look like:
date
sym
asset
point
tenor
2020.11.10
ABC
FX
0.2
"AN/3s"
2020.11.15
ABB
DX
0.1
"AN"
2020.11.17
AAA
FX
0.1
"0N"
2020.11.1
ABB
FX
0.3
"AN/3s"
2020.11.1
AAA
FX
0.1
"1W"
I have tried the naive approach already. Is there a way to do it programmatically?
For reference the table name is data and it is partitioned on date

Try below query, where t is the table:
update tenor: (`AN`ON`SW!`$("AN/3S";"0N";"1W"))#/:tenor from t
It looks up dictionary `AN`ON`SW!`$("AN/3S";"0N";"1W") for each tenor on the right.

We can create a set of indices for the tenors we want to update using the find (?) operator.
We then use those indices to index into our corresponding map.
Finally we fill forward with tenor in case there were any tenors not covered in our mapping (you could replace this with a suitable fill of your choice).
The where clause ensures we only perform this operation on FX assets. As it seems your are working with a date partitioned table I have also added a date clause to further constrain the query
q)update ("AN/3S";"0N";"1W") ("AN";"ON";"SW")?tenor from select from data where date within 2020.11.01 2020.11.20,asset = `FX
date sym asset point tenor
----------------------------------
2020.11.10 ABC FX 0.2 "AN/3S"
2020.11.17 AAA FX 0.1 "0N"
2020.11.01 ABB FX 0.3 "AN/3S"
2020.11.01 AAA FX 0.1 "1W"

Related

What's the easiest way of making a histogram in KDB?

If I've got a list of values x, what's the easiest way to make a histogram with bin-size b?
It seems like I could hack a complicated solution, but I feel like there must be a built-in function for this somewhere that I don't know about.
I haven't heard about built-in histogram so far. But I would approach this task like below.
For fixed bucket size:
a: 0.39 0.51 0.51 0.4 0.17 0.3 0.78 0.53 0.71 0.41;
b: 0.1;
{count each group x xbar y}[b;a]
// returns 0.3 0.5 0.4 0.1 0.7!2 3 2 1 2j
For "floating" buckets:
a: 0.39 0.51 0.51 0.4 0.17 0.3 0.78 0.53 0.71 0.41;
b: -1 0.5 0.7 1;
{count each group x#x bin y}[b;a]
// returns -1 0.5 0.7!5 3 2j
Above functions return dictionary with bucket starts as keys and number of bucket occurrences as values.
Assuming you have a list of x values (let's assume x = 1000):
v:1000?1.0;
You can achieve what you need as follows:
b:0.1;
hist:(count') group xbar[b;v];
There are two points:
the keys in hist are not sorted
For the bucket, do you prefer to output the left or the right delimiter?
To solve for 1), you simply do:
hist:(asc key hist)#hist;
To solve for 2) - I mean, if you want to have the right delimiter:
hist:(+[b;key hist])!value hist;

How to select whole numbers in a row and its adjacent numbers in Matlab

Good day!
I would like to select the whole numbers in my random data, at the same time it will also choose the adjacent numbers.
For example, I have this raw data
A = [0.1 0.2
0.2 0.1
1 0.3
0.3 0.2
0.4 0.4
2 0.5]
so would like to select the (1, 0.3) and (2, 0.5). then my final ouptut will be,
B= [1 0.3
2 0.4]
Thanks in advance!
You can use modulo:
B=A(sum(mod(A,1),2)==0,:)
========== EDIT ====================
Editing w.r.t. comments, if you are only checking for integers in the first column then you do not need to sum results:
B=A(mod(A(:,1),1)==0,:)
Alternative ways would use logicals instead of numericals:
B=A(all(A==round(A),2),:)
or if only the 1st column is checked:
B=A(A==round(A(:,1)),:)

How can I delete table row values in pairs? For example, if either column is less than 0.01, how do I delete the row?

I have two sets of data from different instruments that have common X-variables (XThompsons) but various Y-variables (YCounts) due to various experimental conditions. The data resemble the example below:
[Table1]
XThompsons | YCounts (1) | YCounts (2) | YCounts (3) | .... | ....
------------------------------------------------------------------
[Table2]
XThompsons | YCounts (1) | YCounts (2) | YCounts (3) | .... | ....
------------------------------------------------------------------
When I have two sets of data that are like this, I have written a script to take a single Y-column information from Table1 and do some math to all Y-columns in Table2. However, when comparing two table columns if either column has a value of a specific threshold (0.10) I want to delete that value. In the example below I want to delete row 4 and row 6 because either column has a value containing 0.10 or less
XThompsons | Table1.YCounts(1) | Table2.YCounts(2)
--------------------------------------------------
1 1.00 0.50
2 0.22 0.12
3 0.29 0.14
4 0.29 0.09 (delete row)
5 0.11 0.49
6 0.02 0.83 (delete row)
How can I carry this out in Matlab? My current code is below; I convert each table row to an array first. How can I make it so that if Y < 0.10 delete the row?
datax = readtable('table1.xls'); % Instrument 1
datay = readtable('table2.xls'); % Instrument 2
SIDATA = [];
for idx=2:width(datay);
% Read the indexed column of datax (instrument 1) then normalize to 1
x = table2array(datax(:,idx));
x = x ./ max(x);
% Read indexed column of datay (instrument 2) and carry out loop
for idy=2:width(datay);
% Normalize y data to 1
y = table2array(datay(:,idy));
y = y ./ max(y);
% Calculate similarity index (SI) at using the datax index for all collision energies for datay
xynum = sum(sqrt(x) .* sqrt(y));
xyden = sqrt(sum(x) .* sum(y));
SIDATA(idy,idx) = (xynum/xyden);
end
end
Help would be appreciated.
Thanks!
Generally when looping through and pruning values you want to increment from the end of the matrix back to one; this way, if you delete any rows, you don't skip. (If you delete row 2, then advance to row 3, you skip the data formerly in row 3).
To me, the easiest way to do this is that if all your data is in one matrix A, with columns Y1 Y2,
APruned = A((A(:,1) > 0.1) & (A(:,2) > 0.1),:)
This takes the A matrix, finds the rows where Y1 > 0.1, finds the rows where Y2 > 0.1, finds the overlap, and then outputs only the rows in A where both of these are true.
You should read about logical indecies for more on this topic
EDIT: It looks like you could also clean up your earlier code using element-wise operations;
A = [datax./max(datax) datay./max(datay)];

How to rearrange data from a vector (list) into a matrix (table) in MATLAB

Given three K by 1 arrays date, ticker, and volume containing information about financial transactions, I would like to compute a T by N array containing the same information in a more accessible format. My vectors look like this:
ticker date volume
______ ____ ______
'ABCD' 735602 123456
'ABCD' 735603 789101
'FGHI' 735602 112131
'NOPQ' 735602 415161
'NOPQ' 735603 718192
'NOPQ' 735605 021232
... ... ...
The matrix I want to obtain would look like this (shown as a table for illustration):
'ABCD' 'FGHI' 'JKLM' 'NOPQ' ...
______ ______ ______ ______
735602 | 123456 112131 000000 415161
735603 | 789101 000000 000000 718192
735604 | 000000 000000 000000 000000
735605 | 000000 000000 000000 021232
...
735963
Note that the dimensions of my matrix are pre-specified and they do not depend on the size of any of my input vectors because not all tickers are contained in my input vectors; similarly, not all dates are represented in my input vectors. All coefficients of the matrix whose volume value is not contained in the input vectors should be set to 0.
I have been experimenting with loops and conditions to the point where I have gotten quite confused. I am sure for someone with a more advanced knowledge this is a rather basic task. Any suggestions on how to approach this are greatly appreciated!
This question is related to this previous one. Thanks to the solution found for this question, each entry in volume can be identified unambiguously by the corresponding ticker and date so this is no longer an issue.
First, you need to convert the 'ticker' string to a numeris index into the table.
You can do that using unique
[~,~,tIdx] = unique( ticker );
Now you can use accumarray to accumulate the volume information into a 2D table
myTable = accumarray( {date, tIdx}, volume, [T N] );
Alternatively, if the entries to myTable are unique and do not require accumulation, you can use sub2ind:
myTable = zeros( T, N );
myTable( sub2ind([T,N], date, tIdx) ) = volume;
Another option (if the range of date is too large) is to save myTable as sparse matrix
myTable = sparse( date, tIdx, volume );

how to pick a percentage of data matlab

Hi I have a data set lets call it dataset A and this dataset consists of 500x10 samples. I have another dataset B which is the class labels for each of those rows.
Dataset A | Dataset B
1 0.2 | 0.3 = Green
2 0.1 | 0.1 = Red
3 0.2 | 0.4 = Blue
and so on...
I want to choose a percentage of blue and red from dataset A lets call it percentOfA and have another dataset which corresponds to that from dataset B, lets call it ResultOfA which is just matching colours to the percent of colours choosen from A.
So the new dataset percentOfA would look like:
1 0.2 | 0.4
2 0.2 | 0.4
3 0.2 | 0.4
4 0.1 | 0.1
75% blue and 25% red, then the new resultOfA would look like this:
1 Blue.
2 Blue.
3 Blue.
4 Red.
How is this achieved in matlab?
Sorry I would try show code but I cant find anything for this in the documentation.
NEW EDIT:
So I am a tad lost on how to explain this better. Dataset B contains 500x1 of colours, Blue, Green, Red etc
This dataset B matches dataset A but dataset A contains numerical values of what constitutes those colours.
All I want to do is use dataset B to pick 75% of the blue colour and the ones it picked it keeps track of the row number and then uses those row numbers to take the data out of dataset A and put it into a new dataset.
So that way my "newdataset" will just be 75% of the blue colour and or also 25% red from dataset A (the numerical values).
Look into using the crossvalind function in the Bioinformatics toolbox, specifically with the Group parameter. It will make your life a lot easier by not needing to code this functionality on your own.
In your case, you would simply do something like:
percentage = 75;
[train, test] = crossvalind('HoldOut', B, size(A, 1), percentage/100);
percentOfA = A(train, :);
resultOfA = B(train, :);
First use find to determine the blue rows: idx=find(B==0.4);
Then randomly choose 75% of those blue rows:
n = numel(idx);
n75 = round(n*0.75);
r = randperm(n);
idx_blue = idx(r(1:n75));
The corresponding blue rows of A is: A_blue = A(idx_blue);
Repeat the steps to choose red rows.
HTH.