kdb - get column values n days ago - kdb

If I have a table of prices
t:([]date:2018.01.01+til 30;px:100+sums 30?(-1;1))
date px
2018.01.01 101
2018.01.02 102
2018.01.03 103
2018.01.04 102
2018.01.05 103
2018.01.06 102
2018.01.07 103
...
how do I compute the returns over n days? I am interested in both computing
(px[i] - px[i-n])/px[i-n] and (px[date] - px[date-n])/px[date-n], i.e. one where the column px is shifted n slots by index and one where the previous price is the price at date-n
Thanks for the help

Well you've pretty much got it right with the first one. To get the returns you can use this lambda:
{update return1:(px-px[i-x])%px[i-x] from t}[5]
For the date shift you can use an aj like this:
select date,return2:(px-pr)%pr from aj[`date;t;select date,pr:px from update date:date+5 from t]
Basically what you are trying to do here is to shift the date by the number of days you want to and then extract the price. You use an aj to create your table which will look something like this:
q)aj[`date;t;select date,pr:px from update date:date+5 from t]
date px pr
----------------
2018.01.01 99 98
2018.01.02 98 97
2018.01.03 97 98
Where px is your price now and pr is your price 5 days from now.
Then the return is calculated just the normal way.
Hope this helps!

Related

SAS Placeholder value

I am looking to have a flexible importing structure into my SAS code. The import table from excel looks like this:
data have;
input Fixed_or_Floating $ asset_or_liability $ Base_rate_new;
datalines;
FIX A 10
FIX L Average Maturity
FLT A 20
FLT L Average Maturity
;
run;
The original dataset I'm working with looks like this:
data have2;
input ID Fixed_or_Floating $ asset_or_liability $ Base_rate;
datalines;
1 FIX A 10
2 FIX L 20
3 FIX A 30
4 FLT A 40
5 FLT L 30
6 FLT A 20
7 FIX L 10
;
run;
The placeholder "Average Maturity" exists in the excel file only when the new interest rate is determined by the average maturity of the bond. I have a separate function for this which allows me to search for and then left join the new base rate depending on the closest interest rate. An example of this is such that if the maturity of the bond is in 10 years, i'll use a 10 year interest rate.
So my question is, how can I perform a simple merge, using similar code to this:
proc sort data = have;
by fixed_or_floating asset_or_liability;
run;
proc sort data = have2;
by fixed_or_floating asset_or_liability;
run;
data have3 (drop = base_rate);
merge have2 (in = a)
have1 (in = b);
by fixed_or_floating asset_or_liability;
run;
The problem at the moment is that my placeholder value doesn't read in and I need it to be a word as this is how the excel works in its lookup table - then I use an if statement such as
if base_rate_new = "Average Maturity" then do;
(Insert existing Function Here)
end;
so just the importing of the excel with a placeholder function please and thank you.
TIA.
I'm not 100% sure if this behaviour corresponds with how your data appears once you import it from excel but if I run your code to create have I get:
NOTE: Invalid data for Base_rate_new in line 145 7-13.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
145 FIX L Average Maturity
Fixed_or_Floating=FIX asset_or_liability=L Base_rate_new=. _ERROR_=1 _N_=2
NOTE: Invalid data for Base_rate_new in line 147 7-13.
147 FLT L Average Maturity
Fixed_or_Floating=FLT asset_or_liability=L Base_rate_new=. _ERROR_=1 _N_=4
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.HAVE has 4 observations and 3 variables.
Basically it's saying that when you tried to import the character strings as numeric it couldn't do it so it left them as null values. If we print the table we can see the null values:
proc print data=have;
run;
Result:
Fixed_or_ asset_or_ Base_
Floating liability rate_new
FIX A 10
FIX L .
FLT A 20
FLT L .
Assuming this truly is what your data looks like then we can use the coalesce function to achieve your goal.
data have3 (drop = base_rate);
merge have2 (in = a)
have (in = b);
by fixed_or_floating asset_or_liability;
base_rate_new = coalesce(base_rate_new,base_rate);
run;
The result of doing this gives us this table:
Fixed_or_ asset_or_ Base_
ID Floating liability rate_new
1 FIX A 10
3 FIX A 10
2 FIX L 20
7 FIX L 20
4 FLT A 20
6 FLT A 20
5 FLT L 30
The coalesce function basically returns the first non-null value it can find in the parameters you pass to it. So when base_rate_new already has a value it uses that, and if it doesn't it uses the base_rate field instead.

In tableau, i want to hide alternative rows

In tableau, I want to hide alternative rows like:
I have 2 columns name as ID , Name
id name
101 x
102 y
103 z
104 a
Now I want:
id name
101 x
102
103 z
104
I don't believe there is a way to do this simply, as in there is no option dealing with "alternative rows" in Tableau.
Given the sample data provided though, you could create a calculated field to determine the sequence of data you are displaying. For example, the table functions RANK_UNIQUE or INDEX could sequentially assign an integer to your data within the partition. You can then use
[Even_or_odd]:
IF [Sequence_Number] % 2 = 0
THEN "Even"
ELSE "Odd"
You could then use this calculated field to drive a second that would go along the lines of:
[Display_Value]:
IF [Even_or_odd] = Odd
THEN NULL
ELSE [Value]

sum on date of the month

I have daily data for a year:
25-Apr-17 45
26-Apr-17 50
27-Apr-17 53
28-Apr-17 47
29-Apr-17 34
30-Apr-17 66
01-May-17 10
02-May-17 42
03-May-17 22
04-May-17 65
05-May-17 76
06-May-17 35
I would like to sum the value, at the month of given date, but prior of the given date ie:
month sum as of date 03-May-17;
I would need to get 10+42+22 = 72 only.
While 04-May-17 value onwards should not be included in sum.
I have tried with sumproduct, sumif but none seems to match this requirement.
Assuming you want this in excel or google sheets.
If you put your dates in col A, values in col B and your criteria date in E1.
Put this formula in another cell (not in col A or B)
=arrayformula(SUM(IF(DAY(A:A)<=DAY($E$1),IF(MONTH(A:A)=MONTH($E$1),B:B,0),0)))
Oddly enough I couldn't do =IF(AND([range of values and compare],[range of values and compare]))
Which would be cleaner but it seemed to evaluate as false

how to handle type F in a table in Q/KDB

I have started to learn q/KDB since a while, therefore forgive me in advance for trivial question but I am facing the following problem I don't know how to solve.
I have a table named "res" showing, side, summation of orders and average_price of some simbols
sym side | sum_order avg_price
----------| -------------------
ALPHA B | 95109 9849.73
ALPHA S | 91662 9849.964
BETA B | 47 9851.638
BETA S | 60 9853.383
with these types
c | t f a
---------| -----
sym | s p
side | s
sum_order| f
avg_price| f
I would like to calculate close and open positions, average point, made by close position, and average price of the open position.
I have used this query which I believe it is pretty bizarre (I am sure there will be a more professional way to do it) but it works as expected
position_summary:select
close_position:?[prev[sum_order]>sum_order;sum_order;prev[sum_order]],
average_price:avg_price-prev[avg_price],
open_pos:prev[sum_order]-sum_order,
open_wavgprice:?[sum_order>next[sum_order];avg_price;next[avg_price]][0]
by sym from res
giving me the following table
sym | close_position average_price open_pos open_wavgprice
----------| ----------------------------------------------------
ALPHA | 91662 0.2342456 3447 9849.73
BETA | 47 1.745035 -13 9853.38
and types are
c | t f a
--------------| -----
sym | s s
close_position| F
average_price | F
open_pos | F
open_wavgprice| f
Now my problem starts here, imagine I join position_summary table with another table appending another column "current_price" of type f
What I want to do is to determinate the points of the open positions.
I have tried this way:
select
?[open_pos>0;open_price-open_wavgprice;open_wavgprice-open]
from position_summary
but I got 'type error,
surely because sum_order is type F and open_wavgprice and current_price are f. I have search on internet by I did not find much about F type.
First: how can I handle this ? I have tried "cast" or use "raze" but no effects and moreover I am not sure if they are right on this particular occasion.
Second: is there a better way to use "if-then" during query tables (for example, in plain English :if this row of this column then take the previous / next of another column or the second or third of previous /next column)
Thank you for you help
Let me rephrase your question using a slightly simpler table:
q)show res:([sym:`A`A`B`B;side:`B`S`B`S]size:95 91 47 60;price:49.7 49.9 51.6 53.3)
sym side| size price
--------| ----------
A B | 95 49.7
A S | 91 49.9
B B | 47 51.6
B S | 60 53.3
You are trying to find the closing position for each symbol using a query like this:
q)show summary:select close:?[prev[size]>size;size;prev[size]] by sym from res
sym| close
---| -----
A | 91
B | 47
The result seems to have one number in each row of the "close" column, but in fact it has two. You may notice an extra space before each number in the display above or you can display the first row
q)first 0!summary
sym | `A
close| 0N 91
and see that the first row in the "close" column is 0N 91. Since the missing values such as 0N are displayed as a space, it was hard to see them in the earlier display.
It is not hard to understand how you've got these two values. Since you select by sym, each column gets grouped by symbol and for the symbol A, you have
q)show size:95 91
95 91
and
q)prev size
0N 95
that leads to
q)?[prev[size]>size;size;prev[size]]
0N 91
(Recall that 0N is smaller than any other integer.)
As a side note, ?[a>b;b;a] is element-wise minimum and can be written as a & b in q, so your conditional expression could be written as
q)size & prev size
0N 91
Now we can see why ? gave you the type error
q)close:exec close from summary
q)close
91
47
While the display is deceiving, "close" above is a list of two vectors:
q)first close
0N 91
and
q)last close
0N 47
The vector conditional does not support that:
q)?[close>0;10;20]
'type
[0] ?[close>0;10;20]
^
One can probably cure that by using each:
q)?[;10;20]each close>0
20 10
20 10
But I don't think this is what you want. Your problem started when you computed the summary table. I would expect the closing position to be the sum of "B" orders minus the sum of "S" orders that can be computed as
q)select close:sum ?[side=`B;size;neg size] by sym from res
sym| close
---| -----
A | 4
B | -13
Now you should be able to fix the rest of the columns in your summary query. Just make sure that you use an aggregation function such as sum in the expression for every column.
Type F means the "cell" in the column contains a vector of floats rather than an atom. So your column is actually a vector of vectors rather than a flat vector.
In your case you have a vector of size 1 in each cell, so in your case you could just do:
select first each close_position, first each average_price.....
which will give you a type f.
I'm not 100% on what you were trying to do in the first query, and I don't have a q terminal to hand to check but you could put this into your query:
select close_position:?[prev[sum_order]>sum_order;last sum_order; last prev[sum_order].....
i.e. get the last sum_order in the list.

two different size of dataset filtering via considering timestamp using matlab?

I have two very large dataset of matlab. In both dataset we have different parameter. The only common parameter is timestamp means measuring value of all parameter with every 10 min of interval. Let us take an example,
In dataset 1 , I have Timestamp (YYYY-MM-DD , HH : MM :SS format) and power
In dataset 2, I have again timestamp(in above format) and speed
I want a new dataset which have power and speed with timestamp synchronization. For example :
TimeStamp P S
2014 - 01 - 01 , 00 :10 100 5
00 :20 7
00:30 150 10
00:40 200
00:50 145 12
01:00 50 7
01:10 6
etc............
So in short the output of the final dataset must be like :
TimeStamp P S
00 :10 100 5
00:30 150 10
00:50 145 12
So basically if i am getting both power and speed with same time then it should take otherwise filter rest.
And If we have different size of observation in both data set will it work ?? Even though they might have different observation size but I want only those data in my final database whose P and S matching with time Stamp and if it is not making then my final data base exclude those sets
anyone help me on this with the help of matlab ??? thanks in advance
You could try something like this:
%type "help ismember" in command window to see what the function does
%finds index of timestamp in dataset1 that exists in dataset 2
indexPinS = ismember(dataset1(:,1),dataset2(:,1));
%finds index of timestamp in dataset2 that exists in dataset 1
indexSinP = ismember(dataset2(:,1),dataset1(:,1));
%combines data in final database
finalDatabase = [dataset1(indexPinS,1), dataset1(indexPinS,2), dataset2(indexSinP,2)];