kdb - combine two timeseries into one

kdb - combine two timeseries into one - kdb

I have data in a table with the following schema: date, time, sym, book, pnl
This is a timeseries. sym/book as columns define the timeseries.
I have a special usecase where I need to come up with another timeseries that combines two books together.
If this wasn't a timeseries, this would be fairly easy, just sum by book/sym, filter on the books I Want to combine, and sum again with the the new book name (constant)
But I'm not sure how to create a timeseries with one book value (which is the combination of two at any given in time e.g the distinct times of the combination of both books).
It's important to say that the timeseries isn't even/uniform and that the times are "random" for a bookId/sym combination.
t: ([] date: 4#.z.D; time: (07:00; 07:00; 07:01; 07:02); sym: `x`x`x`x; book: `book1`book2`book2`book1; v: (100; 0; 200; 200))
c: ([] date: 3#.z.D; time: (07:00; 07:01; 07:02); sym: `x`x`x; book: `newbook; v: (100; 300; 400))

Assuming from your expected output that you want to know the total holdings across multiple books at any given time, I think this should fit your purpose.
q)select date,time,sym,book:`newbook,v:sum each vb from update vb:#[;;:;]\[()!();book;v]from t
date time sym book v
--------------------------------
2020.12.22 07:00 x newbook 100
2020.12.22 07:00 x newbook 100
2020.12.22 07:01 x newbook 300
2020.12.22 07:02 x newbook 400
This solution is using a scan (\) to create a dictionary of most recent value for each book, and then summing them. A distinct may need to be added in case there are any rows where nothing has changed.

Related

How to get Number of days between 2 dates when specific condition is met?

I want to get the number of days between two dates based on specific condition, here is the image illustration of what I am talking about:
I need to devise a formula to calculate number of days from dates (column C) it takes ID = 1 to reach from L1 to L2 , so ideally the output for ID = 1 should be:
L1 : 0
L2 : 2022-07-14 - 2022-07-06 = 8
Same for other ids (2,3). I am just a beginner trying to learn, so I apologize for my ordinary question. Thank you

DAYS will give you the day count between dates. try:
=DAYS(SINGLE(FILTER(C:C; B:B="L2"; A:A=1));
SINGLE(FILTER(C:C; B:B="L1"; A:A=1)))
=DAYS(SINGLE(FILTER(C:C; B:B="L2"; A:A=2));
SINGLE(FILTER(C:C; B:B="L1"; A:A=2)))
=DAYS(SINGLE(FILTER(C:C; B:B="L2"; A:A=3));
SINGLE(FILTER(C:C; B:B="L1"; A:A=3)))

Make a list with the quarter and year based on a date range of quarters KDB+/Q

I have a list of date ranges for the past 8 quarters given by the below function
q) findLastYQuarters:{reverse("d"$(-3*til y)+m),'-1+"d"$(-3*-1+til y)+m:3 bar"m"$x}[currentDate;8]
q) findLastYQuarters
2020.01.01 2020.03.31
2020.04.01 2020.06.30
2020.07.01 2020.09.30
2020.10.01 2020.12.31
2021.01.01 2021.03.31
2021.04.01 2021.06.30
2021.07.01 2021.09.30
2021.10.01 2021.12.31
I need to produce a separate list that labels each item in this list by a specific format; the second list would need to be
1Q20,2Q20,3Q20,4Q20,1Q21,2Q21,3Q21,4Q21
This code needs to be able to run on it's own, so how can I take the first list as an input and produce the second list? I thought about casting the latter date in the range as a month and dividing it by 3 to get the quarter and extracting the year, but I couldn't figure out how to actually implement that. Any advice would be much appreciated!

I'm sure there are many ways to solve this, a function like f defined below would do the trick:
q)f:{`$string[1+mod[`month$d;12]%3],'"Q",/:string[`year$d:x[;0]][;2 3]}
q)lyq
2020.01.01 2020.03.31
2020.04.01 2020.06.30
2020.07.01 2020.09.30
2020.10.01 2020.12.31
2021.01.01 2021.03.31
2021.04.01 2021.06.30
2021.07.01 2021.09.30
2021.10.01 2021.12.31
q)f lyq
`1Q20`2Q20`3Q20`4Q20`1Q21`2Q21`3Q21`4Q21

Figured it out.
crop:findLastYQuarters;
crop[0]:crop[0][1];
crop[1]:crop[1][1];
crop[2]:crop[2][1];
crop[3]:crop[3][1];
crop[4]:crop[4][1];
crop[5]:crop[5][1];
crop[6]:crop[6][1];
crop[7]:crop[7][1];
labels:()
labelingFunc:{[r] temp:("." vs string["m"$r]); labels,((string(("J"$temp[1])%3)),"Q",(temp[0][2,3])};
leblingFunc each crop;
labels

MATLAB drop observations from a timetable not contained in another timetable

I have two timetables, each of them have 4 columns, where the first 2 columns are of my particular interest. The first column is a date and the second is an hour.
How can I know which observations (by date an hour) are in the timetable 1 but not in the timetable 2 and, therefore, drop those observations from my timetable 1?
So for example, just by looking I realized that timetable1 included the day 25/05/2015 with hours 1 and 2, but the timetable 2 did not include them, therefore I would like to drop those observations from timetable 1.
I tried using the command groups_timetable1 = findgroups(timetable1.Date,timetable1.Hour);but unfortunately this command does not tell you a lot how to distinguish between observations.
Thank you!

call ismember to find one set of data in another.
to find multiple records as a group in another composite records, you call ismember(..., 'rows').
for example
baseline=[
100, 2.1
200, 7.5
120, 11.0
];
isin=ismember(baseline,[200, 7.5],'rows');
pos=find(isin)
if you have time date strings or datetime objects, please convert those to numerical values, such as by calling datenum or posixtime first.

You can use the timetable method innerjoin to do this. Like so:
% Fabricate some data
dates1 = datetime(2015, 5, ones(10,1));
hours1 = (1:10)';
timetable1 = timetable(dates1(:), hours1, rand(10,1), rand(10,1), ...
'VariableNames', {'Hour', 'Price', 'Volume'});
% Subselect a few rows for timetable2
timetable2 = timetable1([1:3, 6:10],:);
% Use innerjoin to pick rows where Time & Hour intersect:
innerjoin(timetable1, timetable2, 'Keys', {'Time', 'Hour'})
By default, the result of innerjoin contains the table variables from both input tables - that may or may not be what you want.

Shift time series to start from zero H:M:S:MS (possibly in Matlab)

I have some ECG data for a number of subjects. For each subject, I can export an excel file with the RR interval, Heart Rate and other measures. The problem is that I have a timestamp starting at the time of recording (in this case 11:22:3:00).
I need to compare the date with other subjects and I want to automate the procedure in Matlab.
I need to flexibly compare, for instance, the first 3 minutes of subjects in condition 1 with those of sbj in condition 2. Or minutes 4 to 8 of condition 1 and 2 and so forth. To do this, I am thinking that the best way is to shift the time vector for each subject so that it starts from 0.
There are a couple of problems to note: I CANNOT create just one vector for all subjects. This would be inaccurate because the heart measures are variable for each individual.
So, IN SHORT I need to shift the time vector for each participant so that it starts at 0 and increases exactly like the original one. So, in this example:
H: M: S: MS RR HR
11:22:03:000 0.809 74.1
11:22:03:092 0.803 74.7
11:22:03:895 0.768 78.1
11:22:04:663 0.732 81.9
11:22:05:395 0.715 83.9
11:22:06:110 0.693 86.5
11:22:06:803 0.705 85.1
11:22:07:508 0.706 84.9
11:22:08:214 0.749 80.1
11:22:08:963 0.762 78.7
11:22:09:725 0.766 78.3
would become:
00:00:00:0000
00:00:00:092
00:00:00:895
00:00:01:663
and so forth...
I would like to do it in Matlab...
P.S.
I was working around the idea of extracting the info in 4 different variables.
Then, I could subtract the values for each cell from the first cell.
For instance:
11-11 = 0; 22-22=0; 03-03=0; ms: keep the same value
Maybe this could kind of work, except that it wouldn't if I have a subject that started, say, at 11:55:05:00
Thank you all for any help.
Gluce

Basic timestamp normalization just subtracts the minimum (or first, assuming they're properly ordered) time from the rest.
With MATLAB's datetime object, this is just subtraction, which yields a duration object:
ts = ["11:22:03:000", "11:22:03:092", "11:22:03:895", "11:22:04:663"];
% Convert to datetime & normalize
t = datetime(ts, 'InputFormat', 'HH:mm:ss:SSS');
t.Format = 'HH:mm:ss:SSS';
nt = t - t(1);
% Reformat & display
nt.Format = 'hh:mm:ss.SSS';
Which returns:
>> nt
nt =
1×4 duration array
00:00:00.000 00:00:00.092 00:00:00.895 00:00:01.663
Alternatively, you can normalize the datetime array itself:
ts = ["11:22:03:000", "11:22:03:092", "11:22:03:895", "11:22:04:663"];
t = datetime(ts, 'InputFormat', 'HH:mm:ss:SSS');
t.Format = 'HH:mm:ss:SSS';
[h, m, s] = hms(t);
[t.Hour, t.Minute, t.Second] = deal(h - h(1), m - m(1), s - s(1));
Which returns the same:
>> t
t =
1×4 datetime array
00:00:00:000 00:00:00:092 00:00:00:895 00:00:01:663

for loop with structure+matlab

I have a structure named sacfile which has data for various stations within it (sta1-sta6). The sacfile is further borken up into day increments (sacfile.day, per station), and further into hourly increments for each day (sacfile.day.hour). I would like to loop through each day and subsequently through each hour for each station comparison (i.e., day 032 loop through sta1 hr 1 compared to sta2 hr 1, sta3 hr 1, sta4 hr1, sta5 hr1, sta6 hr 1, and so on and so forth through all the hours of that day, then move onto the next day, etc. You get the point. The stations are defined in sacfile.sta. Does anyone have any suggestion on how I can do this simply?
*I only want to loop through the same day and hour for the stations, then move onto the subsequent day and hour. I don't want to cross compare different days and hours. This is important for the loop.
I tried the following:
for i = 1:length(sacfile)
for j = 1:length(sacfile(i,1).day)
for h = 1:length(sacfile(i,1).day.hour)
but that seems to loop through every hour point. Will this work, how can I be assured it's looping through the correct days, i.e., that day 1 for sta1 is the same day1 for sta2.
Here's an example of one of the structures:
name: '2013.032.00.00.00.0000.TA.POKR..BHE.sac'
date: '31-Mar-2014 12:25:33'
bytes: 11949036
isdir: 0
datenum: 7.3569e+05
net: 'TA'
sta: 'POKR'
loc: ''
comp: 'BHE'
day: [1x1 struct]
data: [2987101x1 double]
time: [1x2987101 double]
header: [1x1 struct]
The only relevent ones are net, sta, loc, comp, day and data. The net, sta, loc, comp are the key identifying fields for the file. The name is the name of the file. Day has the data broken up into hours within it. Make sense?

If I understood your problem well, the functions extractfield() and fieldnames() should help.
fields = fieldnames(sacfile);
for i = 1:numel(fields)
b = extractfield(sacfile.(fields{i}).day, 'day3');
c(i) = extractfield(b{1}.hour, 'hour_x');
end
The function extractfield() returns 1x1 cell containing a structure instead of the structure itself. That is why I do b{1}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse