two different size of dataset filtering via considering timestamp using matlab? - matlab

I have two very large dataset of matlab. In both dataset we have different parameter. The only common parameter is timestamp means measuring value of all parameter with every 10 min of interval. Let us take an example,
In dataset 1 , I have Timestamp (YYYY-MM-DD , HH : MM :SS format) and power
In dataset 2, I have again timestamp(in above format) and speed
I want a new dataset which have power and speed with timestamp synchronization. For example :
TimeStamp P S
2014 - 01 - 01 , 00 :10 100 5
00 :20 7
00:30 150 10
00:40 200
00:50 145 12
01:00 50 7
01:10 6
etc............
So in short the output of the final dataset must be like :
TimeStamp P S
00 :10 100 5
00:30 150 10
00:50 145 12
So basically if i am getting both power and speed with same time then it should take otherwise filter rest.
And If we have different size of observation in both data set will it work ?? Even though they might have different observation size but I want only those data in my final database whose P and S matching with time Stamp and if it is not making then my final data base exclude those sets
anyone help me on this with the help of matlab ??? thanks in advance

You could try something like this:
%type "help ismember" in command window to see what the function does
%finds index of timestamp in dataset1 that exists in dataset 2
indexPinS = ismember(dataset1(:,1),dataset2(:,1));
%finds index of timestamp in dataset2 that exists in dataset 1
indexSinP = ismember(dataset2(:,1),dataset1(:,1));
%combines data in final database
finalDatabase = [dataset1(indexPinS,1), dataset1(indexPinS,2), dataset2(indexSinP,2)];

Related

kdb - get column values n days ago

If I have a table of prices
t:([]date:2018.01.01+til 30;px:100+sums 30?(-1;1))
date px
2018.01.01 101
2018.01.02 102
2018.01.03 103
2018.01.04 102
2018.01.05 103
2018.01.06 102
2018.01.07 103
...
how do I compute the returns over n days? I am interested in both computing
(px[i] - px[i-n])/px[i-n] and (px[date] - px[date-n])/px[date-n], i.e. one where the column px is shifted n slots by index and one where the previous price is the price at date-n
Thanks for the help
Well you've pretty much got it right with the first one. To get the returns you can use this lambda:
{update return1:(px-px[i-x])%px[i-x] from t}[5]
For the date shift you can use an aj like this:
select date,return2:(px-pr)%pr from aj[`date;t;select date,pr:px from update date:date+5 from t]
Basically what you are trying to do here is to shift the date by the number of days you want to and then extract the price. You use an aj to create your table which will look something like this:
q)aj[`date;t;select date,pr:px from update date:date+5 from t]
date px pr
----------------
2018.01.01 99 98
2018.01.02 98 97
2018.01.03 97 98
Where px is your price now and pr is your price 5 days from now.
Then the return is calculated just the normal way.
Hope this helps!

SAS Placeholder value

I am looking to have a flexible importing structure into my SAS code. The import table from excel looks like this:
data have;
input Fixed_or_Floating $ asset_or_liability $ Base_rate_new;
datalines;
FIX A 10
FIX L Average Maturity
FLT A 20
FLT L Average Maturity
;
run;
The original dataset I'm working with looks like this:
data have2;
input ID Fixed_or_Floating $ asset_or_liability $ Base_rate;
datalines;
1 FIX A 10
2 FIX L 20
3 FIX A 30
4 FLT A 40
5 FLT L 30
6 FLT A 20
7 FIX L 10
;
run;
The placeholder "Average Maturity" exists in the excel file only when the new interest rate is determined by the average maturity of the bond. I have a separate function for this which allows me to search for and then left join the new base rate depending on the closest interest rate. An example of this is such that if the maturity of the bond is in 10 years, i'll use a 10 year interest rate.
So my question is, how can I perform a simple merge, using similar code to this:
proc sort data = have;
by fixed_or_floating asset_or_liability;
run;
proc sort data = have2;
by fixed_or_floating asset_or_liability;
run;
data have3 (drop = base_rate);
merge have2 (in = a)
have1 (in = b);
by fixed_or_floating asset_or_liability;
run;
The problem at the moment is that my placeholder value doesn't read in and I need it to be a word as this is how the excel works in its lookup table - then I use an if statement such as
if base_rate_new = "Average Maturity" then do;
(Insert existing Function Here)
end;
so just the importing of the excel with a placeholder function please and thank you.
TIA.
I'm not 100% sure if this behaviour corresponds with how your data appears once you import it from excel but if I run your code to create have I get:
NOTE: Invalid data for Base_rate_new in line 145 7-13.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
145 FIX L Average Maturity
Fixed_or_Floating=FIX asset_or_liability=L Base_rate_new=. _ERROR_=1 _N_=2
NOTE: Invalid data for Base_rate_new in line 147 7-13.
147 FLT L Average Maturity
Fixed_or_Floating=FLT asset_or_liability=L Base_rate_new=. _ERROR_=1 _N_=4
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.HAVE has 4 observations and 3 variables.
Basically it's saying that when you tried to import the character strings as numeric it couldn't do it so it left them as null values. If we print the table we can see the null values:
proc print data=have;
run;
Result:
Fixed_or_ asset_or_ Base_
Floating liability rate_new
FIX A 10
FIX L .
FLT A 20
FLT L .
Assuming this truly is what your data looks like then we can use the coalesce function to achieve your goal.
data have3 (drop = base_rate);
merge have2 (in = a)
have (in = b);
by fixed_or_floating asset_or_liability;
base_rate_new = coalesce(base_rate_new,base_rate);
run;
The result of doing this gives us this table:
Fixed_or_ asset_or_ Base_
ID Floating liability rate_new
1 FIX A 10
3 FIX A 10
2 FIX L 20
7 FIX L 20
4 FLT A 20
6 FLT A 20
5 FLT L 30
The coalesce function basically returns the first non-null value it can find in the parameters you pass to it. So when base_rate_new already has a value it uses that, and if it doesn't it uses the base_rate field instead.

Shift time series to start from zero H:M:S:MS (possibly in Matlab)

I have some ECG data for a number of subjects. For each subject, I can export an excel file with the RR interval, Heart Rate and other measures. The problem is that I have a timestamp starting at the time of recording (in this case 11:22:3:00).
I need to compare the date with other subjects and I want to automate the procedure in Matlab.
I need to flexibly compare, for instance, the first 3 minutes of subjects in condition 1 with those of sbj in condition 2. Or minutes 4 to 8 of condition 1 and 2 and so forth. To do this, I am thinking that the best way is to shift the time vector for each subject so that it starts from 0.
There are a couple of problems to note: I CANNOT create just one vector for all subjects. This would be inaccurate because the heart measures are variable for each individual.
So, IN SHORT I need to shift the time vector for each participant so that it starts at 0 and increases exactly like the original one. So, in this example:
H: M: S: MS RR HR
11:22:03:000 0.809 74.1
11:22:03:092 0.803 74.7
11:22:03:895 0.768 78.1
11:22:04:663 0.732 81.9
11:22:05:395 0.715 83.9
11:22:06:110 0.693 86.5
11:22:06:803 0.705 85.1
11:22:07:508 0.706 84.9
11:22:08:214 0.749 80.1
11:22:08:963 0.762 78.7
11:22:09:725 0.766 78.3
would become:
00:00:00:0000
00:00:00:092
00:00:00:895
00:00:01:663
and so forth...
I would like to do it in Matlab...
P.S.
I was working around the idea of extracting the info in 4 different variables.
Then, I could subtract the values for each cell from the first cell.
For instance:
11-11 = 0; 22-22=0; 03-03=0; ms: keep the same value
Maybe this could kind of work, except that it wouldn't if I have a subject that started, say, at 11:55:05:00
Thank you all for any help.
Gluce
Basic timestamp normalization just subtracts the minimum (or first, assuming they're properly ordered) time from the rest.
With MATLAB's datetime object, this is just subtraction, which yields a duration object:
ts = ["11:22:03:000", "11:22:03:092", "11:22:03:895", "11:22:04:663"];
% Convert to datetime & normalize
t = datetime(ts, 'InputFormat', 'HH:mm:ss:SSS');
t.Format = 'HH:mm:ss:SSS';
nt = t - t(1);
% Reformat & display
nt.Format = 'hh:mm:ss.SSS';
Which returns:
>> nt
nt =
1×4 duration array
00:00:00.000 00:00:00.092 00:00:00.895 00:00:01.663
Alternatively, you can normalize the datetime array itself:
ts = ["11:22:03:000", "11:22:03:092", "11:22:03:895", "11:22:04:663"];
t = datetime(ts, 'InputFormat', 'HH:mm:ss:SSS');
t.Format = 'HH:mm:ss:SSS';
[h, m, s] = hms(t);
[t.Hour, t.Minute, t.Second] = deal(h - h(1), m - m(1), s - s(1));
Which returns the same:
>> t
t =
1×4 datetime array
00:00:00:000 00:00:00:092 00:00:00:895 00:00:01:663

sum on date of the month

I have daily data for a year:
25-Apr-17 45
26-Apr-17 50
27-Apr-17 53
28-Apr-17 47
29-Apr-17 34
30-Apr-17 66
01-May-17 10
02-May-17 42
03-May-17 22
04-May-17 65
05-May-17 76
06-May-17 35
I would like to sum the value, at the month of given date, but prior of the given date ie:
month sum as of date 03-May-17;
I would need to get 10+42+22 = 72 only.
While 04-May-17 value onwards should not be included in sum.
I have tried with sumproduct, sumif but none seems to match this requirement.
Assuming you want this in excel or google sheets.
If you put your dates in col A, values in col B and your criteria date in E1.
Put this formula in another cell (not in col A or B)
=arrayformula(SUM(IF(DAY(A:A)<=DAY($E$1),IF(MONTH(A:A)=MONTH($E$1),B:B,0),0)))
Oddly enough I couldn't do =IF(AND([range of values and compare],[range of values and compare]))
Which would be cleaner but it seemed to evaluate as false

Error handling table generated by os.time() Lua

table.concat(os.date("*t"), ":",4,6)
any idea why ^this^ or ˇthisˇ
test = os.date("*t")
table.concat(test, ":" , 4 , 6 )
does not work?
input:3: invalid value (nil) at index 4 in table for 'concat'
table.concat works on numerically indexed table. Whereas the output of os.date '*t' would be a table like:
hour 18
min 20
wday 1
day 2
month 3
year 2014
sec 49
yday 61
isdst false
Although not the answer to your direct question, I suspect you are trying to do is retrieve the time separated by colons.
The best way to do that is os.date"%H:%M:%S"
The formatting options are very flexible and use the C strftime format.