Run time series analysis for multiple series Matlab - matlab

I wish to apply the same data analysis to multiple data time series. However the number of data series is variable. So instead of hard-coding each series' analysis I would like to be able to specify the number and name of the funds and then have the same data-manipulation done to all before they are combined into a single portfolio.
Specifically I have an exel file where each worksheet is a time series where the first column is dates and the second column is prices. The dates for all funds may not correspond so the individual worksheets must be sifted for dates that occur in all the funds before combining into one data set where there is one column of dates and all other columns correspond to the data of each of the present funds.
This combined data set is then analysed for means and variances etc.
I currently have worked out how to carry out the merging and the analysis (below) but I would like to know how I can simply add or remove funds (i.e. by including new worksheets containing individual funds data in the excel file) without having to re-write and add/ remove extra/excess matlab code.
*% LOAD DATA*
XL='XLData.xlsx';
formatIn = 'dd/mm/yyyy';
formatOut = 'mmm-dd-yyyy';
*%SPECIFY WORKSHEETS*
Fund1Prices=3;
Fund2Prices=4;
*%RETRIEVE VALUES*
[Fund1values, ~, Fund1sheet] = xlsread(XL,Fund1Prices);
[Fund2values, ~, Fund2sheet] = xlsread(XL,Fund2Prices);
*%EXTRACT DATES AND DATA AND COMBINE (TO REMOVE UNNECCESSARY TEXT IN ROWS 1
%TO 4) FOR FUND 1.*
Fund1_dates_data=Fund1sheet(4:end,1:2);
Fund1Dates= cellstr(datestr(datevec(Fund1_dates_data(:,1),formatIn),formatOut));
Fund1Data= cell2mat(Fund1_dates_data(:,2));
*%EXTRACT DATES AND DATA AND COMBINE (TO REMOVE UNNECCESSARY TEXT IN ROWS 1
%TO 4) FOR FUND 2.*
Fund2_dates_data=Fund2sheet(4:end,1:2);
Fund2Dates= cellstr(datestr(datevec(Fund2_dates_data(:,1),formatIn),formatOut));
Fund2Data= cell2mat(Fund2_dates_data(:,2));
*%CREATE TIME SERIES FOR EACH FUND*
Fund1ts=fints(Fund1Dates,Fund1Data,'Fund1');
Fund2ts=fints(Fund2Dates,Fund2Data,'Fund2');
*%CREATE PORTFOLIO*
Port=merge(Fund1ts,Fund2ts,'DateSetMethod','Intersection');
*%ANALYSE PORTFOLIO*
Returns=tick2ret(Port);
q = Portfolio;
q = q.estimateAssetMoments(Port)
[qassetmean, qassetcovar] = q.getAssetMoments

Based on edit to the question, the answer was rewritten
You can put your code into a function. This function can be saved as an .m-file and called from Matlab.
However, you want to replace the calls to specific worksheets (Fund1Prices=3) with an automated way of figuring out how many worksheets there are. Here's one way of how to do that in a function:
function [Returns,q,qassetmean,qassetcovar] = my_data_series_analysis(XL)
% All input this function requires is a variable
% containing the name of the xls-file you want to process
formatIn = 'dd/mm/yyyy';
formatOut = 'mmm-dd-yyyy';
% Determine the number of worksheets in the xls-file:
[~,my_sheets] = xlsfinfo(XL);
% Loop through the number of sheets
% (change the start value if the first sheets do not contain data):
% this is needed to merge your portfolio
% in case you do not start the for-loop at I=1
merge_count = 1;
for I=1:size(my_sheets,2)
% RETRIEVE VALUES
% note that Fund1Prices has been replaced with the loop-iterable, I
[FundValues, ~, FundSheet] = xlsread(XL,I);
% EXTRACT DATES AND DATA AND COMBINE
% (TO REMOVE UNNECCESSARY TEXT IN ROWS 1 TO 4)
Fund_dates_data = FundSheet(4:end,1:2);
FundDates = cellstr(datestr(datevec(Fund_dates_data(:,1),...
formatIn),formatOut));
FundData = cell2mat(Fund_dates_data(:,2));
% CREATE TIME SERIES FOR EACH FUND
Fundts{I}=fints(FundDates,FundData,['Fund',num2str(I)]);
if merge_count == 2
Port = merge(Fundts{I-1},Fundts{I},'DateSetMethod','Intersection');
end
if merge_count > 2
Port = merge(Port,Fundts{I},'DateSetMethod','Intersection');
end
merge_count = merge_count + 1;
end
% ANALYSE PORTFOLIO
Returns=tick2ret(Port);
q = Portfolio;
q = q.estimateAssetMoments(Port)
[qassetmean, qassetcovar] = q.getAssetMoments
This function will return the Returns, q, qassetmean and qassetcovar variables for all the worksheets in the xls-file you want to process. The variable XL should be specified like this:
XL = 'my_file.xls';
You can also loop over more than one xls-file. Like this:
% use a cell so that the file names can be of different length:
XL = {'my_file.xls'; 'my_file2.xls'}
for F=1:size(XL,1)
[Returns{F},q{F},qassetmean{F},qassetcovar{F}] = my_data_series_analysis(XL{F,1});
end
Make sure to store the values which are returned from the function in cells (as shown) or structs (not shown) to account for the fact that there may be a different number of sheets per file.

Related

MATLAB Table Datetime loop indexing

I am trying to take my excel spreadsheet and import it into MATLAB (already accomplished that), and then using for-loop indexing to create arrays of the data for a give day containing.
So ideally I would like to know how I could iterate through a years worth of data, and create variables with the date corresponding to the table elements on that day. As I said, I have multiple years worth of data, which is why I'd like a solution which would "automate" my process.
Welcome to stackoverflow! Note that it is easier if you provide some code -- at least to create some sample data.
Anyway, you can loop over dates easily and there shouldn't be a problem with efficiency if you scale it to many entries:
Tm = [ datetime('now') + duration(1,0,0)%add 1 hour
datetime('now')
datetime('yesterday')
datetime('tomorrow')];
% convert to date
Dt = yyyymmdd(Tm);
% you may want to sort it
% [val,idx] = sort(Dt);
% get unique dates
Dt_uq = unique(Dt);
% create a cell of storage
DataAtDate = cell(length(Dt_uq),1);
% loop over unique dates
for i = 1:length(Dt_uq)
Dt1 = Dt_uq(i);
% get all of the same type
lg = Dt == Dt1;
% index matrix/cell to do something
disp(Dt(lg,:))
% do something with the data or matrix or table... e.g. store it in a cell
DataAtDate{i} = Dt(lg,:);
end

Using matlab to arrange and sort data

My data is an excel file with two columns in this format:
Date Type
3/12/06 A
3/12/06 B
3/12/06 B
3/12/06 C
6/01/07 A
6/01/07 A
8/01/07 B
...
Column A are dates and can be repeated while column B are types of observations on these dates.
In MATLAB I want to plot each type as a function of time, however first I need to arrange my data. There are often multiple identical rows that correspond to multiple observations of the same type on the same date. So I think first I need to count how many times a certain type occurred on the same day?
Any help would be great! I'm still at the stage of trying to read the dates in the correct format...
Here is a solution: I replace each type and each date with a specific index and then I use accumarray in order to create a 2D pivot table. You can also directly use the function pivot table from excel.
% We load the xls file.
[~,txt] = xlsread('test.xls');
% We delete the header:
txt(1,:) = [];
% Value and index for the date:
[val_d,~,ind_d] = unique(txt(:,1));
% Value and index for the type:
[val_c,~,ind_c] = unique(txt(:,2));
% We use accumarray to create a pivot table that count each occurence.
acc = accumarray([ind_d,ind_c],1)
% Then we simply plot the result:
dateFormat = 'dd/mm/yy';
for i = 1:length(val_c);
subplot(1,length(val_c),i)
bar(datenum(val_d,dateFormat),acc(:,i),1) % easier to deal with datenum
datetick('x',dateFormat)
xlabel('Date')
ylabel([val_c{i},' count'])
ylim([0,3])
end
RESULT:

Only Import File when it contains certain numbers from a Table

I got a couple 100 sensor measurement files all containing the date and time of measurement. All the files have names that include date and time. Example:
07-06-2016_17-58-32.wf
07-06-2016_18-02-32.wf
...
...
08-06-2016_17:48-26.wf
I have a function (importfile) and a loop that imports my data. The loop looks like this:
Files = dir('C:\Osci\User\*.waveform');
numFiles = length(Files);
Data = cell(1, numFiles);
for fileNum = 1:numFiles
Data{fileNum} = importfile(Files(fileNum).name);
end
Not all of these waveform files are useful. The measurement files are only useful if they were generated in a certain time period. I got a table that shows my allowed time periods:
07-Jun-2016 18:00:01
07-Jun-2016 18:01:31
07-Jun-2016 18:02:01
...
I want to modify my loop, so that the files (.waveform files) are only imported if the numbers for day (first number), hour (4th number) and minute (5th number) from the files match the numbers of the table containing the allowed time periods.
EDIT: Rather than a scalar hour, minute, and second, there is a vector of each. In my case, MyDay, MyHour and MyMinute are 1100x1 matrices while fileTimes only consists of 361 rows.
So, using the provided example the loop should only import file
07-06-2016_18-02-32.wf
since it is the only one where the numbers match (in this case 7, 18, 02).
EDIT2: Using #erfan's answer (and changing some directories and variable names) I have the following working code:
fmtstr = 'O:\\Basic_Research_All\\Lange\\Skripe ISAT\\Rohdaten\\*_%02i-*-*_%02i-%02i-*.wf';
Files = struct([]);
n = size(MyDayMyHourMyMinute);
for N = 1:n;
Files = [Files; dir(sprintf(fmtstr, MyDayMyHourMyMinute(N,:)))];
end
numFiles = length(Files);
WaveformData = cell(1, numFiles);
for fileNum = 1:numFiles
WaveformData{fileNum} = importfile(Files(fileNum).name);
end
Since your filenames are pretty well defined as dates and times, you can prefilter your list by turning them into actual dates and times:
% Get the file list
Files = dir('C:\Osci\User\*.waveform');
% You only need the names
Files = {Files.name};
% Get just the filename w/o the extension
[~, baseFileNames] = cellfun(#(x) fileparts(x), Files, 'UniformOutput', false);
% Your filename is just a date, so parse it as such
fileTimes = datevec(baseFileNames, 'mm-dd-yyyy_HH-MM-SS');
% Now pick out the files you want
% goodFiles = fileTimes(:, 4) == myHour & fileTimes(:, 5) == myMinute & fileTimes(:, 6) == mySecond;
goodFiles = ismember(fileTimes(:, 4:6), [myHour(:), myMinute(:), mySecond(:)], 'rows');
% Pare down your list of filenames
Files = Files(goodFiles);
% Preallocate your data cell
Data = cell(1, numel(Files));
% Now do your loop
for idx = 1:numel(Data)
Data{idx} = importfile(Files{idx});
end
You will, of course, need to define myHour, myMinute and mySecond. Of course, using the logical indexing in goodFiles, you could impose any sort of time criteria, like time or date range. If you find that your filenames aren't so well defined, you could parse out the filename using textscan or strfind to get the bits you want. The important thing is that cell arrays can be indexed into in much the same way as numerical or string arrays and it's often better to vectorize your filter criteria and then only do the loop on the parts you have to.
The OP indicated in a comment below that rather than a scalar hour, minute, and second, there is a vector of each. In that case, use ismember to match the two time vectors and return a logical index vector. With 2015a, MathWorks introduced the function ismembertol, which allows one to check membership within a certain tolerance.
You can apply your selection from the beginning. Imagine the acceptable values for day, hour and minute are saved in acc as an n*3 matrix. If you replace the first line of your code with:
fmtstr = 'C:\Osci\User\%02i-*-*_%02i-%02i-*.wf';
Files = struct([]);
for ii = 1:n
Files = [Files; dir(sprintf(fmtstr, acc(ii,:)))];
end
Then you have already applied your criteria to Files. The rest is the same.

Matlab - Access index of max value in for loop and use it to remove values from array

I would like to recursively find the maximum value in a series of matrices (column 8, to be specific), then use the index of that maximum value to set all values in the array with index up to the max index to NaN (for columns 14:16). It is straight forward to find the max value and index, but using a for loop to do it for multiple arrays I am stumped.
Here is how I can do it without a for loop:
[C,Max] = max(wy2000(:,8));
wy2000(1:Max,14:16) = NaN;
[C,Max] = max(wy2001(:,8));
wy2001(1:Max,14:16) = NaN;
[C,Max] = max(wy2002(:,8));
wy2002(1:Max,14:16) = NaN;
and so on and so forth...
Here are two ways I have tried using a for loop:
startyear = 2000;
endyear = 2009;
for n=startyear:endyear
currentYear = sprintf('wy%d',n);
[C,Max] = max(currentYear(:,8));
currentYear(1:Max,14:16) = NaN;
end
Here is another way I tried, using the eval function
for n=2000:2009;
currentYear = ['wy' int2str(n)];
var2 = ['maxswe' int2str(n)];
eval([var2 ' = max(currentYear(:,8))']);
end
In both cases, the problem seems to be that MATLAB doesn't recognize the 'currentYear' variable to be the array that corresponds to the wyXXXX that I already have created in my workspace.
Based on Peters answer, here is some more info about my data. I am starting with a matrix of data called all_data which holds 16 columns of data, spanning the time period 1982 - 2012. I am only interested in the period 2000 - 2009, and I am also interested in analyzing each year individually (2000, 2001,...,2009).
To get the data into individual years, I use the following code:
for n=2000:2009;
s = datenum(n-1,10,1);
e = datenum(n,9,30);
startcell = find(TIME(:,7)==s);
endcell = find(TIME(:,7)==e);
var1 = ['wy' int2str(n)];
eval([var1 '= all_data3(startcell:endcell,:)']);
eval(['save ', var1]);
end
For clarification, it is the period 10/1/YEAR1 to 9/30/YEAR2 that I am interested in, and TIME is a matrix holding the dates and times of my data.
So at the end of the above for-loop, I have a new matrix for each water-year (wy). I then want to find the date of maximum snow-accumulation (column 8) and exclude all data prior to that date from my analysis. this is where the original question comes from.
Peter's solution works, but I was hoping to find a more simple solution to find the max date and set the values prior to that date to NaN, without having to declare a bunch of variables (or entries in a cell array).
If I could write a loop that would create the cell array that Peter suggested based on a start and end year, that would make the code transferable to other datasets, but when i try to do this I run into the issue that the index for the cell-array is 1:length(years), but the wy arrays are named according to the actual year, so there is an inconsistency when using the eval function.
Matt
You've discovered the problem with eval and dynamically named variables. They're messy. I'd recommend recoding this as a cell array, with the cell array index being the index for the year:
years = 2000:2009;
wy{1} = wy2000;
wy{2} = wy2001;
% etc...
% Then,
for n=1:length(years)
[C, maxval] = max(wy{n}(:,8));
% etc.
end
You really only need the actual year when you input the data and when you display it. Now, if you're starting from a huge pile of arrays already named this way, that's the time to use eval: to convert them into this form that's easier to use. Just form the eval strings so they read, for example, 'wy{1} = wy2000;'

explanation for matlab code

Am new to matlab.
Can someone explain me the following code. this code is used for training the neural network
N = xlsread('data.xls','Sheet1');
N = N(1:150,:);
UN = xlsread('data.xls','Sheet2');
UN = UN(1:150,:);
traindata = [N ; UN];
save('traindata.mat','traindata');
label = [];
for i = 1 : size(N,1)*2
if( i <= size(N,1))
% label = [label ;sum(traindata(i,:))/size(traindata(i,:),2)];
label = [label ;sum(traindata(i,:))/10];
else
% label = [label ;sum(traindata(i,:))/size(traindata(i,:),2)];
label = [label ;sum(traindata(i,:))/10];
end
end
weightMat = BpTrainingProcess(4,0.0001,0.1,0.9,15,[size(traindata,1) 1],traindata,label);
I cannot find a Neural Network toolbox built-in that corresponds to BpTrainingProcess(), so this must be a file you have access to locally (or you need to obtain from the person who gave you this code). It likely strings together several function calls to Neural Network toolbox functions, or perhaps is an original implementation of a back-propagation training method.
Otherwise, the code has some drawbacks. For one, it doesn't appear that the interior if-else statement actually does anything. Even the lines that are commented out would leave a totally useless if-else setup. It looks like the if-else is intended to let you do different label normalization for the data loaded from Sheet1 of the Excel file vs. data loaded from Sheet2. Maybe that is important for you, but it's currently not happening in the program.
Lastly, the code uses an empty array for label and the proceeds to append rows to the empty array. This is unneeded because you already know how many rows there will be (it will total up to size(N,1)*2 = 150*2 = 300 rows. You could just as easily set label=zeros(300,1) and then use usual indexing at each iteration of the for-loop: label(i) = .... This saves time and space, but arguably won't matter much for a 300-row data set (assuming that the length of each row is not too large).
I put documentation next to the code below.
% The functionn 'xlsread()' reads data from an Excel file.
% Here it is storing the values from Sheet 1 of the file 'data.xls'
% into the variable N, and then using the syntax N = N(1:150,:) to
% change N from being all of the data into being only the first
% 150 rows of the data
N = xlsread('data.xls','Sheet1');
N = N(1:150,:);
% Now do the same thing for Sheet 2 from the Excel file.
UN = xlsread('data.xls','Sheet2');
UN = UN(1:150,:);
% This concatenates the two different data arrays together, making
% one large array where N is the top half and UN is the bottom half.
% This is basically just stacking N on top of UN into one array.
traindata = [N ; UN];
% This saves a copy of the newly stacked array into the Matlab data file
% 'traindata.mat'. From now on, you should be able to load the data from
% this file, without needing to read it from the Excel sheet above.
save('traindata.mat','traindata');
% This makes an empty array which will have new things appended to it below.
label = [];
% Because UN and N have the same number of rows, then the training data
% has twice as many rows. So this sets up a for loop that will traverse
% all of these rows of the training data. The 'size()' function can be
% used to get the different dimensions of an array.
for i = 1 : size(N,1)*2
% Here, an if statement is used to check if the current row number, i,
% is less than or equal to than the number of rows in N. This implies
% that this part of the if-statement is only for handling the top half
% of 'trainingdata', that is, the stuff coming from the variable N.
if( i <= size(N,1))
% The line below was already commented out. Maybe it had an old use
% but is no longer needed?
% label = [label ;sum(traindata(i,:))/size(traindata(i,:),2)];
% This syntax will append new rows to the variable 'label', which
% started out as an empty array. This is usually bad practice, memory-wise
% and also for readability.
% Here, the sum of the training data is being computed, and divided by 10
% in every case, and then appended as a new row in 'label'. Hopefully,
% if you are familiar with the data, you will know why the data in 'N'
% always needs to be divided by 10.
label = [label ;sum(traindata(i,:))/10];
% Otherwise, if i > # of rows then handle the data differently.
% Really this means the code below treats only data from the variable UN.
else
% The line below was already commented out. Maybe it had an old use
% but is no longer needed?
% label = [label ;sum(traindata(i,:))/size(traindata(i,:),2)];
% Just like above, the data is being divided by 10. Given that there
% is nothing different about the code here, and how it modifies 'label'
% there is no need for the if-else statements, and they only waste time.
label = [label ;sum(traindata(i,:))/10];
% This is needed to show the end of the if-else block.
end
% This is needed to show the end of the for-loop.
end
% This appears to be a Back-Propagation Neural Network training function.
% This doesn't match any built-in Matlab function I can find, but you might
% check in the Neural Network toolbox to see if the local function
% BpTrainingProcess is a wrapper for a collection of built-in training functions.
weightMat = BpTrainingProcess(4, 0.0001, 0.1, 0.9, 15,
[size(traindata,1) 1], traindata,label);
Here is a link to an example Matlab Neural Network toolbox function for back-propagation training. You might want to look around the documentation there to see if any of it resembles the interior of BpTrainingProcess().