MATLAB Table Datetime loop indexing - matlab

I am trying to take my excel spreadsheet and import it into MATLAB (already accomplished that), and then using for-loop indexing to create arrays of the data for a give day containing.
So ideally I would like to know how I could iterate through a years worth of data, and create variables with the date corresponding to the table elements on that day. As I said, I have multiple years worth of data, which is why I'd like a solution which would "automate" my process.

Welcome to stackoverflow! Note that it is easier if you provide some code -- at least to create some sample data.
Anyway, you can loop over dates easily and there shouldn't be a problem with efficiency if you scale it to many entries:
Tm = [ datetime('now') + duration(1,0,0)%add 1 hour
datetime('now')
datetime('yesterday')
datetime('tomorrow')];
% convert to date
Dt = yyyymmdd(Tm);
% you may want to sort it
% [val,idx] = sort(Dt);
% get unique dates
Dt_uq = unique(Dt);
% create a cell of storage
DataAtDate = cell(length(Dt_uq),1);
% loop over unique dates
for i = 1:length(Dt_uq)
Dt1 = Dt_uq(i);
% get all of the same type
lg = Dt == Dt1;
% index matrix/cell to do something
disp(Dt(lg,:))
% do something with the data or matrix or table... e.g. store it in a cell
DataAtDate{i} = Dt(lg,:);
end

Related

Using matlab to arrange and sort data

My data is an excel file with two columns in this format:
Date Type
3/12/06 A
3/12/06 B
3/12/06 B
3/12/06 C
6/01/07 A
6/01/07 A
8/01/07 B
...
Column A are dates and can be repeated while column B are types of observations on these dates.
In MATLAB I want to plot each type as a function of time, however first I need to arrange my data. There are often multiple identical rows that correspond to multiple observations of the same type on the same date. So I think first I need to count how many times a certain type occurred on the same day?
Any help would be great! I'm still at the stage of trying to read the dates in the correct format...
Here is a solution: I replace each type and each date with a specific index and then I use accumarray in order to create a 2D pivot table. You can also directly use the function pivot table from excel.
% We load the xls file.
[~,txt] = xlsread('test.xls');
% We delete the header:
txt(1,:) = [];
% Value and index for the date:
[val_d,~,ind_d] = unique(txt(:,1));
% Value and index for the type:
[val_c,~,ind_c] = unique(txt(:,2));
% We use accumarray to create a pivot table that count each occurence.
acc = accumarray([ind_d,ind_c],1)
% Then we simply plot the result:
dateFormat = 'dd/mm/yy';
for i = 1:length(val_c);
subplot(1,length(val_c),i)
bar(datenum(val_d,dateFormat),acc(:,i),1) % easier to deal with datenum
datetick('x',dateFormat)
xlabel('Date')
ylabel([val_c{i},' count'])
ylim([0,3])
end
RESULT:

Only Import File when it contains certain numbers from a Table

I got a couple 100 sensor measurement files all containing the date and time of measurement. All the files have names that include date and time. Example:
07-06-2016_17-58-32.wf
07-06-2016_18-02-32.wf
...
...
08-06-2016_17:48-26.wf
I have a function (importfile) and a loop that imports my data. The loop looks like this:
Files = dir('C:\Osci\User\*.waveform');
numFiles = length(Files);
Data = cell(1, numFiles);
for fileNum = 1:numFiles
Data{fileNum} = importfile(Files(fileNum).name);
end
Not all of these waveform files are useful. The measurement files are only useful if they were generated in a certain time period. I got a table that shows my allowed time periods:
07-Jun-2016 18:00:01
07-Jun-2016 18:01:31
07-Jun-2016 18:02:01
...
I want to modify my loop, so that the files (.waveform files) are only imported if the numbers for day (first number), hour (4th number) and minute (5th number) from the files match the numbers of the table containing the allowed time periods.
EDIT: Rather than a scalar hour, minute, and second, there is a vector of each. In my case, MyDay, MyHour and MyMinute are 1100x1 matrices while fileTimes only consists of 361 rows.
So, using the provided example the loop should only import file
07-06-2016_18-02-32.wf
since it is the only one where the numbers match (in this case 7, 18, 02).
EDIT2: Using #erfan's answer (and changing some directories and variable names) I have the following working code:
fmtstr = 'O:\\Basic_Research_All\\Lange\\Skripe ISAT\\Rohdaten\\*_%02i-*-*_%02i-%02i-*.wf';
Files = struct([]);
n = size(MyDayMyHourMyMinute);
for N = 1:n;
Files = [Files; dir(sprintf(fmtstr, MyDayMyHourMyMinute(N,:)))];
end
numFiles = length(Files);
WaveformData = cell(1, numFiles);
for fileNum = 1:numFiles
WaveformData{fileNum} = importfile(Files(fileNum).name);
end
Since your filenames are pretty well defined as dates and times, you can prefilter your list by turning them into actual dates and times:
% Get the file list
Files = dir('C:\Osci\User\*.waveform');
% You only need the names
Files = {Files.name};
% Get just the filename w/o the extension
[~, baseFileNames] = cellfun(#(x) fileparts(x), Files, 'UniformOutput', false);
% Your filename is just a date, so parse it as such
fileTimes = datevec(baseFileNames, 'mm-dd-yyyy_HH-MM-SS');
% Now pick out the files you want
% goodFiles = fileTimes(:, 4) == myHour & fileTimes(:, 5) == myMinute & fileTimes(:, 6) == mySecond;
goodFiles = ismember(fileTimes(:, 4:6), [myHour(:), myMinute(:), mySecond(:)], 'rows');
% Pare down your list of filenames
Files = Files(goodFiles);
% Preallocate your data cell
Data = cell(1, numel(Files));
% Now do your loop
for idx = 1:numel(Data)
Data{idx} = importfile(Files{idx});
end
You will, of course, need to define myHour, myMinute and mySecond. Of course, using the logical indexing in goodFiles, you could impose any sort of time criteria, like time or date range. If you find that your filenames aren't so well defined, you could parse out the filename using textscan or strfind to get the bits you want. The important thing is that cell arrays can be indexed into in much the same way as numerical or string arrays and it's often better to vectorize your filter criteria and then only do the loop on the parts you have to.
The OP indicated in a comment below that rather than a scalar hour, minute, and second, there is a vector of each. In that case, use ismember to match the two time vectors and return a logical index vector. With 2015a, MathWorks introduced the function ismembertol, which allows one to check membership within a certain tolerance.
You can apply your selection from the beginning. Imagine the acceptable values for day, hour and minute are saved in acc as an n*3 matrix. If you replace the first line of your code with:
fmtstr = 'C:\Osci\User\%02i-*-*_%02i-%02i-*.wf';
Files = struct([]);
for ii = 1:n
Files = [Files; dir(sprintf(fmtstr, acc(ii,:)))];
end
Then you have already applied your criteria to Files. The rest is the same.

Matlab - Access index of max value in for loop and use it to remove values from array

I would like to recursively find the maximum value in a series of matrices (column 8, to be specific), then use the index of that maximum value to set all values in the array with index up to the max index to NaN (for columns 14:16). It is straight forward to find the max value and index, but using a for loop to do it for multiple arrays I am stumped.
Here is how I can do it without a for loop:
[C,Max] = max(wy2000(:,8));
wy2000(1:Max,14:16) = NaN;
[C,Max] = max(wy2001(:,8));
wy2001(1:Max,14:16) = NaN;
[C,Max] = max(wy2002(:,8));
wy2002(1:Max,14:16) = NaN;
and so on and so forth...
Here are two ways I have tried using a for loop:
startyear = 2000;
endyear = 2009;
for n=startyear:endyear
currentYear = sprintf('wy%d',n);
[C,Max] = max(currentYear(:,8));
currentYear(1:Max,14:16) = NaN;
end
Here is another way I tried, using the eval function
for n=2000:2009;
currentYear = ['wy' int2str(n)];
var2 = ['maxswe' int2str(n)];
eval([var2 ' = max(currentYear(:,8))']);
end
In both cases, the problem seems to be that MATLAB doesn't recognize the 'currentYear' variable to be the array that corresponds to the wyXXXX that I already have created in my workspace.
Based on Peters answer, here is some more info about my data. I am starting with a matrix of data called all_data which holds 16 columns of data, spanning the time period 1982 - 2012. I am only interested in the period 2000 - 2009, and I am also interested in analyzing each year individually (2000, 2001,...,2009).
To get the data into individual years, I use the following code:
for n=2000:2009;
s = datenum(n-1,10,1);
e = datenum(n,9,30);
startcell = find(TIME(:,7)==s);
endcell = find(TIME(:,7)==e);
var1 = ['wy' int2str(n)];
eval([var1 '= all_data3(startcell:endcell,:)']);
eval(['save ', var1]);
end
For clarification, it is the period 10/1/YEAR1 to 9/30/YEAR2 that I am interested in, and TIME is a matrix holding the dates and times of my data.
So at the end of the above for-loop, I have a new matrix for each water-year (wy). I then want to find the date of maximum snow-accumulation (column 8) and exclude all data prior to that date from my analysis. this is where the original question comes from.
Peter's solution works, but I was hoping to find a more simple solution to find the max date and set the values prior to that date to NaN, without having to declare a bunch of variables (or entries in a cell array).
If I could write a loop that would create the cell array that Peter suggested based on a start and end year, that would make the code transferable to other datasets, but when i try to do this I run into the issue that the index for the cell-array is 1:length(years), but the wy arrays are named according to the actual year, so there is an inconsistency when using the eval function.
Matt
You've discovered the problem with eval and dynamically named variables. They're messy. I'd recommend recoding this as a cell array, with the cell array index being the index for the year:
years = 2000:2009;
wy{1} = wy2000;
wy{2} = wy2001;
% etc...
% Then,
for n=1:length(years)
[C, maxval] = max(wy{n}(:,8));
% etc.
end
You really only need the actual year when you input the data and when you display it. Now, if you're starting from a huge pile of arrays already named this way, that's the time to use eval: to convert them into this form that's easier to use. Just form the eval strings so they read, for example, 'wy{1} = wy2000;'

Run time series analysis for multiple series Matlab

I wish to apply the same data analysis to multiple data time series. However the number of data series is variable. So instead of hard-coding each series' analysis I would like to be able to specify the number and name of the funds and then have the same data-manipulation done to all before they are combined into a single portfolio.
Specifically I have an exel file where each worksheet is a time series where the first column is dates and the second column is prices. The dates for all funds may not correspond so the individual worksheets must be sifted for dates that occur in all the funds before combining into one data set where there is one column of dates and all other columns correspond to the data of each of the present funds.
This combined data set is then analysed for means and variances etc.
I currently have worked out how to carry out the merging and the analysis (below) but I would like to know how I can simply add or remove funds (i.e. by including new worksheets containing individual funds data in the excel file) without having to re-write and add/ remove extra/excess matlab code.
*% LOAD DATA*
XL='XLData.xlsx';
formatIn = 'dd/mm/yyyy';
formatOut = 'mmm-dd-yyyy';
*%SPECIFY WORKSHEETS*
Fund1Prices=3;
Fund2Prices=4;
*%RETRIEVE VALUES*
[Fund1values, ~, Fund1sheet] = xlsread(XL,Fund1Prices);
[Fund2values, ~, Fund2sheet] = xlsread(XL,Fund2Prices);
*%EXTRACT DATES AND DATA AND COMBINE (TO REMOVE UNNECCESSARY TEXT IN ROWS 1
%TO 4) FOR FUND 1.*
Fund1_dates_data=Fund1sheet(4:end,1:2);
Fund1Dates= cellstr(datestr(datevec(Fund1_dates_data(:,1),formatIn),formatOut));
Fund1Data= cell2mat(Fund1_dates_data(:,2));
*%EXTRACT DATES AND DATA AND COMBINE (TO REMOVE UNNECCESSARY TEXT IN ROWS 1
%TO 4) FOR FUND 2.*
Fund2_dates_data=Fund2sheet(4:end,1:2);
Fund2Dates= cellstr(datestr(datevec(Fund2_dates_data(:,1),formatIn),formatOut));
Fund2Data= cell2mat(Fund2_dates_data(:,2));
*%CREATE TIME SERIES FOR EACH FUND*
Fund1ts=fints(Fund1Dates,Fund1Data,'Fund1');
Fund2ts=fints(Fund2Dates,Fund2Data,'Fund2');
*%CREATE PORTFOLIO*
Port=merge(Fund1ts,Fund2ts,'DateSetMethod','Intersection');
*%ANALYSE PORTFOLIO*
Returns=tick2ret(Port);
q = Portfolio;
q = q.estimateAssetMoments(Port)
[qassetmean, qassetcovar] = q.getAssetMoments
Based on edit to the question, the answer was rewritten
You can put your code into a function. This function can be saved as an .m-file and called from Matlab.
However, you want to replace the calls to specific worksheets (Fund1Prices=3) with an automated way of figuring out how many worksheets there are. Here's one way of how to do that in a function:
function [Returns,q,qassetmean,qassetcovar] = my_data_series_analysis(XL)
% All input this function requires is a variable
% containing the name of the xls-file you want to process
formatIn = 'dd/mm/yyyy';
formatOut = 'mmm-dd-yyyy';
% Determine the number of worksheets in the xls-file:
[~,my_sheets] = xlsfinfo(XL);
% Loop through the number of sheets
% (change the start value if the first sheets do not contain data):
% this is needed to merge your portfolio
% in case you do not start the for-loop at I=1
merge_count = 1;
for I=1:size(my_sheets,2)
% RETRIEVE VALUES
% note that Fund1Prices has been replaced with the loop-iterable, I
[FundValues, ~, FundSheet] = xlsread(XL,I);
% EXTRACT DATES AND DATA AND COMBINE
% (TO REMOVE UNNECCESSARY TEXT IN ROWS 1 TO 4)
Fund_dates_data = FundSheet(4:end,1:2);
FundDates = cellstr(datestr(datevec(Fund_dates_data(:,1),...
formatIn),formatOut));
FundData = cell2mat(Fund_dates_data(:,2));
% CREATE TIME SERIES FOR EACH FUND
Fundts{I}=fints(FundDates,FundData,['Fund',num2str(I)]);
if merge_count == 2
Port = merge(Fundts{I-1},Fundts{I},'DateSetMethod','Intersection');
end
if merge_count > 2
Port = merge(Port,Fundts{I},'DateSetMethod','Intersection');
end
merge_count = merge_count + 1;
end
% ANALYSE PORTFOLIO
Returns=tick2ret(Port);
q = Portfolio;
q = q.estimateAssetMoments(Port)
[qassetmean, qassetcovar] = q.getAssetMoments
This function will return the Returns, q, qassetmean and qassetcovar variables for all the worksheets in the xls-file you want to process. The variable XL should be specified like this:
XL = 'my_file.xls';
You can also loop over more than one xls-file. Like this:
% use a cell so that the file names can be of different length:
XL = {'my_file.xls'; 'my_file2.xls'}
for F=1:size(XL,1)
[Returns{F},q{F},qassetmean{F},qassetcovar{F}] = my_data_series_analysis(XL{F,1});
end
Make sure to store the values which are returned from the function in cells (as shown) or structs (not shown) to account for the fact that there may be a different number of sheets per file.

Bucketing Algorithm

I've got some code that works, but is a bit of a bottleneck, and I'm stuck trying to figure out how to speed it up. It's in a loop, and I can't figure how to vectorize it.
I've got a 2D array, vals, that represents timeseries data. Rows are dates, columns are different series. I'm trying to bucket the data by months to perform various operations on it (sum, mean, etc). Here is my current code:
allDts; %Dates/times for vals. Size is [size(vals, 1), 1]
vals;
[Y M] = datevec(allDts);
fomDates = unique(datenum(Y, M, 1)); %first of the month dates
[Y M] = datevec(fomDates);
nextFomDates = datenum(Y, M, DateUtil.monthLength(Y, M)+1);
newVals = nan(length(fomDates), size(vals, 2)); %preallocate for speed
for k = 1:length(fomDates);
This next line is the bottleneck because I call it so many times.(looping)
idx = (allDts >= fomDates(k)) & (allDts < nextFomDates(k));
bucketed = vals(idx, :);
newVals(k, :) = nansum(bucketed);
end %for
Any Ideas? Thanks in advance.
That's a difficult problem to vectorize. I can suggest a way to do it using CELLFUN, but I can't guarantee that it will be faster for your problem (you would have to time it yourself on the specific data sets you are using). As discussed in this other SO question, vectorizing doesn't always work faster than for loops. It can be very problem-specific which is the best option. With that disclaimer, I'll suggest two solutions for you to try: a CELLFUN version and a modification of your for-loop version that may run faster.
CELLFUN SOLUTION:
[Y,M] = datevec(allDts);
monthStart = datenum(Y,M,1); % Start date of each month
[monthStart,sortIndex] = sort(monthStart); % Sort the start dates
[uniqueStarts,uniqueIndex] = unique(monthStart); % Get unique start dates
valCell = mat2cell(vals(sortIndex,:),diff([0 uniqueIndex]));
newVals = cellfun(#nansum,valCell,'UniformOutput',false);
The call to MAT2CELL groups the rows of vals that have the same start date together into cells of a cell array valCell. The variable newVals will be a cell array of length numel(uniqueStarts), where each cell will contain the result of performing nansum on the corresponding cell of valCell.
FOR-LOOP SOLUTION:
[Y,M] = datevec(allDts);
monthStart = datenum(Y,M,1); % Start date of each month
[monthStart,sortIndex] = sort(monthStart); % Sort the start dates
[uniqueStarts,uniqueIndex] = unique(monthStart); % Get unique start dates
vals = vals(sortIndex,:); % Sort the values according to start date
nMonths = numel(uniqueStarts);
uniqueIndex = [0 uniqueIndex];
newVals = nan(nMonths,size(vals,2)); % Preallocate
for iMonth = 1:nMonths,
index = (uniqueIndex(iMonth)+1):uniqueIndex(iMonth+1);
newVals(iMonth,:) = nansum(vals(index,:));
end
If all you need to do is form the sum or mean on rows of a matrix, where the rows are summed depending upon another variable (date) then use my consolidator function. It is designed to do exactly this operation, reducing data based on the values of an indicator series. (Actually, consolidator can also work on n-d data, and with a tolerance, but all you need to do is pass it the month and year information.)
Find consolidator on the file exchange on Matlab Central