Matlab mystery: Comparison is not defined between double and datetime arrays - matlab

I'm new to Matlab, troubleshooting a comparison error by chopping it down as simple as can be. This test matrix of two rows works fine, then I copy/paste to add a third record and I get a conversion error (for the same data)!
Background: my data is for patient exams: patient, arrival time, exam start time, and I'm writing a program to determine how full the waiting room gets. If the patient isn't seen instantaneously I add them to a patient_queue (just saving their eventual start time as a placeholder). The data is in order of arrival time, so at each new row I check the queue to see if anyone's exam started in the meantime, remove them, and go from there.
Here's an example with only two patient rows (this works):
data_matrix = [1,735724.291666667,735724.322916667,735724.343750000,5;
2,735724.331250000,735724.340277778,735724.371527778,18];
patient_queue = [];
highest_wait_num = 0;
[rows, columns] = size(data_matrix);
for i = 1:rows
this_row_arrival = datetime (data_matrix(i, 2), 'ConvertFrom', 'datenum');
this_row_exam_start = datetime (data_matrix(i, 3), 'ConvertFrom', 'datenum');
now = this_row_arrival; %making a copy called 'now' for readability below
% check patient queue: if anyone left in the meantime, remove them
for j = 1:length(patient_queue)
if patient_queue{j} < now
patient_queue(j) = [];
end
end
% if new patient isn't seen immediately (tb > ta), add tb to the queue
if this_row_exam_start > this_row_arrival
patient_queue{end+1} = this_row_exam_start;
end
% get the current queue size
patient_queue_non_zero = (~cellfun('isempty',patient_queue));
indices = find(patient_queue_non_zero);
current_queue_count = length(indices);
% if the current queue size beats the highest we've seen, update it
if current_queue_count > highest_wait_num
highest_wait_num = current_queue_count;
end
end
patient_queue{j}
highest_wait_num
But when I use the full data set I get an error at the line:
if patient_queue{j} < now
Comparison is not defined between double and datetime arrays.
So I'm narrowing the problem, and I can even reproduce the error by taking my simple matrix of 2 records that worked, copy the second one to make a matrix of 3, like so- just swapping that in the code above makes the error(!!):
data_matrix = [1,735724.291666667,735724.322916667,735724.343750000,5;
2,735724.331250000,735724.340277778,735724.371527778,18;
2,735724.331250000,735724.340277778,735724.371527778,18]
What am I missing?

Here is a much shorter way to do this, without converting numbers to dates:
patient_queue = [];
highest_wait_num = 0;
rows = size(data_matrix,1);
for k = 1:rows
this_row_arrival = data_matrix(k, 2);
this_row_exam_start = data_matrix(k, 3);
now = data_matrix(k, 2); %making a copy called 'now' for readability below
% check patient queue: if anyone left in the meantime, remove them
patient_queue(patient_queue < now) = [];
% if new patient isn't seen immediately (tb > ta), add tb to the queue
if this_row_exam_start > this_row_arrival
patient_queue(end+1) = this_row_exam_start;
end
% get the current queue size
current_queue_count = numel(nonzeros(patient_queue));
% if the current queue size beats the highest we've seen, update it
highest_wait_num = max(current_queue_count,highest_wait_num);
end

What difference the right braces make :/
I had declared this as a vector instead of a cell array:
patient_queue = [];
As a result, I couldn't get to the value inside.
The answer was to:
- declare it this way instead: patient_queue = {};
- add one more (missing) datetime conversion for patient_queue{j}
This is a specific pair of mistakes I had, so I'll delete the post since it's unlikely to be useful for others

Related

Filter data with standard derivation in loop

I have acceleration (10240x31) data that I want to filter by replacing every data point that exceeds the threshold value of 4 times standard derivation of each column with the mean value of the two adjacent data points.
First, I wanted to replace every data point with a zero, if it exceeds the maximum value. This is my loop:
for w = 1:31
Sigma(w) = std(zacceleration(:,w));
zacceleration(zacceleration<(-4*Sigma(w))) = 0;
zacceleration(zacceleration>(4*Sigma(w))) = 0;
end
That code works if w is just one number, for example:
w = 1;
But when w changes every iteration, the filtered data only contains the values that don't exceed the threshold value of the last dataset, Sigma(31).
So, I guess that I overwrite my data or something like that but I cant seem to find a solution.
Can anybody please give me a hint?
Thank you in advance and best regards.
I think I got it now.
Sigma = std(zacceleration);
for a = 1:10240;
for b = 1:31;
if zacceleration(a,b)<(-4*Sigma(b))
zacceleration(a,b) = 0;
end
if zacceleration(a,b)>(4*Sigma(b))
zacceleration(a,b) = 0;
end
end
end

Group variables based on lengths of specific arrays

I have a long list of variables in a dataset which contains multiple time channels with different sampling rates, such as time_1, time_2, TIME, Time, etc. There are also multiple other variables that are dependent on either of these times.
I'd like to list all possible channels that contain 'time' (case-insensitive partial string search within Workspace) and search & match which variable belongs to each item of this time list, based on the size of the variables and then group them in a structure with the values of the variables for later analysis.
For example:
Name Size Bytes Class
ENGSPD_1 181289x1 1450312 double
Eng_Spd 12500x1 100000 double
Speed 41273x1 330184 double
TIME 41273x1 330184 double
Time 12500x1 100000 double
engine_speed_2 1406x1 11248 double
time_1 181289x1 1450312 double
time_2 1406x1 11248 double
In this case, I have 4 time channels with different names & sizes and 4 speed channels which belong to each of these time channels.
whos function is case-sensitive and it will only return the name of the variable, rather than the values of the variable.
As a preamble I'm going to echo my comment from above and earlier comments from folks here and on your other similar questions:
Please stop trying to manipulate your data this way.
It may have made sense at the beginning but, given the questions you've asked on SO to date, this isn't the first time you've encountered issues trying to pull everything together and if you continue this way it's not going to be the last. This approach is highly error prone, unreliable, and unpredictable. Every step of the process requires you to make assumptions about your data that cannot be guaranteed (size of data matching, variables being present and named predictably, etc.). Rather than trying to come up with creative ways to hack together the data, start over and output your data predictably from the beginning. It may take some time but I guarantee it's going to save time in the future and it will make sense to whoever looks at this in 6 months trying to figure out what is going on.
For example, there is absolutely no significant effort needed to output your variables as:
outputstructure.EngineID.time = sometimeseries;
outputstructure.EngineID.speed = somedata;
Where EngineID can be any valid variable name. This is simple and it links your data together permanently and robustly.
That being said, the following will bring a marginal amount of sanity to your data set:
% Build up a totally amorphous data set
ENGSPD_1 = rand(10, 1);
Eng_Spd = rand(20, 1);
Speed = rand(30, 1);
TIME = rand(30, 1);
Time = rand(20, 1);
engine_speed_2 = rand(5, 1);
time_1 = rand(10, 1);
time_2 = rand(5, 1);
% Identify time and speed variable using regular expressions
% Assumes time variables contain 'time' (case insensitive)
% Assumes speed variables contain 'spd', 'sped', or 'speed' (case insensitive)
timevars = whos('-regexp', '[T|t][I|i][M|m][E|e]');
speedvars = whos('-regexp', '[S|s][P|p][E|e]{0,2}[D|d]');
% Pair timeseries and data arrays together. Data is only coupled if
% the number of rows in the timeseries is exactly the same as the
% number of rows in the data array.
timesizes = vertcat(speedvars(:).size); % Concatenate timeseries sizes
speedsizes = vertcat(timevars(:).size); % Concatenate speed array sizes
% Find intersection and their locations in the structures returned by whos
% By using intersect we only get the data that is matched
[sizes, timeidx, speedidx] = intersect(timesizes(:,1), speedsizes(:,1));
% Preallocate structure
ndata = length(sizes);
groupeddata(ndata).time = [];
groupeddata(ndata).speed = [];
% Unavoidable (without saving/loading data) eval loop :|
for ii = 1:ndata
groupeddata(ii).time = eval('timevars(timeidx(ii)).name');
groupeddata(ii).speed = eval('speedvars(speedidx(ii)).name');
end
A non-eval method, by request:
ENGSPD_1 = rand(10, 1);
Eng_Spd = rand(20, 1);
Speed = rand(30, 1);
TIME = rand(30, 1);
Time = rand(20, 1);
engine_speed_2 = rand(5, 1);
time_1 = rand(10, 1);
time_2 = rand(5, 1);
save('tmp.mat')
oldworkspace = load('tmp.mat');
varnames = fieldnames(oldworkspace);
timevars = regexpi(varnames, '.*time.*', 'match', 'once');
timevars(cellfun('isempty', timevars)) = [];
speedvars = regexpi(varnames, '.*spe{0,2}d.*', 'match', 'once');
speedvars(cellfun('isempty', speedvars)) = [];
timesizes = zeros(length(timevars), 2);
for ii = 1:length(timevars)
timesizes(ii, :) = size(oldworkspace.(timevars{ii}));
end
speedsizes = zeros(length(speedvars), 2);
for ii = 1:length(speedvars)
speedsizes(ii, :) = size(oldworkspace.(speedvars{ii}));
end
[sizes, timeidx, speedidx] = intersect(timesizes(:,1), speedsizes(:,1));
ndata = length(sizes);
groupeddata(ndata).time = [];
groupeddata(ndata).speed = [];
for ii = 1:ndata
groupeddata(ii).time = oldworkspace.(timevars{timeidx(ii)});
groupeddata(ii).speed = oldworkspace.(speedvars{speedidx(ii)});
end
See this gist for timing.

MATLAB: Creating a matrix from for loop values?

I have the following code:
for i = 1450:9740:89910
n = i+495;
range = ['B',num2str(i),':','H',num2str(n)];
iter = xlsread('BrokenDisplacements.xlsx' , range);
displ = iter;
displ = [displ; iter];
end
Which takes values from an Excel file from a number of ranges I want and outputs them as matricies. However, this code just uses the final value of displ and creates the total matrix from there. I would like to total these outputs (displ) into one large matrix saving values along the way, how would I go about doing this?
Since you know the size of the block of data you are reading, you can make your code much more efficient as follows:
firstVals = 1450:9740:89910;
displ = zeros((firstVals(end) - firstVals(1) + 1 + 496), 7);
for ii = firstVals
n = ii + 495;
range = sprintf('B%d:H%d', ii, ii+495);
displ((ii:ii+495)-firstVals(1)+1,:) = xlsread('BrokenDiplacements.xlsx', range);
end
Couple of points:
I prefer not to use i as a variable since it is built in as sqrt(-1) - if you later execute code that assumes that to be true, you're in trouble
I am not assuming that the last value of ii is 89910 - by first assigning the value to a vector, then finding the last value in the vector, I sidestep that question
I assign all space in iter at once - otherwise, as it grows, Matlab keeps having to move the array around which can slow things down a lot
I used sprintf to generate the string representing the range - I think it's more readable but it's a question of style
I assign the return value of xlsread directly to a block in displ that is the right size
I hope this helps.
How about this:
displ=[];
for i = 1450:9740:89910
n = i+495;
range = ['B',num2str(i),':','H',num2str(n)];
iter = xlsread('BrokenDisplacements.xlsx' , range);
displ = [displ; iter];
end

How to apply an equation to multiple columns separately in a matrix?

I have 4 different lengths of data (in rows) and they all have a differing ammount of columns. I need to apply an equation to each of these columns and then extract the max value from each of them.
The equation I am trying to use is:
averg = mean([interpolate(1:end-2),interpolate(3:end)],2); % this is just getting your average value.
real_num = interpolate(2:end-1);
streaking1 = (abs(real_num-averg)./averg)*100;
An example of one of my data sets is 5448 rows by 13 columns
EDIT
This is the current adapation of Ben A.'s Solution and it is working.
A = interpolate;
averg = (A(1:end-2,:) + A(3:end,:))/2;
center_A = A(2:end-1,:);
streaking = [];
for idx = 1:size(A,2)
streaking(:,idx) = (abs(center_A(idx,:)-averg(idx,:))./averg(idx,:))*100;
end
I'm not entirely sure that I fully follow what you're doing in each step, but here is a stab at it:
A = interpolate;
averg = (A(1:end-2,:) + A(3:end,:))/2;
center_A = A(2:end-1,:);
streaking = [];
for idx = 1:size(A,2)
streaking(:,idx) = (abs(center_A(idx,:)-averg(idx,:))./averg(idx,:))*100;
end
Averg will be a vector of means for each column. I just use the values in the given column as the real_num variable that you had before. I'm not clear why you would need to index that the way you are as nothing is at risk of breaking index rules.
If this helps, great! If not let me know and I'll see if I can revise somewhat.

Bucketing Algorithm

I've got some code that works, but is a bit of a bottleneck, and I'm stuck trying to figure out how to speed it up. It's in a loop, and I can't figure how to vectorize it.
I've got a 2D array, vals, that represents timeseries data. Rows are dates, columns are different series. I'm trying to bucket the data by months to perform various operations on it (sum, mean, etc). Here is my current code:
allDts; %Dates/times for vals. Size is [size(vals, 1), 1]
vals;
[Y M] = datevec(allDts);
fomDates = unique(datenum(Y, M, 1)); %first of the month dates
[Y M] = datevec(fomDates);
nextFomDates = datenum(Y, M, DateUtil.monthLength(Y, M)+1);
newVals = nan(length(fomDates), size(vals, 2)); %preallocate for speed
for k = 1:length(fomDates);
This next line is the bottleneck because I call it so many times.(looping)
idx = (allDts >= fomDates(k)) & (allDts < nextFomDates(k));
bucketed = vals(idx, :);
newVals(k, :) = nansum(bucketed);
end %for
Any Ideas? Thanks in advance.
That's a difficult problem to vectorize. I can suggest a way to do it using CELLFUN, but I can't guarantee that it will be faster for your problem (you would have to time it yourself on the specific data sets you are using). As discussed in this other SO question, vectorizing doesn't always work faster than for loops. It can be very problem-specific which is the best option. With that disclaimer, I'll suggest two solutions for you to try: a CELLFUN version and a modification of your for-loop version that may run faster.
CELLFUN SOLUTION:
[Y,M] = datevec(allDts);
monthStart = datenum(Y,M,1); % Start date of each month
[monthStart,sortIndex] = sort(monthStart); % Sort the start dates
[uniqueStarts,uniqueIndex] = unique(monthStart); % Get unique start dates
valCell = mat2cell(vals(sortIndex,:),diff([0 uniqueIndex]));
newVals = cellfun(#nansum,valCell,'UniformOutput',false);
The call to MAT2CELL groups the rows of vals that have the same start date together into cells of a cell array valCell. The variable newVals will be a cell array of length numel(uniqueStarts), where each cell will contain the result of performing nansum on the corresponding cell of valCell.
FOR-LOOP SOLUTION:
[Y,M] = datevec(allDts);
monthStart = datenum(Y,M,1); % Start date of each month
[monthStart,sortIndex] = sort(monthStart); % Sort the start dates
[uniqueStarts,uniqueIndex] = unique(monthStart); % Get unique start dates
vals = vals(sortIndex,:); % Sort the values according to start date
nMonths = numel(uniqueStarts);
uniqueIndex = [0 uniqueIndex];
newVals = nan(nMonths,size(vals,2)); % Preallocate
for iMonth = 1:nMonths,
index = (uniqueIndex(iMonth)+1):uniqueIndex(iMonth+1);
newVals(iMonth,:) = nansum(vals(index,:));
end
If all you need to do is form the sum or mean on rows of a matrix, where the rows are summed depending upon another variable (date) then use my consolidator function. It is designed to do exactly this operation, reducing data based on the values of an indicator series. (Actually, consolidator can also work on n-d data, and with a tolerance, but all you need to do is pass it the month and year information.)
Find consolidator on the file exchange on Matlab Central