How do I calculate conditional probabilities from data - matlab
I'm doing a naive Bayes in Matlab, and it was all good until they said I needed the conditional probabilities. Now I know the formula for conditional p(A|B) = P(A and B)/p(B), but when I have data to get it from I'm lost. The data is:
1,0,3,0,?,0,2,2,2,1,1,1,1,3,2,2,1,2,2,0,2,2,2,2,1,2,2,2,3,2,1,1,1,3,3,2,2,1,2,2,2,1,2,2,2,2,2,2,2,2,2,2,1,1,1,2,2
1,0,3,3,1,0,3,1,3,1,1,1,1,1,3,3,1,2,2,0,0,2,2,2,1,2,1,3,2,3,1,1,1,3,3,2,2,2,1,2,2,2,1,2,2,1,2,2,2,2,2,2,2,2,1,2,2
1,0,3,3,2,0,3,3,3,1,1,1,0,3,3,3,1,2,1,0,0,2,2,2,1,2,2,3,2,3,1,3,3,3,1,2,2,1,2,2,2,1,2,2,1,2,2,2,2,2,2,2,2,2,2,1,2
1,0,2,3,2,1,3,3,3,1,2,1,0,3,3,1,1,2,2,0,0,2,2,2,2,1,3,2,3,3,1,3,3,3,1,1,1,1,2,2,2,2,1,2,2,2,1,2,2,2,2,2,2,2,2,2,2
1,0,3,2,1,1,3,3,3,2,2,2,1,1,2,2,2,2,2,0,0,2,2,2,1,1,2,3,2,2,1,1,1,3,2,1,2,2,1,2,2,2,1,2,2,2,2,2,2,2,2,2,2,2,1,2,2
1,0,3,3,2,0,3,3,3,1,2,2,0,3,3,3,2,2,1,0,0,1,2,2,2,1,3,3,1,2,2,3,3,3,2,1,2,2,1,2,2,2,1,2,2,2,2,2,2,2,2,2,2,2,2,1,2
1,0,3,2,1,0,3,3,3,1,2,1,2,3,3,3,3,2,2,0,0,2,2,2,2,1,3,2,2,2,2,3,3,3,2,1,1,2,2,1,2,1,2,2,2,2,1,2,2,2,2,1,2,2,2,1,2
1,0,2,2,1,0,3,1,3,3,3,3,2,1,3,3,1,2,2,0,0,1,1,2,1,2,1,3,2,1,1,3,3,3,2,2,1,2,1,2,2,1,2,2,2,1,2,2,2,1,2,2,2,2,1,2,2
1,0,3,1,1,0,3,1,3,1,1,1,3,2,3,3,1,2,2,0,0,2,2,2,1,2,1,2,1,1,1,3,3,3,3,2,2,1,2,2,2,1,2,2,1,2,2,2,2,2,2,2,2,2,1,2,2
2,0,2,3,2,0,2,2,2,1,2,2,2,2,2,2,1,2,2,2,2,2,2,1,3,2,3,3,3,3,3,3,3,3,2,1,2,1,2,2,2,2,2,2,2,2,2,2,2,2,1,3,2,1,1,2,2
2,0,2,2,0,0,3,2,3,1,1,3,1,3,1,1,2,2,2,0,2,1,1,2,1,1,2,2,2,2,1,3,3,3,1,2,2,1,2,2,2,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
2,0,2,3,2,0,1,2,1,1,2,1,0,1,2,2,1,2,1,0,2,2,2,2,1,2,1,2,2,3,1,3,3,3,1,2,2,1,2,2,2,2,1,2,2,2,2,2,2,2,2,2,1,1,2,2,1
2,0,2,1,1,0,1,2,2,1,2,1,1,2,2,2,1,2,2,0,2,2,2,2,1,2,1,3,2,2,1,1,1,1,1,2,2,1,2,2,2,1,2,2,2,2,2,2,2,2,2,2,2,2,1,2,2
2,0,2,2,1,1,2,3,3,1,1,1,1,2,2,2,1,2,2,0,1,2,2,2,1,2,1,2,2,2,1,1,1,3,2,1,1,2,1,2,2,2,1,2,2,1,2,2,2,2,2,2,1,1,1,2,2
2,1,3,0,?,1,1,2,2,1,1,1,1,2,1,1,1,2,2,0,2,2,2,2,1,2,2,2,2,2,3,3,3,3,1,1,2,1,2,2,2,2,1,2,2,2,2,2,2,2,2,2,2,2,1,2,1
2,0,3,2,2,1,2,2,2,1,1,2,1,2,3,3,2,2,2,0,1,2,2,2,1,2,3,2,2,1,2,2,2,3,1,3,2,1,2,2,2,1,2,2,1,2,2,2,2,2,2,2,2,2,2,2,2
2,0,3,2,2,0,1,1,3,1,1,1,0,1,3,3,1,2,2,0,2,2,2,2,1,1,2,2,2,2,1,3,3,3,3,3,1,2,2,1,2,1,2,2,2,2,2,2,2,2,2,2,2,2,1,2,2
2,0,2,1,1,0,2,1,3,1,1,1,0,3,1,3,1,2,2,0,0,1,2,2,3,3,3,2,2,2,1,3,3,3,1,1,1,2,1,2,2,2,1,2,1,2,2,2,2,2,2,2,1,1,1,2,2
2,0,2,0,?,0,2,3,3,3,2,1,0,2,2,1,1,1,2,0,0,2,1,2,1,2,3,2,2,3,1,3,3,3,2,1,1,2,1,2,2,2,3,2,2,2,2,2,2,2,2,2,2,2,2,1,2
2,0,1,2,1,0,3,3,3,1,2,2,1,1,3,3,1,2,2,0,0,2,2,2,1,2,1,3,2,3,1,1,1,3,1,1,2,2,1,2,2,2,1,2,2,2,2,2,2,2,2,2,1,1,2,2,1
2,0,2,0,?,1,3,3,3,1,2,1,1,3,3,3,1,2,2,0,0,2,2,2,2,1,1,2,3,2,1,1,1,3,1,3,1,1,2,2,2,1,2,2,1,2,2,2,2,2,2,1,2,2,1,2,2
2,0,3,3,2,0,2,1,3,1,1,3,3,3,3,3,1,2,2,0,0,2,2,1,1,2,2,3,3,3,3,3,3,3,2,2,2,1,2,1,2,1,2,2,2,2,2,2,2,1,2,2,2,2,2,1,2
3,0,2,3,1,1,2,2,1,1,1,1,1,1,2,2,1,2,2,2,2,1,2,1,1,1,1,2,2,3,1,3,3,3,1,1,1,3,1,3,3,3,3,3,3,3,3,3,3,3,3,1,3,3,2,2,1
3,0,2,3,1,1,1,2,1,1,1,2,1,1,1,2,2,1,1,1,2,1,2,1,1,2,2,2,2,2,1,3,3,3,2,2,2,3,3,1,1,2,2,3,2,2,2,2,2,2,2,2,2,2,2,2,1
3,0,3,3,1,0,3,3,1,1,1,2,1,1,2,2,2,2,2,2,2,1,1,1,1,1,2,2,2,2,3,3,3,3,2,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,2,2,2,2,1
3,0,2,3,2,0,1,2,2,1,2,1,2,1,1,1,2,1,2,2,1,2,1,2,2,1,3,2,1,1,2,2,2,2,1,1,2,2,?,2,1,1,1,2,2,2,1,2,2,2,1,3,1,2,2,1,2
3,0,2,2,2,0,2,1,2,1,1,1,0,2,2,3,1,2,2,2,2,2,2,2,3,3,3,2,2,1,2,2,2,2,3,1,2,2,2,2,1,2,1,1,2,2,1,2,2,2,2,2,2,2,1,2,1
3,0,2,2,1,0,2,2,2,1,1,2,0,2,2,2,1,2,2,2,2,2,2,2,1,2,1,3,3,3,1,3,3,2,2,3,1,2,1,3,2,2,3,2,2,2,3,3,3,2,2,3,2,2,2,2,1
3,0,3,2,2,0,2,2,2,1,1,2,0,2,2,2,1,2,2,2,2,2,2,1,1,2,2,2,2,2,2,1,1,1,2,1,1,3,1,3,3,3,2,3,2,2,2,2,2,2,3,1,2,2,2,2,2
3,0,2,1,1,0,2,2,1,1,1,1,0,1,1,1,2,1,2,0,2,1,1,1,1,1,2,2,1,2,1,3,3,3,1,1,3,3,3,2,3,1,2,2,3,3,2,2,2,3,2,2,2,2,2,2,1
3,0,2,3,2,1,2,2,3,1,1,2,1,2,2,2,1,2,2,0,2,2,2,1,1,2,2,2,2,2,1,2,2,3,2,2,2,1,2,2,2,1,2,2,2,2,2,2,2,2,2,2,2,2,1,2,2
3,0,2,3,1,0,2,3,3,1,1,1,1,2,2,2,1,2,2,0,2,2,2,2,1,2,1,2,2,2,1,1,1,1,1,2,2,1,2,2,2,1,2,2,1,2,2,2,2,2,2,2,2,2,2,2,2
Where the classes are in the first column from 1 to 3 the ? I will change them as the mean of the column, the prior of the class well that can be done by counting as class #/total of first column. That is simple, but the conditionals?
---class---
/ / \ \
x1 x2...x_i xn
Bayes p(c|x) = p(x|c)p(c)/p(x). Thanks.
EDIT: What I think I need is, someone who can explain the process of getting the conditionals from data, with apples if possible if I need to do a CPT and give me pointers on how to do it, I'm a programmer mostly.
This is a brute-force code. Just to be exactly sure we are talking about the same problem.
% 1º step: Write the 32-by-56 matrix (excluding column of classes),
% replacing "?" with nans. Name it "data".
% Check the values in data:
unique(data(~isnan(data)))
% These are 0, 1, 2 and 3
% 2º step: Find mean of each variable without considering the NaN values
data_mean = nanmean(data);
% 3º step: replace missing values with class sample mean
data_new = data;
for hh = 1:56
inds = isnan(data(:, hh));
data_new(inds, hh) = data_mean(hh);
end
% Only NaN values have been replaced:
find(isnan(data(:))) % indices of NaN values in data
find(data_new(:) ~= data(:)) % indices of data_new different from data
% 4º step: compute probabilities of outcome conditional to each class
n = [0, 9, 22, 32]; % indices for classes
probs = zeros(56, 3, 4);
for hh = 1:56 % for each variable
for ii = 1:3 % for each class
inds = (n(ii)+1):n(ii+1);
for jj = 1:4 % for each outcome
probs(hh, ii, jj) = sum(data(inds, hh) == jj-1);
end
end
end
% The conditional probability of the outcome conditional to the class, for
% the first variable is
squeeze(probs(1, :, :))
Related
Which Bins are occupied in a 3D histogram in MatLab
I got 3D data, from which I need to calculate properties. To reduce computung I wanted to discretize the space and calculate the properties from the Bin instead of the individual data points and then reasign the propertie caclulated from the bin back to the datapoint. I further only want to calculate the Bins which have points within them. Since there is no 3D-binning function in MatLab, what i do is using histcounts over each dimension and then searching for the unique Bins that have been asigned to the data points. a5pre=compositions(:,1); a7pre=compositions(:,2); a8pre=compositions(:,3); %% BINNING a5pre_edges=[0,linspace(0.005,0.995,19),1]; a5pre_val=(a5pre_edges(1:end-1) + a5pre_edges(2:end))/2; a5pre_val(1)=0; a5pre_val(end)=1; a7pre_edges=[0,linspace(0.005,0.995,49),1]; a7pre_val=(a7pre_edges(1:end-1) + a7pre_edges(2:end))/2; a7pre_val(1)=0; a7pre_val(end)=1; a8pre_edges=a7pre_edges; a8pre_val=a7pre_val; [~,~,bin1]=histcounts(a5pre,a5pre_edges); [~,~,bin2]=histcounts(a7pre,a7pre_edges); [~,~,bin3]=histcounts(a8pre,a8pre_edges); bins=[bin1,bin2,bin3]; [A,~,C]=unique(bins,'rows','stable'); a5pre=a5pre_val(A(:,1)); a7pre=a7pre_val(A(:,2)); a8pre=a8pre_val(A(:,3)); It seems like that the unique function is pretty time consuming, so I was wondering if there is a faster way to do it, knowing that the line only can contain integer or so... or a totaly different. Best regards
function [comps,C]=compo_binner(x,y,z,e1,e2,e3,v1,v2,v3) C=NaN(length(x),1); comps=NaN(length(x),3); id=1; for i=1:numel(x) B_temp(1,1)=v1(sum(x(i)>e1)); B_temp(1,2)=v2(sum(y(i)>e2)); B_temp(1,3)=v3(sum(z(i)>e3)); C_id=sum(ismember(comps,B_temp),2)==3; if sum(C_id)>0 C(i)=find(C_id); else comps(id,:)=B_temp; id=id+1; C_id=sum(ismember(comps,B_temp),2)==3; C(i)=find(C_id>0); end end comps(any(isnan(comps), 2), :) = []; end But its way slower than the histcount, unique version. Cant avoid find-function, and thats a function you sure want to avoid in a loop when its about speed...
If I understand correctly you want to compute a 3D histogram. If there's no built-in tool to compute one, it is simple to write one: function [H, lindices] = histogram3d(data, n) % histogram3d 3D histogram % H = histogram3d(data, n) computes a 3D histogram from (x,y,z) values % in the Nx3 array `data`. `n` is the number of bins between 0 and 1. % It is assumed all values in `data` are between 0 and 1. assert(size(data,2) == 3, 'data must be Nx3'); H = zeros(n, n, n); indices = floor(data * n) + 1; indices(indices > n) = n; lindices = sub2ind(size(H), indices(:,1), indices(:,2), indices(:,3)); for ii = 1:size(data,1) H(lindices(ii)) = H(lindices(ii)) + 1; end end Now, given your compositions array, and binning each dimension into 20 bins, we get: [H, indices] = histogram3d(compositions, 20); idx = find(H); [x,y,z] = ind2sub(size(H), idx); reduced_compositions = ([x,y,z] - 0.5) / 20; The bin centers for H are at ((1:20)-0.5)/20. On my machine this runs in a fraction of a second for 5 million inputs points. Now, for each composition(ii,:), you have a number indices(ii), which matches with another number idx[jj], corresponding to reduced_compositions(jj,:). One easy way to make the assignment of results is as follows: H(H > 0) = 1:numel(idx); indices = H(indices); Now for each composition(ii,:), your closest match in the reduced set is reduced_compositions(indices(ii),:).
How to parallelize operations on a huge array
I couldn't find any relevant topics so I'm posting this one: How can I parallelize operations/calculations on a huge array? The problem is that I use the arrays with size of 10000000x10 which is basically small enough to operate on in line, but while running on parfor - causes not enough memory error. The code goes: function aggregatedRes = umbrellaFct(preparedInputsAsaCellArray) % Description: function used to parallelize calculation % preparedInputsAsaCellArray - cell array with size of 1x10, for example first % cell {1,1} would be: {array,corr,df} % array - an array 1e7 by 10, with data from different regions to be aggregated % corr - correlation matrix % df - degrees of freedom as an integer value % create a function handle from child function fcnHndl = #childFct; % For each available cell - calculate and aggregate parfor j = 1:numel(preparedInputsAsaCellArray) output = fcnHndl(preparedInputsAsaCellArray{j}{:}); end % Extract results for i = 1:numel(preparedInputsAsaCellArray) aggregatedRes(:,i) = output{j}; end end And child function used in the umberella function: function aggregated = childFct(array, corr, df) % Description: % array - an array 1e7 by 10, with data from different regions to be aggregated % corr - correlation matrix % df - degrees of freedom as an integer value % get num of cases for multivariate nums cases = lenght(array(:,1)); % preallocate space corrMatrix = double(zeros(cases, size(corr,1))) u = corrMatrix; soerted = corrMatrix; s = zeros(lenght(array(:,1)), lenght(array(1,:))); % calc multivariate nums u = mvtrnd(corr, df, cases); clear corr, cases % calc t-students cumulative dist u = tcdf(u, df); clear df % double sort [~, sorted] = sort(u); clear u [~, corrMatrix] = sort(sorted); clear sorted for jj = 1:lenght(lossData(1,:)) s(:,jj) = array(corrMatrix(:,jj),jj); end clear array corrMatrix jj aggregated = sum(s,2); end I already tried with distributed memory but ultimately failed. I will apreciate any help or hint! Edit: The logic behind functions is to calculate and aggregate data from different regions. In total there are ten arrays, all with size 1e7x10. My idea was to use parfor to simultaneously calculate and aggregate them - to save time. It works fine for smaller arrays (like 1e6x10) but runned out of memory for 1e7x10 (in case of more than 2 pools). I suspect the way i used and implemented parfor could be wrong and inefficient.
Verify Law of Large Numbers in MATLAB
The problem: If a large number of fair N-sided dice are rolled, the average of the simulated rolls is likely to be close to the mean of 1,2,...N i.e. the expected value of one die. For example, the expected value of a 6-sided die is 3.5. Given N, simulate 1e8 N-sided dice rolls by creating a vector of 1e8 uniformly distributed random integers. Return the difference between the mean of this vector and the mean of integers from 1 to N. My code: function dice_diff = loln(N) % the mean of integer from 1 to N A = 1:N meanN = sum(A)/N; % I do not have any idea what I am doing here! V = randi(1e8); meanvector = V/1e8; dice_diff = meanvector - meanN; end
First of all, make sure everytime you ask a question that it is as clear as possible, to make it easier for other users to read. If you check how randi works, you can see this: R = randi(IMAX,N) returns an N-by-N matrix containing pseudorandom integer values drawn from the discrete uniform distribution on 1:IMAX. randi(IMAX,M,N) or randi(IMAX,[M,N]) returns an M-by-N matrix. randi(IMAX,M,N,P,...) or randi(IMAX,[M,N,P,...]) returns an M-by-N-by-P-by-... array. randi(IMAX) returns a scalar. randi(IMAX,SIZE(A)) returns an array the same size as A. So, if you want to use randi in your problem, you have to use it like this: V=randi(N, 1e8,1); and you need some more changes: function dice_diff = loln(N) %the mean of integer from 1 to N A = 1:N; meanN = mean(A); V = randi(N, 1e8,1); meanvector = mean(V); dice_diff = meanvector - meanN; end For future problems, try using the command help randi And matlab will explain how the function randi (or other function) works. Make sure to check if the code above gives the desired result
As pointed out, take a closer look at the use of randi(). From the general case X = randi([LowerInt,UpperInt],NumRows,NumColumns); % UpperInt > LowerInt you can adapt to dice rolling by Rolls = randi([1 NumSides],NumRolls,NumSamplePaths); as an example. Exchanging NumRolls and NumSamplePaths will yield Rolls.', or transpose(Rolls). According to the Law of Large Numbers, the updated sample average after each roll should converge to the true mean, ExpVal (short for expected value), as the number of rolls (trials) increases. Notice that as NumRolls gets larger, the sample mean converges to the true mean. The image below shows this for two sample paths. To get the sample mean for each number of dice rolls, I used arrayfun() with CumulativeAvg1 = arrayfun(#(jj)mean(Rolls(1:jj,1)),[1:NumRolls]); which is equivalent to using the cumulative sum, cumsum(), to get the same result. CumulativeAvg1 = (cumsum(Rolls(:,1))./(1:NumRolls).'); % equivalent % MATLAB R2019a % Create Dice NumSides = 6; % positive nonzero integer NumRolls = 200; NumSamplePaths = 2; % Roll Dice Rolls = randi([1 NumSides],NumRolls,NumSamplePaths); % Output Statistics ExpVal = mean(1:NumSides); CumulativeAvg1 = arrayfun(#(jj)mean(Rolls(1:jj,1)),[1:NumRolls]); CumulativeAvgError1 = CumulativeAvg1 - ExpVal; CumulativeAvg2 = arrayfun(#(jj)mean(Rolls(1:jj,2)),[1:NumRolls]); CumulativeAvgError2 = CumulativeAvg2 - ExpVal; % Plot figure subplot(2,1,1), hold on, box on plot(1:NumRolls,CumulativeAvg1,'b--','LineWidth',1.5,'DisplayName','Sample Path 1') plot(1:NumRolls,CumulativeAvg2,'r--','LineWidth',1.5,'DisplayName','Sample Path 2') yline(ExpVal,'k-') title('Average') xlabel('Number of Trials') ylim([1 NumSides]) subplot(2,1,2), hold on, box on plot(1:NumRolls,CumulativeAvgError1,'b--','LineWidth',1.5,'DisplayName','Sample Path 1') plot(1:NumRolls,CumulativeAvgError2,'r--','LineWidth',1.5,'DisplayName','Sample Path 2') yline(0,'k-') title('Error') xlabel('Number of Trials')
Speeding up the conditional filling of huge sparse matrices
I was wondering if there is a way of speeding up (maybe via vectorization?) the conditional filling of huge sparse matrices (e.g. ~ 1e10 x 1e10). Here's the sample code where I have a nested loop, and I fill in a sparse matrix only if a certain condition is met: % We are given the following cell arrays of the same size: % all_arrays_1 % all_arrays_2 % all_mapping_arrays N = 1e10; % The number of nnz (non-zeros) is unknown until the loop finishes huge_sparse_matrix = sparse([],[],[],N,N); n_iterations = numel(all_arrays_1); for iteration=1:n_iterations array_1 = all_arrays_1{iteration}; array_2 = all_arrays_2{iteration}; mapping_array = all_mapping_arrays{iteration}; n_elements_in_array_1 = numel(array_1); n_elements_in_array_2 = numel(array_2); for element_1 = 1:n_elements_in_array_1 element_2 = mapping_array(element_1); % Sanity check: if element_2 <= n_elements_in_array_2 item_1 = array_1(element_1); item_2 = array_2(element_2); huge_sparse_matrix(item_1,item_2) = 1; end end end I am struggling to vectorize the nested loop. As far as I understand the filling a sparse matrix element by element is very slow when the number of entries to fill is large (~100M). I need to work with a sparse matrix since it has dimensions in the 10,000M x 10,000M range. However, this way of filling a sparse matrix in MATLAB is very slow. Edits: I have updated the names of the variables to reflect their nature better. There are no function calls. Addendum: This code builds the matrix adjacency for a huge graph. The variable all_mapping_arrays holds mapping arrays (~ adjacency relationship) between nodes of the graph in a local representation, which is why I need array_1 and array_2 to map the adjacency to a global representation.
I think it will be the incremental update of the sparse matrix, rather than the loop based conditional that will be slowing things down. When you add a new entry to a sparse matrix via something like A(i,j) = 1 it typically requires that the whole matrix data structure is re-packed. The is an expensive operation. If you're interested, MATLAB uses a CCS data structure (compressed column storage) internally, which is described under the Data Structure section here. Note the statement: This scheme is not effcient for manipulating matrices one element at a time Generally, it's far better (faster) to accumulate the non-zero entries in the matrix as a set of triplets and then make a single call to sparse. For example (warning - brain compiled code!!): % Inputs: % N % prev_array and next_array % n_labels_prev and n_labels_next % mapping % allocate space for matrix entries as a set of "triplets" ii = zeros(N,1); jj = zeros(N,1); xx = zeros(N,1); nn = 0; for next_label_ix = 1:n_labels_next prev_label = mapping(next_label_ix); if prev_label <= n_labels_prev prev_global_label = prev_array(prev_label); next_global_label = next_array(next_label_ix); % reallocate triplets on demand if (nn + 1 > length(ii)) ii = [ii; zeros(N,1)]; jj = [jj; zeros(N,1)]; xx = [xx; zeros(N,1)]; end % append a new triplet and increment counter ii(nn + 1) = next_global_label; % row index jj(nn + 1) = prev_global_label; % col index xx(nn + 1) = 1.0; % coefficient nn = nn + 1; end end % we may have over-alloacted our triplets, so trim the arrays % based on our final counter ii = ii(1:nn); jj = jj(1:nn); xx = xx(1:nn); % just make a single call to "sparse" to pack the triplet data % as a sparse matrix object sp_graph_adj_global = sparse(ii,jj,xx,N,N); I'm allocating in chunks of N entries at a time. Assuming that you know alot about the structure of your matrix you might be able to use a better value here. Hope this helps.
Vectorizing for loops in MATLAB
I'm not too sure if this is possible, but my understanding of MATLAB could certainly be better. I have some code I wish to vectorize as it's causing quite a bottleneck in my program. It's part of an optimisation routine which has many possible configurations of Short Term Average (STA), Long Term Average (LTA) and Sensitivity (OnSense) to run through. Time is in vector format, FL2onSS is the main data (an Nx1 double), FL2onSSSTA is its STA (NxSTA double), FL2onSSThresh is its Threshold value (NxLTAxOnSense double) The idea is to calculate a Red alarm matrix which will be 4D - the alarmStatexSTAxLTAxOnSense that is used throughout the rest of the program. Red = zeros(length(FL2onSS), length(STA), length(LTA), length(OnSense), 'double'); for i=1:length(STA) for j=1:length(LTA) for k=1:length(OnSense) Red(:,i,j,k) = calcRedAlarm(Time, FL2onSS, FL2onSSSTA(:,i), FL2onSSThresh(:,j,k)); end end end I've currently got this repeating a function in an attempt to get a bit more speed out of it, but obviously it will be better if the entire thing can be vectorised. In other words I do not need to keep the function if there is a better solution. function [Red] = calcRedAlarm(Time, FL2onSS, FL2onSSSTA, FL2onSSThresh) % Calculate Alarms % Alarm triggers when STA > Threshold zeroSize = length(FL2onSS); %Precompose Red = zeros(zeroSize, 1, 'double'); for i=2:zeroSize %Because of time chunks being butted up against each other, alarms can %go off when they shouldn't. To fix this, timeDiff has been %calculated to check if the last date is different to the current by 5 %seconds. If it isn't, don't generate an alarm as there is either a %validity or time gap. timeDiff = etime(Time(i,:), Time(i-1,:)); if FL2onSSSTA(i) > FL2onSSThresh(i) && FL2onSSThresh(i) ~= 0 && timeDiff == 5 %If Short Term Avg is > Threshold, Trigger Red(i) = 1; elseif FL2onSSSTA(i) < FL2onSSThresh(i) && FL2onSSThresh(i) ~= 0 && timeDiff == 5 %If Short Term Avg is < Threshold, Turn off Red(i) = 0; else %Otherwise keep current state Red(i) = Red(i-1); end end end The code is simple enough so I won't explain it any further. If you need elucidation on what a particular line is doing, let me know.
The trick is to bring all your data to the same form, using mostly repmat and permute. Then the logic is the simple part. I needed a nasty trick to implement the last part (if none of the conditions hold, use the last results). usually that sort of logic is done using a cumsum. I had to use another matrix of 2.^n to make sure the values that are defined are used (so that +1,+1,-1 will really give 1,1,0) - just look at the code :) %// define size variables for better readability N = length(Time); M = length(STA); O = length(LTA); P = length(OnSense); %// transform the main data to same dimentions (3d matrices) %// note that I flatten FL2onSSThresh to be 2D first, to make things simpler. %// anyway you don't use the fact that its 3D except traversing it. FL2onSSThresh2 = reshape(FL2onSSThresh, [N, O*P]); FL2onSSThresh3 = repmat(FL2onSSThresh2, [1, 1, M]); FL2onSSSTA3 = permute(repmat(FL2onSSSTA, [1, 1, O*P]), [1, 3, 2]); timeDiff = diff(datenum(Time))*24*60*60; timeDiff3 = repmat(timeDiff, [1, O*P, M]); %// we also remove the 1st plain from each of the matrices (the vector equiv of running i=2:zeroSize FL2onSSThresh3 = FL2onSSThresh3(2:end, :, :); FL2onSSSTA3 = FL2onSSSTA3(2:end, :, :); Red3 = zeros(N-1, O*P, M, 'double'); %// now the logic in vector form %// note the chage of && (logical operator) to & (binary operator) Red3((FL2onSSSTA3 > FL2onSSThresh3) & (FL2onSSThresh3 ~= 0) & (timeDiff3 == 5)) = 1; Red3((FL2onSSSTA3 < FL2onSSThresh3) & (FL2onSSThresh3 ~= 0) & (timeDiff3 == 5)) = -1; %// now you have a matrix with +1 where alarm should start, and -1 where it should end. %// add the 0s at the begining Red3 = [zeros(1, O*P, M); Red3]; %// reshape back to the same shape Red2 = reshape(Red3, [N, O, P, M]); Red2 = permute(Red2, [1, 4, 2, 3]); %// and now some nasty trick to convert the start/end data to 1 where alarm is on, and 0 where it is off. Weights = 2.^repmat((1:N)', [1, M, O, P]); %// ' damn SO syntax highlighting. learn MATLAB already! Red = (sign(cumsum(Weights.*Red2))+1)==2; %// and we are done. %// print sum(Red(:)!=OldRed(:)), where OldRed is Red calculated in non vector form to test this.