Updating histogram in a for-loop without growing y-data - matlab

I have had zero luck finding this elsewhere on the site, so here's my problem. I loop through about a thousand mat files, each with about 10,000 points of data. I'm trying to create an overall histogram of this data, but it's not very feasible to concatenate all this data to give to hist.
I was hoping to be able to create an N and Bin variable each loop using hist (y), then N and Bin would be recalculated on the next loop iteration by using hist(y_new). And so on and so on. That way the source data doesn't grow and when the loop finally ends, I can just use bar(). If this method wouldn't work, then I am very open-minded to other solutions.
Also, it is probably not safe to assume that the x data will remain constant throughout each iteration. I'm using 2012a.
Thanks for any help!!

I think the best solution here is to loop through your files twice: once to set the bins and once to do the histogram. But, if this is impossible in your case, here's a one shot solution that requires you to set the bin width beforehand.
clear; close all;
rng('default') % for reproducibility
% make example data
N = 10; % number of data files
M = 5; % length of data files
xs = cell(1,N);
for i = 1:N
xs{i} = trnd(1,1,M);
end
% parameters
width = 2;
% main
for i = 1:length(xs)
x = xs{i}; % "load data"
range = [min(x) max(x)];
binsPos = 0:width:range(2)+width;
binsNeg = fliplr( 0:-width:range(1)-width );
newBins = [binsNeg(1:end-1) binsPos];
newCounts = histc(x, newBins);
newCounts(end) = []; % last bin should always be zero, see help histc
if i == 1
counts = newCounts;
bins = newBins;
else
% combine new and old counts
allBins = min(bins(1), newBins(1)) : width : max(bins(end), newBins(end));
allCounts = zeros(1,length(allBins)-1);
allCounts(find(allBins==bins(1)) : find(allBins==bins(end-1))) = counts;
allCounts(find(allBins==newBins(1)) : find(allBins==newBins(end-1))) = ...
allCounts(find(allBins==newBins(1)) : find(allBins==newBins(end-1))) + newCounts;
bins = allBins;
counts = allCounts;
end
end
% check
figure
bar(bins(1:end-1) + width/2, counts)
xFull = [xs{:}];
[fullCounts] = histc(xFull, bins);
fullCounts(end) = [];
figure
bar(bins(1:end-1) + width/2, fullCounts)

Related

What is the better way to change the percentages of the training and the testing during the splitting process?

With using the PCA technique and the Yale database, I'm trying to work on face recognition within Matlab by randomly splitting the training process to 20% and the testing process to 80%. It is given an
Index in position 2 exceeds array bounds (must not exceed 29)
error. The following is the code, hoping to get help:
dataset = load('yale_FaceDataset.mat');
trainSz = round(dataset.samples*0.2);
testSz = round(dataset.samples*0.8);
trainSetCell = cell(1,trainSz*dataset.classes);
testSetCell = cell(1,testSz*dataset.classes);
j = 1;
k = 1;
m = 1;
for i = 1:dataset.classes
% training set
trainSetCell(k:k+trainSz-1) = dataset.images(j:j+trainSz-1);
trainLabels(k:k+trainSz-1) = dataset.labels(j:j+trainSz-1);
k = k+trainSz;
% test set
testSetCell(m:m+testSz-1) = dataset.images(j+trainSz:j+dataset.samples-1);
testLabels(m:m+testSz-1) = dataset.labels(j+trainSz:j+dataset.samples-1);
m = m+testSz;
j = j+dataset.samples;
end
% convert the data from a cell into a matrix format
numImgs = length(trainSetCell);
trainSet = zeros(numImgs,numel(trainSetCell{1}));
for i = 1:numImgs
trainSet(i,:) = reshape(trainSetCell{i},[],1);
end
numImgs = length(testSetCell);
testSet = zeros(numImgs,numel(testSetCell{1}));
for i = 1:numImgs
testSet(i,:) = reshape(testSetCell{i},[],1);
end
%% applying PCA
% compute the mean face
mu = mean(trainSet)';
% centre the training data
trainSet = trainSet - (repmat(mu,1,size(trainSet,1)))';
% generate the eigenfaces(features of the training set)
eigenfaces = pca(trainSet);
% set the number of principal components
Ncomponents = 100;
% Out of the generated components, we keep "Ncomponents"
eigenfaces = eigenfaces(:,1:Ncomponents);
% generate training features
trainFeatures = eigenfaces' * trainSet';
% Subspace projection
% centre features
testSet = testSet - (repmat(mu,1,size(testSet,1)))';
% subspace projection
testFeatures = inv(eigenfaces'*eigenfaces) * eigenfaces' * testSet';
mdl = fitcdiscr(trainFeatures',trainLabels);
labels = predict(mdl,testFeatures');
% find the images that were recognised and their respect. labels
correctRec = find(testLabels == labels');
correctLabels = labels(correctRec);
% find the images that were NOT recognised and their respect. labels
falseRec = find(testLabels ~= labels');
falseLabels = labels(falseRec);
% compute and display the recognition rate
result = length(correctRec)/length(testLabels)*100;
fprintf('The recognition rate is: %0.3f \n',result);
% divide the images into : recognised and unrecognised
correctTest = testSetCell(correctRec);
falseTest = testSetCell(falseRec);
% display some recognised samples and their respective labels
imgshow(correctTest(1:8),correctLabels(1:8));
% display all unrecognised samples and their respective labels
imgshow(falseTest(1:length(falseTest)), falseLabels(1:length(falseTest)));
it would be nice, if you provide also the line-number and the full message of the error and if you would strip your code to the essential. I guess, the PCA-stuff is not necessary here, as the error is raised probably in your loop. That is because you are incrementing j by j = j+dataset.samples; and take this in the next loop-set for indexing j:j+trainSz-1, which now must exceed dataset.samples...
Nevertheless, there is no randomness in the indexing. It is easiest if you use the built-in cvpartition-function:
% split data
cvp = cvpartition(Lbl,'HoldOut',.2);
lgTrn = cvp.training;
lgTst = cvp.test;
You may provide the number of classes as first input (Lbl in this case) or the actual class vector to let cvpartition pick random subsets that reflect the original distribution of the individual classes.

Matlab generating random numbers and overlap check

I wrote a code for generating random number of rods on Matlab within a specified domain and then saving the output in a text file. I would like to ask for help on adding the following options to the code;
(i) if the randomly generated rod exceeds the specified domain size, the length of that rod should be shortened so that to keep it in that particular domain.
(ii) i would like to avoid the overlapping of the newly generated number (rod) with that of the previous one, in case of overlap generate another place for the new rod.
I can't figure out how shall I do it. It would be of much help if someone may help me write code for these two options.
Thank you
% myrandom.m
% Units are mm.
% domain size
bx = 160;
by = 40;
bz = 40;
lf = 12; % rod length
nf = 500; % Number of rods
rns = rand(nf,3); % Start
rne = rand(nf,3)-0.5; % End
% Start Points
for i = 1:nf
rns(i,1) = rns(i,1)*bx;
rns(i,2) = rns(i,2)*by;
rns(i,3) = rns(i,3)*bz;
end
% Unit Deltas
delta = zeros(nf,1);
for i = 1:nf
temp = rne(i,:);
delta(i) = norm(temp);
end
% Length Deltas
rne = lf*rne./delta;
% End Points
rne = rns + rne;
fileID = fopen('scfibers.txt','w');
for i = 1:nf
fprintf(fileID,'%12.8f %12.8f %12.8f\r\n',rns(i,1),rns(i,2),rns(i,3));
fprintf(fileID,'%12.8f %12.8f %12.8f\r\n\r\n',rne(i,1),rne(i,2),rne(i,3));
end
fclose(fileID);
I would start from writing a function that creates the random rods:
function [rns,rne] = myrandom(domain,len,N)
rns = rand(N,3).*domain; % Start --> rns = bsxfun(#times,rand(N,3),domain)
rne = rand(N,3)-0.5; % End
% Unit Deltas
delta = zeros(N,1);
for k = 1:N
delta(k) = norm(rne(k,:));
end
% Length Deltas
rne = len*rne./delta; % --> rne = len*bsxfun(#rdivide,rne,delta)
% End Points
rne = rns + rne;
% remove rods the exceed the domain:
notValid = any(rne>domain,2); % --> notValid = any(bsxfun(#gt,rne,domain),2);
rns(notValid,:)=[];
rne(notValid,:)=[];
end
This function gets the domain as [bx by bz] and also the length of the rods as len, and N the number of rods to generate. Note that using elementwise multiplication (.*) I have eliminated the first for loop.
In case you use MATLAB version prior to 2016b, you need to use bsxfun:
In MATLAB® R2016b and later, the built-in binary functions listed in this table independently support implicit expansion.
The affected lines are marked with --> in the code (with the alternative).
The last three lines in the function remove from the result all the rodes that exceed the domain size (I hope I got you correctly on this).
Next, I call this function within a script:
% domain size
bx = 160;
by = 40;
bz = 40;
domain = [bx by bz];
lf = 12; % rod length
nf = 500; % Number of rods
[rns,rne] = myrandom(domain,lf,nf);
u = unique([rns rne],'rows');
remain = nf-size(u,1);
while remain>0
[rns_temp,rne_temp] = myrandom(domain,lf,remain);
rns = [rns;rns_temp];
rne = [rne;rne_temp];
u = unique([rns rne],'rows');
remain = nf-size(u,1);
end
After the basic definitions, the function is called and returns rne and rns, which are probably smaller than nf. Then we check for duplicates, and store all unique rods in u. We calculate the rods remain to compute, and we use a while loop to generate new rods as needed. In each iteration of the loop, we add the newly created rods to those we have in rne and rns, and check how many unique vectors we have now, and if there are enough we quit the loop (then you can add printing to the file).
Note that:
I was not sure what you mean by "in case of overlap generate another place for the new rod" - do you want to have more than nf rods if some are duplicates, that from which nf are unique (what the code above does)? or you want to remove the duplicates and remain only with nf unique rods? In the case of the latter option, I would insert the unique function part into the function that creates the rods myrandom.
The wile loop as written above is not efficient since no preallocating of memory is done. I'm not sure that this is possible if you just want to create more rods but keep the duplicates, but if not (the second option in 1 above) and if you are going to use this allot, then preallocating is very recommended.

Fast way to get mean values of rows accordingly to subscripts

I have a data, which may be simulated in the following way:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
In other words, it is a matrix, where the last row is subscripts.
Now I want to calculate nanmean() for each subscript. Also I want to save number of rows for each subscript. I have a 'dummy' code for this:
uniqueSubs = unique(M(:,6));
avM = nan(numel(uniqueSubs),6);
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1)];
end
The problem is, that it is too slow. I want it to work for N = 10^8 and K = 10^6 (see commented part in the definition of these variables.
How can I find the mean of the data in a faster way?
This sounds like a perfect job for findgroups and splitapply.
% Find groups in the final column
G = findgroups(M(:,6));
% function to apply per group
fcn = #(group) [mean(group, 1, 'omitnan'), size(group, 1)];
% Use splitapply to apply fcn to each group in M(:,1:5)
result = splitapply(fcn, M(:, 1:5), G);
% Check
assert(isequaln(result, avM));
M = sortrows(M,6); % sort the data per subscript
IDX = diff(M(:,6)); % find where the subscript changes
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)]; % add start and end of data
for iSub= 2:numel(tmp)
% Calculate the mean over just a single subscript, store in iSub-1
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];tmp(iSub-1)];
end
This is some 60 times faster than your original code on my computer. The speed-up mainly comes from presorting the data and then finding all locations where the subscript changes. That way you do not have to traverse the full array each time to find the correct subscripts, but rather you only check what's necessary each iteration. You thus calculate the mean over ~100 rows, instead of first having to check in 1,000,000 rows whether each row is needed that iteration or not.
Thus: in the original you check numel(uniqueSubs), 10,000 in this case, whether all N, 1,000,000 here, numbers belong to a certain category, which results in 10^12 checks. The proposed code sorts the rows (sorting is NlogN, thus 6,000,000 here), and then loop once over the full array without additional checks.
For completion, here is the original code, along with my version, and it shows the two are the same:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
uniqueSubs = unique(M(:,6));
%% zlon's original code
avM = nan(numel(uniqueSubs),7); % add the subscript for comparison later
tic
uniqueSubs = unique(M(:,6));
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1) uniqueSubs(iSub)];
end
toc
%%%%% End of zlon's code
avM = sortrows(avM,7); % Sort for comparison
%% Start of Adriaan's code
avM2 = nan(numel(uniqueSubs),6);
tic
M = sortrows(M,6);
IDX = diff(M(:,6));
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)];
for iSub = 2:numel(tmp)
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];
end
toc %tic/toc should not be used for accurate timing, this is just for order of magnitude
%%%% End of Adriaan's code
all(avM(:,1:6) == avM2) % Do the comparison
% End of script
% Output
Elapsed time is 58.561347 seconds.
Elapsed time is 0.843124 seconds. % ~70 times faster
ans =
1×6 logical array
1 1 1 1 1 1 % i.e. the matrices are equal to one another

Dimensions issus

Finding maximum values of wave heights and wave lengths
dwcL01 though dwcL10 is arrays of <3001x2 double> with output from a numerical wave model.
Part of my script:
%% Plotting results from SWASH
% Examination of phase velocity on deep water with different number of layers
% Wave height 3 meters, wave peroid 8 sec on a depth of 30 meters
clear all; close all; clc;
T=8;
L0=1.56*T^2;
%% Loading results tabels.
load dwcL01.tbl; load dwcL02.tbl; load dwcL03.tbl; load dwcL04.tbl;
load dwcL05.tbl; load dwcL06.tbl; load dwcL07.tbl; load dwcL08.tbl;
load dwcL09.tbl; load dwcL10.tbl;
M(:,:,1) = dwcL01; M(:,:,2) = dwcL02; M(:,:,3) = dwcL03; M(:,:,4) = dwcL04;
M(:,:,5) = dwcL05; M(:,:,6) = dwcL06; M(:,:,7) = dwcL07; M(:,:,8) = dwcL08;
M(:,:,9) = dwcL09; M(:,:,10) = dwcL10;
%% Finding position of wave crest using diff and sign.
for ii=1:10
Tp(:,1,ii) = diff(sign(diff([M(1,2,ii);M(:,2,ii)]))) < 0;
Wc(:,:,ii) = M(Tp,1,ii);
L(:,ii) = diff(Wc(:,1,ii))
end
The loop
for ii=1:10
Tp(:,1,ii) = diff(sign(diff([M(1,2,ii);M(:,2,ii)]))) < 0;
Wc(:,:,ii) = M(Tp,1,ii);
L(:,ii) = diff(Wc(:,1,ii))
end
Works fine for ii = 1 Getting the following error for ii = 2
Index exceeds matrix dimensions.
Error in mkPlot (line 19)
Wc(:,:,i) = M(Tp,:,i);
Don't have the same number of wave crests for the different set ups, naturally M(Tp,1,ii) will have different dimensions. How do I work around this issue? Can it be done in a for loop? please feel free to email me or other wise ask for further information.
The problem is that Tp is a three dimensional array. I need to call the Tp(:,:,ii) corresponding to the present scenario. Together with this and defining Wc as a cell I solve my issue.
for ii = 1:10
Tp(:,1,ii) = diff(sign(diff([M(1,2,ii);M(:,2,ii)]))) < 0;
Wc{:,:,ii} = M(Tp(:,:,ii),1,ii);
L{:,ii} = diff(cell2mat(Wc(ii)));
end

Rolling window for averaging using MATLAB

I have the following code, pasted below. I would like to change it to only average the 10 most recently filtered images and not the entire group of filtered images. The line I think I need to change is: Yout(k,p,q) = (Yout(k,p,q) + (y.^2))/2;, but how do I do it?
j=1;
K = 1:3600;
window = zeros(1,10);
Yout = zeros(10,column,row);
figure;
y = 0; %# Preallocate memory for output
%Load one image
for i = 1:length(K)
disp(i)
str = int2str(i);
str1 = strcat(str,'.mat');
load(str1);
D{i}(:,:) = A(:,:);
%Go through the columns and rows
for p = 1:column
for q = 1:row
if(mean2(D{i}(p,q))==0)
x = 0;
else
if(i == 1)
meanvalue = mean2(D{i}(p,q));
end
%Calculate the temporal mean value based on previous ones.
meanvalue = (meanvalue+D{i}(p,q))/2;
x = double(D{i}(p,q)/meanvalue);
end
%Filtering for 10 bands, based on the previous state
for k = 1:10
[y, ZState{k}] = filter(bCoeff{k},aCoeff{k},x,ZState{k});
Yout(k,p,q) = (Yout(k,p,q) + (y.^2))/2;
end
end
end
% for k = 2:10
% subplot(5,2,k)
% subimage(Yout(k)*5000, [0 100]);
% colormap jet
% end
% pause(0.01);
end
disp('Done Loading...')
The best way to do this (in my opinion) would be to use a circular-buffer to store your images. In a circular-, or ring-buffer, the oldest data element in the array is overwritten by the newest element pushed in to the array. The basics of making such a structure are described in the short Mathworks video Implementing a simple circular buffer.
For each iteration of you main loop that deals with a single image, just load a new image into the circular-buffer and then use MATLAB's built in mean function to take the average efficiently.
If you need to apply a window function to the data, then make a temporary copy of the frames multiplied by the window function and take the average of the copy at each iteration of the loop.
The line
Yout(k,p,q) = (Yout(k,p,q) + (y.^2))/2;
calculates a kind of Moving Average for each of the 10 bands over all your images.
This line calculates a moving average of meanvalue over your images:
meanvalue=(meanvalue+D{i}(p,q))/2;
For both you will want to add a buffer structure that keeps only the last 10 images.
To simplify it, you can also just keep all in memory. Here is an example for Yout:
Change this line: (Add one dimension)
Yout = zeros(3600,10,column,row);
And change this:
for q = 1:row
[...]
%filtering for 10 bands, based on the previous state
for k = 1:10
[y, ZState{k}] = filter(bCoeff{k},aCoeff{k},x,ZState{k});
Yout(i,k,p,q) = y.^2;
end
YoutAvg = zeros(10,column,row);
start = max(0, i-10+1);
for avgImg = start:i
YoutAvg(k,p,q) = (YoutAvg(k,p,q) + Yout(avgImg,k,p,q))/2;
end
end
Then to display use
subimage(Yout(k)*5000, [0 100]);
You would do sth. similar for meanvalue