Matlab creating median on entire movie (Avoid memory issue) - matlab

So my computer is not too strong.. to say the least..
Yet I want to create a median of all pixels in an entire specific movie.
I was able to do it for a sequence of frames in memory.. but I am not sure on how to do it when reading more frames each time... how do I give median weight?
(like I'll read 100 frames each time but the median has to update according to the current median * 100 * times I read + 100 * current image..)
I have this code:
mov = VideoReader('MVI_3478.MOV');
seq = read(mov, [1 frames]);
% create background
channels = size(seq, 3);
height = size(seq,1);
width = size(seq,2);
BG = zeros(height, width, channels, 'uint8');
for c = 1:channels
for y = 1:height
for x = 1:width
BG(y,x,c) = median(seq(y,x,c,:));
end
end
end
and my question is, given that I will add another loop above everything, how to give median weight?
Thanks!

There is no possibility to calculate the median this way. The required Information is lost.
Example:
median([1,2,3,4,5,6,7]) is 4
median([1,2,3,3,5,6,7]) is 3
median([1,2,3])=2
median([4,5,6,7])=5
median([3,5,6,7])=5
Thus, for both subsequence you get the partial results 2 and 5, while the median is 3 in one case and 4 in the other case.
The only possibility I see is some binary search approach:
smaller=0
larger=0
equal=0
el=numel(s)
while(smaller>=el/2||larger>el/2||equal==0)
guess=..
smaller=0
larger=0
equal=0
for c = 1:channels
for y = 1:height
for x = 1:width
s=seq(y,x,c,:)
smaller=smaller+numel(s(s<guess);
larger=larger+numel(s(s>guess);
equal=equal+numel(s(s=guess);
end
end
end
end
This is only a sketch, the code has to be completed. Guess has to be filled with some binary search strategy.

In case of a large number of frames, calculating the median in a progressive manner can be problem since the median is a global order statistic and does not have a structure. The classical method is to use the fact that we are working with grayscale 8 bit values (256). Thus for any pixel p(x,y,n) one needs to maintain a histogram with 256 bins with each bin counting n values( as there are n frames).
Thus at each update we will have:
value = p(x,y,i); %for the ith frame
H(x,y,value) = H(x,y,value) + 1; %updating your histogram,
and then sort the histogram by their frequencies and pick the middle value: https://math.stackexchange.com/questions/202302/how-to-calculate-median-and-standard-deviation-from-histogram
The size of this counter can be decided based on the number of frames you have in the video N = log2(n) bit. The median search now is simplified since its constant time search within a histogram. This also helps when concatenating many histograms since the search remains a constant time search independent.
Thus finally the total size of your histograms would be XYN bits, where X and Y are the dimensions of your image.

Related

How do you apply a custom median filter with threshold?

I am trying to create a custom median filter for my image processing where for a 3x3 neighbourhood, the central pixel (being changed) is excluded. My kernel is therefore
1 1 1
1 0 1
1 1 1
But I want to only change the central pixel to the median of the surrounding pixels if its value deviates by more than the surrounding pixels by some threshold value. E.g. if the pixel is more than 10 times the median of the surrounding pixels, then the central pixel value is changed to the median.
I've looked at using ordfilt2 and I can create a median filter with it. But I am not sure how I can implement the threshold condition. I am essentially trying to remove any outliers within my image which meet the threshold condition within my kernel.
Thanks for any help.
You don't have a single function for doing that, but ord2filt is a good start.
N = uint8([1 1 1 ; 1 0 1 ; 1 1 1]); % neighborhood, faster with integer class
J = (ordfilt2(I,4,N) + ordfilt2(I,5,N))/2; % median of even set
M = I>J+10; % put here your threshold method
Out = I;
Out(M) = J(M);
Rem: question already asked here, but without any good answer IMO.
I suggest the following approach:
%defines input
A = repmat(1:5,5,1);
%step 1: median filtering, ignoring the central pixel
fun = #(x) median([x(1:ceil(length(x(:))/2-1)),x(ceil(length(x(:))/2+1):end)]);
filteredA = nlfilter(A,[3 3],fun);
%step 2: changing each pixel, onlyt if its 10 times bigger from the median
result = A;
changeMask = (A./filteredA)>10 | (A./filteredA)<0.1;
result(changeMask ) = filteredA(changeMask);

Length scaling orientation data in MATLAB

Problem
I have a data set of describing geological structures. Each structure has a row with two attributes - its length and orientation (0-360 degrees).
Within this data set, there are two types of structure.
Type 1: less data points, but the structures are physically larger (large length, and so more significant).
Type 2: more data points, but the structures are physically smaller (small length, and so less significant).
I want to create a rose plot to show the spread of the structures' orientations. However, I want this plot to also represent the significance of the structures in combination with the direction they face - taking into account the lengths.
Is it possible to scale this by length in MATLAB somehow so that the subset which is less numerous is not under represented, when the structures are large?
Example
A data set might contain:
10 structures orientated North-South, 50km long.
100 structures orientated East-West, 0.5km long.
In this situation the East-West population would look to be more significant than the North-South population based on absolute numbers. However, in reality the length of the members contributing to this population are much smaller and so the structures are less significant.
Code
This is the code I have so far:
load('WG_rose_data.xy')
azimuth = WG_rose_data(:,2);
length = WG_rose_data(:,1);
rose(azimuth,20);
Where WG_rose_data.xy is a data file with 2 columns containing the length and azimuth (orientation) data for the geological structures.
For each row in your data, you could duplicate it a given number of times, according to its length value. Therefore, if you had a structure with length 50, it counts for 50 data points, whereas a structure with length 1 only counts as 1 data point. Of course you have to round your lengths since you can only have integer numbers of rows.
This could be achieved like so, with your example data in the matrix d
% Set up example data: 10 large vertical structures, 100 small ones perpendicular
d = [repmat([0, 50], 10, 1); repmat([90, .5], 100, 1)];
% For each row, duplicate the data in column 1, according to the length in column 2
d1 = [];
for ii = 1:size(d,1)
% make d(ii,2) = length copies of d(ii,1) = orientation
d1(end+1:end+ceil(d(ii,2))) = d(ii,1);
end
Output rose plot:
You could fine tune how to duplicate the data to achieve the desired balance of actual data and length weighting.
Thanks for all the help with this. This code is my final working version for reference:
clear all
close all
% Input dataset
original_data = load('WG_rose_data.xy');
d = [];
%reformat azimuth
d(:,1)= original_data(:,2);
%reformat length
d(:,2)= original_data(:,1);
% For each row, duplicate the data in column 1, according to the length in column 2
d1 = [];
for a = 1:size(d,1)
d1(end+1:end+ceil(d(a,2))) = d(a,1);
end
%create oposite directions for rose diagram
length_d1_azi = length(d1);
d1_op_azi=zeros(1,length_d1_azi);
for i = 1:length_d1_azi
d1_op_azi(i)=d1(i)-180;
if d1_op_azi(i) < 1;
d1_op_azi(i) = 360 - (d1_op_azi(i)*-1);
end
end
%join calculated oposites to original input
new_length = length_d1_azi*2;
all=zeros(new_length,1);
for i = 1:length_d1_azi
all(i)=d1(i);
end
for j = length_d1_azi+1:new_length;
all(j)=d1_op_azi(j-length_d1_azi);
end
%convert input aray into radians to plot
d1_rad=degtorad(all);
rose(d1_rad,24)
set(gca,'View',[-90 90],'YDir','reverse');

How can I detect the minimum and maximum values every 50 rows

I'm trying to detect peak values in MATLAB. I'm trying to use the findpeaks function. The problem is that my data consists of 4200 rows and I just want to detect the minimum and maximum point in every 50 rows.After I'll use this code for real time accelerometer data.
This is my code:
[peaks,peaklocations] = findpeaks( filteredX, 'minpeakdistance', 50 );
plot( x, filteredX, x( peaklocations ), peaks, 'or' )
So you want to first reshape your vector into 50 sample rows and then compute the peaks for each row.
A = randn(4200,1);
B = reshape (A,[50,size(A,1)/50]); %//which gives B the structure of 50*84 Matrix
pks=zeros(50,size(A,1)/50); %//pre-define and set to zero/NaN for stability
pklocations = zeros(50,size(A,1)/50); %//pre-define and set to zero/NaN for stability
for i = 1: size(A,1)/50
[pks(1:size(findpeaks(B(:,i)),1),i),pklocations(1:size(findpeaks(B(:,i)),1),i)] = findpeaks(B(:,i)); %//this gives you your peak, you can alter the parameters of the findpeaks function.
end
This generates 2 matrices, pklocations and pks for each of your segments. The downside ofc is that since you do not know how many peaks you will get for each segment and your matrix must have the same length of each column, so I padded it with zero, you can pad it with NaN if you want.
EDIT, since the OP is looking for only 1 maximum and 1 minimum for each 50 samples, this can easily be satisfied by the min/max function in MATLAB.
A = randn(4200,1);
B = reshape (A,[50,size(A,1)/50]); %//which gives B the structure of 50*84 Matrix
[pks,pklocations] = max(B);
[trghs,trghlocations] = min(B);
I guess alternatively, you could do a max(pks), but it is simply making it complicated.

Histogram Equalization method without use of histeq

I am new to Matlab and am trying to implement code to perform the same function as histeq without actual use of the function. In my code the image colour I get changes drastically when it should not change that much. The average intensity in the image (ranging between 0 and 255) is 105.3196. The image is of an open source pollen particle.
Any help would be much appreciated. The sooner the better! Please could any help be simplified as my Matlab understanding is limited. Thanks.
clc;
clear all;
close all;
pollenJpg = imread ('pollen.jpg', 'jpg');
greyscalePollen = rgb2gray (pollenJpg);
histEqPollen = histeq(greyscalePollen);
averagePollen = mean2 (greyscalePollen)
sizeGreyScalePollen = size(greyscalePollen);
rowsGreyScalePollen = sizeGreyScalePollen(1,1);
columnsGreyScalePollen = sizeGreyScalePollen(1,2);
for i = (1:rowsGreyScalePollen)
for j = (1:columnsGreyScalePollen)
if (greyscalePollen(i,j) > averagePollen)
greyscalePollen(i,j) = greyscalePollen(i,j) + (0.1 * averagePollen);
if (greyscalePollen(i,j) > 255)
greyscalePollen(i,j) = 255;
end
elseif (greyscalePollen(i,j) < averagePollen)
greyscalePollen(i,j) = greyscalePollen(i,j) - (0.1 * averagePollen);
if (greyscalePollen(i,j) > 0)
greyscalePollen(i,j) = 0;
end
end
end
end
figure;
imshow (pollenJpg);
title ('Original Image');
figure;
imshow (greyscalePollen);
title ('Attempted Histogram Equalization of Image');
figure;
imshow (histEqPollen);
title ('True Histogram Equalization of Image');
To implement the equalisation algorithm described on the Wikipedia page, follow these these steps:
Decide on a binSize to group greyscale values. (This is a tweakable, the larger the bin, the less accurate the result from the ideal case, but I think it can cause problems if chosen too small on real images).
Then, calculate the probability of a pixel being a shade of grey:
pixelCount = imageWidth * imageHeight
histogram = all zero
for each pixel in image at coordinates i, j
histogram[floor(pixel / 255 / 10) + 1] += 1 / pixelCount // 1-based arrays, not 0-based
// Note a technicality here: you may need to
// write special code to handle pixels of 255,
// because they will fall in their own bin. Or instead use rounding with an offset.
The histogram in this calculation is scaled (divided by the pixel count) so that the values make sense as probabilities. You can of course factor the division out of the for loop.
Now you need to calculate the accumulative sum of this:
histogramSum = all zero // length of histogramSum must be one bigger than histogram
for i == 1 .. length(histogram)
histogramSum[i + 1] = histogramSum[i] + histogram[i]
Now you have to invert this function and this is the tricky part. The best is to not calculate an explicit inverse, but calculate it on the spot, and apply it on the image. The basic idea is to search for the pixel value in the histogramSum (find the closest index below), and then do a linear interpolation between the index and the next index.
foreach pixel in image at coordinates i, j
hIndex = findIndex(pixel, histogramSum) // You have to write findIndex, it should be simple
equilisationFactor = (pixel - histogramSum[hIndex])/(histogramSum[hIndex + 1] - histogramSum[hIndex]) * binSize
// This above is the linear interpolation step.
// Notice the technicality that you need to handle:
// histogramSum[hIndex + 1] may be out of bounds
equalisedImage[i, j] = pixel * equilisationFactor
Edit: without drilling into the maths, I can't be 100% sure, but I think that division by 0 errors are possible. These can occur if one bin is empty, so consecutive sums are equal. So you need special code to handle this case too. The best you can do is take the value for the factor as halfway between hIndex, hIndex + n, where n is the highest value for which histogramSum[hIndex + n] == histogramSum[hIndex].
And that should be it, once you have dealt with all the technicalities.
The above algorithm is slow (especially in the findIndex step). You may be able to optimize this with a special lookup datastructure. But only do that when it's working, and only if necessary.
One more thing about your Matlab code: the rows and columns are inverted. Because of the symmetry in the algorithm, the result is the same, but it can cause puzzling bugs in other algorithms, and be very confusing if you examine pixel values during debugging. In the pseudocode above I used them the same as you, though.
Relatively few (5) lines of code can do this. I used a low contrast file called 'pollen.jpg' that I found at http://commons.wikimedia.org/wiki/File%3ALepismium_lorentzianum_pollen.jpg
I read it in using your code, run all the above, then do the following:
% find out the index of pixels sorted by intensity:
[gv gi] = sort(greyscalePollen(:));
% create a table of "approximately equal" intensity values:
N = numel(gv);
newVals = repmat(0:255, [ceil(N/256) 1]);
% perform lookup:
% the pixels in sorted order need new values from "equal bins" table:
newImg = zeros(size(greyscalePollen));
newImg(gi) = newVals(1:N);
% if the size of the image doesn't divide into 256, the last bin will have
% slightly fewer pixels in it than the others
When I run this algorithm, and then create a composite of the four images (original, your attempt, my attempt, and histeq), you get the following:
I think it's convincing. The images are not exactly identical - I believe that is because the matlab histeq routine ignores all pixels with value 0. Since it is fully vectorized it is also pretty fast (although not nearly as fast as histeq by about a factor 15 on my image.
EDIT: a bit of explanation might be in order. The repmat command I use to create the newVals matrix creates a matrix that looks like this:
0 1 2 3 4 ... 255
0 1 2 3 4 ... 255
0 1 2 3 4 ... 255
...
0 1 2 3 4 ... 255
Since matlab stores matrices in "first index first" order, if you read this matrix with a single index (as I do in the line newVals(1:N)), you access first all the zeros, then all the ones, etc:
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 ...
So - when I know the indices of the pixels in the order of their intensity (as returned by the second argument of the sort command, which I called gi), then I can easily assign the value 0 to the first N/256 pixels, the value 1 to the next N/256 etc, with the command I used:
newImg(gi) = newVals(1:N);
I hope this makes the code a little easier to understand.

How to master the random generator in MATLAB for application to bootstrap artificial neural network models

I am working with hydrological time series data and I am attempting to construct Bootstrap Artificial Neural Network models. In order to provide an uncertainty assessment using confidence intervals, one must make sure when resampling/Bootstrapping the original time series data set, that every value in the original time series is held back at least twice within all bootstrap samples in order to calculate the variance and confidence intervals at that point in time.
To give some background:
I am using a hydrological time series that contains Standard Precipitation Index values at monthly time steps, this time series spans 429 (rows) x 1 (column), let's call this time series vector X. All elements/values of X are normalized and standardized between 0 and 1.
Time series X is then trained against some Target values (same length and conditions as X) in a Neural Network to produce new estimates of the Target values, we'll call this output vector, O (same length and conditions as X).
I am now to take X and resample it ii =1:1:200 times (i.e. Bootstrap size = 200) for length(429) with replacement. Let's call the matrix where all the bootstrap samples are placed, M. I use B = randsample(X, length(X), true) and fill M using a for loop such that M(:,ii) = B. Note: I also make sure to include rng('shuffle') after my randsample statement to keep the RNG moving to new states in hopes that it will provide more random results.
Now I am to test how "well" my data was resampled for use in creating confidence intervals.
My procedure is as follow:
Generate a for loop to create M using above procedure
Create a new variable Xc, this will hold all values of X that were not resampled in bootstrap sample ii for ii = 1:1:200
For j=1:1:length(X) fill 'Xc' using the Xc(j,ii) = setdiff(X, M(:,ii)), if element j exists in M(:,ii) fill Xc(j,ii) with NaN.
Xc is now a matrix the same size and dimensions as M. Count the amount of NaN values in each row of Xc and place in vector CI.
If any row in CI is > [Bootstrap sample size, for this case (200) - 1], then no confidence interval can be created at this point.
When I run this I find that the values chosen from my set X are almost always repeated, i.e. the same values of X are used to generate all the samples in M. It's roughly the same ~200 data points in my original time series that are always chosen to create the new bootstrap samples.
How can I effectively alter my program or use any specific functions that will allow me to avoid the negative solution in (5)?
Here is an example of my code, but please keep in mind the variables used in the script may differ from my text in here.
Thank you for the help and please see the code below.
for ii = 1:1:Blen % for loop to create 'how many bootstraps we desire'
B = randsample(Xtrain, wtrain, true); % bootstrap resamples of data series 'X' for 'how many elements' with replacement
rng('shuffle');
M(:,ii) = B; % creates a matrix of all bootstrap resamples with respect to the amount created by the for loop
[C,IA] = setdiff(Xtrain,B); % creates a vector containing all elements of 'Xtrain' that were not included in bootstrap sample 'ii' and the location of each element
[IAc] = setdiff(k,IA); % creates a vector containing locations of elements of 'Xtrain' used in bootstrap sample 'ii' --> ***IA + IAc = wtrain***
for j = 1:1:wtrain % for loop that counts each row of vector
if ismember(j,IA)== 1 % if the count variable is equal to a value of 'IA'
XC(j,ii) = Xtrain(j,1); % place variable in matrix for sample 'ii' in position 'j' if statement above is true
else
XC(j,ii) = NaN; % hold position with a NaN value to state that this value has been used in bootstrap sample 'ii'
end
dum1(:,ii) = wtrain - sum(isnan(XC(:,ii))); % dummy variable to permit transposing of 'IAs' limited by 'isnan' --> used to calculate amt of elements in IA
dum2(:,ii) = sum(isnan(XC(:,ii))); % dummy variable to permit transposing of 'IAsc' limited by 'isnan'
IAs = transpose(dum1) ; % variable counting amount of elements not resampled in 'M' at set 'i', ***i.e. counts 'IA' for each resample set 'i'
IAsc = transpose(dum2) ; % variable counting amount of elements resampled in 'M' at set 'i', ***i.e. counts 'IAc' for each resample set 'i'
chk = isnan(XC); % returns 1 in position of NaN and 0 in position of actual value
chks = sum(chk,2); % counts how many NaNs are in each row for length of time training set
chks_cnt = sum(chks(:)<(Blen-1)); % counts how many values of the original time series that can be provided a confidence interval, should = wtrain to provide complete CIs
end
end
This doesn't appear to be a problem with randsample, but rather a problem in your other code somewhere. randsample does the right thing. For example:
x = (1:10)';
nSamples = 10;
for iter = 1:100;
data(:,iter) = randsample(x,nSamples ,true);
end;
hist(data(:)) %this is approximately uniform
randsample samples quite randomly...