How to do a median projection of a large image stack in Matlab - matlab

I have a large stack of 800 16bit gray scale images with 2048x2048px. They are read from a single BigTIFF file and the whole stack barely fits into my RAM (8GB).
Now I need do a median projection. That means I want to compute the median of each pixel across all 800 frames. The Matlab median function fails because there is not enough memory left make a copy of the whole array for the function call. What would be an efficient way to compute the median?
I have tried using a for loop to compute the median one pixel at a time, but this is still terribly slow.

Iterating over blocks, as #Shai suggests, may be the most straightforward solution. If you do have this problem frequently, you may want to consider converting the image to a mat-file, so that you can access the pixels as n-d array directly from disk.
%# convert to mat file
matObj = matfile('dest.mat','w');
matObj.data(2048,2048,numSlices) = 0;
for t = 1:numSlices
matObj.data(:,:,t) = imread(tiffFile,'index',t);
end
%# load a block of the matfile to take median (run as part of a loop)
medianOfBlock = median(matObj.data(1:128,1:128,:),3);

I bet that the distributions of the individual pixel values over the stack (i.e. the histograms of the pixel jets) are sparse.
If that's the case, the amount of memory needed to keep all the pixel histograms is much less than 2K x 2K x 64k: you can use a compact hash map to represent each histogram, and update them loading the images one at a time. When all updates are done, you go through your histograms and compute the median of each.

If you have access to the Image Processing Toolbox, Matlab has a set of tool to handle large images called Blockproc
From the docs :
To avoid these problems, you can process large images incrementally: reading, processing, and finally writing the results back to disk, one region at a time. The blockproc function helps you with this process.

I will try my best to provide help (if any), because I don't have an 800-stack TIFF image, nor an 8GB computer, but I want to see if my thinkings can form a solution.
First, 800*2048*2048*8bit = 3.2GB, not including the headers. With your 8GB RAM it should not be too difficult to store it at once; there might be too many programs running and chopping up the contiguous memories. Anyway, let's treat the problem as Matlab can't load it as a whole into the memory.
As Jonas suggests, imread supports loading a TIFF image by index. It also supports a PixelRegion parameter, so you can also consider accessing parts of the image by this parameter if you want to utilize Shai's idea.
I came up with a median algo that doesn't use all the data at the same time; it barely scans through a sequence of un-ordered data, one at each time; but it does keep a memory of 256 counters.
_
data = randi([0,255], 1, 800);
bins = num2cell(zeros(256,1,'uint16'));
for ii = 1:800
bins{data(ii)+1} = bins{data(ii)+1} + 1;
end
% clearvars data
s = cumsum(cell2mat(bins));
if find(s==400)
med = ( find(s==400, 1, 'first') + ...
find(s>400, 1, 'first') ) /2 - 1;
else
med = find(s>400, 1, 'first') - 1;
end
_
It's not very efficient, at least because it uses a for loop. But the benefit is instead of keeping 800 raw data in memory, only 256 counters are kept; but the counters need uint16, so actually they are roughly equivalent to 512 raw data. But if you are confident that for any pixel the same grayscale level won't count for more than 255 times among the 800 samples, you can choose uint8, and hence reduce the memory by half.
The above code is for one pixel. I'm still thinking how to expand it to a 2048x2048 version, such as
for ii = 1:800
img_data = randi([0,255], 2048, 2048);
(do stats stuff)
end
By doing so, for each iteration, you only need these kept in memory:
One frame of image;
A set of counters;
A few supplemental variables, with size comparable to one frame of image.
I use a cell array to store the counters. According to this post, a cell array can be pre-allocated while its elements can still be stored in memory non-contigously. That means the 256 counters (512*2048*2048 bytes) can be stored separately, which is quite reasonable for your 8GB RAM. But obviously my sample code does not make use of it since bins = num2cell(zeros(....

Related

averaging over 3rd dim, within a cell array, as efficiently as possible

I haven't found a good solution for this problem, which takes forever and seems to be mainly due to storage of the data in a cell array (as far as I see).
I process movie data in this format:
[data{1:4}] = deal(int16(randi([0 255],200,400,100))); %200px, 200px, 100 frames, 4 similar movies
Where data represents 4 different, but similar movies, in a cell array. Now I would like to take the average of the 4 variables data{1:4}, frame by frame. This is what I came up with:
for frame = 1:size(data{ind},3)
tmp = zeros(200,400,'int16');
for ind = 1:4
tmp = tmp + data{ind}(:,:,frame);
end
data_avg(:,:,frame) = tmp./4;
end
is there a more efficient (faster performing, without doubling the RAM usage) way to do this (I haven't found any)?
the fastest approach will be to do:
data_avg= (data{1}+data{2}+data{3}+data{4})/4;
no need for a for loop.
This is slower: mean(double(cat(4,data{:})),4); because matlabs mean has an overhead inefficiency. your for loop is in between.

Memory Limitation when Extracting VLAD from SIFT Descriptors in VLFeat with Matlab

I recently asked how to extract VLAD from SIFT descriptors in VLFeat with Matlab here.
However, I am running up against memory limitations. I have 64GB RAM and 64GB Swap.
all_descr = single([sift_descr{:}]);
... produces a memory error:
Requested 128x258438583 (123.2GB) array exceeds maximum array size preference. Creation of arrays greater than this limit may take a
long time and cause MATLAB to become unresponsive. See array size limit or preference panel for more information.
What is the correct way to extract VLAD when we have a very large training dataset? For example, I could subset the SIFT descriptors before running vl_kmeans, like this:
all_descr = single([sift_descr{1:100000}]);
This works within memory, but how will it affect my VLAD features? Thank you.
The main issue here is not how to extract the VLAD for such a matrix: you calculate the VLAD vector of each image by loop'ing over all images and calculating the VLAD one-by-one, i.e. the memory problem doesn't appear there.
You run out of memory when you try to concatenate the SIFT descriptors of all images, in order to cluster them into visual words using a nearest neighbor search:
all_descr = single([sift_descr{:}]);
centroids = vl_kmeans(all_descr, 64);
The simplest way I can think of is to switch to MATLAB's own kmeans function, which is included in the Statistics and Machine Learning toolbox.
It comes with support for Tall Arrays, i.e. MATLAB's datatype for arrays which don't fit into the memory.
To do that, you can save the SIFT descriptors of each image into a CSV file using csvwrite:
for k = 1:size(filelist, 1)
I = imread([repo filelist(k).name]) ;
I = single(rgb2gray(I)) ;
[f,d] = vl_sift(I) ;
csvwrite(sprintf('sift/im_%d.csv', k), single(d.'));
end
Then, you can load the descriptors as a Tall Array by using the datastore function and converting it to tall. As the result will be a table, you can use table2array to convert it to a tall array.
ds = datastore('sift/*.csv');
all_descriptors = table2array(tall(ds));
Finally, you can call thekmeans function on that, and get your centroids
[~, centroids] = kmeans(all_descriptors, 64);
Now you can proceed with calculating the VLAD vector as usual.

Matlab Horzcat - Out of memory

Any trick to avoid an out of memory error in matlab?
I am assuming that the reason it shows up is because matlab is very inefficient in using horzcat and actually needs to temporarily duplicate matrices.
I have a matrix A with size 108977555 x 25. I want to merge this with three vectors d, m and y with size 108977555 x 1 each.
My machine has 32GB ram, and the above matrice + vectors occupy 18GB.
Now I want to run the following command:
A = [A(:,1:3), d, m, y, A(:,5:end)];
But that yields the error:
Error using horzcat
Out of memory. Type HELP MEMORY for your options.
Any trick to do this merge?
Working with Large Data Sets. If you are working with large data sets, you need to be careful when increasing the size of an array to avoid getting errors caused by insufficient memory. If you expand the array beyond the available contiguous memory of its original location, MATLAB must make a copy of the array and set this copy to the new value. During this operation, there are two copies of the original array in memory.
Restart matlab, I often find it doesn't fully clean up its memory or it get's fragmented, leading to lower maximal array sizes.
Change your datatype (if you can). E.g. if you're only dealing with numbers 0 - 255, use uint8, the memory size will reduce by a factor 8 compared to an array of doubles
Start of with A already large enough (i.e. 108977555x27 instead of 108977555x25 and insert in place:
A(:, 4) = d;
clear d
A(:, 5) = m;
clear m
A(:, 6) = y;
Merge the data in one datatype to reduce total memory requirement, eg a date easily fits into one uint32.
Leave the data separated, think about why you want the data in one matrix in the first place and if that is really necessary.
Use C-code to do the data allocation yourself (only if you're really desperate)
Further reading: https://nl.mathworks.com/help/matlab/matlab_prog/memory-allocation.html
Even if you could make it using Gunther's suggestions, it will just occupy memory. Right now it takes more than half of available memory. So, what are you planning to do then? Even simple B = A+1 doesn't fit. The only thing you can do is stuff like sum, or operations on part of array.
So, you should consider going to tall arrays and other related big data concepts, which are exactly meant to work with such large datasets.
https://www.mathworks.com/help/matlab/tall-arrays.html
You can first try the efficient memory management strategies as mentioned on the official mathworks site : https://in.mathworks.com/help/matlab/matlab_prog/strategies-for-efficient-use-of-memory.html
Use Single (4 bytes) or some other smaller data type instead of Double (8 bytes) if your code can work with that.
If possible use block processing (like rows or columns) i.e. store blocks as separate mat files and load and access only those parts of the matrix which are required.
Use matfile command for loading large variables in parts. Perhaps something like this :
save('A.mat','A','-v7.3')
oldMat = matfile('A.mat');
clear A
newMat = matfile('Anew.mat','Writeable',true) %Empty matfile
for i=1:27
if (i<4), newMat.A(:,i) = oldMat.A(:,i); end
if (i==4), newMat.A(:,i) = d; end
if (i==5), newMat.A(:,i) = m; end
if (i==6), newMat.A(:,i) = y; end
if (i>6), newMat.A(:,i) = oldMat.A(:,i-2); end
end

4 dimensional matrix

I need to use 4 dimensional matrix as an accumulator for voting 4 parameters. every parameters vary in the range of 1~300. for that, I define Acc = zeros(300,300,300,300) in MATLAB. and somewhere for example, I used:
Acc(4,10,120,78)=Acc(4,10,120,78)+1
however, MATLAB says some error happened because of memory limitation.
??? Error using ==> zeros
Out of memory. Type HELP MEMORY for your options.
in the below, you can see a part of my code:
I = imread('image.bmp'); %I is logical 300x300 image.
Acc = zeros(100,100,100,100);
for i = 1:300
for j = 1:300
if I(i,j)==1
for x0 = 3:3:300
for y0 = 3:3:300
for a = 3:3:300
b = abs(j-y0)/sqrt(1-((i-x0)^2) / (a^2));
b1=floor(b/3);
if b1==0
b1=1;
end
a1=ceil(a/3);
Acc(x0/3,y0/3,a1,b1) = Acc(x0/3,y0/3,a1,b1)+1;
end
end
end
end
end
end
As #Rasman mentioned, you probably want to use a sparse representation of the matrix Acc.
Unfortunately, the sparse function is geared toward 2D matrices, not arbitrary n-D.
But that's ok, because we can take advantage of sub2ind and linear indexing to go back and forth to 4D.
Dims = [300, 300, 300, 300]; % it will be a 300 by 300 by 300 by 300 matrix
Acc = sparse([], [], [], prod(Dims), 1, ExpectedNumElts);
Here ExpectedNumElts should be some number like 30 or 9000 or however many non-zero elements you expect for the matrix Acc to have. We notionally think of Acc as a matrix, but actually it will be a vector. But that's okay, we can use sub2ind to convert 4D coordinates into linear indices into the vector:
ind = sub2ind(Dims, 4, 10, 120, 78);
Acc(ind) = Acc(ind) + 1;
You may also find the functions find, nnz, spy, and spfun helpful.
edit: see lambdageek for the exact same answer with a bit more elegance.
The other answers are helping to guide you to use a sparse mat instead of your current dense solution. This is made a little more difficult since current matlab doesn't support N-dimensional sparse arrays. One implementation to do this is
replace
zeros(100,100,100,100)
with
sparse(100*100*100*100,1)
this will store all your counts in a sparse array, as long as most remain zero, you will be ok for memory.
then to access this data, instead of:
Acc(h,i,j,k)=Acc(h,i,j,k)+1
use:
index = h+100*i+100*100*j+100*100*100*k
Acc(index,1)=Acc(index,1)+1
See Avoiding 'Out of Memory' Errors
Your statement would require more than 4 GB of RAM (Around 16 Gigs, to be specific).
Solutions to 'Out of Memory' problems
fall into two main categories:
Maximizing the memory available to
MATLAB (i.e., removing or increasing
limits) on your system via operating
system selection and system
configuration. These usually have the
greatest overall applicability but are
potentially the most disruptive (e.g.
using a different operating system).
These techniques are covered in the
first two sections of this document.
Minimizing the memory used by MATLAB
by making your code more memory
efficient. These are all algorithm
and application specific and therefore
are less broadly applicable. These
techniques are covered in later
sections of this document.
In your case later seems to be the solution - try reducing the amount of memory used / required.

Loading multiple images in MATLAB

Here is the desired workflow:
I want to load 100 images into MATLAB workspace
Run a bunch of my code on the images
Save my output (the output returned by my code is an integer array) in a new array
By the end I should have a data structure storing the output of the code for images 1-100.
How would I go about doing that?
If you know the name of the directory they are in, or if you cd to that directory, then use dir to get the list of image names.
Now it is simply a for loop to load in the images. Store the images in a cell array. For example...
D = dir('*.jpg');
imcell = cell(1,numel(D));
for i = 1:numel(D)
imcell{i} = imread(D(i).name);
end
BEWARE that these 100 images will take up too much memory. For example, a single 1Kx1K image will require 3 megabytes to store, if it is uint8 RGB values. This may not seem like a huge amount.
But then 100 of these images will require 300 MB of RAM. The real issue comes about if your operations on these images turn them into doubles, then they will now take up 2.4 GIGAbytes of memory. This will quickly eat up the amount of RAM you have, especially if you are not using a 64 bit version of MATLAB.
Assuming that your images are named in a sequential way, you could do this:
N = 100
IMAGES = cell(1,N);
FNAMEFMT = 'image_%d.png';
% Load images
for i=1:N
IMAGES{i} = imread(sprintf(FNAMEFMT, i));
end
% Run code
RESULT = cell(1,N);
for i=1:N
RESULT{i} = someImageProcessingFunction(IMAGES{i});
end
The cell array RESULT then contains the output for each image.
Be aware that depending on the size of your images, prefetching the images might make you run out of memory.
As many have said, this can get pretty big. Is there a reason you need ALL of these in memory when you are done? Could you write the individual results out as files when you are done with them such that you never have more than the input and output images in memory at a given time?
IMWRITE would be good to get them out of memory when you are done.