Loading multiple images in MATLAB - matlab

Here is the desired workflow:
I want to load 100 images into MATLAB workspace
Run a bunch of my code on the images
Save my output (the output returned by my code is an integer array) in a new array
By the end I should have a data structure storing the output of the code for images 1-100.
How would I go about doing that?

If you know the name of the directory they are in, or if you cd to that directory, then use dir to get the list of image names.
Now it is simply a for loop to load in the images. Store the images in a cell array. For example...
D = dir('*.jpg');
imcell = cell(1,numel(D));
for i = 1:numel(D)
imcell{i} = imread(D(i).name);
end
BEWARE that these 100 images will take up too much memory. For example, a single 1Kx1K image will require 3 megabytes to store, if it is uint8 RGB values. This may not seem like a huge amount.
But then 100 of these images will require 300 MB of RAM. The real issue comes about if your operations on these images turn them into doubles, then they will now take up 2.4 GIGAbytes of memory. This will quickly eat up the amount of RAM you have, especially if you are not using a 64 bit version of MATLAB.

Assuming that your images are named in a sequential way, you could do this:
N = 100
IMAGES = cell(1,N);
FNAMEFMT = 'image_%d.png';
% Load images
for i=1:N
IMAGES{i} = imread(sprintf(FNAMEFMT, i));
end
% Run code
RESULT = cell(1,N);
for i=1:N
RESULT{i} = someImageProcessingFunction(IMAGES{i});
end
The cell array RESULT then contains the output for each image.
Be aware that depending on the size of your images, prefetching the images might make you run out of memory.

As many have said, this can get pretty big. Is there a reason you need ALL of these in memory when you are done? Could you write the individual results out as files when you are done with them such that you never have more than the input and output images in memory at a given time?
IMWRITE would be good to get them out of memory when you are done.

Related

Good use of memory

If i create a cell with 1000 matrices ( size of each matrix 800*1280), clearing each matrix after using it will speed up calculations ?
Example
A=cell(1000,1);
for i=1:1000
A{i}=rand(800,1280);
end
image=A{1};
image2=A{2}; % I will use image and image2 with other functions
A{1}=[];
A{2}=[];
EDIT
The real use of the cell will be like :
A=cell(1000,1);
parfor i=1:1000
A{i}=function_that_creates_image(800,1280); % image with size 800*1280 px
end
for i=1:number_of_images % number_of_images=1000 in this case
image1=A{1};
image2=A{2};
A{1}=[];
A{2}=[];
% image1 and image 2 will be used then in the next lines
%next lines of code
end
I noticed that calculating components of A in a parfor loop is faster than calculating each component for each loop inside the for
If you want to use less memory and speed up calculations, it's wiser to avoid using cells. Luckily, it's very easy in your case, since all your matrices are the same size, so you could use an ND-array.
A = zeros(800,1280,1000);
for k = 1:size(A,3)
A(:,:,k) = function_that_creates_image(800,1280);
end
image = A(:,:,1);
image2 = A(:,:,2); % I will use image and image2 with other functions
EDIT:
If you want to further process each image, I would save them to a file within the parfor, so you will have 1000 .mat files at the end of the first loop:
parfor k = 1:number_of_images
A = function_that_creates_image(800,1280);
save(['images_dir\image' num2str(k) '.mat'],'A');
end
then you can load them as needed for processing using load:
for k = 1:number_of_images-1
image1 = load(['images_dir\image' num2str(k) '.mat']);
image2 = load(['images_dir\image' num2str(k+1) '.mat'];
% do what you want with those images...
end
This way you only keep 2 images in memory each time, and on the next iteration they a replaced by the next images.
If you fit everything in memory (you need at least 16GB to hold data and do some work on parts of it, to work on the full beast at the same time you should be having 32GB), clearing these won't change anything at all. If you don't, I would assume/hope Matlab and Windows are smart enough to optimize which chunk is held in memory and which is put on the disk, so again deleting won't help. But you might not want to rely on that.
What you can do is to have A{i} = 'path-to-file';, then load it in memory just for the time when needed. Why do you even need to first load all the images and then do work on them one by one? It would be much better for memory to simply have image1 = rand(...);, image2 = rand(...); in the loop itself, and reuse these image1 and image2. No need to even have this A.
In general, tall arrays are your memory friendly solution you should be using if you want to have tons of data at the same time. https://www.mathworks.com/help/matlab/tall-arrays.html

read 2d grey images and combine them to 3d matrix

I have some problems with matlab 2015a Win10 X64 16GB Ram.
There is a bunch of images (1280x960x8bit) and I want to load them into a 3d matrix. In theory that matrix should store ~1.2GB for 1001 images.
What I have so far is:
values(:,:,:)= zeros(960, 1280, 1001, 'uint8');
for i = Start:Steps:End
file = strcat(folderStr, filenameStr, num2str(i), '.png');
img = imread(file);
values(:,:,i-Start+1) = img;
end
This code is working for small amount of images but using it for all 1001 images I get "Out of memory" error.
Another problem is the speed.
Reading 50 images and save them takes me ~2s, reading 100 images takes ~48s.
What I thought this method does is allocating the memory and change the "z-elements" of the matrix picture by picture. But obviously it holds more memory than needed to perform that single task.
Is there any method I can store the grey values of a 2d picture sequence to a 3d matrix in matlab without wasting that much time and ressources?
Thank you
The only possibility I can see is that your idexes are bad. But I can only guess because the values of start step and End are not given. If End is 100000000, Start is 1 and step is 100000000, you are only reading 2 images, but you are accessing values(:,:,100000000) thus making the variable incredibly huge. That is, most likely, your problem.
To solve this, create a new variable:
imagenames=Start:Step:End; %note that using End as a variable sucks, better ending
for ii=1:numel(imagenames);
file = strcat(folderStr, filenameStr, num2str(imagenames(ii)), '.png');
img = imread(file);
values(:,:,ii) = img;
end
As Shai suggests, have a look to fullfile for better filename accessing

How to work around cell arrays that exceeds memory capacity?

I have a problem with insufficient memory (RAM) when I am reading metrological data (GRIB files), amounting to 35 GB of data, into a matlab cell array.
How can I work around my RAM-restrictions when I load big data sets?
I have tried to preallocate the cell-array, but that does not help. It stops at 70% loading of the data set.
Here is the FOR-loop that errors:
% load grib files
for ii = 1:number_files
waitbar(ii/number_files,h);
file_name = [fname,'\',num2str(ii),'.grb'];
grib_struct = read_grib([file_name],-1);
Temp{ii} = single(grib_struct(1,1).fltarray);
Rad_direct{ii} = single(grib_struct(1,2).fltarray);
Rad_diff{ii} = single(grib_struct(1,3).fltarray);
fclose('all');
end
Thanks!
You can use the matfile command to work directly on the file system. It stores every data you put into directly on the file system. It will be slow, but it is possible.

Quickest way to search txt/bin/etc file for numeric data greater than specified value

I have a 37,000,000x1 double array saved in a matfile under a structure labelled r. I can point to this file using matfile(...) then just use the find(...) command to find all values above a threshold val
This finds all the values greater than/equal to 0.004 but given the size of my data, this takes some time.
I want to reduce the time and have considered using bin files (apparently they are better than txt files in terms of not losing precision?) etc, however I'm not knowledgable with the syntax/method
I've managed to save the data into the bin file, but what is the quickest way to search through this large file?
The only output data I want are the actually values greater than my specified value.
IS using a bin file the best? Or a matfile? Etc
I don't want to load the entire file into matlab. I want to conserve the matlab memory as other programs may need the space and I don't want memory errors again
As #OlegKomarov points out, a 37,000,000 element array of doubles is not very big. Your real problem may be that you don't have enough RAM and/or are using a 32-bit version of Matlab. The find function will require additional memory for the input and the out array of indices.
If you want to load and process your data in chunks, you can use the matfile function. Here's a small example:
fname = [tempname '.mat']; % Use temp directory file for example
matObj = matfile(fname,'Writable',true); % Create MAT-file
matObj.r = rand(37e4,1e2); % Write random date to r variable in file
szR = size(matObj,'r'); % Get dimensions of r variable in file
idx = [];
for i = 1:szR(2)
idx = [idx;find(matObj.r(:,i)>0.999)]; % Find indices of r greater than 0.999
end
delete(fname); % Delete example file
This will save you memory, but it definitely not faster than storing everything in memory and calling find once. File access is always slower (though it will help a bit if you have an SSD). The code above uses dynamic memory allocation for the idx variable, but the memory is only re-allocated a few times in large chunks, which can be quite fast in current versions of Matlab.

How to do a median projection of a large image stack in Matlab

I have a large stack of 800 16bit gray scale images with 2048x2048px. They are read from a single BigTIFF file and the whole stack barely fits into my RAM (8GB).
Now I need do a median projection. That means I want to compute the median of each pixel across all 800 frames. The Matlab median function fails because there is not enough memory left make a copy of the whole array for the function call. What would be an efficient way to compute the median?
I have tried using a for loop to compute the median one pixel at a time, but this is still terribly slow.
Iterating over blocks, as #Shai suggests, may be the most straightforward solution. If you do have this problem frequently, you may want to consider converting the image to a mat-file, so that you can access the pixels as n-d array directly from disk.
%# convert to mat file
matObj = matfile('dest.mat','w');
matObj.data(2048,2048,numSlices) = 0;
for t = 1:numSlices
matObj.data(:,:,t) = imread(tiffFile,'index',t);
end
%# load a block of the matfile to take median (run as part of a loop)
medianOfBlock = median(matObj.data(1:128,1:128,:),3);
I bet that the distributions of the individual pixel values over the stack (i.e. the histograms of the pixel jets) are sparse.
If that's the case, the amount of memory needed to keep all the pixel histograms is much less than 2K x 2K x 64k: you can use a compact hash map to represent each histogram, and update them loading the images one at a time. When all updates are done, you go through your histograms and compute the median of each.
If you have access to the Image Processing Toolbox, Matlab has a set of tool to handle large images called Blockproc
From the docs :
To avoid these problems, you can process large images incrementally: reading, processing, and finally writing the results back to disk, one region at a time. The blockproc function helps you with this process.
I will try my best to provide help (if any), because I don't have an 800-stack TIFF image, nor an 8GB computer, but I want to see if my thinkings can form a solution.
First, 800*2048*2048*8bit = 3.2GB, not including the headers. With your 8GB RAM it should not be too difficult to store it at once; there might be too many programs running and chopping up the contiguous memories. Anyway, let's treat the problem as Matlab can't load it as a whole into the memory.
As Jonas suggests, imread supports loading a TIFF image by index. It also supports a PixelRegion parameter, so you can also consider accessing parts of the image by this parameter if you want to utilize Shai's idea.
I came up with a median algo that doesn't use all the data at the same time; it barely scans through a sequence of un-ordered data, one at each time; but it does keep a memory of 256 counters.
_
data = randi([0,255], 1, 800);
bins = num2cell(zeros(256,1,'uint16'));
for ii = 1:800
bins{data(ii)+1} = bins{data(ii)+1} + 1;
end
% clearvars data
s = cumsum(cell2mat(bins));
if find(s==400)
med = ( find(s==400, 1, 'first') + ...
find(s>400, 1, 'first') ) /2 - 1;
else
med = find(s>400, 1, 'first') - 1;
end
_
It's not very efficient, at least because it uses a for loop. But the benefit is instead of keeping 800 raw data in memory, only 256 counters are kept; but the counters need uint16, so actually they are roughly equivalent to 512 raw data. But if you are confident that for any pixel the same grayscale level won't count for more than 255 times among the 800 samples, you can choose uint8, and hence reduce the memory by half.
The above code is for one pixel. I'm still thinking how to expand it to a 2048x2048 version, such as
for ii = 1:800
img_data = randi([0,255], 2048, 2048);
(do stats stuff)
end
By doing so, for each iteration, you only need these kept in memory:
One frame of image;
A set of counters;
A few supplemental variables, with size comparable to one frame of image.
I use a cell array to store the counters. According to this post, a cell array can be pre-allocated while its elements can still be stored in memory non-contigously. That means the 256 counters (512*2048*2048 bytes) can be stored separately, which is quite reasonable for your 8GB RAM. But obviously my sample code does not make use of it since bins = num2cell(zeros(....