MATLAB: making a histogram plot from csv files read and put into cells? - matlab

Unfortunately I am not too tech proficient and only have a basic MATLAB/programming background...
I have several csv data files in a folder, and would like to make a histogram plot of all of them simultaneously in order to compare them. I am not sure how to go about doing this. Some digging online gave a script:
d=dir('*.csv'); % return the list of csv files
for i=1:length(d)
m{i}=csvread(d(i).name); % put into cell array
end
The problem is I cannot now simply write histogram(m(i)) command, because m(i) is a cell type not a csv file type (I'm not sure I'm using this terminology correctly, but MATLAB definitely isn't accepting the former).
I am not quite sure how to proceed. In fact, I am not sure what exactly is the nature of the elements m(i) and what I can/cannot do with them. The histogram command wants a matrix input, so presumably I would need a 'vector of matrices' and a command which plots each of the vector elements (i.e. matrices) on a separate plot. I would have about 14 altogether, which is quite a lot and would take a long time to load, but I am not sure how to proceed more efficiently.
Generalizing the question:
I will later be writing a script to reduce the noise and smooth out the data in the csv file, and binarise it (the csv files are for noisy images with vague shapes, and I want to distinguish these shapes by setting a cut off for the pixel intensity/value in the csv matrix, such as to create a binary image showing these shapes). Ideally, I would like to apply this to all of the images in my folder at once so I can shift out which images are best for analysis. So my question is, how can I run a script with all of the csv files in my folder so that I can compare them all at once? I presume whatever technique I use for the histogram plots can apply to this too, but I am not sure.
It should probably be better to write a script which:
-makes a histogram plot and/or runs the binarising script for each csv file in the folder
-and puts all of the images into a new, designated folder, so I can sift through these.
I would greatly appreciate pointers on how to do this. As I mentioned, I am quite new to programming and am getting overwhelmed when looking at suggestions, seeing various different commands used to apparently achieve the same thing- reading several files at once.

The function csvread returns natively a matrix. I am not sure but it is possible that if some elements inside the csv file are not numbers, Matlab automatically makes a cell array out of the output. Since I don't know the structure of your csv-files I will recommend you trying out some similar functions(readtable, xlsread):
M = readtable(d(i).name) % Reads table like data, most recommended
M = xlsread(d(i).name) % Excel like structures, but works also on similar data
Try them out and let me know if it worked. If not please upload a file sample.

The function csvread(filename)
always return the matrix M that is numerical matrix and will never give the cell as return.
If you have textual data inside the .csv file, it will give you an error for not having the numerical data only. The only reason I can see for using the cell array when reading the files is if the dimensions of individual matrices read from each file are different, for example first .csv file contains data organised as 3xA, and second .csv file contains data organised as 2xB, so you can place them all into a single structure.
However, it is still possible to use histogram on cell array, by extracting the element as an array instead of extracting it as cell element.
If M is a cell matrix, there are two options for extracting the data:
M(i) and M{i}. M(i) will give you the cell element, and cannot be used for histogram, however M{i} returns element in its initial form which is numerical matrix.
TL;DR use histogram(M{i}) instead of histogram(M(i)).

Related

MATLAB: How to convert data from an excel file (several sheets) to a matrix?

I try to process data from an excel file from several sheets (~200). Luckily the data is always at the same position in each sheet. I wrote the following code for this purpose which unfortunately does not work since I have problems with converting the cell entries to a matrix. Any idea how to solve that? Thanks!
[~, sheet_name] = xlsfinfo('mydata.xlsx');
for k=1:numel(sheet_name)
data{k}=xlsread('mydata.xlsx',sheet_name{k},'A7:A14');
b{k}=cell2mat(data{k}) %this line does not work...
end```
You are not using the cell2mat command on the whole cell.
data{k}
will already give you a matrix that contains the 7 entries of the sheet k.
You either need to use cell2mat on the whole cell after the loop:
b=cell2mat(data)
or
b(k)=data{k}
within the loop.

Is there a way to stack multiple 2x2 matrices in MATLAB into a multidimensional array in a simpler way (e.g. without using "cat" or "reshape")?

I am receiving a text file with 1000 matrices of size 2x2 each day from someone, in the following format (only 3 matrices are shown here instead of 1000):
0.96875000 0.03125000
0.03125000 0.96875000
0.96875000 0.01562500
0.03125000 0.98437500
0.99218800 0.03125000
0.00781250 0.96875000
I need to make a 2x2x1000 array in MATLAB. Ideally I could do something simple like:
[0.96875000 0.03125000
0.03125000 0.96875000;
0.96875000 0.01562500
0.03125000 0.98437500;
0.99218800 0.03125000
0.00781250 0.96875000]
After reading the MATLAB documentation on multidimensional arrays and the MATLAB documentation for the cat function, I figured out that I could make the required array in the following way (the first argument of cat is 3 because I'm concatenating the 2x2 matrices along the 3rd dimension):
cat(3,...
[0.96875000 0.03125000
0.03125000 0.96875000],...
[0.96875000 0.01562500
0.03125000 0.98437500],...
[0.99218800 0.03125000
0.00781250 0.96875000])
But that does not work if I put spacing between the lines as in my "ideal" example above, and the need for all the commas and dots makes it a bit uglier in my opinion.
While writing this question, I have discovered that I can run my "ideal" example and then use reshape, which I prefer over my solution using the cat function. For this, I don't even need the semi-colons. However Cris Luengo correctly pointed out in the comments that reshape is not enough and permute is also needed, and then Luis Mendo pointed out in chat that the solution is not so simple:
permute(reshape(ideal.',2,2,[]),[2 1 3])
Andras Deak has done what we thought was impossible, which is to remove the transpose, but the solution is still quite complicated, and was not easy to engineer:
permute(reshape(ideal,2,[],2),[1 3 2])
Ideally one would not need to use cat or reshape to make a 3D array, when the original data is already so nicely formatted in what the human eye can already see is a 3D array of several 2x2 matrices.
Is there a simpler way to build the 3D array in MATLAB using the data in the format I have?
So far I have done the following on my own:
Searched online and found the above two MATLAB documentation articles which lead me to the above solution using cat
Came up with the above solution using reshape while writing this question, then it got improved by Cris and Luis in the comments and chat 😊.
Also: I tried saving the data in a .txt file and clicked import in MATLAB, knowing that the import GUI gives some options for how the data is to be organized in the resulting MATLAB array, but there did not seem to be any option to make this a 3D array.
Indeed there is no "direct" way to import this text as a 3D matrix. This is the easiest way I can come up with:
Save the input as a .txt file
Use the import tool (Import Data button in the Variable toolbar) to import the data as a Mx2 matrix. Choose "Numeric Matrix" as "Output Type". And you can "exclude rows with" "blank cells" to avoid the empty rows.
Besides reshape() and permute(), using cell array to format it as below might be more intuitive and less error prone to someones.
% The number of 2x2 matrices
N = size(m,1)/2;
% Split each 2x2 matrix into a cell
c = mat2cell(m, 2*ones(1,N), 1);
% Concatenate along the 3rd dimension
output3DMatrix = cat(3, c{:});

Manipulating large sets of data in Matlab, asking for advice on a few things, cells and numeric array operations, with performance in mind

This is a cross-post from here:
Link to post in the Mathworks community
Currently I'm working with large data sets, I've saved those data set as matlab files with the two biggest files being 9.5GB and 5.9GB.
These files contain a cell array each of 1x8 (this is done for addressibility and to prevent mixing up data from each of the 8 cells and I specifically wanted to avoid eval).
Each cell then contains a 3D double matrix, for one it's 1001x2002x201 and the other it is 2003x1001x201 (when process it I chop of 1 row at the end to get it to 2002).
Now I'm already running my script and processing it on a server (64 cores and plenty of RAM, matlab crashed on my laptop, as I need more than 12GB ram on windows). Nonetheless it still takes several hours to finish running my script and I still need to do some extra operations on the data which is why I'm asking advice.
For some of the large cell arrays, I need to find the maximum value of the entire set of all 8 cells, normally I would run a for loop to get the maximum of each cel and store each value in a temporay numeric array and then use the function max again. This will work for sure I'm just wondering if there's a better more efficient way.
After I find the maximum I need to do a manipulation over all this data as well, normally I would do something like this for an array:
B=A./maxvaluefound;
A(B > a) = A(B > a)*constant;
Now I could put this in a for loop, adress each cell and run this, however I'm not sure how efficient that would be though. Do you think there's a better way then a for loop that's not extremely complicated/difficult to implement?
There's one more thing I need to do which is really important, each cell as I said before is a slice (consider it time), while inside each slide is the value for a 3D matrix/plot. Now I need to integrate the data so that I get more slices. The reason I need to do this that I need to create slices/frames/plots to create a movie/gif. I'm planning on plotting the 3d data using scatter3 where this data is represented by color. I plan on using alpha values to make it see through so that one can actually see the intensity in this 3d plot. However I understand how to use griddata but apparently it's quite slow. Some of the other methods where hard to understand. Thus what would be the best way to interpolate these (time) slices in an efficient way over the different cells in the cell array? Please explain it if you can, preferably with an example.
I've added a pic for the Linux server info I'm running it on below, note I can not update the matlab version unfortunately, it's R2016a:
I've also attached part of my code to give a better idea of what I'm doing:
if (or(L03==2,L04==2)) % check if this section needs to be executed based on parameters set at top of file
load('../loadfilewithpathnameonmypc.mat')
E_field_650nm_intAll=cell(1,8); %create empty cell array
parfor ee=1:8 %run for loop for cell array, changed this to a parfor to increase speed by approximately 8x
E_field_650nm_intAll{ee}=nan(szxit(1),szxit(2),xres); %create nan-filled matrix in cell 1-8
for qq=1:2:xres
tt=(qq+1)/2; %consecutive number instead of spacing 2
T1=griddata(Xsall{ee},Ysall{ee},EfieldsAll{ee}(:,:,qq)',XIT,ZIT,'natural'); %change data on non-uniform grid to uniform gridded data
E_field_650nm_intAll{ee}(:,:,tt)=T1; %fill up each cell with uniform data
end
end
clear T1
clear qq tt
clear ee
save('../savelargefile.mat', 'E_field_650nm_intAll', '-v7.3')
end
if (L05==2) % check if this section needs to be executed based on parameters set at top of file
if ~exist('E_field_650nm_intAll','var') % if variable not in workspace load it
load('../loadanotherfilewithpathnameonmypc.mat');
end
parfor tt=1:8 %run for loop for cell array, changed this to a parfor to increase speed by approximately 8x
CFxLight{tt}=nan(szxit(1),szxit(2),xres); %create nan-filled matrix in cells 1 to 8
for qq=1:xres
CFs=Cafluo3D{tt}(1:lxq2,:,qq)'; %get matrix slice and tranpose matrix for point-wise multiplication
CFxLight{tt}(:,:,qq)=CFs.*E_field_650nm_intAll{tt}(:,:,qq); %point-wise multiple the two large matrices for each cell and put in new cell array
end
end
clear CFs
clear qq tt
save('../saveanotherlargefile.mat', 'CFxLight', '-v7.3')
end

Import multiple tab delimited files into matlab from different subdirectories

Sorry I am new to matlab.
What I have: A folder containing about 80 subfolders, labeled Day01, Day02, Day03, etc. Each subfolder has a file called "sample_ids.txt" It is a n x m matrix in a tab delimited format.
What I need: 1 data structure that is an array of matrices, where each matrix is the data from "sample_ids.txt" and it should be in the alphabetical order of Day01, Day02, Day03, etc.
I have no idea how to get from point A to point B. Any guidance would be greatly appreciated.
You can decompose this problem into two parts: finding the files, and reading them into memory.
Finding the files is pretty easy, and has already been covered on StackOverflow.
For loading them into memory, you want a multidimensional array, which is as simple as creating a regular array and start using more index dimensions: A = ones(2); A(:,:,2) = ones(2); will, for example, give you a 3-dimensional array of size 2-by-2-by-2, with ones all over.
What you want, is probably want something like this:
A = [] % No prealocation. Fix for speed-up.
files = dir('./Day*/sample_ids.txt');
for file = files
temp = load(file.name);
A(:,:,size(A,3)+1) = temp;
end
disp(A) % display the contents of A afterards...
I haven't tested this code extensively, but it should work OK.
A few important points:
All files must contain matrices of the exact same dimensions - MATLAB can't handle arrays that have different dimensions in different layers (at least not with regular arrays - you could use cell arrays, but that quickly becomes more complicated...). Think of it as trying to build a matrix from vectors of different lengths.
If you have a lot of data, and you know how much, you can save a lot of time by pre-allocating A. This is as easy as A = zeros(k,l,m) for m datafiles with k rows and l columns in each. If you do this, you'll also have to figure out the index of the current file, so you can use that as the third index in the assignment (on the second line in the loop block). I leave this as an internet research excersize :)

MATLAB Saving and Loading Feature Vectors

I am trying to load feature vectors into classifiers such as a k-nearest neighbors classifier.
I have my code for GLCM, so I get contrast, correlation, energy, homogeneity in numbers (feature vectors).
My question is, how can I save every set of feature vectors from all the training images? I have seen somewhere that people had a .set file to load into classifiers (may be it is a special case for the particular classifier toolbox).
load 'mydata.set';
for example.
I suppose it does not have to be a .set file.
I'd just need a way to store all the feature vectors from all the training images in a separate file that can be loaded.
I've google,
and I found this that may be useful
but I am not entirely sure.
Thanks for your time and help in advance.
Regards.
If you arrange your feature vectors as the columns of an array called X, then just issue the command
save('some_description.mat','X');
Alternatively, if you want the save file to be readable, say in ASCII, then just use this instead:
save('some_description.txt', 'X', '-ASCII');
Later, when you want to re-use the data, just say
var = {'X'}; % <-- You can modify this if you want to load multiple variables.
load('some_description.mat', var{:});
load('some_description.txt', var{:}); % <-- Use this if you saved to .txt file.
Then the variable named 'X' will be loaded into the workspace and its columns will be the same feature vectors you computed before.
You will want to replace the some_description part of each file name above and instead use something that allows you to easily identify which data set's feature vectors are saved in the file (if you have multiple data sets). Your array of feature vectors may also be called something besides X, so you can change the name accordingly.