I have a question about varying number of for loops in Matlab. I have generated some different data folders, each one contain some .mat files. The thing I need to do is to generate all possible combinations from those files in those folders (but not the files in the same folder). So basically the algorithm is :
For i = 1: number of files in folder 1
For j = 1:number of files in folder 2
............................
For m = 1: number of files in folder n
Read file i in folder 1
Read file j in folder 2
......................
Read file m in folder n
Result file = sum of data in those files %at last we have n-Dimension matrix
end;
end;
end;
If we can fix the number of folders, this is done, but the number of folder varies depend on the input parameter, so I cannot find a suitable solution.
I also have read about the recursive algorithm but this is not clear to me how to do this.
For eg, using recursive algorithm:
I have a vector that shows number of files in each folder A =[2,3,4] (3 folders).
Function Recursive (n, A) % n = 3
if (n>1)
Recursive (n-1, A)
else
for i = 1: A(n)
Read file i in folder n;
end;
end
Here we cannot know previous variable (j,k..), so this is useless.
Please give me some suggestions.
Firstly, reading is slow. You shouldn't do it any more than you absolutely have to. If you can freely store everything in memory, do that. If you can't, but if you're actually doing something like summing data in files, calculate the sums of each of the files and store them (probably in a cell with one vector per folder). You also want to do that (work with summary statistics not whole .mat files) if you're calling functions recursively - you don't want to pour your whole .mat files into a new function for every iteration of the innermost loop.
For the actual loop, I would suggest using a main loop with its counter going from 1 to the product of the numbers of files in each folder. Inside that loop, I would test to see what I was updating. Something like
BigOutputArray=zeros(A);
P=prod(A);
cumprod=cumprod(A);
S=zeros(n,1);
currindex=num2cell(ones(n,1));
for ii=1:n
S(ii)=SumsOfMatFiles{ii}(1);
end
for ii=1:prod(A)
jj=max([0,find(~mod(ii,P))])+1;
thissum=sum(S);
BigOutputArray(i)=thissum;
S(jj)=SumsOfMatFiles{jj}(currindex(ii)+1);
currindex(ii)=currindex(ii)+1;
end
This assumes SumsOfMatFiles is a nice cell containing n arrays containing the sums, and there's no interaction between the .mat files. If it isn't, replace S with a cell array containing n sets of .mat file contents, and replace sum(S) with the appropriate function, if necessary using S{:}.
I am in favor of converting m dynamic nested loop, each one of length n:
for i1=v{1}
...
for im=v{m}
f(i1,...,im)=F(i1,...,im);
end
...
end
Into a m*n list of combinations:
I=combvec(v{1},...,v{m});
for i=I
f(i)=F(I(i));
end
f=reshape(F,m,n);
v is a variable size cell with m elements, defining the elements of the loop. v{i} is the index row vector for the ith loop.
The loop index could not neccesarily be of a fixed length n. The code is the same for that case, as it can be seen.
Both results return a matrix f of m by n.
The central cost is always to evaluate the function F, which must take a dynamic number of arguments.
Both loops are O(n^m) in complexity, but the dynamic loop cannot be programmed in any known way up to date without recursion.
Both the cost of processing and storing the index I and the dynamic loop are insignificant compared with evaluating the function F.
Related
I'm working with these h5 files that have tens of thousands of datasets that contains vectors of numerical values and all of the same size. My goal is to read the datasets and create one large matrix from these vectors. The datasets are named from "0" to "xxxxx" (some large number) I was able to read them and get the matrix but it takes forever to do so. I was wondering if you can take a look at my code and suggest a way to make it run faster
here is how I do it right now
t =[];
for i = 0:40400 % there are 40401 datasets in this particular file
j = int2str(i);
p = '/mesh/'; % The parent group
s = strcat(p,j); % to create the full path of a dataset e.g. '/mesh/0'
r = h5read('temp.h5',s); % the file name is temp and s has the dataset path
t = [t;r];
end
in this particular case, there are 40401 datasets, each has 80802x1 vector of numerical values. Therefore eventually I want to create 80802x40401 matrix. This code takes over a day to finish. I think one of the reason it is slow because in every iteration, matlab access the h5 file. I would appreciate it if some of you have some tips in speeding up the code
When I copied you code in an editor, I get the red tilde under the t with the warning:
The variable t appears to change size on every loop iteration. Consider preallocating for speed.
You should allocate the final memory of t before starting the loop, with the function zeros:
t = zeros(80804,40401);
You should also read this: Programming Patterns: Maximizing Code Performance by Optimizing Memory Access:
Preallocate arrays before accessing them within loops
Store and access data in columns
Avoid creating unnecessary variables
Maybe p = '/mesh/'; is useless inside the loop and can be done outside the loop, since it doesn't change. It could be even better to not have p and directly do s = strcat('/mesh/',j);
This question already exists:
Generating vectors from multiple matrices
Closed 6 years ago.
I am really new in matlab. So i am trying to learn the very basics. I have 8 tsv files with names like 2004.07.01.0000.tsv, 2004.07.01.0300.tsv, where each file has 72 rows and 144 columns. I am trying to automatically import all of those files to matlab in a matrix form to calculate the mean, median, skewness (for data correction). What I did is that I imported one file (2004.07.01.0000.tsv) using matlab gui, then I generated a function called importfile. I am trying to use a for loop to access all the data in those files but I could not figure it out. I tried (not sure at all):
for fileNum=1:8;
startRow=1;
endRow=72;
filename
a=importfile(filename, startRow, endRow);
end
If your importfile() function works correctly, in this manner at every for-loop iteration you'll overwrite a with the most recent imported file. You should concatenate all your files (i.e. matrices) instead.
A matrix concatenation can either be done by rows (i.e. horizontal concatenation) or by columns (i.e. vertical concatenation). As I understand, you want a vertical concatenation in order to generate a unique matrix with 144 columns and as many rows as your single files contain.
Thus you should change the loop as follows
myMatrix=[];
for fileNum=1:8;
startRow=1;
endRow=72;
filename
myMatrix=[myMatrix ; importfile(filename, startRow, endRow)];
end
The vertical concatenation can be done by means of the ; operator, thus an instruction like A=[B ; C] will create a matrix A by concatenating matrices B and C. In your case you initialize myMatrix as empty and then you will vertically concatenate (in an iterative fashion) all outputs from importfile(), that are your .tsv files.
At the end of the loop, myMatrix should have size NxM where M is 144 and N is the sum of the number of rows across all your files (8*72).
Update
If you have to pass explicitly the filename to the importfile() function you can create a cell array of strings in which each element of the cell is a filename. Thus in our case the cell array will be something like:
filenames={'filename1.tsv','filename2.tsv',...,'filename8.tsv'};
obviously you must replace the strings inside the cell with the proper filenames and finally you can slightly edit the loop as follows
myMatrix=[];
for fileNum=1:8;
startRow=1;
endRow=72;
myMatrix=[myMatrix ; importfile(filenames{i}, startRow, endRow)];
end
In this manner, at every loop iteration the i-th filename will be given as input to importfile() and hopefully it'll be loaded.
For this to work you should (let's make things simple)
place your Matlab script and obviously the function importfile() in the same folder containing your .tsv files
set said folder as the Current Folder
or if you have the .tsv files in a given folder and your scripts in another folder, then the Current Folder must certainly will be the folder containing your scripts and the filenames inside the cell array filenames must contain the entire path, not just the proper filenames.
Assuming I have X .matfiles under a directory called my_experiment_1/data, I know I can load them into
experiment1_files = dir(['my_experiments/data/*.mat']);
Now I would like to open them in a for loop with the .name extension of dir:
for count = 1:N
% load data-set
load(experiment1_files(count).name);
...
end
and perform a bunch of operations with the matrices of each file.
Question: what is the way to compute the number of files withing a directory in MATLAB (meaning, the number N in the for-loop above)?
As stated in the specification of dir it returns an Nx1 struct where the number of items N corresponds to the number of files and folders it retrieved from the path you pass to dir.
I have 50 matrices contained in one folder, all of dimension 181 x 360. How do I cycle through that folder and take an average of each corresponding data points across all 50 matrices?
If the matrices are contained within Matlab variables stored using save('filename','VariableName') then they can be opened using load('filename.mat').
As such, you can use the result of filesInDirectory = dir; to get a list of all your files, using a search pattern if appropriate, like files = dir('*.mat');
Next you can use your load command, and then whos to see which variables were loaded. You should consider storing these for ease clearing after each iteration of your loop.
Once you have your matrix loaded (one at a time), you can take averages as you need, probably summing a value across multiple loop iterations, then dividing by a total counter you've been measuring (using perhaps count = count + size(MatrixVar, dimension);).
If you need all of the matrices loaded at once, then you can modify the above idea, to load using a loop, then average outside of the loop. In this case, you may need to take care - but 50*181*360 isn't too bad I suspect.
A brief introduction to the load command can be found at this link. It talks mainly about opening one matrix, then plotting the values, but there are some comments about dealing with headers, if needed, and different ways in which you can open data, if load is insufficient. It doesn't talk about binary files, though.
Note on binary files, based on comment to OP's question:
If the file can be opened using
FID = fopen('filename.dat');
fread(FID, 'float');
then you can replace the steps referring to load above, and instead use a loop to find filenames using dir, open the matrices using fopen and fread, then average as needed, finally closing the files and clearing the matrices.
In this case, probably your file identifier is the only part you're likely to need to change during the loop (although your total will increase, if that's how you want to average your data)
Reshaping the matrix, or inverting it, might make the code clearer (which is good!), but might not be necessary depending on what you're trying to average - it may be that selecting only a subsection of the matrix is sufficient.
Possible example code?
Assuming that all of the files in the current directory are to be opened, and that no files are elsewhere, you could try something like:
listOfFiles = dir('*.dat');
for f = 1:size(listOfFiles,1)
FID = fopen(listOfFiles(f).name);
Data = fread(FID, 'float');
% Reshape if needed?
Total = Total + sum(Data(start:end,:)); % This might vary, depending on what you want to average etc.
Counter = Counter + (size(Data,1) * size(Data,2)); % This product will be the 181*360 you had in the matrix, in this case
end
Av = Total/Counter;
Sorry I am new to matlab.
What I have: A folder containing about 80 subfolders, labeled Day01, Day02, Day03, etc. Each subfolder has a file called "sample_ids.txt" It is a n x m matrix in a tab delimited format.
What I need: 1 data structure that is an array of matrices, where each matrix is the data from "sample_ids.txt" and it should be in the alphabetical order of Day01, Day02, Day03, etc.
I have no idea how to get from point A to point B. Any guidance would be greatly appreciated.
You can decompose this problem into two parts: finding the files, and reading them into memory.
Finding the files is pretty easy, and has already been covered on StackOverflow.
For loading them into memory, you want a multidimensional array, which is as simple as creating a regular array and start using more index dimensions: A = ones(2); A(:,:,2) = ones(2); will, for example, give you a 3-dimensional array of size 2-by-2-by-2, with ones all over.
What you want, is probably want something like this:
A = [] % No prealocation. Fix for speed-up.
files = dir('./Day*/sample_ids.txt');
for file = files
temp = load(file.name);
A(:,:,size(A,3)+1) = temp;
end
disp(A) % display the contents of A afterards...
I haven't tested this code extensively, but it should work OK.
A few important points:
All files must contain matrices of the exact same dimensions - MATLAB can't handle arrays that have different dimensions in different layers (at least not with regular arrays - you could use cell arrays, but that quickly becomes more complicated...). Think of it as trying to build a matrix from vectors of different lengths.
If you have a lot of data, and you know how much, you can save a lot of time by pre-allocating A. This is as easy as A = zeros(k,l,m) for m datafiles with k rows and l columns in each. If you do this, you'll also have to figure out the index of the current file, so you can use that as the third index in the assignment (on the second line in the loop block). I leave this as an internet research excersize :)