I've been working on the processing 2 lists of 855 4000x4000 matrices. Here is a list of 855 matrices of some value, another one is a list of coordinates (another 855 4000x4000 matrices). It's important to do it within the one cycle, in order to not to have thousands of useless variables. For every file, it cuts (read put NaN where I don't need data) coordinates data, then it cuts data related to the coordinates. Then it gathers all the values into one matrix. The code is:
for x = 1:length(list_with_par)
cd 'D:\Coord'
par_lon = ncread(list_with_coordinates(x,:), 'longitude');
par_lon(par_lon>=15) = nan;
par_lon(par_lon<=-18) = nan;
par_lat = ncread(list_with_coordinates(x,:), 'latitude');
par_lat(par_lon>=84) = nan;
par_lat(par_lon<=76) = nan;
cd 'D:\Par'
par = ncread(list_with_par(x,:), 'PAR');
for i = 1:size(ncread(list_with_par(x,:),'PAR'),1) %size(,1)
for z = 1:size(ncread(list_with_par(x,:),'PAR'),2) %size(,2)
if isnan(par_lon(i,z))
par(i,z) = nan;
end
if isnan(par_lat(i,z))
par(i,z) = nan;
end
end
end
if size(par,2) < size(PAR_main,2)
left_cells = size(PAR_main,2) - size(par,2);
temp_cell = NaN(4865,left_cells);
C2 = cat(2,par,temp_cell);
end
if size(par,2) == size(PAR_main,2)
C2 = par(:,:,1);
end
PAR_main(:,:,x) = C2(:,:,1);
end
But suddenly an error pops up after 4-5 hours of processing.
Error using netcdflib
The NetCDF library encountered an error during execution of 'open' function - 'HDF error (NC_EHDFERR)'.
Error in netcdf.open (line 67)
[varargout{:}] = netcdflib ( 'open', filename, varargin{1} );
Error in internal.matlab.imagesci.nc/openToRead (line 1278)
this.ncRootid = netcdf.open(this.Filename,'NOWRITE');
Error in internal.matlab.imagesci.nc (line 121)
this.openToRead();
Error in ncread (line 61)
ncObj = internal.matlab.imagesci.nc(ncFile);
What might be an issue?
I'm not really familiar with ncread (and associated functions), but there two things that jump out at me that appear to be very inefficient:
In your loops over 'i' and 'z', is there a reason to read in the data again to determine its size instead of just using the 'par' variable that you already saved?
for i = 1:size(par,1)
for z = 1:size(par,2)
For that matter, unless I am missing something specific to this set of functions, you should be able to skip the loops over 'i' and 'z' completely and vectorize the calculation:
par(isnan(par_lon))= nan;
par(isnan(par_lat)) = nan;
This is certainly significantly slowing down your code. It is hard to say beyond that, but I could definitely see how having millions of extraneous file reads could cause some issues with either temporary file addresses, etc.) or memory leaks.
Related
% 3. Calculation of strain energy density
% CALCULATION OF STRAIN-ENERGY-DENSITY FOR EACH LOAD CASE
% u=1/2*sigma*epsilon
for p = 1:N_ele
uLS1(p) = 1/2*(sigma_1(p,2:7)*epsilon_1(p,2:7)');
uLS2(p) = 1/2*(sigma_2(p,2:7)*epsilon_2(p,2:7)');
uLS3(p) = 1/2*(sigma_3(p,2:7)*epsilon_3(p,2:7)');
end
% AVERAGE OF ALL LOAD CASES
sed(:,a) = (uLS1' + uLS2' + uLS3')/3; %11 ... line
Error on command window:
"Unrecognized function or variable 'uLS1'."
Error in main_file (line 86)
sed(:,a) = (uLS1' + uLS2' + uLS3')/3;
Regarding the error: The variable sed must have N_ele rows such that size(sed,1) = N_ele. If the number N_ele changes with every iteration a, then you can use a cell instead an array, i.e., sed{a} = (uLS1' + uLS2' + uLS3')/3;.
Regarding the warning: Preallocate the arrays uLS1, uLS2, and uLS3 before the for-loop when you know the size they will have, i.e.,
uLS1 = zeros(1, N_ele);
uLS2 = zeros(1, N_ele);
uLS3 = zeros(1, N_ele);
If you don't know their sizes in advance, you have the choice to ignore Matlab's warning and proceed as is.
I have to read and then process a huge amount of data (Matrix: ~40.000.000x19).
Frist step is to read the data:
Array = load('vort.dat');
The flie 'vort.dat ' contains ~40.000.000 (imax*jmax*kmax) lines and 19 rows, the first line is:
3.53080034E-03 0.00000000 1.25000002E-02 63.0216064 -3.03968048 -358.802948 -744.902588 -2.51340670E-10 2.11566061E-04 18.6898212 72.3569489 0.727692425 0.754972637 0.661218643 1.50408816 1.87408039E-03 5.69900125E-03 0.00000000 0.00000000
Than I loop over the Array and store the various values for the post-processing into separate arrays:
imax=511;
jmax=160;
kmax=399;
for q=1:length(Array(:,1))
Rp(k,j,i)=Array(q,1);
yp(k,j,i)=(0.5-Rp(k,j,i))*360;
...
% index variables
k=k+1;
if(k>kmax)
k=1;
i=i+1;
if(i>imax)
i=1;
j=j+1;
if(j>jmax)
j=1;
end
end
end
end
Than the post-processing starts!
The problem is that matlab crashes without a warning during the data processing or during the plotting of figures!
I already set the stack size to unlimitied (ulimit -s unlimited).
The second idea was working with memmapfile, it looks like it is working but the plots from the post-processing show that it does not read the right data!
%%% Array = load('vort.dat');
m=memmapfile('data.dat','Format',{'double',[imax*jmax*kmax 19], 'x'},'repeat', 1);
Array=m.data.x;
If you're running out of memory during pre-processing, you may want to clear MATLAB's memory before loading massive amounts of data:
clear('all');
Array = load('vort.dat');
%'Here continues the pre-processing'
If you're running out of memory during post-processing, you may want to clear massive variables once they're not used anymore. For example, since Array is not used anymore after pre-processing, begin your post-processing with:
clear('Array');
or, simpler:
Array = 0;
Given the size of your matrix, this should free enough memory to allow you carry on with post-processing and reporting.
So, as example, the script would look like:
%//Preparing
clear('all'); %//Start with fresh memory
dbstop('if', 'error'); %//Trap uncaught exceptions
%//Loading
A = load('vort.dat');
%//Pre-processing, vectorized operations
I = 511;
J = 160;
K = 399;
Rp = permute(reshape(A(:,1),K,I,J), [1 3 2]);
yp = (0.5 - Rp)*360;
%//...
%//Post-processing
clear('A'); %//and other vars not needed anymore
%//...
I am working on a code to extract my AR(1)-GARCH(1) parameter, which I estimated using an AR(1)-GJR(1,1) model to individual matrices so that I can use them as variables in my calculations. As I have 16 time series variables, I combine the code with a loop in the following way:
for i=1:nIndices
AA_ARCH(:,i) = cell2mat(fit{i}.Variance.ARCH)';
end;
My problem is that for some variables is are no for AA_ARCH(:,i) the dimension is lower than nIndices. Naturally, when I try to export the estimates in the loop which specified the dimension of (:,i) and nIndices matlab reports a dimension mismatch. I would like to tell Matlab to replace the NaN with 0 instead of leaving the spot empty so that it is able to produce a (1,nIndices) matrix from AA_ARCH.
I thought of something like the this:
fit{i}.Variance.Leverage(isnan(fit{i}.Variance.Leverage))=0
but I wasn't able to combine this part with the previous code.
I would be very happy about any hints!
Best, Carolin
UPDATE:
Here is a fully a runnable version of my code which produces my problem. Notice that the code produces a dimension mismatch error because there is no ARCH and GARCH estimate in the fit.gjr(1,1) for time series 1. For these missing values I would like to have 0 as a placeholder in the extracted matrix.
returns = randn(2,750)';
T = size(returns,1);
nIndices = 2;
model = arima('AR', NaN, 'Variance', gjr(1,1));
residuals = NaN(T, nIndices);
variances = NaN(T, nIndices);
fit = cell(nIndices,1);
options = optimset('fmincon');
options = optimset(options, 'Display' , 'off', 'Diagnostics', 'off', ...
'Algorithm', 'sqp', 'TolCon' , 1e-7);
for i = 1:nIndices
fit{i} = estimate(model, returns(:,i), 'print', false, 'options', options);
[residuals(:,i), variances(:,i)] = infer(fit{i}, returns(:,i));
end
for i=1:nIndices
AA_beta(:,i) = cell2mat(fit{i}.AR)';
AA_GARCH(:,i) = cell2mat(fit{i}.Variance.GARCH)';
AA_ARCH(:,i) = cell2mat(fit{i}.Variance.ARCH)';
AA_Leverage(:,i) = cell2mat(fit{i}.Variance.Leverage)';
end;
I have some general things to say about the code, but first a solution to your problem:
You can put a simple if/else structure in your loop to handle the case of an empty array:
for ind1=1:nIndices
AA_beta(:,ind1) = cell2mat(fit{ind1}.AR)'; %//'
%// GARCH
if isempty(cell2mat(fit{ind1}.Variance.GARCH)') %//'
AA_GARCH(1,ind1) = 0;
else
AA_GARCH(:,ind1) = cell2mat(fit{ind1}.Variance.GARCH)'; %//'
end
%// ARCH (same exact code, should probably be exported to a function)
if isempty(cell2mat(fit{ind1}.Variance.ARCH)') %//'
AA_ARCH(1,ind1) = 0;
else
AA_ARCH(:,ind1) = cell2mat(fit{ind1}.Variance.ARCH)'; %//'
end
AA_Leverage(:,ind1) = cell2mat(fit{ind1}.Variance.Leverage)'; %//'
end;
Side note: I initially tried something like this: soz = #(A)isempty(A)*0+~isempty(A)*A; as an inline replacement for the if/else, but it turns out that MATLAB doesn't handle [] + 0 the way I wanted (it results in [] instead of 0; unlike other languages like JS).
As for the other things I have to say:
I am a firm supporter of the notion that one shouldn't use i,j as loop indices, as this may cause compatibility problems in some cases where complex numbers are involved (e.g. if you loop index is i then 1*i now refers to the loop index instead of to the square root of -1).
Part of your problem was that the arrays you were writing into weren't preallocated - which also means the correct datatype was unknown to MATLAB at the time of their creation. Besides the obvious performance hit this entails, it could also result in errors like the one you encountered here. If, for example, you used cells for AA_beta etc. then they could contain empty values, which you could later replace with whichever placeholder your heart desired using a combination of cellfun and isempty. Bottom line: lint (aka the colorful square on the top right of the editor window) is your friend - don't ignore it :)
I have this file which is a series of x, y, z coordinates of over 34 million particles and I am reading them in as follows:
parfor i = 1:Ntot
x0(i,1)=fread(fid, 1, 'real*8')';
y0(i,1)=fread(fid, 1, 'real*8')';
z0(i,1)=fread(fid, 1, 'real*8')';
end
Is there a way to read this in without doing a loop? It would greatly speed up the read in. I just want three vectors with x,y,z. I just want to speed up the read in process. Thanks. Other suggestions welcomed.
I do not have a machine with Matlab and I don't have your file to test either but I think coordinates = fread (fid, [3, Ntot], 'real*8') should work fine.
Maybe fread is the function you are looking for.
You're right. Reading data in larger batches is usually a key part of speeding up file reads. Another part is pre-allocating the destination variable zeros, for example, a zeros call.
I would do something like this:
%Pre-allocate
x0 = zeros(Ntot,1);
y0 = zeros(Ntot,1);
z0 = zeros(Ntot,1);
%Define a desired batch size. make this as large as you can, given available memory.
batchSize = 10000;
%Use while to step through file
indexCurrent = 1; %indexCurrent is the next element which will be read
while indexCurrent <= Ntot
%At the end of the file, we may need to read less than batchSize
currentBatch = min(batchSize, Ntot-indexCurrent+1);
%Load a batch of data
tmpLoaded = fread(fid, currentBatch*3, 'read*8')';
%Deal the fread data into the desired three variables
x0(indexCurrent + (0:(currentBatch-1))) = tmpLoaded(1:3:end);
y0(indexCurrent + (0:(currentBatch-1))) = tmpLoaded(2:3:end);
z0(indexCurrent + (0:(currentBatch-1))) = tmpLoaded(3:3:end);
%Update index variable
indexCurrent = indexCurrent + batchSize;
end
Of course, make sure you test, as I have not. I'm always suspicious of off-by-one errors in this sort of work.
I have following 10 fold implementation, I am using data set publish by UCI Machine learning, Here is the link for the data set:
Here are my dimensions
x =
data: [178x13 double]
labels: [178x1 double]
This is the error that I am getting
Index exceeds matrix dimensions.
Error in GetTenFold (line 33)
results_cell{i,2} = shuffledMatrix(testRows ,:);
This is my code:
%Function that accept data file as a name and the number of folds
%For the cross fold
function [results_cell] = GetTenFold(dataFile, x)
%loading the data file
dataMatrix = load(dataFile);
%combine the data and labels as one matrix
X = [dataMatrix.data dataMatrix.labels];
%geting the length of the of matrix
dataRowNumber = length(dataMatrix.data);
%shuffle the matrix while keeping rows intact
shuffledMatrix = X(randperm(size(X,1)),:);
crossValidationFolds = x;
%Assinging number of rows per fold
numberOfRowsPerFold = dataRowNumber / crossValidationFolds;
crossValidationTrainData = [];
crossValidationTestData = [];
%Assigning 10X2 cell to hold each fold as training and test data
results_cell = cell(10,2);
%starting from the first row and segment it based on folds
i = 1;
for startOfRow = 1:numberOfRowsPerFold:dataRowNumber
testRows = startOfRow:startOfRow+numberOfRowsPerFold-1;
if (startOfRow == 1)
trainRows = (max(testRows)+1:dataRowNumber);
else
trainRows = [1:startOfRow-1 max(testRows)+1:dataRowNumber];
i = i + 1;
end
%for i=1:10
results_cell{i,1} = shuffledMatrix(trainRows ,:);
results_cell{i,2} = shuffledMatrix(testRows ,:); %This is where I am getting my dimension error
%end
%crossValidationTrainData = [crossValidationTrainData ; shuffledMatrix(trainRows ,:)];
%crossValidationTestData = [crossValidationTestData ;shuffledMatrix(testRows ,:)];
end
end
You're looping over 1:numberOfRowsPerFold:dataRowNumber which is 1:x:178 and i increments every time. So that's a way you can get the index out of bounds error on results_cell.
Another way to get the error is that testRows selects rows out of bound of shuffledMatrix.
Learn to debug
To pause the code and start debugging when the error occurs, run dbstop if error before executing your code. This way the compiler goes in debug mode upon encountering an error and you can inspect the state of variables right before things mess up.
(to disable this debugging mode, run dbclear if error.)