Why is MATLAB failing to successfully read in binary files? - matlab

Matlab is failing to read in the specified number of elements from a file. I have a simple program that needs to read in two files, perform a linear operation on the data and write a combined result to a third file.
My questions are: 1) Why does Matlab fail to read the specified number of elements and 2) is there a workaround for this? Any of your thoughts will be helpful.
Some details on the input files:
they are large (~18GB)
they are both the same size (exactly)
Details on the procedure (2-4 are conditioned on an feof check of both files:
Open the input and output files for reading and writing (resp.)
Read in N floats (N*4 bytes) from each of the input files
Perform an operation on the data (say 0.5*(datin1+datin2))
Write the result to the output file.
Granted, this is all very simple and in most cases in the past this has worked well. Unfortunately, at some point in the cycle, MATLAB doesn't get all N floats from one of the files and gives a matrix dimension error on step 3.
CODE SNIP:
N = 2048;
fidin1 = fopen('file1.dat','r','l');
fidin2 = fopen('file2.dat','r','l');
fidout = fopen('outfile.dat','w','l');
%# I could do some assertions on the file sizes,
%# but I know they are the same size (w/o question).
while(~feof(fidin1) && ~feof(fidin2))
datin1 = fread(fidin1,N,'float=>single',0,'l');
datin2 = fread(fidin2,N,'float=>single',0,'l');
%# the following line produces an error after 100
%# or more iterations in to the procedure
datout = 0.5*(datin1+datin2);
fwrite(fidout,datout,'float',0,'l');
end
UPDATE 1
The error message I'm receiving is:
???Error using ==> plus
Matrix dimension must agree.
UPDATE 2
I followed a suggestion and included ferrorchecks after each read and magically the problem went away. So now a modification to my questions: What could be the root of the problem here? Is this simply a timing issue or bug?
Here is a snip of the updated code (showing only a portion of the code). I'm sure there are better ways to do this. Regardless, the addition of these checks allowed Matlab to complete all the reads from each of the files successfully.
[datin1 count1]= fread(fidin1,N,'float=>single',0,'l');
[msg errn1]=ferror(fidin1);
if errn1
pos1 = ftell(fidin1);
error('Error at Position %d in file. %d bytes were read.',...
pos1,count1);
end
[datin2 count2]= fread(fidin2,N,'float=>single',0,'l');
[msg errn2]=ferror(fidin2);
if errn2
pos2 = ftell(fidin2);
error('Error at Position %d in file. %d bytes were read.',...
pos2,count2);
end
%# the following line produces an error after 100
%# or more iterations in to the procedure
datout = 0.5*(datin1+datin2);
fwrite(fidout,datout,'float',0,'l');

Have you specifically looked at the datin1 and datin2 variables at the time the error occurs? Try going to 'Debug-->Stop if Errors/Warnings...' then select 'Always stop if error (dstop if error)'. Run your program and then once it crashes, look at datin1 and datin2. Hopefully that will explain why adding them together is now working.

Related

(Matlab) Option to turn pause on and off from output callback of system(string)?

For an aerospace course aerelasticity I am doing an assignment with Nastran in Matlab (by using system(command) and bdf as input file).
I have attached a piece of my code as explanation. In this case the program Nastran produces a punch file (text) with displacements.
Currently the problem is that Matlab disregards the time Nastran needs for analysis to produce this punch file and continues on with the loop, however this punch file is not created yet so matlab turns out an error saying it does not exist and stops the loop.
I "have" a workaround for this by setting the pause times manually found from running it manually for increasing mesh sizes, this gives me at least some data on mesh convergence, however this is not a suitable method to use the rest of the assignment as it will take way too much time, therefore it must be automated.
I was thinking of setting a condition temporarily pausing the loop if the punch file does not exist and turning on again if it exists, however I got stuck with using a pause condition inside a while loop alltogether, it does not seem a solution to me.
Do you have any suggestions / ideas on what I could use / do how to get around this problem
or
know if there is a way to sent a callback from system(nastran) which i can use to create a condition to control the loop or something in that direction?
The following is a piece of code of the created function which turns out the Residual Mean squared error of the mesh which I use to see if the mesh converges:
%% Run Nastran
system('"C:\Users\$$$$\AppData\Roaming\MSC.Software\MSC_Nastran_Student_Edition\2014\Nastran\bin\nastranw.exe" nastranfile.bdf mem=1gb'); % Run the created bdf file
pause(15);
%% Read results and save all relevant results
fpc = fopen('nastranfile.pch','r')
line = 0;
for j=1:6;
line = line+1;
fgets(fpc);
end
line;
counter=0;
data = [];
while ~feof(fpc)
counter= counter+1;
str = fgets(fpc);
line=line+1;
str = str(61:73);
data(counter) = str2num(str)
fgets(fpc);
line=line+1;
end
line;
fclose(fpc);
% Find RMSE
mdl = fitlm(1:length(data),data);
RMEA = mdl.Rsquared.Adjusted;
RMSE = mdl.RMSE;

Looping over a Matlab structure

I am currently processing a bunch of files that I have imported into a structure, but have hit a bump in the road while trying to loop over the data.
First of all, here is my structure:
Ice
1.1 az160, az240, az300...
1.1.2 zen15, zen30,zen45...
1.1.2.1 Data
1.1.2.2 Textdata
I am trying to extract a value from each "textdata" cell array and use it to divide a column in data of the same structure. To do so, I am looping through the structure in the following way:
az_names = fieldnames(ice)
for m = 1:numel(az_names)
snames = fieldnames(ice.(az_names{m}))
for k = 1:numel(snames)
inttime = strrep(ice.(az_names{m}).(snames{k}).textdata(9,1), 'Integration > Time (usec): ','');
inttime = strrep(inttime, ' (USB2+E00040)','');
integration = cellfun(#str2num,inttime)
line 17 ice.(az_names{m}).(snames{k}).data(:,4) = ice.(az_names{m}).(snames{k}).data(:,3)/integration
end
end
I get the following error:
Index exceeds matrix dimensions
Edit: Matlab gives me the error at line 17. If I run the code up to "integration" and also write:
ice.(az_names{m}).(snames{k}).data(:,4)
I don't get a problem, Matlab prints to screen the right number and the data column.
I thought this would loop through each field in the structure and do the operation (dividing a column of values by a number), but I seem to be missing a point here. Can anybody see my mistake?
Regards,
If the error occurs when you try to execute this fragment:
ice.(az_names{m}).(snames{k}).data(:,4)
Then the cause is quite simple.
The variables m and k seem to be handled properly (due to the numel in the loop they should never be too big), meaning that the 4 is simply too big.
Check
ice.(az_names{m}).(snames{k}).data(:,4)
Or even more directly
size(ice.(az_names{m}).(snames{k}).data)
And you should find that the second element of the size is less than 4.
On second thought, this fragment works:
a.b=1;
a.b(:,4)=1
So I suspect that the error occurs when trying to read in this part:
ice.(az_names{m}). (snames{k}).data(:,3)
Meaning that the second element of the size should even be less than 3.
Also I would recommend removing the space.

Determine number of bytes in a line of a binary file Matlab

I am working on importing some data interpretation of binary files from Fortran to MATLAB and have come across a bit of an issue.
In the Fortran file I am working with the following check is performed
CHARACTER*72 PICTFILE
CHARACTER*8192 INLINE
INTEGER NPX
INTEGER NLN
INTEGER BYTES
c This is read from another file but I'll just hard code it for now
NPX = 1024
NLN = 1024
bytes=2
open(unit=10, file=pictfile, access='direct', recl=2*npx, status='old')
read(10,rec=nln, err=20) inline(1:2*npx)
go to 21
20 bytes=1
21 continue
close(unit=10)
where nln is the number of lines in the file being read, and npx is the number of integers contained in each line. This check basically determines whether each of those integers is 1 byte or 2 bytes. I understand the Fortran code well enough to figure that out, but now I need to figure out how to perform this check in MATLAB. I have tried using the fgetl command on the file and then reading the length of the characters contained but the length never seems to be more than 4 or 5 characters, when even if each integer is 1 byte the length should be somewhere around 1000.
Does someone know a way that I can automatically perform this check in MATLAB?
So what we figured out was that the check is simply to see if the file is the correct size. In Matlab this can be done as
fullpath=which(file); %extracting the full file path
s=dir(fullpath); %extracting information about hte file
fid=fopen(file_name,'r'); %opening image file
if s.bytes/NLN==2*NPX %if the file is NLN*NPX*2 bytes
for n=1:NLN %for each line
dn(n,:) = (fread(fid, NPX, '*uint16','b'))'; %reading in lines into DN
end
elseif s.bytes/NLN==NPX %Else if the file is NLN*NPX bytes
for n=1:NLN %for each line
dn(n,:) = (fread(fid, NPX, '*uint8','b'))'; %reading in lines into DN
end
else %If the file is neither something went wrong
error('Invalid file. The file is not the correct size specified by the SUM file')
end
where file contains the filename, nln contains the number of lines, and npx contains the number of columns. Hope this helps anyone who may have a similar answer, but be warned because this will only work if your file only contains data that has the same number of bytes for each entry, and if you know the total number of entries there should be!
Generally speaking, binary files don't have line lengths; only text files have line lengths. MATLAB's getl will read until it finds the binary equivalent of newline characters. It then removes them and returns the result. The binary file, on the other hand, should read a block of length 2*npx and return the result. It looks like you want to use fread to get a block of data like this:
inline = fread(fileID,2*npx)
Your fortran code is requesting to read record nln. If the code you have shared reads all the records starting at the first one and working up, then can just put the above code in a loop for nln=1:maxValue. However, if you really do want to yank out record nln you need to fseek to that position first:
fseek(fileID, nln*2*npx, -1);
inline = fread(fileID,2*npx)
so you get something like the following:
Either reading them all in a loop:
fileID = fopen(pictfile);
nln = 0;
while ~feof(fileID)
nln = nln+1;
inline = fread(fileID,2*npx);
end
fclose(fileID);
or picking out only the number `nln record:
fileID = fopen(pictfile);
nln = 7;
fseek(fileID, nln*2*npx, -1);
inline = fread(fileID,2*npx);
fclose(fileID);

Interpret the matlab code

I'm a Java programmer and have no background of matlab hence I'm really clueless with these lines of code from MATLAB. When I run the code I got an error :
??? Undefined function or variable 'nfile'.
Error in ==> texture_id at 29
fprintf(' \nneural network processing \n',nfile);
I understand that 'path' is a variable that stores string, 'demo' is boolean, but for the other lines, I don't want to assume what it does...Can you please help me and explain each lines?
Here's the code:
path = 'C:\Users\Dais\Documents\MATLAB\Data Sets\';
demo = true;
elfile = dir('*.jpg');
[lu ri] = size(elfile); feat=zeros(lu,29); nomf=cell(lu,1);
for nfi = 1:lu
nfile = elfile(nfi).name;
fprintf(' feature extraction file: %s \n',nfile);
nomf{nfi} = upper(nfile);
feat(nfi,:) = feature_ex([path nfile],demo);
end
fprintf(' \nneural network processing \n',nfile);
I would guess that whats happening here is that elfile = dir('*.jpg'); does not find any jpegs in the local directory and hence lu is empty and nfile is never populated. Place a breakpoint there in the code and check this. The way I would set up the loop would be something like this:
for nfi=1:numel(elfile)
As #Rody Oldenhuis said, use doc and help to elarn more about each function (or press F1 when the cursor is in the function name) but this should get you started..
%Looks for all files with extention .jpg in current directory
elfile = dir('*.jpg');
%lu and ri hold the rows, column lengths of elfile respectively
[lu ri] = size(elfile);
%creates an array of zeros of dimensions lu rows by 29 columns
feat=zeros(lu,29);
%creates an empty cell array (doc cell) dimensions lu rows by 1
nomf=cell(lu,1); columns
for nfi = 1:lu %look through all files
nfile = elfile(nfi).name; %get index nfi file
fprintf(' feature extraction file: %s \n',nfile); %print string
nomf{nfi} = upper(nfile); %upper case
feat(nfi,:) = feature_ex([path nfile],demo); %some external function
end
fprintf(' \nneural network processing \n',nfile); %print string
Rather than explain all and everything about MATLAB, I'll say this: MATLAB is interactive! And, one of the things why you pay good money for MATLAB, is that the documentation is awesome, and getting help is super easy.
For instance, you can type help <command> on the MATLAB command line, and get a short help on that command, or doc <command> to get the complete documentation, often with examples and demonstrations. The whole documentation is also online, should you prefer Google and being in a browser.
Should you have a script or function or class that has problems, you can issue dbstop if error, so that you drop into the debugger when an error occurs, and then you can view the contents of all variables just prior to the error, type new commands to investigate the error, etc. You can set breakpoints by clicking on the line number next to where you want to break, dbstep then makes a single step, dbup moves you up a level, etc. Have a look at doc dbstop.
You can select portions of code and press F9, which will execute those lines of code. Note that that is equivalent to copy-pasting the code to the command window and running it, so you will often have problems with undefined variables (and similar problems) that way (this or something similar is what I suspect happened in your particular case, as the code you posted should not give that error).

MATLAB error message when plotting big data files?

I have written code to plot data from very large .txt files (20Gb to 60Gb). The .txt files contain two columns of data, that represent the outputs of two sensors from an experiment that I did. The reason the data files are so large is that the data was recorded at 4M samples/s.
The code works well for plotting relatively small .txt files (10Gb), however when I try to plot my larger data files (60Gb) I get the following error message:
Attempted to access TIME(0); index must be a
positive integer or logical.
Error in textscan_loop (line 17)
TIME =
((TIME(end)+sample_rate):sample_rate:(sample_rate*(size(d,1)))+(TIME(end)));%shift
Time along
The basic idea behind my code is to conserve RAM by reading Nlines of data from .txt on disk to Matlab variable C in RAM, plotting C then clearing C. This process occurs in loop so the data is plotted in chunks until the end of the .txt file is reached. The code can be found below:
Nlines = 1e6; % set numbe of lines to sample per cycle
sample_rate = (1); %sample rate
DECE= 1000;% decimation factor
TIME = (0:sample_rate:sample_rate*((Nlines)-1));%first inctance of time vector
format = '%f\t%f';
fid = fopen('H:\PhD backup\Data/ONK_PP260_G_text.txt');
while(~feof(fid))
C = textscan(fid, format, Nlines, 'CollectOutput', true);
d = C{1}; % immediately clear C at this point you need the memory!
clearvars C ;
TIME = ((TIME(end)+sample_rate):sample_rate:(sample_rate*(size(d,1)))+(TIME(end)));%shift Time along
plot((TIME(1:DECE:end)),(d(1:DECE:end,:)))%plot and decimate
hold on;
clearvars d;
end
fclose(fid);
I think the while loop does around 110 cycles before the code stops executing and the error message is displayed, I know this because the graph shows around 110e7 data points and the loop processes 1e6 data points at a time.
If anyone knows why this error might be occurring please let me know.
Cheers,
Jim
The error that you encounter is in fact not in the plotting, but in the line of reference.
Though I have been unable to reproduce the exact error, I suspect it to be related to this:
Time = 1:0
Time(end)
In any case, the way forward is clear. You need to run this code with dbstop if error and observe all relevant variables in the line that throws the error.
From here you will likely figure out what is causing the problem, hopefully just something simple like your code being unable to deal with data size that is an exact multiple of 1000 or so.
Trying to use plot for big data is problematic as matlab is trying to plot every single data point.
Obviously the screen will not display all of these points (many will overlap), and therefore it is recommended to plot only the relevant points. One could subsample and do this manually as you seem to have tried, but fortunately we have a ready to use solution for this:
The Plot (Big) File Exchange Submission
Here is the introduction:
This simple tool intercepts data going into a plot and reduces it to
the smallest possible set that looks identical given the number of
pixels available on the screen. It then updates the data as a user
zooms or pans. This is useful when a user must plot a very large
amount of data and explore it visually.
This works with MATLAB's built-in line plot functions, allowing the
functionality of those to be preserved.
Instead of:
plot(t, x);
One could use:
reduce_plot(t, x);
Most plot options, such as multiple series and line properties, can be
passed in too, such that 'reduce_plot' is largely a drop-in
replacement for 'plot'.
h = reduce_plot(t, x(1, :), 'b:', t, x(2, :), t, x(3, :), 'r--*');
This function works on plots where the "x" data is always increasing,
which is the most common, such as for time series.