how to read binary format byte by byte in MATLAB - matlab

I have been struggling with this bug. When using MATLAB to read a binary file that contains three columns of numbers in float formats.
I am reading one number at a time using this line.
pt(j) = fread(fid,1,'float','a');
I have found that sometimes (rarely) MATLAB instead of reading four bytes for a float, it uses 5 bytes. And it misses up the rest of the reading. I am not sure if the file is corrupted or MATLAB has a bug there. When I printed the file as a txt and read it in txt everything works well.
As a work around here is what I did:
cur = ftell(fid);
if (cur - prev)~= 4
pt(j) = 0; % I m throwing this reading away for the sake of saving the rest of the data. This is not ideal
cur = prev +4;
fseek(fid, cur,'bof');
end
prev = cur;
I tried different combinations of different formats float32 float64 etc... nothing works MATLAB always read 5 bytes instead of 4 at this particular location.
EDIT:
To solve it based on Chris's answer. I was using this command to open the file.
fid = fopen(fname,'rt');
I replaced it with this
fid = fopen(fname,'r');

Sometimes, rarely, skipping a byte. It sounds to me like you are on Windows, and have opened the file in text mode. See the permissions parameter to the fopen function.
When opening a file in text mode on Windows, the sequence \r\n (13,10) is replaced with \n (10). This happens before fread gets to it.
So, when opening the file, don't do:
fid = fopen('name', 'rt');
The t here indicates "text". Instead, do:
fid = fopen('name', 'r');
To make this explicit, you can add b to the permissions. This is not documented, but is supposed to mean "binary", and makes the call similar to what you'd do in C or in the POSIX fopen():
fid = fopen('name', 'rb');

Related

Read hex from file matlab

What I have
A txt file like:
D091B
E7E1F
20823
...
What I need
To read them and store them like char, just as they are in the file: N (don't knot how many) lines, with its 5 characters (5 columns) at each one.
What have I tried
fichero = fopen('PS.txt','r');
sizeDatos = [[] 5]; % Several Options, read below
resultados=fscanf(fichero, '%s', sizeDatos); % Here too
fclose(fichero);
I've tried with the snippet above, to read my txt file. However, I didn't manage to get it. Most I've obtained is, using:
sizeDatos = [1 Inf];
So I got all my hex characters into an array, with no spaces.
As you can see, I've tried several optios changing fscanf size parameter, as well as trying to say into the format chain that it should recognize new lines by using \n for example. None of them have worked for me.
Any idea about how can I get it? I've readed fscanf page from documentation, but it didn't inspire me to make anything different.
One possible solution is using textscan and convert it to a cell array.
fileId = fopen('PS.txt');
C = textscan(fileId, '%s');
Now to show the content of cell you can use
celldisp(C)
Or you can convert it to other types.
Don't forget to close your file after using it.

How to delete first block of bytes of a file in matlab

I want to delete first block of bytes in a file in matlab (ex: delete first 50 Byte of a text file)
is that possible in matlab?? if so, how to achieve that??
Do you want to do this with or without loading the file into memory? If you can do this in memory, one possible way is to read in the file with fseek and fread, skip the first few bytes, read the rest of the data into memory and save that back into a new file using fwrite.
In Linux / Mac OS, there are efficient ways to do this without having to load the file in memory. For example, see here: https://unix.stackexchange.com/questions/6852/best-way-to-remove-bytes-from-the-start-of-a-file
However, if you're in Windows, you can't escape doing a byte copy which ultimately means doing this in memory. From what I have seen with Windows, the only way is to do a byte copy where the input pointer starts at however many bytes you want to skip over.
See for example here: What is the most efficient way to remove first N bytes from a file on Windows?, and also here: http://blogs.msdn.com/b/oldnewthing/archive/2010/12/01/10097859.aspx
With these posts, you don't have a choice but to do a byte copy. Therefore, if you want to simulate the same in MATLAB, you'll have to do what I said above.
Since you're working in MATLAB, here is some example code to do what I have outlined above:
fid = fopen('data', 'r'); %// Open up data file
fid2 = fopen('dataout', 'w'); %// File to save - new file with skipped bytes
skip = 50; %// Determine how many bytes you want to skip over
fseek(fid1, skip, 'bof'); %// Skip over bytes - 'bof' means from beginning of file
A = fread(fid1); %// Read the data
fwrite(fid2, A); %// Write data to new file
%// Close the files
fclose(fid);
fclose(fid2);

Determine number of bytes in a line of a binary file Matlab

I am working on importing some data interpretation of binary files from Fortran to MATLAB and have come across a bit of an issue.
In the Fortran file I am working with the following check is performed
CHARACTER*72 PICTFILE
CHARACTER*8192 INLINE
INTEGER NPX
INTEGER NLN
INTEGER BYTES
c This is read from another file but I'll just hard code it for now
NPX = 1024
NLN = 1024
bytes=2
open(unit=10, file=pictfile, access='direct', recl=2*npx, status='old')
read(10,rec=nln, err=20) inline(1:2*npx)
go to 21
20 bytes=1
21 continue
close(unit=10)
where nln is the number of lines in the file being read, and npx is the number of integers contained in each line. This check basically determines whether each of those integers is 1 byte or 2 bytes. I understand the Fortran code well enough to figure that out, but now I need to figure out how to perform this check in MATLAB. I have tried using the fgetl command on the file and then reading the length of the characters contained but the length never seems to be more than 4 or 5 characters, when even if each integer is 1 byte the length should be somewhere around 1000.
Does someone know a way that I can automatically perform this check in MATLAB?
So what we figured out was that the check is simply to see if the file is the correct size. In Matlab this can be done as
fullpath=which(file); %extracting the full file path
s=dir(fullpath); %extracting information about hte file
fid=fopen(file_name,'r'); %opening image file
if s.bytes/NLN==2*NPX %if the file is NLN*NPX*2 bytes
for n=1:NLN %for each line
dn(n,:) = (fread(fid, NPX, '*uint16','b'))'; %reading in lines into DN
end
elseif s.bytes/NLN==NPX %Else if the file is NLN*NPX bytes
for n=1:NLN %for each line
dn(n,:) = (fread(fid, NPX, '*uint8','b'))'; %reading in lines into DN
end
else %If the file is neither something went wrong
error('Invalid file. The file is not the correct size specified by the SUM file')
end
where file contains the filename, nln contains the number of lines, and npx contains the number of columns. Hope this helps anyone who may have a similar answer, but be warned because this will only work if your file only contains data that has the same number of bytes for each entry, and if you know the total number of entries there should be!
Generally speaking, binary files don't have line lengths; only text files have line lengths. MATLAB's getl will read until it finds the binary equivalent of newline characters. It then removes them and returns the result. The binary file, on the other hand, should read a block of length 2*npx and return the result. It looks like you want to use fread to get a block of data like this:
inline = fread(fileID,2*npx)
Your fortran code is requesting to read record nln. If the code you have shared reads all the records starting at the first one and working up, then can just put the above code in a loop for nln=1:maxValue. However, if you really do want to yank out record nln you need to fseek to that position first:
fseek(fileID, nln*2*npx, -1);
inline = fread(fileID,2*npx)
so you get something like the following:
Either reading them all in a loop:
fileID = fopen(pictfile);
nln = 0;
while ~feof(fileID)
nln = nln+1;
inline = fread(fileID,2*npx);
end
fclose(fileID);
or picking out only the number `nln record:
fileID = fopen(pictfile);
nln = 7;
fseek(fileID, nln*2*npx, -1);
inline = fread(fileID,2*npx);
fclose(fileID);

Memory map file in MATLAB?

I have decided to use memmapfile because my data (typically 30Gb to 60Gb) is too big to fit in a computer's memory.
My data files consist two columns of data that correspond to the outputs of two sensors and I have them in both .bin and .txt formats.
m=memmapfile('G:\E-Stress Research\Data\2013-12-18\LD101_3\EPS/LD101_3.bin','format','int32')
m.data(1)
I used the above code to memory map my data to a variable "m" but I have no idea what data format to use (int8', 'int16', 'int32', 'int64','uint8', 'uint16', 'uint32', 'uint64', 'single', and 'double').
In fact I tried all of the data formats listed that MATLAB supports, but when I used the m.data(index number) I never get a pair of numbers (2 columns of data) which is what I expected, also the number will be different depending on the format I used.
If anyone has experience with memmapfile please help me.
Here are some smaller versions of my data files so people can understand how my data is structured:
cheers
James
memmapfile is designed for reading binary files, that's why you are having trouble with your text file. The data in there is characters, so you'll have to read them as characters and then parse them into numbers. More on that below.
The binary file appears to contain more than just a stream of floating point values written in binary format. I see identifiers (strings) and other things in the file as well. Your only hope of reading that is to contact the manufacturer of the device that created the binary file and ask them about how to read in such files. There'll probably be an SDK, or at least a description of the format. You might want to look into this as the floating point numbers in your text file might be truncated, i.e., you have lost precision compared to directly reading the binary representation of the floats.
Ok, so how to read your file with memmapfile? This post provides some hints.
So first we open your file as 'uint8' (note there is no 'char' option, so as a workaround we read the content of the file into a datatype of the same size):
m = memmapfile('RTL5_57.txt','Format','uint8'); % uint8 is default, you could leave that off
We can render the data read in as uint8 as characters by casting it to char:
c = char(m.Data(1:19)).' % read the first three lines. NB: transpose just for getting nice output, don't use it in your code
c =
0.398516 0.063440
0.399611 0.063284
0.398985 0.061253
As each line in your file has the same length (2*8 chars for the numbers, 1 tab and 2 chars for newline = 19 chars), we can read N lines from the file by reading N*19 values. So m.Data(1:19) gets you the first line, m.Data(20:38), the second line, and m.Data(20:57) the second and third lines. Read as much as you want at once.
Then we'll have to parse the read-in data into floating point numbers:
f = sscanf(c,'%f')
f =
0.3985
0.0634
0.3996
0.0633
0.3990
0.0613
All that's left now is to reshape them into your two column format
d = reshape(f,2,[]).'
d =
0.3985 0.0634
0.3996 0.0633
0.3990 0.0613
Easier ways than using memmapfile:
You don't need to use memmapfile to solve your problem, and I think it makes things more complicated. You can simply use fopen followed by fread:
fid = fopen('RTL5_57.txt');
c = fread(fid,Nlines*19,'*char');
% now sscanf and reshape as above
% NB: one can read the values the text file directly with f = fscanf(fid,'%f',Nlines*19).
% However, in testing, I have found calling fread followed by sscanf to be faster
% which will make a significant difference when reading such large files.
Using this you can read Nlines pairs of values at a time, process them and simply call fread again to read the next Nlines. fread remembers where it is in the file (as does fscanf), so simply use same call to get next lines. Its thus easy to write a loop to process the whole file, testing with feof(fid) if you are at the end of the file.
An even easier way is suggested here: use textscan. To slightly adapt their example code:
Nlines = 10000;
% describe the format of the data
% for more information, see the textscan reference page
format = '%f\t%f';
fid = fopen('RTL5_57.txt');
while ~feof(fid)
C = textscan(fid, format, Nlines, 'CollectOutput', true);
d = C{1}; % immediately clear C at this point if you need the memory!
% process d
end
fclose(fid);
Note again however that the fread followed by sscanf will be fastest. Note however that the fread method would die as soon as there is one line in the text file that doesn't exactly match your format. textscan is forgiving of whitespace changes on the other hand and thus more robust.

reading unformatted fortran file in matlab - which precision?

I have just written out a file:
real*8 :: vol_cel
real*8, dimension(256,256,256) :: dense
[... some operations]
open(unit=8,file=fname,form="unformatted")
write(8)dense(:,:,:)/vol_cell
close(8)
My code to read this in in Matlab:
fid = fopen(fname,'r');
mesh_raw = fread(fid,256*256*256,'double');
fclose(fid);
The min and max values clearly show that it is not reading it in correctly (Min is 0 and max is a largish positive real*8).
min =
3.3622e+38
max =
-3.3661e+38
What precision do I need to set in Matlab to make it read in the unformatted Fortran file?
A somewhat related question: This Matlab code I am using reads binary files OK but not unformatted files. Though I am generating this new data on my Mac OSX using gfortran. It doesn't recognize form="binary" so I can't do it that way. Do I need to add some library or this an endian problem?
===== Progress =====
OK progress. Instead of my ndim*ndim*ndim matrix I just wrote out the x values (column vector) as such:
open(unit=8,file=fnamex,form="unformatted")
write(8)x0
close(8)
Then Matlab reads:
fid = fopen(nfilename,'r');
hr3=fread(fid, 1, 'int32');
x0 = fread(fid,Ntot,'float32');
hr4=fread(fid, 1, 'int32');
fclose(fid);
THAT worked. Then I tried the original ndim**3 matrix, I tried to read with:
fid = fopen(nfilename,'r');
hr3=fread(fid, 1, 'int32');
mesh_raw = fread(fid,ndim*ndim*ndim,'float32');
hr4=fread(fid, 1, 'int32');
fclose(fid);
But that gives me garbage. Perhaps here:
real*4, dimension(:), allocatable :: x0
real*8, dimension(256,256,256) :: dense
Do I need to change: mesh_raw = fread(fid,ndim*ndim*ndim,'float32'); to make sure it is reading a real*8? What would that be? Surely just using 'real*8' verbatim should work? I mean 'real*4' for the x vector works. I mean it reads "dense" but the min/max/average values are wrong.
Your Fortran code shows you writing what is known as an unformatted sequential file. This is a record based file format. Typical implementation (Fortran compiler/platform specific) is for the compiler to write addition record structure information to the file - often (gfortran included) the record length is written at the start and end of each record. Your original Matlab code does not appear to take that into account.
Fortran 2003 introduced stream access (add the ACCESS='STREAM' specifier to the OPEN statement). gfortran supports this feature, FORM='BINARY' is a non-standard synonym on some compilers. A unformatted file created with stream access has no record structure - it is a stream of bytes akin to a C stream. This may be more appropriate for you.
This is most likely an endian problem, as a rough ordered guess put a much more reasonable number on my part. I'm not sure what the solution is exaclty, so I'm going to give you 3 possible solutions, one of which should fix the problem. It depends on your source file.
The trick is simply to change the fopen statement to one of the following:
fid = fopen(fname,'rn'); %Native format (Default usually)
fid = fopen(fname,'rl'); %Little Endian
fid = fopen(fname,'rb'); %Big Endian
fid = fopen(nfilename,'r');
hr3=fread(fid, 1, 'int32');
mesh_raw = fread(fid,ndim*ndim*ndim,'float32');
hr4=fread(fid, 1, 'int32');
fclose(fid);
this is correct, except since you are writing real*8 in fortran, you need to have
mesh_raw = fread(fid,ndim*ndim*ndim,'double');