I have just written out a file:
real*8 :: vol_cel
real*8, dimension(256,256,256) :: dense
[... some operations]
open(unit=8,file=fname,form="unformatted")
write(8)dense(:,:,:)/vol_cell
close(8)
My code to read this in in Matlab:
fid = fopen(fname,'r');
mesh_raw = fread(fid,256*256*256,'double');
fclose(fid);
The min and max values clearly show that it is not reading it in correctly (Min is 0 and max is a largish positive real*8).
min =
3.3622e+38
max =
-3.3661e+38
What precision do I need to set in Matlab to make it read in the unformatted Fortran file?
A somewhat related question: This Matlab code I am using reads binary files OK but not unformatted files. Though I am generating this new data on my Mac OSX using gfortran. It doesn't recognize form="binary" so I can't do it that way. Do I need to add some library or this an endian problem?
===== Progress =====
OK progress. Instead of my ndim*ndim*ndim matrix I just wrote out the x values (column vector) as such:
open(unit=8,file=fnamex,form="unformatted")
write(8)x0
close(8)
Then Matlab reads:
fid = fopen(nfilename,'r');
hr3=fread(fid, 1, 'int32');
x0 = fread(fid,Ntot,'float32');
hr4=fread(fid, 1, 'int32');
fclose(fid);
THAT worked. Then I tried the original ndim**3 matrix, I tried to read with:
fid = fopen(nfilename,'r');
hr3=fread(fid, 1, 'int32');
mesh_raw = fread(fid,ndim*ndim*ndim,'float32');
hr4=fread(fid, 1, 'int32');
fclose(fid);
But that gives me garbage. Perhaps here:
real*4, dimension(:), allocatable :: x0
real*8, dimension(256,256,256) :: dense
Do I need to change: mesh_raw = fread(fid,ndim*ndim*ndim,'float32'); to make sure it is reading a real*8? What would that be? Surely just using 'real*8' verbatim should work? I mean 'real*4' for the x vector works. I mean it reads "dense" but the min/max/average values are wrong.
Your Fortran code shows you writing what is known as an unformatted sequential file. This is a record based file format. Typical implementation (Fortran compiler/platform specific) is for the compiler to write addition record structure information to the file - often (gfortran included) the record length is written at the start and end of each record. Your original Matlab code does not appear to take that into account.
Fortran 2003 introduced stream access (add the ACCESS='STREAM' specifier to the OPEN statement). gfortran supports this feature, FORM='BINARY' is a non-standard synonym on some compilers. A unformatted file created with stream access has no record structure - it is a stream of bytes akin to a C stream. This may be more appropriate for you.
This is most likely an endian problem, as a rough ordered guess put a much more reasonable number on my part. I'm not sure what the solution is exaclty, so I'm going to give you 3 possible solutions, one of which should fix the problem. It depends on your source file.
The trick is simply to change the fopen statement to one of the following:
fid = fopen(fname,'rn'); %Native format (Default usually)
fid = fopen(fname,'rl'); %Little Endian
fid = fopen(fname,'rb'); %Big Endian
fid = fopen(nfilename,'r');
hr3=fread(fid, 1, 'int32');
mesh_raw = fread(fid,ndim*ndim*ndim,'float32');
hr4=fread(fid, 1, 'int32');
fclose(fid);
this is correct, except since you are writing real*8 in fortran, you need to have
mesh_raw = fread(fid,ndim*ndim*ndim,'double');
Related
I have been struggling with this bug. When using MATLAB to read a binary file that contains three columns of numbers in float formats.
I am reading one number at a time using this line.
pt(j) = fread(fid,1,'float','a');
I have found that sometimes (rarely) MATLAB instead of reading four bytes for a float, it uses 5 bytes. And it misses up the rest of the reading. I am not sure if the file is corrupted or MATLAB has a bug there. When I printed the file as a txt and read it in txt everything works well.
As a work around here is what I did:
cur = ftell(fid);
if (cur - prev)~= 4
pt(j) = 0; % I m throwing this reading away for the sake of saving the rest of the data. This is not ideal
cur = prev +4;
fseek(fid, cur,'bof');
end
prev = cur;
I tried different combinations of different formats float32 float64 etc... nothing works MATLAB always read 5 bytes instead of 4 at this particular location.
EDIT:
To solve it based on Chris's answer. I was using this command to open the file.
fid = fopen(fname,'rt');
I replaced it with this
fid = fopen(fname,'r');
Sometimes, rarely, skipping a byte. It sounds to me like you are on Windows, and have opened the file in text mode. See the permissions parameter to the fopen function.
When opening a file in text mode on Windows, the sequence \r\n (13,10) is replaced with \n (10). This happens before fread gets to it.
So, when opening the file, don't do:
fid = fopen('name', 'rt');
The t here indicates "text". Instead, do:
fid = fopen('name', 'r');
To make this explicit, you can add b to the permissions. This is not documented, but is supposed to mean "binary", and makes the call similar to what you'd do in C or in the POSIX fopen():
fid = fopen('name', 'rb');
Here is the code:
x = rand(5)*100;
save('pqfile.txt','x','-ascii','-tabs')
The above works, but:
x = rand(5)*100;
x = uint8(x);
save('pqfile.txt','x','-ascii','-tabs')
says:
Warning: Attempt to write an unsupported data type to an ASCII file.
Variable 'x' not written to file.
Does anyone know why this happens? How come I can't save the data when it is uint8. I have to read data into a VHDL testbench so was experimenting. I guess the only option is to save my 8 bit unsigned integer values in 2d array using printf then read into the test bench.
ASCII option
The save method is somewhat restrictive in what it can support, and then it uses floating point notation to represent your numbers which bloats your file when dealing with a limited range of numbers like you are (i.e. uint8, 0 to 255).
Check out dlmwrite as an alternative (documentation here).
It takes the filename to write/save to, the variable to store, and some additional parameters, like the delimiter you want to separate your values with.
For your example, it looks like this
x = rand(5)*100;
x = uint8(x);
dlmwrite('pqfile.txt',x,'\t');
Binary option
If you are looking to stored your uint8 data as single bytes then you probably want go with a custom binary file instead instead of ASCII. (Yes, you can convert uint8 to single ASCII characters but you run into issues with these values being interpreted with your delimiters; newlines or tabs.)
fid=fopen('pqfile.dat','wb');
if(fid>2)
fwrite(fid,size(x),'*uint8'); % Note: change data type here you are dealing with more than 255 rows or columns
fwrite(fid,x','*uint8'); % Transpose x (with x') so it is stored in row order.
fclose(fid);
else
fprintf(1,'Could not open the file for writing.\n');
end
I'm not sure what type of parser you are using for your VHDL, but this will pack your data into a file with a short header of the expected dimensions followed by one long row of your serialized data.
To read it back in with MATLAB, you can do this:
fid = fopen('pqfile.dat','rb');
szX = fread(fid,2,'uint8');
x = fread(fid,szX,'*uint8')'; % transpose back if you are dealing with matlab.
fclose(fid);
The transpose operations are necessary for MATLAB because it reads data column-wise, whereas most other languages (in my experience) read row-wise.
I have a binary square matrix with complex values, stored in a .bin format file. I have tried to read this 100-by-100 matrix with a Matlab script:
i=fopen('matrix.bin','r')
A=fread(i,[100 100]
This code does not correctly read the complex values contained in A. I only get a 100-by-100 matrix of integers.
MATLAB fread support ANSI C types, but there is no native ANSI C types that represent complex numbers. Most likely, a complex number is stored as a pair of real and imaginary numbers.
Without information as to how the binary file is saved, you can still perform some test to figure this out. If the complex number is represented as a real part and an imaginary part, and both in double precision, then a single complex number would take up 8 + 8 = 16 bytes. We can test this by navigating to the end of the file, and see how many bytes there are.
fID = fopen('matrix.bin','r')
fseek(fID, 0, 'eof') % Go to the end of file
ftell(fID) % Tell current position in the open file
fclose(fID)
If this number is equal to 16 * 100 * 100 = 160000, then you're in luck. There is no extra stuff saved in this file, and you can simply read the data by this code:
fID = fopen('matrix.bin','r')
data = []
for ii = 1:10000
data = [data; fread(fID, 2, 'double')']
end
fclose(fID)
You'll end up with a 10000*2 array, with each row representing a complex number. If the file size is 80000, then both real and imaginary part could be saved in single data type. If file size is some other number, then it probably means some additional information is stored in the binary. You'll have to know what additional information is stored so you can read the file correctly.
I have decided to use memmapfile because my data (typically 30Gb to 60Gb) is too big to fit in a computer's memory.
My data files consist two columns of data that correspond to the outputs of two sensors and I have them in both .bin and .txt formats.
m=memmapfile('G:\E-Stress Research\Data\2013-12-18\LD101_3\EPS/LD101_3.bin','format','int32')
m.data(1)
I used the above code to memory map my data to a variable "m" but I have no idea what data format to use (int8', 'int16', 'int32', 'int64','uint8', 'uint16', 'uint32', 'uint64', 'single', and 'double').
In fact I tried all of the data formats listed that MATLAB supports, but when I used the m.data(index number) I never get a pair of numbers (2 columns of data) which is what I expected, also the number will be different depending on the format I used.
If anyone has experience with memmapfile please help me.
Here are some smaller versions of my data files so people can understand how my data is structured:
cheers
James
memmapfile is designed for reading binary files, that's why you are having trouble with your text file. The data in there is characters, so you'll have to read them as characters and then parse them into numbers. More on that below.
The binary file appears to contain more than just a stream of floating point values written in binary format. I see identifiers (strings) and other things in the file as well. Your only hope of reading that is to contact the manufacturer of the device that created the binary file and ask them about how to read in such files. There'll probably be an SDK, or at least a description of the format. You might want to look into this as the floating point numbers in your text file might be truncated, i.e., you have lost precision compared to directly reading the binary representation of the floats.
Ok, so how to read your file with memmapfile? This post provides some hints.
So first we open your file as 'uint8' (note there is no 'char' option, so as a workaround we read the content of the file into a datatype of the same size):
m = memmapfile('RTL5_57.txt','Format','uint8'); % uint8 is default, you could leave that off
We can render the data read in as uint8 as characters by casting it to char:
c = char(m.Data(1:19)).' % read the first three lines. NB: transpose just for getting nice output, don't use it in your code
c =
0.398516 0.063440
0.399611 0.063284
0.398985 0.061253
As each line in your file has the same length (2*8 chars for the numbers, 1 tab and 2 chars for newline = 19 chars), we can read N lines from the file by reading N*19 values. So m.Data(1:19) gets you the first line, m.Data(20:38), the second line, and m.Data(20:57) the second and third lines. Read as much as you want at once.
Then we'll have to parse the read-in data into floating point numbers:
f = sscanf(c,'%f')
f =
0.3985
0.0634
0.3996
0.0633
0.3990
0.0613
All that's left now is to reshape them into your two column format
d = reshape(f,2,[]).'
d =
0.3985 0.0634
0.3996 0.0633
0.3990 0.0613
Easier ways than using memmapfile:
You don't need to use memmapfile to solve your problem, and I think it makes things more complicated. You can simply use fopen followed by fread:
fid = fopen('RTL5_57.txt');
c = fread(fid,Nlines*19,'*char');
% now sscanf and reshape as above
% NB: one can read the values the text file directly with f = fscanf(fid,'%f',Nlines*19).
% However, in testing, I have found calling fread followed by sscanf to be faster
% which will make a significant difference when reading such large files.
Using this you can read Nlines pairs of values at a time, process them and simply call fread again to read the next Nlines. fread remembers where it is in the file (as does fscanf), so simply use same call to get next lines. Its thus easy to write a loop to process the whole file, testing with feof(fid) if you are at the end of the file.
An even easier way is suggested here: use textscan. To slightly adapt their example code:
Nlines = 10000;
% describe the format of the data
% for more information, see the textscan reference page
format = '%f\t%f';
fid = fopen('RTL5_57.txt');
while ~feof(fid)
C = textscan(fid, format, Nlines, 'CollectOutput', true);
d = C{1}; % immediately clear C at this point if you need the memory!
% process d
end
fclose(fid);
Note again however that the fread followed by sscanf will be fastest. Note however that the fread method would die as soon as there is one line in the text file that doesn't exactly match your format. textscan is forgiving of whitespace changes on the other hand and thus more robust.
My knowledge of matlab is merely on a need to know basis, so this is probably an elementary question. Nevertheless here it comes:
I have got a file containing data (16-bit integers) stored in binary format. How do I read it into a vector /an array in matlab? How do I write this data to a file in matlab? Is there any smart tweak to increase the performance speed when reading/writing a huge amount of data (gigabytes)?
As Bill the Lizard wrote you can use fread to load the data into a vector. I just want to expand a little on his answer.
Reading Data
>> fid=fopen('data.bin','rb') % opens the file for reading
>> A = fread(fid, count, 'int16') % reads _count_ elements and stores them in A.
The commands fopen and fread default to Little-endian[1] encoding for the integers. If your file is Big-endian encoded you will need to change the fread to
>> A = fread(fid, count, 'int16', 'ieee-be');
Also, if you want to read the whole file set
>> count=inf;
and if you want to read the data into matrix with n columns use
>> count=[n inf];
Writing Data
As for witting the data to a file. The command, fwrite, in Bill's answer will write to a binary file. If you want to write the data to a text file you can use dlmwrite
>> dlmwrite('data.csv',A,',');
References
[1] http://en.wikipedia.org/wiki/Endianness
Update
The machine format (IE, ieee-be,
ieee-le, vaxd etc.) of the binary data can be specified in either the
fopen or the fread commands in Matlab. Details of the supported
machine format can be found in
Matlab's documentation of fopen.
Scott French's comment to Bill's
answer
suggests reading the data into an
int16 variable. To do this use
>> A = int16(fread(fid,count,precision,machineFormat));
where count is the size/shape of
the data to be read, precision is
the data format, and machineformat
is the encoding of each byte.
See commands fseek to move around the file. For example,
>> fseek(fid,0,'bof');
will rewind the file to the beginning where bof stands for beginning of file.
Assuming you know how many values you have stored in the file, you can do something like this to read the data into an array.
fid = fopen('data.bin','rb')
A = fread(fid, count, 'int16')
To write data to a file do this:
fid = fopen('data.bin','w')
count = fwrite(fid, A, 'int16')
The fwrite function returns the number of elements (not bytes) written to the file.
As far as performance tuning goes, you can read data in chunks to only use as much as you need to process. This is the same in any language, and there's no way to speed it up that's specific to Matlab.
I usually hate seeing links in a response, but this looks pretty close:
http://www.mathworks.com/support/tech-notes/1400/1403.html
As to the second part of performance tuning, it's been 6 years since I've used Matlab, so I don't know.
HTH