Loading huge binary file partially into Matlab - matlab

I have a huge binary file with double precision numbers and I would like to load parts of it into Matlab. Is there a way to do this?
One way would be if I could convert it to a .mat file (without loading it in Matlab first), but I haven't been able to figure out how (or if it's actually possible).
Any ideas?
PS: I was thinking of using c++ to do the conversion but it turns out this is really problematic because I'm using a linux version of c++ (through cygwin) and a windows version of Matlab.

If you know what parts of the file you want to load, you can use fseek followed by fread (both preceded by fopen, of course).
For example, jump a few thousand bytes into a file and read a certain number of bytes, as doubles:
fid = fopen('binary.dat','r');
fseek(fid, 3000, 'bof');
A = fread(fid, N, 'double');
fclose(A); % don't forget to close the file
See the section of documentation called Reading Portions of a File for more information.

Related

matlab cannot read text file containing ^* as power of 10

I need to read text files into Matlab. In the text files there are numbers like 5.875489^*-6, which is indeed 0.000005875489. Matlab cannot read this format and since there are too many files, I cannot change the format in all the files manually. So, I wonder if there is any tips to make Matlab reading the files as they are?
Any help and guide is highly appreciated.
Marilla.
As pointed out by #vu1p3n0x, it would probably be easier to replace ^* by e using a replace-all. Alternatively, if that is unpractical, you could read in the the mantissa and exponent separately and perform the exponentiation in Matlab:
Raw = textscan(fid, '%f^*%f');
Result = Raw{1}.*10.^Raw{2};

How to delete first block of bytes of a file in matlab

I want to delete first block of bytes in a file in matlab (ex: delete first 50 Byte of a text file)
is that possible in matlab?? if so, how to achieve that??
Do you want to do this with or without loading the file into memory? If you can do this in memory, one possible way is to read in the file with fseek and fread, skip the first few bytes, read the rest of the data into memory and save that back into a new file using fwrite.
In Linux / Mac OS, there are efficient ways to do this without having to load the file in memory. For example, see here: https://unix.stackexchange.com/questions/6852/best-way-to-remove-bytes-from-the-start-of-a-file
However, if you're in Windows, you can't escape doing a byte copy which ultimately means doing this in memory. From what I have seen with Windows, the only way is to do a byte copy where the input pointer starts at however many bytes you want to skip over.
See for example here: What is the most efficient way to remove first N bytes from a file on Windows?, and also here: http://blogs.msdn.com/b/oldnewthing/archive/2010/12/01/10097859.aspx
With these posts, you don't have a choice but to do a byte copy. Therefore, if you want to simulate the same in MATLAB, you'll have to do what I said above.
Since you're working in MATLAB, here is some example code to do what I have outlined above:
fid = fopen('data', 'r'); %// Open up data file
fid2 = fopen('dataout', 'w'); %// File to save - new file with skipped bytes
skip = 50; %// Determine how many bytes you want to skip over
fseek(fid1, skip, 'bof'); %// Skip over bytes - 'bof' means from beginning of file
A = fread(fid1); %// Read the data
fwrite(fid2, A); %// Write data to new file
%// Close the files
fclose(fid);
fclose(fid2);

Memory map file in MATLAB?

I have decided to use memmapfile because my data (typically 30Gb to 60Gb) is too big to fit in a computer's memory.
My data files consist two columns of data that correspond to the outputs of two sensors and I have them in both .bin and .txt formats.
m=memmapfile('G:\E-Stress Research\Data\2013-12-18\LD101_3\EPS/LD101_3.bin','format','int32')
m.data(1)
I used the above code to memory map my data to a variable "m" but I have no idea what data format to use (int8', 'int16', 'int32', 'int64','uint8', 'uint16', 'uint32', 'uint64', 'single', and 'double').
In fact I tried all of the data formats listed that MATLAB supports, but when I used the m.data(index number) I never get a pair of numbers (2 columns of data) which is what I expected, also the number will be different depending on the format I used.
If anyone has experience with memmapfile please help me.
Here are some smaller versions of my data files so people can understand how my data is structured:
cheers
James
memmapfile is designed for reading binary files, that's why you are having trouble with your text file. The data in there is characters, so you'll have to read them as characters and then parse them into numbers. More on that below.
The binary file appears to contain more than just a stream of floating point values written in binary format. I see identifiers (strings) and other things in the file as well. Your only hope of reading that is to contact the manufacturer of the device that created the binary file and ask them about how to read in such files. There'll probably be an SDK, or at least a description of the format. You might want to look into this as the floating point numbers in your text file might be truncated, i.e., you have lost precision compared to directly reading the binary representation of the floats.
Ok, so how to read your file with memmapfile? This post provides some hints.
So first we open your file as 'uint8' (note there is no 'char' option, so as a workaround we read the content of the file into a datatype of the same size):
m = memmapfile('RTL5_57.txt','Format','uint8'); % uint8 is default, you could leave that off
We can render the data read in as uint8 as characters by casting it to char:
c = char(m.Data(1:19)).' % read the first three lines. NB: transpose just for getting nice output, don't use it in your code
c =
0.398516 0.063440
0.399611 0.063284
0.398985 0.061253
As each line in your file has the same length (2*8 chars for the numbers, 1 tab and 2 chars for newline = 19 chars), we can read N lines from the file by reading N*19 values. So m.Data(1:19) gets you the first line, m.Data(20:38), the second line, and m.Data(20:57) the second and third lines. Read as much as you want at once.
Then we'll have to parse the read-in data into floating point numbers:
f = sscanf(c,'%f')
f =
0.3985
0.0634
0.3996
0.0633
0.3990
0.0613
All that's left now is to reshape them into your two column format
d = reshape(f,2,[]).'
d =
0.3985 0.0634
0.3996 0.0633
0.3990 0.0613
Easier ways than using memmapfile:
You don't need to use memmapfile to solve your problem, and I think it makes things more complicated. You can simply use fopen followed by fread:
fid = fopen('RTL5_57.txt');
c = fread(fid,Nlines*19,'*char');
% now sscanf and reshape as above
% NB: one can read the values the text file directly with f = fscanf(fid,'%f',Nlines*19).
% However, in testing, I have found calling fread followed by sscanf to be faster
% which will make a significant difference when reading such large files.
Using this you can read Nlines pairs of values at a time, process them and simply call fread again to read the next Nlines. fread remembers where it is in the file (as does fscanf), so simply use same call to get next lines. Its thus easy to write a loop to process the whole file, testing with feof(fid) if you are at the end of the file.
An even easier way is suggested here: use textscan. To slightly adapt their example code:
Nlines = 10000;
% describe the format of the data
% for more information, see the textscan reference page
format = '%f\t%f';
fid = fopen('RTL5_57.txt');
while ~feof(fid)
C = textscan(fid, format, Nlines, 'CollectOutput', true);
d = C{1}; % immediately clear C at this point if you need the memory!
% process d
end
fclose(fid);
Note again however that the fread followed by sscanf will be fastest. Note however that the fread method would die as soon as there is one line in the text file that doesn't exactly match your format. textscan is forgiving of whitespace changes on the other hand and thus more robust.

importing C written binary files into Matlab

I've read the post "Read and write from/to a binary file in Matlab" but I still have doubts. I have a binary file of long double values created with fwrite in C and in Matlab I'm using
fid = fopen('vz3.dat', 'r')
mydata = fread(fid, 'double')
where vz3.dat is my file. But I'm getting garbage values in Matlab. According to
[cinfo, maxsize, ordering] = computer
in Matlab, my computer is a little-endian system (byte ordering system). Any suggestions?
By the way, does a binary file necessarily have to end in .bin .I'm using the .dat extension. Is it ok to do so?
Thanks a lot
To open a file with little endian, use
fid = fopen('vz3.dat','r','l');
It doesn't matter what the file is called, by the way.
In case you have to use a file handle opened elsewhere, you can also use the machineformat parameter to fread (which is optional).
The documentation is available on the MathWorks site.

Read and write from/to a binary file in Matlab

My knowledge of matlab is merely on a need to know basis, so this is probably an elementary question. Nevertheless here it comes:
I have got a file containing data (16-bit integers) stored in binary format. How do I read it into a vector /an array in matlab? How do I write this data to a file in matlab? Is there any smart tweak to increase the performance speed when reading/writing a huge amount of data (gigabytes)?
As Bill the Lizard wrote you can use fread to load the data into a vector. I just want to expand a little on his answer.
Reading Data
>> fid=fopen('data.bin','rb') % opens the file for reading
>> A = fread(fid, count, 'int16') % reads _count_ elements and stores them in A.
The commands fopen and fread default to Little-endian[1] encoding for the integers. If your file is Big-endian encoded you will need to change the fread to
>> A = fread(fid, count, 'int16', 'ieee-be');
Also, if you want to read the whole file set
>> count=inf;
and if you want to read the data into matrix with n columns use
>> count=[n inf];
Writing Data
As for witting the data to a file. The command, fwrite, in Bill's answer will write to a binary file. If you want to write the data to a text file you can use dlmwrite
>> dlmwrite('data.csv',A,',');
References
[1] http://en.wikipedia.org/wiki/Endianness
Update
The machine format (IE, ieee-be,
ieee-le, vaxd etc.) of the binary data can be specified in either the
fopen or the fread commands in Matlab. Details of the supported
machine format can be found in
Matlab's documentation of fopen.
Scott French's comment to Bill's
answer
suggests reading the data into an
int16 variable. To do this use
>> A = int16(fread(fid,count,precision,machineFormat));
where count is the size/shape of
the data to be read, precision is
the data format, and machineformat
is the encoding of each byte.
See commands fseek to move around the file. For example,
>> fseek(fid,0,'bof');
will rewind the file to the beginning where bof stands for beginning of file.
Assuming you know how many values you have stored in the file, you can do something like this to read the data into an array.
fid = fopen('data.bin','rb')
A = fread(fid, count, 'int16')
To write data to a file do this:
fid = fopen('data.bin','w')
count = fwrite(fid, A, 'int16')
The fwrite function returns the number of elements (not bytes) written to the file.
As far as performance tuning goes, you can read data in chunks to only use as much as you need to process. This is the same in any language, and there's no way to speed it up that's specific to Matlab.
I usually hate seeing links in a response, but this looks pretty close:
http://www.mathworks.com/support/tech-notes/1400/1403.html
As to the second part of performance tuning, it's been 6 years since I've used Matlab, so I don't know.
HTH