The question may be naive, but answers could help me.
A measurement is recorded in binary format, with a header that contains all information about the data and the data itself (i.e. a series of doubles).
The measurement data can be exported in csv format from the application, but it takes ages.
What do you have to pay attention to when trying to read data from a binary file? Is this process even feasible using Matlab to import as an array or labview (export as .txt maybe?)
Binary .rec file format may refer to various things (audio/video encoding format of Topfield based on MPEG4-TS, proprietary audio encoding, and even MRI scanner from Phillips) ...
If it refers to MRI scanner you may find some direct reader on fileexchange: Matlab PAR REC Reader
If it refer to something else, you may parse binary file header and data by yourself using the low level routine: fread
Edit
Not knowing the exact file format for your recorded sensor displacement, here is dummy example with fread for reading large rec file block-by-block supposing header contains just the length of data, and that data is just a serie of double values:
function [] = DummyReadRec()
%[
% Open rec file for reading
[fid, errmsg] = fopen('dummy.rec', 'r');
if (fid < 0), error(errmsg); end
cuo = onCleanup(#()fclose(fid));
% Read header (here supposing it is only an integer giving length of data)
reclenght = fread(fid, 1, 'uint32');
% Read data block-by-block (here supposing it is only double values)
MAX_BLOCK_LENGTH = 512;
blockCount = ceil(reclenght / MAX_BLOCK_LENGTH);
for bi = 1:blockCount,
% Will read a maximum of 'MAX_BLOCK_LENGTH' (or less if we're on the last block)
[recdata, siz] = fread(fid, [1 MAX_BLOCK_LENGTH], 'double');
% Do something with this block (fft or whatever)
offset = (bi-1)*MAX_BLOCK_LENGTH;
position = (offset+1):(offset+siz);
plot(position, 20*log10(abs(fft(recdata))));
drawnow();
end
%]
end
The answer is going to depend on the format of your binary file and how large it is.
I have done many conversion of various binary files all with differing layouts. If the file will fit into memory then you can just use fread as long as you know the layout of the binary file. Below is an example of reading a header & simple data block. It would of course have to be modified depending on the layout of your file. Depending on recording equipment & computer type you may also need to make use of the machinefmt ('ieee-le' or 'ieee-be') options of fread ... that has burned me before.
%Open the File for reading
fid = fopen(yourRECfile,'r');
%Read the Header ... your layout will be different
header.MajorRel = fread(fid,1,'uint16'); %Major File Rev #
header.MinorRel = fread(fid,1,'uint16'); %Minor File Rev #
header.IRIGStart = fread(fid,1,'double'); %Start time in secs
header.Flags = fread(fid,1,'uint32'); %Flags
%Read everything else from there until end of file as a series of doubles.
data = fread(fid,inf,'double');
fclose(fid);
If the file does not fit into memory you will either need to process it in blocks or look into using memmapfile.
Related
I have a binary square matrix with complex values, stored in a .bin format file. I have tried to read this 100-by-100 matrix with a Matlab script:
i=fopen('matrix.bin','r')
A=fread(i,[100 100]
This code does not correctly read the complex values contained in A. I only get a 100-by-100 matrix of integers.
MATLAB fread support ANSI C types, but there is no native ANSI C types that represent complex numbers. Most likely, a complex number is stored as a pair of real and imaginary numbers.
Without information as to how the binary file is saved, you can still perform some test to figure this out. If the complex number is represented as a real part and an imaginary part, and both in double precision, then a single complex number would take up 8 + 8 = 16 bytes. We can test this by navigating to the end of the file, and see how many bytes there are.
fID = fopen('matrix.bin','r')
fseek(fID, 0, 'eof') % Go to the end of file
ftell(fID) % Tell current position in the open file
fclose(fID)
If this number is equal to 16 * 100 * 100 = 160000, then you're in luck. There is no extra stuff saved in this file, and you can simply read the data by this code:
fID = fopen('matrix.bin','r')
data = []
for ii = 1:10000
data = [data; fread(fID, 2, 'double')']
end
fclose(fID)
You'll end up with a 10000*2 array, with each row representing a complex number. If the file size is 80000, then both real and imaginary part could be saved in single data type. If file size is some other number, then it probably means some additional information is stored in the binary. You'll have to know what additional information is stored so you can read the file correctly.
I have a very large matrix (M X N). I want to divide matrix into 10 equal parts (almost) and save each of them into a separate file say A1.txt, A2.txt, etc. or .mat format. How can I do this ?
Below is a code to divide a matrix into 10 equal parts and data_size is (M / 10).
for i=1:10
if i==1
data = DATA(1:data_size,:);
elseif i==10
data = DATA((i-1)*data_size+1:end,:);
else
data = DATA((i-1)*data_size+1: i*data_size,:);
end
save data(i).mat data
% What should I write here in order to save data into separate file data1.mat, data2.mat etc.
end
You said you wanted it in either txt format or mat format. I'll provide both solutions, and some of this is attributed to Daniel in his comment in your post above.
Saving as a text file
You can use fopen to open a file up for writing. This returns an ID to the file that you want to write to. After this, use fprintf and specify the ID to the file that you want to write to, and the data you want to write to this file. As such, with sprintf, generate the text file name you want, then use fprintf to write data to your file. It should be noted that writing matrices to fprintf in MATLAB assume column major format. If you don't want your data written this way and want it done in row-major, you need to transpose your data before you write this to file. I'll provide both methods in the code depending on what you want.
After you're done, use fclose to close the file noting that you have finished writing to it. Therefore, you would do this:
for i=1:10
if i==1
data = DATA(1:data_size,:);
elseif i==10
data = DATA((i-1)*data_size+1:end,:);
else
data = DATA((i-1)*data_size+1: i*data_size,:);
end
filename = sprintf('A%d.txt', i); %// Generate file name
fid = fopen(filename, 'w'); % // Open file for writing
fwrite(fid, '%f ', data.'); %// Write to file - Transpose for row major!
%// fwrite(fid, '%f ', data); %// Write to file - Column major!
fclose(fid); %// Close file
end
Take note that I space separated the numbers so you can open up the file and see how these values are written accordingly. I've also used the default precision and formatting by just using %f. You can play around with this by looking at the fprintf documentation and customizing the precision and leading zero formatting to your desire.
Saving to a MAT file
This is actually a more simpler approach. You would still use sprintf to save your data, then use the save command to save your workspace variables to file. Therefore, your loop would be this:
for i=1:10
if i==1
data = DATA(1:data_size,:);
elseif i==10
data = DATA((i-1)*data_size+1:end,:);
else
data = DATA((i-1)*data_size+1: i*data_size,:);
end
filename = sprintf('A%d.mat', i); %// Generate file name
save(filename, 'data');
end
Take note that the variable you want to save must be a string. This is why you have to put single quotes around the data variable as this is the variable you are writing to file.
You can use
save(['data' num2str(i) '.mat'], 'data');
where [ ] is used to concatenate strings and num2str to convert an integer to a string.
I have very large data files (typically 30Gb to 60Gb) in .txt format. I want to find a way to to automatically decimate the files without importing them to memory first.
My .txt files consist of two columns of data, this is an example file:
https://www.dropbox.com/s/87s7qug8aaipj31/RTL5_57.txt
What I have done so far is to import the data to a variable "C" then down sample the data. The problem with this method is that the variable "C" often fills the memory capacity of MATLAB before the program has change to decimate:
function [] = textscan_EPS(N,D,fileEPS )
%fileEPS: .txt address
%N: number of lines to read
%D: Decimation factor
fid = fopen(fileEPS);
format = '%f\t%f';
C = textscan(fid, format, N, 'CollectOutput', true);% this variable exceeds memory capacity
d = downsample(C{1},D);
plot(d);
fclose(fid);
end
How can I modify this line:
C = textscan(fid, format, N, 'CollectOutput', true);
So that it effectively decimates the data at this instance by importing every other line of or every 3rd line ect.. of the .txt file from disk to variable "C" in memory.
Any help would be much appreciated.
Cheers,
Jim
PS
An alternative method that I have been playing around with uses "fread" but it encouters the same problem:
function [d] = fread_EPS(N,D,fileEPS)
%N: number of lines to read
%D: decimation factor
%fileEPS: location of .txt fiel
%read in the data as characters
fid = fopen(fileEPS);
c = fread(fid,N*19,'*char');% EWach line of .txt has 19 characters
%Parse and read the data into floading point numbers
f=sscanf(c,'%f');
%Reshape the data into a two column format
format long
d=decimate((flipud(rot90(reshape(f,2,[])))),D); %reshape for 2 colum format, rotate 90, flip veritically,decimation factor
I believe that textscan is the way to go, however you may need to take an intermediate step. Here is what I would do assuming you can easily read N lines at a time:
Read in N lines with textscan(fileID,formatSpec,N)
Sample from these lines, store the result (file or variable) and drop the rest
As long as there are lines left continue with step 1
Optional, depending on your storage method: combine all stored results into one big sample
It should be possible to just read 1 line each time, and decide whether you want to keep/discard it. Though this should consume minimal memory I would try to do a few thousand each time to get reasonable performance.
I ended up writing the code below based on Dennis Jaheruddin's advice. It appears to work well for large .txt files (10GB to 50Gb). The code is also inspired by another post:
Memory map file in MATLAB?
Nlines = 1e3; % set numbe of lines to sample per cycle
sample_rate = (1/1.5e6); %data sample rate
DECE= 1;% decimation factor
start = 40; %start of plot time
finish = 50; % end plot time
TIME = (0:sample_rate:sample_rate*((Nlines)-1));
format = '%f\t%f';
fid = fopen('C:\Users\James Archer\Desktop/RTL5_57.txt');
while(~feof(fid))
C = textscan(fid, format, Nlines, 'CollectOutput', true);
d = C{1}; % immediately clear C at this point you need the memory!
clearvars C ;
TIME = ((TIME(end)+sample_rate):sample_rate:(sample_rate*(size(d,1)))+(TIME(end)));%shift Time along
if ((TIME(end)) > start) && ((TIME(end)) < finish);
plot((TIME(1:DECE:end)),(d(1:DECE:end,:)))%plot and decimate
end
hold on;
clearvars d;
end
fclose(fid);
older versions of MATLAB do not process this code well, the following message appears:
Caught std::exception Exception message is: bad allocation
But MATLAB 2013 works just fine
I have decided to use memmapfile because my data (typically 30Gb to 60Gb) is too big to fit in a computer's memory.
My data files consist two columns of data that correspond to the outputs of two sensors and I have them in both .bin and .txt formats.
m=memmapfile('G:\E-Stress Research\Data\2013-12-18\LD101_3\EPS/LD101_3.bin','format','int32')
m.data(1)
I used the above code to memory map my data to a variable "m" but I have no idea what data format to use (int8', 'int16', 'int32', 'int64','uint8', 'uint16', 'uint32', 'uint64', 'single', and 'double').
In fact I tried all of the data formats listed that MATLAB supports, but when I used the m.data(index number) I never get a pair of numbers (2 columns of data) which is what I expected, also the number will be different depending on the format I used.
If anyone has experience with memmapfile please help me.
Here are some smaller versions of my data files so people can understand how my data is structured:
cheers
James
memmapfile is designed for reading binary files, that's why you are having trouble with your text file. The data in there is characters, so you'll have to read them as characters and then parse them into numbers. More on that below.
The binary file appears to contain more than just a stream of floating point values written in binary format. I see identifiers (strings) and other things in the file as well. Your only hope of reading that is to contact the manufacturer of the device that created the binary file and ask them about how to read in such files. There'll probably be an SDK, or at least a description of the format. You might want to look into this as the floating point numbers in your text file might be truncated, i.e., you have lost precision compared to directly reading the binary representation of the floats.
Ok, so how to read your file with memmapfile? This post provides some hints.
So first we open your file as 'uint8' (note there is no 'char' option, so as a workaround we read the content of the file into a datatype of the same size):
m = memmapfile('RTL5_57.txt','Format','uint8'); % uint8 is default, you could leave that off
We can render the data read in as uint8 as characters by casting it to char:
c = char(m.Data(1:19)).' % read the first three lines. NB: transpose just for getting nice output, don't use it in your code
c =
0.398516 0.063440
0.399611 0.063284
0.398985 0.061253
As each line in your file has the same length (2*8 chars for the numbers, 1 tab and 2 chars for newline = 19 chars), we can read N lines from the file by reading N*19 values. So m.Data(1:19) gets you the first line, m.Data(20:38), the second line, and m.Data(20:57) the second and third lines. Read as much as you want at once.
Then we'll have to parse the read-in data into floating point numbers:
f = sscanf(c,'%f')
f =
0.3985
0.0634
0.3996
0.0633
0.3990
0.0613
All that's left now is to reshape them into your two column format
d = reshape(f,2,[]).'
d =
0.3985 0.0634
0.3996 0.0633
0.3990 0.0613
Easier ways than using memmapfile:
You don't need to use memmapfile to solve your problem, and I think it makes things more complicated. You can simply use fopen followed by fread:
fid = fopen('RTL5_57.txt');
c = fread(fid,Nlines*19,'*char');
% now sscanf and reshape as above
% NB: one can read the values the text file directly with f = fscanf(fid,'%f',Nlines*19).
% However, in testing, I have found calling fread followed by sscanf to be faster
% which will make a significant difference when reading such large files.
Using this you can read Nlines pairs of values at a time, process them and simply call fread again to read the next Nlines. fread remembers where it is in the file (as does fscanf), so simply use same call to get next lines. Its thus easy to write a loop to process the whole file, testing with feof(fid) if you are at the end of the file.
An even easier way is suggested here: use textscan. To slightly adapt their example code:
Nlines = 10000;
% describe the format of the data
% for more information, see the textscan reference page
format = '%f\t%f';
fid = fopen('RTL5_57.txt');
while ~feof(fid)
C = textscan(fid, format, Nlines, 'CollectOutput', true);
d = C{1}; % immediately clear C at this point if you need the memory!
% process d
end
fclose(fid);
Note again however that the fread followed by sscanf will be fastest. Note however that the fread method would die as soon as there is one line in the text file that doesn't exactly match your format. textscan is forgiving of whitespace changes on the other hand and thus more robust.
My knowledge of matlab is merely on a need to know basis, so this is probably an elementary question. Nevertheless here it comes:
I have got a file containing data (16-bit integers) stored in binary format. How do I read it into a vector /an array in matlab? How do I write this data to a file in matlab? Is there any smart tweak to increase the performance speed when reading/writing a huge amount of data (gigabytes)?
As Bill the Lizard wrote you can use fread to load the data into a vector. I just want to expand a little on his answer.
Reading Data
>> fid=fopen('data.bin','rb') % opens the file for reading
>> A = fread(fid, count, 'int16') % reads _count_ elements and stores them in A.
The commands fopen and fread default to Little-endian[1] encoding for the integers. If your file is Big-endian encoded you will need to change the fread to
>> A = fread(fid, count, 'int16', 'ieee-be');
Also, if you want to read the whole file set
>> count=inf;
and if you want to read the data into matrix with n columns use
>> count=[n inf];
Writing Data
As for witting the data to a file. The command, fwrite, in Bill's answer will write to a binary file. If you want to write the data to a text file you can use dlmwrite
>> dlmwrite('data.csv',A,',');
References
[1] http://en.wikipedia.org/wiki/Endianness
Update
The machine format (IE, ieee-be,
ieee-le, vaxd etc.) of the binary data can be specified in either the
fopen or the fread commands in Matlab. Details of the supported
machine format can be found in
Matlab's documentation of fopen.
Scott French's comment to Bill's
answer
suggests reading the data into an
int16 variable. To do this use
>> A = int16(fread(fid,count,precision,machineFormat));
where count is the size/shape of
the data to be read, precision is
the data format, and machineformat
is the encoding of each byte.
See commands fseek to move around the file. For example,
>> fseek(fid,0,'bof');
will rewind the file to the beginning where bof stands for beginning of file.
Assuming you know how many values you have stored in the file, you can do something like this to read the data into an array.
fid = fopen('data.bin','rb')
A = fread(fid, count, 'int16')
To write data to a file do this:
fid = fopen('data.bin','w')
count = fwrite(fid, A, 'int16')
The fwrite function returns the number of elements (not bytes) written to the file.
As far as performance tuning goes, you can read data in chunks to only use as much as you need to process. This is the same in any language, and there's no way to speed it up that's specific to Matlab.
I usually hate seeing links in a response, but this looks pretty close:
http://www.mathworks.com/support/tech-notes/1400/1403.html
As to the second part of performance tuning, it's been 6 years since I've used Matlab, so I don't know.
HTH