I think the solution will be quite simple for somebody with some MATLAB knowhow however I do not know how to do it.
I have a binary file that I am reading with fread and I am reading the first 4 bytes of this file followed by the next 2 bytes.
I basically want this process of reading 4 bytes followed by 2 bytes repeated till the end of the file is reached.
So the number of bytes read is 4,2,4,2,4,2......
I have the following to read the first pair of data and I want this to repeat.
fileID = fopen('MyBinaryFile');
4bytes = fread(fileID, 4);
fseek(fileID, 4, 0);
2bytes = fread(fileID, 2);
Thanks in advance for any help and suggestions
I take it this is a variant of your former question MATLAB reading a mixed data type binary file.
Your goal is to read a binary file containing mixed data type. In your case it contains 2 columns:
1x single value (4 bytes) and 1x int16 value (2 bytes).
There are several ways to read this type of file. They differ in speed because some ways minimize disk access but require more temporary memory, and other way use just the memory needed but require more disk access (= slower).
Ultimately, the 3 ways I'm going to show you produce exactly the same result.
The direct answer to this question is the version #3 below, but I encourage you to have a look at the 2 other options described here, they are both really worth understanding.
For the purpose of the example, I had to create a binary file as you described. This is done this way:
%% // write example file
A = single(linspace(-3,1,11)) ; %// a few "float" (=single) data
B = int16(-5:5) ; %// a few "int16" data
fileID = fopen('testmixeddata.bin','w');
for il=1:11
fwrite(fileID,A(il),'single');
fwrite(fileID,B(il),'int16');
end
fclose(fileID);
This create a 2 column binary file, the columns being:
11 values of type float going from -3 to 1.
11 values of type int16 going from -5 to +5.
For future reference:
>> disp(A)
-3.0000 -2.6000 -2.2000 -1.8000 -1.4000 -1.0000 -0.6000 -0.2000 0.2000 0.6000 1.0000
>> disp(B)
-5 -4 -3 -2 -1 0 1 2 3 4 5
In each of the solution below, the first column will be read in a variable called varSingle, and the second column in a variable called varInt16.
1) Read all data in one go - convert to proper type after
%% // SOLUTION 1 (fastest) : Read all data in one go - convert to proper type after
fileID = fopen('testmixeddata.bin');
R = fread(fileID,'uint8=>uint8') ; %// read all values, most basic data type (unsigned 8 bit integer)
fclose(fileID);
colSize = [4 2] ; %// number of byte for each column [4 byte single, 2 byte int16]
R = reshape( R , sum(colSize) , [] ) ; %// reshape data into a matrix (6 is because 4+2byte=6 byte per column)
temp = R(1:4,:) ; %// extract data for first column into temporary variable (OPTIONAL)
varSingle = typecast( temp(:) , 'single' ) ; %// convert into "single/float"
temp = R(5:end,:) ; %// extract data for second column
varInt16 = typecast( temp(:) , 'int16' ) ; %// convert into "int16"
This is my favourite method. Specially for speed because it minimizes the read/seek operations on disk, and most post calculations are done in memory (much much faster than disk operations).
Note that the temporary variable I used was only for clarity/verbose, you can avoid it altogether if you get your indexing into the raw data right.
The key thing to understand is the use of the typecast function. And the good news is it got even faster since 2014b.
2) Read column by column (using "skipvalue") - 2 pass approach
%% // SOLUTION 2 : Read column by column (using "skipvalue") - 2 pass approach
col1size = 4 ; %// size of data in column 1 (in [byte])
col2size = 2 ; %// size of data in column 2 (in [byte])
fileID = fopen('testmixeddata.bin');
varSingle = fread(fileID,'*single',col2size) ; %// read all "float" values, skipping all "int16"
fseek(fileID,col1size,'bof') ; %// rewind to beginning of column 2 at the top of the file
varInt16 = fread(fileID,'*int16',col1size) ; %// read all "int16" values, skipping all "float"
fclose(fileID);
That works too. It works fine ... but it is going to be slower than method 1 above, because you will have to scan the file twice. It may be a good option if the file is very large and method 1 above fail because of an out of memory error.
3) Read element by element
%% // SOLUTION 3 : Read element by element (slow - not recommended)
fileID = fopen('testmixeddata.bin');
varSingle=[];varInt16=[];
while ~feof(fileID)
try
varSingle(end+1) = fread(fileID, 1, '*single' ) ;
varInt16(end+1) = fread(fileID, 1, '*int16' ) ;
catch
disp('reached End Of File')
end
end
fclose(fileID);
That does work too, and if you were writing C code it would be more than ok. But in Matlab this is not the recommended way to go (your choice ultimately)
As promised, the 3 methods above will give you exactly what we wrote in the file at the beginning:
>> disp(varSingle)
-3.0000 -2.6000 -2.2000 -1.8000 -1.4000 -1.0000 -0.6000 -0.2000 0.2000 0.6000 1.0000
>> disp(varInt16)
-5 -4 -3 -2 -1 0 1 2 3 4 5
fileID = fopen('MyBinaryFile');
kk=1;
while ~feof(fileID)
bytes4(kk) = fread(fileID, 4);
fseek(fileID, 4, 0);
bytes2(kk) = fread(fileID, 2);
kk=kk+1;
end
the while loop condition is ~feof, which stands for End-Of-File. So as long as you haven't reached the end of your file it runs.
I added the kk just so you store everything and not just overwrite them each loop iteration.
If you want to get the data without loops, there are MATLABish ways to that:
%'Sizes'
T = 4; %'Time record size'
D = 2; %'Date record size'
R = T+D; %'Record size'
%'Open file'
f = fopen('MyBinaryFile', 'rb');
if f < 0
error('Could not open file.');
end;
%'Read the entire file at once, and close file'
buf = fread(f, Inf, '*uint8');
fclose(f);
%'Ignore the last unpadded bytes, and reshape by the size of 1 record'
buf = reshape(buf(1:R*fix(numel(buf)/R)), R, []);
%'Pinpoint the data'
time_bytes = buf( 1: T, :);
date_bytes = buf(T+1:T+D, :);
Related
Suppose I've a text file 'Data.txt' such as:-
S.No. Date Amplitude
The subsequent columns are separated by space rather than commas. Now, going through the tutorial provided in MATLAB webpage, I made a program such as:-
fileID = fopen('Data.txt','r');
formatSpec = '%d %d %f';
sizeA = [3 Inf];
A = fscanf(fileID,formatSpec,sizeA);
A=A';
This program works successfully in extracting data from the text file, after making some additional manipulations.
My question is- why it was suggested to keep
sizeA = [3 Inf];
rather than [Inf 3]
because this resulted in an initial generation of A matrix with a data row corresponding to S.No. 1 as column 1, next data row (corresponding to S.No. 2) as column 2, and so on.
At last, A was transposed to get a similar view of data, as given in the text file.
This transpose was necessary, I get it, but why an initial size was taken as such?
I took
sizeA = [Inf 3];
and it resulted in an error.
So basically, I wanted to know the reason of such a size selection, and why not the other way around.
I have a matrix as shown in the image. In this matrix if any of the values in one row is found in another row we remove the shorter row. For example row 2 to row 5 all contain 3, therefore I want to keep only row 5(the row with most non-zero values) and remove all other rows...please suggest a solution.
Thanks
I believe the below code should work. The idea is to sort the matrix first according to the number of elements in the rows, then loop and remove the rows that have matches. Probably not the most efficient code but should work in principle.. see the comments for more explanation
% generating the data
M = zeros(6, 10);
M(2,1:3) = [3 8 10];
M(3,1:4) = [3 8 10 9];
M(4,1:5) = [3 8 10 9 7];
M(5,1:6) = [3 8 10 9 7 4];
M(6,1) = [5];
% sorting according to the number of non-zero elements
nr_of_nonzero = sum(M~=0, 2);
[~, sort_indices] = sort(nr_of_nonzero);
M_sorted = M(sort_indices,:);
M_sorted(M_sorted==0)=NaN; % should not compare 0s (?)
% get rid of the matches
for i=1:size(M_sorted, 1)-1
for j=(i+1):size(M_sorted, 1)
[C,ia,ib] = intersect(M_sorted(i,:),M_sorted(j,:));
if numel(C)>0
M_sorted(i,:) = NaN;
end
break;
end
end
% reorder
M(sort_indices,:) = M_sorted;
% remove all NaN rows
M(all(isnan(M),2),:) = [];
% back to 0s
M(isnan(M)) = 0;
I'm not doing all the code here, but here's the steps that I would take to solve it. You will likely have to try different ways of doing it to obtain the intended result (i.e. vector operations, while loop, for loop, etc.).
Problem
Rows are repetitive and need to be reduced in a more compact form.
Solution
Look up mat2str.
Convert your vectors (rows) to strings. This can be done with temporary values like tmpstr1 = mat2str(yourMatrix(rowToBeCompared, :));
Parse the first string from beginning to end, while parsing the second string in the same way to make comparisons.
use strcmp to see if the string characters (or strings themselves) are the same: http://www.mathworks.com/help/matlab/ref/strcmp.html
Delete a row if you find it appropriate with yourMatrix[rowToDelete, : ] = [];
Try that and see if it works.
Note - Expansion of step 3:
if we have variable a = '[ab+11]';, we can select individual characters from the string like:
a(4)
ans = '+'
a(5)
ans = '1'
a(1)
and = '['
Therefore, you can parse the string with a loop:
for n = 1 : length(a)
if a(n) == '1' || a(n) == '0'
str(n) = a(n);
end
end
Like Sardar Usama said, it's helpful to provide the code so that we can copy and paste into our own MATLAB workspaces.
I have a huge text file in the following format:
1 2
1 3
1 10
1 11
1 20
1 376
1 665255
2 4
2 126
2 134
2 242
2 247
First column is the x coordinate while second column is the y coordinate.
It indicates that if I had to construct a Matrix
M = zeros(N, N);
M(1, 2) = 1;
M(1, 3) = 1;
.
.
M(2, 247) = 1;
This text file is huge and can't be brought to main memory at once. I must read it line by line. And save it in a sparse matrix.
So I need the following function:
function mat = generate( path )
fid = fopen(path);
tline = fgetl(fid);
% initialize an empty sparse matrix. (I know I assigned Mat(1, 1) = 1)
mat = sparse(1);
while ischar(tline)
tline = fgetl(fid);
if ischar(tline)
C = strsplit(tline);
end
mat(C{1}, C{2}) = 1;
end
fclose(fid);
end
But unfortunately besides the first row it just puts trash in my sparse mat.
Demo:
1 7
1 9
2 4
2 9
If I print the sparse mat I get:
(1,1) 1
(50,52) 1
(49,57) 1
(50,57) 1
Any suggestions ?
Fixing what you have...
Your problem is that C is a cell array of characters, not numbers. You need to convert the strings you read from the file into integer values. Instead of strsplit you can use functions like str2num and str2double. Since tline is a space-delimited character array of integers in this case, str2num is the easiest to use to compute C:
C = str2num(tline);
Then you just index C like an array instead of a cell array:
mat(C(1), C(2)) = 1;
Extra tidbit: If you were wondering how your demo code still worked even though C contained characters, it's because MATLAB has a tendency to automatically convert variables to the correct type for certain operations. In this case, the characters were converted to their double ASCII code equivalents: '1' became 49, '2' became 50, etc. Then it used these as indices into mat.
A simpler alternative...
You don't even have to bother with all that mess above, since you can replace your entire function with a much simpler approach using dlmread and sparse like so:
data = dlmread(filePath);
mat = sparse(data(:, 1), data(:, 2), 1);
clear data; % Save yourself some memory if you don't need it any more
I am having trouble combining repeated elements of my Matlab "data" variable. I can easily combine the values using unique and sort.
[sorted,idx] = sort(data);
[~,ij] = unique(sorted,'first');
Indx = (sort(idx(ij)));
However, by doing this I am combining ALL repeated values. What I really want to do is combine only groups of repeating elements. For example take this:
data = [1;1;1;2;2;2;3;3;3;4;4;4;4;4;3;3;2;2;2;2;1;1;1;1;4;4;4;4;]
Combine duplicate groups of elements:
data = [1;2;3;4;3;2;1;4;]
I need to combine the groups of repeating elements wile still preserving the order. It would also be helpful to return the index because I need to average data in another variable based on the index of combination.
For example:
data = [1;1;1;2;2;2;3;3;3;4;4;4;4;4;3;3;2;2;2;2;1;1;1;1;4;4;4;4;]
data2 = [7;2;4;5;3;4;6;8;5;3;5;7;4;2;4;6;8;4;3;6;7;8;4;2;9;3;2;0;]
dataCombined = [1; 2; 3; 4; 3; 2; 1; 4; ]
data2average = [4.33; 4; 6.33 4.2 5; 5.25; 5.25; 3.5; ]
Can anyone give suggestions?
SOLUTION:
Thank you all for your answers. MZimmerman6's solution worked well for me. I wanted to show what I did in order to average the values in "data2" array.
data = [1;1;1;2;2;2;3;3;3;4;4;4;4;4;3;3;2;2;2;2;1;1;1;1;4;4;4;4;];
data2 = [7;2;4;5;3;4;6;8;5;3;5;7;4;2;4;6;8;4;3;6;7;8;4;2;9;3;2;0;];
change = diff(data)~=0;
indices = [1,find(change)'+1];
compressed = data(indices)';
numberOfRepeatingGroups = size(indices);
for i=1:numberOfRepeatingGroups(1,2)
if(i == 1)
dataToAverage = data2(indices(1,1):(indices(1,2)-1));
elseif (i == numberOfRepeatingGroups(1,2))
dataToAverage = data2(indices(1,i):end);
else
dataToAverage = data2(indices(1,i):(indices(1,(i+1))-1));
end
data2Averaged(1,i) = mean(dataToAverage(:));
end
data2Averaged =
4.3333 4.0000 6.3333 4.2000 5.0000 5.2500 5.2500 3.5000
You can use a derivative to find fluctuations in your data arrays, which would indicate a change in grouping. Anywhere where the derivative is not 0, there is a change, either positive or negative. Find where these changes occur, and then grab the corresponding indices. Something like below.
data = [1;1;1;2;2;2;3;3;3;4;4;4;4;4;3;3;2;2;2;2;1;1;1;1;4;4;4;4;];
change = diff(data)~=0;
indices = [1,find(change)'+1];
compressed = data(indices)';
and the result will be
compressed =
1 2 3 4 3 2 1 4
And of course you can use the indices variable for whatever you need as well.
Note
On the third line, we add index 1 because technically the start of the array is a change, and then we add 1 to the find command because we are using find on the derivative, so the returned change array will be 1 shorter than the original.
I will never stop recommending this run-length encoding/deconding utility from the File Exchange: rude().
% Run-length encode preserving order
[len,val] = rude(data);
len =
3 3 3 5 2 4 4 4
val =
1 2 3 4 3 2 1 4
Now, to calculate the mean, first re-label each subsequence with rude(), then use accumarray()
% Decode and re-label each subsequence
subs = rude(len,1:numel(len))';
% Take average on each re-labelled subsequence
accumarray(subs,data2,[],#mean)
ans =
4.3333
4.0000
6.3333
4.2000
5.0000
5.2500
5.2500
3.5000
I have been using the load function until now to load my space separated files into arrays in matlab. However this wastes lots of space for me since my values are either 0 or 1. Thus instead of writing files like
0 1 1 0
1 1 1 0
I removed the spaces to create half as big files like:
0110
1110
However now load doesn't work correctly any more (creates a matrix I think with only the first number, so 2x1 instead of 2x4).
I looked around with importdata, reading the file line by line and lots of other stuff but I couldn't find a clear solution.
So essentially I want to read a matrix from a file which doesn't have a delimiter. Every number is an element of the array
Does anyone know of a clean way to do this?
Thanks
Here is one way:
data.txt
0110
1110
MATLAB
%# read file lines as cell array of strings
fid = fopen('data.txt','rt');
C = textscan(fid, '%s', 'Delimiter','');
fclose(fid);
%# extract digits
C = cell2mat(cellfun(#(s)s-'0', C{1}, 'Uniform',false));
result:
C =
0 1 1 0
1 1 1 0
If you are really concerned about memory, maybe you cast as boolean: C = logical(C) as 0/1 are the possible values.
Adapted from the example here:
function ret = readbin()
fid = fopen('data.txt');
ret = [];
tline = fgetl(fid);
while ischar(tline)
if isempty(ret)
ret = tline;
else
ret = [ret; tline];
end
tline = fgetl(fid);
end
% Turn char '0' into numerical 0
ret = ret - 48;
fclose(fid);
end
Subtracting 48 (ASCII code for '0') you get a numeric matrix with 1's and 0's in the appropriate places. This is the output I get at the end of the function:
K>> ret
ret =
0 1 1 0
1 1 1 0
K>> class(ret)
ans =
double
Inspired from this question: Convert decimal number to vector, here is a proposition, but I don't think it will work for large "numbers":
arrayfun(#str2double, str);
An alternate form of Amro's solution would be either
fid = fopen('data.txt','r');
C = textscan(fid, '%1d%1d%1d%1d');
fclose(fid);
M = cell2mat(C)
Or
fid = fopen('data.txt','r');
M_temp = textscan(fid, '%1d');
fclose(fid);
M = reshape(M_temp{1}, 2,4)
These are slightly different from Amro's solution as the first reads in a %1d format number (1 character integer) and returns a cell array C that can be converted using a much simpler cell2mat command. The second bit of code reads in only a vector of those values then reshapes the result---which works okay if you already know the size of the data but may need additional work if not.
If your actual files are very large you may find that one of these ways is faster, although it is hard to tell without actually running them side-by-side.