fread and fwrite in matlab not exact - matlab

I would like to save float numbers in a binary file and read them afterwards for further processing. Unfortunately, fwrite and afterwards fread does change the number.
Following simple example:
% Number to store
A = 0.123456789101112
% Generate and open txt file
fid = fopen('test_fread.txt','w','b');
% write A into test_fread.txt
fwrite(fid,A,'float32');
% close file
fclose(fid)
% open txt file
fid = fopen('test_fread.txt','r','b');
% read the file
fread(fid,'float32')
ans = 0.123456791043282
The answer is different than the input. How can I fix this? What should I search for? Is it rounding, precision or another problem?

Floating point numbers are never exact. single precision floats (float32) only have 6-9 decimals of precision. For the pure decimal case this translates to the issue that you're seeing. The effect is more exaggerated if you also have an integer component:
% Sample number
A = 123456789.123456789;
% Write, rewind, and read back in
fID = fopen('test_fread.txt', 'w+', 'b');
fwrite(fID, A, 'float32');
frewind(fID);
B = fread(fID,'float32');
fclose(fID);
fprintf('A: %15.15f\nB: %15.15f\n', A, B);
Which returns:
A: 123456789.123456790000000
B: 123456792.000000000000000
Note that MATLAB casts B as a double here.
MATLAB's default data type, double (float64) has double the bits available, which will give you 15-17 significant decimal digits. Using the previous example we can try:
% Sample number
A = 123456789.123456789;
% Write, rewind, and read back in
fID = fopen('test_fread.txt', 'w+', 'b');
fwrite(fID, A, 'float64');
frewind(fID);
B = fread(fID,'float64');
fclose(fID);
fprintf('A: %15.15f\nB: %15.15f\n', A, B);
Which returns:
A: 123456789.123456790000000
B: 123456789.123456790000000
Yay.
If you need better precision in MATLAB than double you will need to look into using vpa, which is part of the Symbolic Math Toolbox, or a similar package that offers higher/variable precision.

Related

Matlab fwrite precision conversion

I want to write some data to a binary file in single precision. The data is originally in double precision. Is there any difference between converting the data to single by calling the single command before fwrite and just letting Matlab do the conversion in the fwrite call?
Case 1
data1 % double precision
fwrite(fid,data1,'single');
Case 2
data2=single(data1);
fwrite(fid,data2,'single');
In the 2nd case, is Matlab doing any modifications to data2 before writing it since it is already in single format ? Will there be any difference in data written to the two files ?
Let's try this:
data1 = 1.555555555555555555;
data2 = single(data1);
fid = fopen('C:\Some\Address\data1.bin', 'w');
fwrite(fid, data1, 'single');
fclose(fid);
fid = fopen('C:\Some\Address\data2.bin', 'w');
fwrite(fid, data2, 'single');
fclose(fid);
% Lets read them back (note that MATLAB stores them in a double-precision variable by default)
fid = fopen('C:\Some\Address\data1.bin', 'r');
data1 = fread(fid, 'single');
fclose(fid);
fid = fopen('C:\Some\Address\data2.bin', 'r');
data2 = fread(fid, 'single');
fclose(fid);
format long;
[data1 data2] % or use fprintf to see the values
ans =
1.555555582046509 1.555555582046509
To your questions:
In the 2nd case, is Matlab doing any modifications to data2 before
writing it since it is already in single format ?
I don't think so but cannot be confident without knowing what is going on under the hood of fwrite.
Will there be any difference in data written to the two files ?
According to the test above I don't believe so,
You should use formatSpec to specify the format:
Like this
A = [6.6,1.11111];
formatSpec = '%4.5f'; % modify it accordingly.
fprintf(formatSpec ,A);

Optimizing reading the data in Matlab

I have a large data file with a text formatted as a single column with n rows. Each row is either a real number or a string with a value of: No Data. I have imported this text as a nx1 cell named Data. Now I want to filter out the data and to create a nx1 array out of it with NaN values instead of No data. I have managed to do it using a simple cycle (see below), the problem is that it is quite slow.
z = zeros(n,1);
for i = 1:n
if Data{i}(1)~='N'
z(i) = str2double(Data{i});
else
z(i) = NaN;
end
end
Is there a way to optimize it?
Actually, the whole parsing can be performed with a one-liner using a properly parametrized readtable function call (no iterations, no sanitization, no conversion, etc...):
data = readtable('data.txt','Delimiter','\n','Format','%f','ReadVariableNames',false,'TreatAsEmpty','No data');
Here is the content of the text file I used as a template for my test:
9.343410
11.54300
6.733000
-135.210
No data
34.23000
0.550001
No data
1.535000
-0.00012
7.244000
9.999999
34.00000
No data
And here is the output (which can be retrieved in the form of a vector of doubles using data.Var1):
ans =
9.34341
11.543
6.733
-135.21
NaN
34.23
0.550001
NaN
1.535
-0.00012
7.244
9.999999
34
NaN
Delimiter: specified as a line break since you are working with a single column... this prevents No data to produce two columns because of the whitespace.
Format: you want numerical values.
TreatAsEmpty: this tells the function to treat a specific string as empty, and empty doubles are set to NaN by default.
If you run this you can find out which approach is faster. It creates an 11MB text file and reads it with the various approaches.
filename = 'data.txt';
%% generate data
fid = fopen(filename,'wt');
N = 1E6;
for ct = 1:N
val = rand(1);
if val<0.01
fwrite(fid,sprintf('%s\n','No Data'));
else
fwrite(fid,sprintf('%f\n',val*1000));
end
end
fclose(fid)
%% Tommaso Belluzzo
tic
data = readtable(filename,'Delimiter','\n','Format','%f','ReadVariableNames',false,'TreatAsEmpty','No Data');
toc
%% Camilo Rada
tic
[txtMat, nLines]=txt2mat(filename);
NoData=txtMat(:,1)=='N';
z = zeros(nLines,1);
z(NoData)=nan;
toc
%% Gelliant
tic
fid = fopen(filename,'rt');
z= textscan(fid, '%f', 'Delimiter','\n', 'whitespace',' ', 'TreatAsEmpty','No Data', 'EndOfLine','\n','TextType','char');
z=z{1};
fclose(fid);
toc
result:
Elapsed time is 0.273248 seconds.
Elapsed time is 0.304987 seconds.
Elapsed time is 0.206315 seconds.
txt2mat is slow, even without converting resulting string matrix to numbers it is outperformed by readtable and textscan. textscan is slightly faster than readtable. Probably because it skips some of the internal sanity checks and does not convert the resulting data to a table.
Depending of how big are your files and how often you read such files, you might want to go beyond readtable, that could be quite slow.
EDIT: After tests, with a file this simple the method below provide no advantages. The method was developed to read RINEX files, that are large and complex in the sense that the are aphanumeric with different numbers of columns and different delimiters in different rows.
The most efficient way I've found, is to read the whole file as a char matrix, then you can easily find you "No data" lines. And if your real numbers are formatted with fix width you can transform them from char into numbers in a way much more efficient than str2double or similar functions.
The function I wrote to read a text file into a char matrix is:
function [txtMat, nLines]=txt2mat(filename)
% txt2mat Read the content of a text file to a char matrix
% Read all the content of a text file to a matrix as wide as the longest
% line on the file. Shorter lines are padded with blank spaces. New lines
% are not included in the output.
% New lines are identified by new line \n characters.
% Reading the whole file in a string
fid=fopen(filename,'r');
fileData = char(fread(fid));
fclose(fid);
% Finding new lines positions
newLines= fileData==sprintf('\n');
linesEndPos=find(newLines)-1;
% Calculating number of lines
nLines=length(linesEndPos);
% Calculating the width (number of characters) of each line
linesWidth=diff([-1; linesEndPos])-1;
% Number of characters per row including new lines
charsPerRow=max(linesWidth)+1;
% Initializing output var with blank spaces
txtMat=char(zeros(charsPerRow,nLines,'uint8')+' ');
% Computing a logical index to all characters of the input string to
% their final positions
charIdx=false(charsPerRow,nLines);
% Indexes of all new lines
linearInd = sub2ind(size(txtMat), (linesWidth+1)', 1:nLines);
charIdx(linearInd)=true;
charIdx=cumsum(charIdx)==0;
% Filling output matrix
txtMat(charIdx)=fileData(~newLines);
% Cropping the last row coresponding to new lines characters and transposing
txtMat=txtMat(1:end-1,:)';
end
Then, once you have all your data in a matrix (let's assume it is named txtMat), you can do:
NoData=txtMat(:,1)=='N';
And if your number fields have fix width, you can transform them to numbers way more efficiently than str2num with something like
values=((txtMat(:,1:10)-'0')*[1e6; 1e5; 1e4; 1e3; 1e2; 10; 1; 0; 1e-1; 1e-2]);
Where I've assumed the numbers have 7 digits and two decimal places, but you can easily adapt it for your case.
And to finish you need to set the NaN values with:
values(NoData)=NaN;
This is more cumbersome than readtable or similar functions, but if you are looking to optimize the reading, this is WAY faster. And if you don't have fix width numbers you can still do it this way by adding a couple lines to count the number of digits and find the place of the decimal point before doing the conversion, but that will slow down things a little bit. However, I think it will still be faster.

Plot in MATLAB using data from a txt file

I need to read data from a file and plot a graph with its data. The problem is:
(1) I can't change the format of data in the file
(2) The format contains information and characters that I don't know how to deal with.
Here is a part of the data file, it's in a txt format:
Estation;Date;Time;Temp1;Temp2;Pressure;
83743;01/01/2016;0000;31.9;25.3;1005.1;
83743;01/01/2016;1200;31.3;26.7;1005.7;
83743;01/01/2016;1800;33.1;25.4;1004.3;
83743;02/01/2016;0000;26.1;24.2;1008.6;
What I'm trying to do is to plot the Date and Time against Temp1 and Temp2, not worrying about Pressure. The first column can be neglected as well. How can I extract the Date, Time and Temps into and matrix so I can plot them? All I did so far was this:
fileID = fopen('teste.txt','r');
[A] = fscanf(fileID, ['%d' ';']);
fclose(fileID);
disp(A);
Which just reads the first value, 83743.
To build on m7913d's answer:
fileID = fopen('MyFile.txt','r');
A = fscanf(fileID, ['%s' ';']); % read the header line
B = fscanf(fileID, '%d;%d/%d/%d;%d;%f;%f;%f;', [8,inf]); % read all the data into B (the date is parsed into three columns)
fclose(fileID);
B = B.'; % transpose B
% C is just for verification, can be omitted
C = datetime([B(:,4:-1:2) B(:,5)/100zeros(numel(B(:,1)),2)],'InputFormat','yyyy-MM-dd HH:mm:ss');
D = datenum(C); % Get the date in a MATLAB usable format
Titles = strsplit(A,';'); % Get column names
figure;
hold on % hold the figure for multiple plots
plot(D,B(:,6),'r')
plot(D,B(:,7),'b')
datetick('x') % Set a date tick as axis
legend(Titles{4},Titles{5}); % uses titles for legend
note the parsing of the date into C: first is the date as given by you in dd-MM-yyyy format, which I flip to the official standard of yyyy-MM-dd, then your hour, which needs to be divided by 100, then a 0 for both minutes and seconds. You might need to rip those apart when you don't have exactly hourly data. Finally transform to a regular datenum, which MATLAB can use for processing.
Which results in:
You might want to play around with the datetick format, as it's got lots of options which might appeal to you.
fileID = fopen('input.txt','r');
[A] = fscanf(fileID, ['%s' ';']); % read the header line
[B] = fscanf(fileID, '%d;%d/%d/%d;%d;%f;%f;%f;', [8,inf]); % read all the data into B (the date is parsed into three columns)
fclose(fileID);
disp(B');
Note that %d reads an integer (not a double) and %f reads a floating point number.
See fscanf for more details.

Change default NaN representation of fprintf() in Matlab

I am trying to export data from Matlab in format that would be understood by another application... For that I need to change the NaN, Inf and -Inf strings (that Matlab prints by default for such values) to //m, //inf+ and //Inf-.
In general I DO KNOW how to accomplish this. I am asking how (and whether it is possible) to exploit one particular thing in Matlab. The actual question is located in the last paragraph.
There are two approaches that I have attempted (code bellow).
Use sprintf() on data and strrep() the output. This is done in line-by-line fashion in order to save memory. This solution takes almost 10 times more time than simple fprintf(). The advantage is that it has low memory overhead.
Same as option 1., but the translation is done on the whole data at once. This solution is way faster, but vulnerable to out of memory exception. My problem with this approach is that I do not want to unnecessarily duplicate the data.
Code:
rows = 50000
cols = 40
data = rand(rows, cols); % generate random matrix
data([1 3 8]) = NaN; % insert some NaN values
data([5 6 14]) = Inf; % insert some Inf values
data([4 2 12]) = -Inf; % insert some -Inf values
fid = fopen('data.txt', 'w'); %output file
%% 0) Write data using default fprintf
format = repmat('%g ', 1, cols);
tic
fprintf(fid, [format '\n'], data');
toc
%% 1) Using strrep, writing line by line
fprintf(fid, '\n');
tic
for i = 1:rows
fprintf(fid, '%s\n', strrep(strrep(strrep(sprintf(format, data(i, :)), 'NaN', '//m'), '-Inf', '//inf-'), 'Inf', '//inf+'));
end
toc
%% 2) Using strrep, writing all at once
fprintf(fid, '\n');
format = [format '\n'];
tic
fprintf(fid, '%s\n', strrep(strrep(strrep(sprintf(format, data'), 'NaN', '//m'), '-Inf', '//inf-'), 'Inf', '//inf+'));
toc
Output:
Elapsed time is 1.651089 seconds. % Regular fprintf()
Elapsed time is 11.529552 seconds. % Option 1
Elapsed time is 2.305582 seconds. % Option 2
Now to the question...
I am not satisfied with the memory overhead and time lost using my solutions in comparison with simple fprintf().
My rationale is that the 'NaN', 'Inf' and '-Inf' strings are simple data saved in some variable inside the *printf() or *2str() implementation. Is there any way to change their value at runtime?
For example in C# I would change the System.Globalization.CultureInfo.NumberFormat.NaNSymbol, etc. as explaind here.
In the limited case mentioned in comments that a number of (unknown, changing per data set) columns may be entirely NaN (or Inf, etc), but that there are not unwanted NaN values otherwise, another possibility is to check the first row of data, assemble a format string which writes the \\m strings directly, and use that while telling fprintf to ignore the columns that contain NaN or other unwanted values.
y = ~isnan(data(1,:)); % find all non-NaN
format = sprintf('%d ',y); % print a 1/0 string
format = strrep(format,'1','%g');
format = strrep(format,'0','//m');
fid = fopen('data.txt', 'w');
fprintf(fid, [format '\n'], data(:,y)'); %pass only the non-NaN data
fclose(fid);
By my check with two columns of NaN this fprintf is pretty much the same as your "regular" fprintf and quicker than the loop - not taking into account the initialisation step of producing format. It would be fiddlier to set it up to automatically produce the format string if you also have to take +/- Inf into account, but certainly possible. There is probably a cleaner way of producing format as well.
How it works:
You can pass in a subset of your data, and you can also insert any text you like into a format string, so if every row has the same desired "text" in the same spot (in this case NaN columns and our desired replacement for "NaN"), we can put the text we want in that spot and then just not pass those parts of the data to fprintf in the first place. A simpler example for trying out on the command line:
x = magic(5);
x(:,3)=NaN
sprintf('%d %d ihatethrees %d %d \n',x(:,[1,2,4,5])');

How to import non-comma/tab separated ASCII numerical data and render into vector (in matlab)

I want to import 100,000 digits of pi into matlab and manipulate it as a vector. I've copied and pasted these digits from here and saved them in a text file. I'm now having a lot of trouble importing these digits and rendering them as a vector.
I've found this function which calculates the digits within matlab. However, even with this I'm having trouble turning the output into a vector which I could then, for example, plot. (Plus it's rather slow for 100,000 digits, at least on my computer.)
Use textscan for that:
fid = fopen('100000.txt','r');
PiT = textscan(fid,'%c',Inf);
fclose(fid);
PiT is a cell array, so convert it to a vector of chars:
PiT = cell2mat(PiT(1));
Now, you want a vector of int, but you have to discard the decimal period to use the standard function:
Pi = cell2mat(textscan(PiT([1,3:end]),'%1d', Inf));
Note: if you delete (manually) the period, you can do that all in once:
fid = fopen('100000.txt','r');
Pi = cell2mat(textscan(fid,'%1d',Inf));
fclose(fid);
EDIT
Here is another solution, using fscanf, as textscan may not return a cell-type result.
Read the file with fscanf:
fid = fopen('100000.txt','r');
Pi = fscanf(fid,'%c');
fclose(fid);
Then take only the digits and convert the string as digits:
Pi = int32(Pi((Pi>='0')&(Pi<='9')))-int32('0');
Function int32 may be replaced by other conversion functions (e.g. double).
It can be done in one line.
When you modify your text file to just include the post-comma digits 14... you can use fscanf to get the vector you want.
pi = fscanf( fopen('pi.txt'), '%1d' ); fclose all
or without modifying the file:
pi = str2double( regexp( fscanf( fopen('pi.txt') ,'%1c'), '[0-9]', 'match'));