I need to read data from a file and plot a graph with its data. The problem is:
(1) I can't change the format of data in the file
(2) The format contains information and characters that I don't know how to deal with.
Here is a part of the data file, it's in a txt format:
Estation;Date;Time;Temp1;Temp2;Pressure;
83743;01/01/2016;0000;31.9;25.3;1005.1;
83743;01/01/2016;1200;31.3;26.7;1005.7;
83743;01/01/2016;1800;33.1;25.4;1004.3;
83743;02/01/2016;0000;26.1;24.2;1008.6;
What I'm trying to do is to plot the Date and Time against Temp1 and Temp2, not worrying about Pressure. The first column can be neglected as well. How can I extract the Date, Time and Temps into and matrix so I can plot them? All I did so far was this:
fileID = fopen('teste.txt','r');
[A] = fscanf(fileID, ['%d' ';']);
fclose(fileID);
disp(A);
Which just reads the first value, 83743.
To build on m7913d's answer:
fileID = fopen('MyFile.txt','r');
A = fscanf(fileID, ['%s' ';']); % read the header line
B = fscanf(fileID, '%d;%d/%d/%d;%d;%f;%f;%f;', [8,inf]); % read all the data into B (the date is parsed into three columns)
fclose(fileID);
B = B.'; % transpose B
% C is just for verification, can be omitted
C = datetime([B(:,4:-1:2) B(:,5)/100zeros(numel(B(:,1)),2)],'InputFormat','yyyy-MM-dd HH:mm:ss');
D = datenum(C); % Get the date in a MATLAB usable format
Titles = strsplit(A,';'); % Get column names
figure;
hold on % hold the figure for multiple plots
plot(D,B(:,6),'r')
plot(D,B(:,7),'b')
datetick('x') % Set a date tick as axis
legend(Titles{4},Titles{5}); % uses titles for legend
note the parsing of the date into C: first is the date as given by you in dd-MM-yyyy format, which I flip to the official standard of yyyy-MM-dd, then your hour, which needs to be divided by 100, then a 0 for both minutes and seconds. You might need to rip those apart when you don't have exactly hourly data. Finally transform to a regular datenum, which MATLAB can use for processing.
Which results in:
You might want to play around with the datetick format, as it's got lots of options which might appeal to you.
fileID = fopen('input.txt','r');
[A] = fscanf(fileID, ['%s' ';']); % read the header line
[B] = fscanf(fileID, '%d;%d/%d/%d;%d;%f;%f;%f;', [8,inf]); % read all the data into B (the date is parsed into three columns)
fclose(fileID);
disp(B');
Note that %d reads an integer (not a double) and %f reads a floating point number.
See fscanf for more details.
Related
I have a large data file with a text formatted as a single column with n rows. Each row is either a real number or a string with a value of: No Data. I have imported this text as a nx1 cell named Data. Now I want to filter out the data and to create a nx1 array out of it with NaN values instead of No data. I have managed to do it using a simple cycle (see below), the problem is that it is quite slow.
z = zeros(n,1);
for i = 1:n
if Data{i}(1)~='N'
z(i) = str2double(Data{i});
else
z(i) = NaN;
end
end
Is there a way to optimize it?
Actually, the whole parsing can be performed with a one-liner using a properly parametrized readtable function call (no iterations, no sanitization, no conversion, etc...):
data = readtable('data.txt','Delimiter','\n','Format','%f','ReadVariableNames',false,'TreatAsEmpty','No data');
Here is the content of the text file I used as a template for my test:
9.343410
11.54300
6.733000
-135.210
No data
34.23000
0.550001
No data
1.535000
-0.00012
7.244000
9.999999
34.00000
No data
And here is the output (which can be retrieved in the form of a vector of doubles using data.Var1):
ans =
9.34341
11.543
6.733
-135.21
NaN
34.23
0.550001
NaN
1.535
-0.00012
7.244
9.999999
34
NaN
Delimiter: specified as a line break since you are working with a single column... this prevents No data to produce two columns because of the whitespace.
Format: you want numerical values.
TreatAsEmpty: this tells the function to treat a specific string as empty, and empty doubles are set to NaN by default.
If you run this you can find out which approach is faster. It creates an 11MB text file and reads it with the various approaches.
filename = 'data.txt';
%% generate data
fid = fopen(filename,'wt');
N = 1E6;
for ct = 1:N
val = rand(1);
if val<0.01
fwrite(fid,sprintf('%s\n','No Data'));
else
fwrite(fid,sprintf('%f\n',val*1000));
end
end
fclose(fid)
%% Tommaso Belluzzo
tic
data = readtable(filename,'Delimiter','\n','Format','%f','ReadVariableNames',false,'TreatAsEmpty','No Data');
toc
%% Camilo Rada
tic
[txtMat, nLines]=txt2mat(filename);
NoData=txtMat(:,1)=='N';
z = zeros(nLines,1);
z(NoData)=nan;
toc
%% Gelliant
tic
fid = fopen(filename,'rt');
z= textscan(fid, '%f', 'Delimiter','\n', 'whitespace',' ', 'TreatAsEmpty','No Data', 'EndOfLine','\n','TextType','char');
z=z{1};
fclose(fid);
toc
result:
Elapsed time is 0.273248 seconds.
Elapsed time is 0.304987 seconds.
Elapsed time is 0.206315 seconds.
txt2mat is slow, even without converting resulting string matrix to numbers it is outperformed by readtable and textscan. textscan is slightly faster than readtable. Probably because it skips some of the internal sanity checks and does not convert the resulting data to a table.
Depending of how big are your files and how often you read such files, you might want to go beyond readtable, that could be quite slow.
EDIT: After tests, with a file this simple the method below provide no advantages. The method was developed to read RINEX files, that are large and complex in the sense that the are aphanumeric with different numbers of columns and different delimiters in different rows.
The most efficient way I've found, is to read the whole file as a char matrix, then you can easily find you "No data" lines. And if your real numbers are formatted with fix width you can transform them from char into numbers in a way much more efficient than str2double or similar functions.
The function I wrote to read a text file into a char matrix is:
function [txtMat, nLines]=txt2mat(filename)
% txt2mat Read the content of a text file to a char matrix
% Read all the content of a text file to a matrix as wide as the longest
% line on the file. Shorter lines are padded with blank spaces. New lines
% are not included in the output.
% New lines are identified by new line \n characters.
% Reading the whole file in a string
fid=fopen(filename,'r');
fileData = char(fread(fid));
fclose(fid);
% Finding new lines positions
newLines= fileData==sprintf('\n');
linesEndPos=find(newLines)-1;
% Calculating number of lines
nLines=length(linesEndPos);
% Calculating the width (number of characters) of each line
linesWidth=diff([-1; linesEndPos])-1;
% Number of characters per row including new lines
charsPerRow=max(linesWidth)+1;
% Initializing output var with blank spaces
txtMat=char(zeros(charsPerRow,nLines,'uint8')+' ');
% Computing a logical index to all characters of the input string to
% their final positions
charIdx=false(charsPerRow,nLines);
% Indexes of all new lines
linearInd = sub2ind(size(txtMat), (linesWidth+1)', 1:nLines);
charIdx(linearInd)=true;
charIdx=cumsum(charIdx)==0;
% Filling output matrix
txtMat(charIdx)=fileData(~newLines);
% Cropping the last row coresponding to new lines characters and transposing
txtMat=txtMat(1:end-1,:)';
end
Then, once you have all your data in a matrix (let's assume it is named txtMat), you can do:
NoData=txtMat(:,1)=='N';
And if your number fields have fix width, you can transform them to numbers way more efficiently than str2num with something like
values=((txtMat(:,1:10)-'0')*[1e6; 1e5; 1e4; 1e3; 1e2; 10; 1; 0; 1e-1; 1e-2]);
Where I've assumed the numbers have 7 digits and two decimal places, but you can easily adapt it for your case.
And to finish you need to set the NaN values with:
values(NoData)=NaN;
This is more cumbersome than readtable or similar functions, but if you are looking to optimize the reading, this is WAY faster. And if you don't have fix width numbers you can still do it this way by adding a couple lines to count the number of digits and find the place of the decimal point before doing the conversion, but that will slow down things a little bit. However, I think it will still be faster.
I have a two column data with mmyyyy and SPI (Standardized Precipitation Index) variables. The first two samples have no data (NAN). The file is:
011982 NAN
021982 NAN
031982 -1.348
.
.
.
122013 1.098
I load the time and SPI data into MATLAB, then I would like to plot it but it is not working.
I would like to plot line graph but I really have no idea how to plot time in x-axis and I would like my x-axis to show only the year.
Using the new datetime data type in MATLAB (added in R2014b), this should be easy.
Here is an example. First we load the data into a MATLAB table:
% import data from file
fid = fopen('file.dat', 'rt');
C = textscan(fid, '%{MMyyyy}D %f');
fclose(fid);
% create table
t = table(C{:}, 'VariableNames',{'date','SPI'});
You get something like this:
>> t(1:10,:)
ans =
date SPI
______ ________
011982 NaN
021982 NaN
031982 2.022
041982 1.5689
051982 0.75813
061982 -0.74338
071982 -1.7323
081982 -2.4466
091982 -0.86604
101982 0.085698
Next to plot the data with date and time, it's as easy as calling plot:
plot(t.date, t.SPI)
xlabel('Date'), ylabel('Standardized Precipitation Index')
By default, plot chooses tick mark locations based on the range of data. When you zoom in and out of a plot, the tick labels automatically adjust to the new axis limits.
But if you want, you can also specify a custom format for the datetime tick labels. Note that when you do this, the plot always formats the tick labels according to the specified value, they won't adjust on zoom:
plot(t.date, t.SPI, 'DatetimeTickFormat','yyyy')
I'm adding another answer that works in older MATLAB versions without table or datetime data types.
Like before, we first import the data from file, but this time we read the dates as strings then convert them to serial date numbers using datenum function (defined as the number of days since "January 0, 0000"):
% import data from file
fid = fopen('file.dat', 'rt');
C = textscan(fid, '%s %f');
fclose(fid);
% create matrix
t = [datenum(C{1},'mmyyyy') C{2}];
The data looks like this:
>> format long
>> t(1:10,:)
ans =
1.0e+05 *
7.239120000000000 NaN
7.239430000000000 NaN
7.239710000000000 0.000005606888474
7.240020000000000 0.000009156147863
7.240320000000000 0.000004504804864
7.240630000000000 0.000008359005819
7.240930000000000 0.000007436313932
7.241240000000000 0.000002800134237
7.241550000000000 0.000005261613664
7.241850000000000 0.000001809901372
Next we plot the data like before, but instead we use the datetick function to format the x-axis as dates ('yyyy' for years):
plot(t(:,1), t(:,2))
datetick('x', 'yyyy')
xlabel('Date'), ylabel('Standardized Precipitation Index')
Unfortunately the tick labels will not automatically update when you zoom in and out... The good news, there are solutions on the File Exchange that solve this issue, for example datetickzoom and datetick2.
I am trying to export data from Matlab in format that would be understood by another application... For that I need to change the NaN, Inf and -Inf strings (that Matlab prints by default for such values) to //m, //inf+ and //Inf-.
In general I DO KNOW how to accomplish this. I am asking how (and whether it is possible) to exploit one particular thing in Matlab. The actual question is located in the last paragraph.
There are two approaches that I have attempted (code bellow).
Use sprintf() on data and strrep() the output. This is done in line-by-line fashion in order to save memory. This solution takes almost 10 times more time than simple fprintf(). The advantage is that it has low memory overhead.
Same as option 1., but the translation is done on the whole data at once. This solution is way faster, but vulnerable to out of memory exception. My problem with this approach is that I do not want to unnecessarily duplicate the data.
Code:
rows = 50000
cols = 40
data = rand(rows, cols); % generate random matrix
data([1 3 8]) = NaN; % insert some NaN values
data([5 6 14]) = Inf; % insert some Inf values
data([4 2 12]) = -Inf; % insert some -Inf values
fid = fopen('data.txt', 'w'); %output file
%% 0) Write data using default fprintf
format = repmat('%g ', 1, cols);
tic
fprintf(fid, [format '\n'], data');
toc
%% 1) Using strrep, writing line by line
fprintf(fid, '\n');
tic
for i = 1:rows
fprintf(fid, '%s\n', strrep(strrep(strrep(sprintf(format, data(i, :)), 'NaN', '//m'), '-Inf', '//inf-'), 'Inf', '//inf+'));
end
toc
%% 2) Using strrep, writing all at once
fprintf(fid, '\n');
format = [format '\n'];
tic
fprintf(fid, '%s\n', strrep(strrep(strrep(sprintf(format, data'), 'NaN', '//m'), '-Inf', '//inf-'), 'Inf', '//inf+'));
toc
Output:
Elapsed time is 1.651089 seconds. % Regular fprintf()
Elapsed time is 11.529552 seconds. % Option 1
Elapsed time is 2.305582 seconds. % Option 2
Now to the question...
I am not satisfied with the memory overhead and time lost using my solutions in comparison with simple fprintf().
My rationale is that the 'NaN', 'Inf' and '-Inf' strings are simple data saved in some variable inside the *printf() or *2str() implementation. Is there any way to change their value at runtime?
For example in C# I would change the System.Globalization.CultureInfo.NumberFormat.NaNSymbol, etc. as explaind here.
In the limited case mentioned in comments that a number of (unknown, changing per data set) columns may be entirely NaN (or Inf, etc), but that there are not unwanted NaN values otherwise, another possibility is to check the first row of data, assemble a format string which writes the \\m strings directly, and use that while telling fprintf to ignore the columns that contain NaN or other unwanted values.
y = ~isnan(data(1,:)); % find all non-NaN
format = sprintf('%d ',y); % print a 1/0 string
format = strrep(format,'1','%g');
format = strrep(format,'0','//m');
fid = fopen('data.txt', 'w');
fprintf(fid, [format '\n'], data(:,y)'); %pass only the non-NaN data
fclose(fid);
By my check with two columns of NaN this fprintf is pretty much the same as your "regular" fprintf and quicker than the loop - not taking into account the initialisation step of producing format. It would be fiddlier to set it up to automatically produce the format string if you also have to take +/- Inf into account, but certainly possible. There is probably a cleaner way of producing format as well.
How it works:
You can pass in a subset of your data, and you can also insert any text you like into a format string, so if every row has the same desired "text" in the same spot (in this case NaN columns and our desired replacement for "NaN"), we can put the text we want in that spot and then just not pass those parts of the data to fprintf in the first place. A simpler example for trying out on the command line:
x = magic(5);
x(:,3)=NaN
sprintf('%d %d ihatethrees %d %d \n',x(:,[1,2,4,5])');
I need to read the following csv file in MATLAB:
2009-04-29 01:01:42.000;16271.1;16271.1
2009-04-29 02:01:42.000;2.5;16273.6
2009-04-29 03:01:42.000;2.599609;16276.2
2009-04-29 04:01:42.000;2.5;16278.7
...
I'd like to have three columns:
timestamp;value1;value2
I tried the approaches described here:
Reading date and time from CSV file in MATLAB
modified as:
filename = 'prova.csv';
fid = fopen(filename, 'rt');
a = textscan(fid, '%s %f %f', ...
'Delimiter',';', 'CollectOutput',1);
fclose(fid);
But it returs a 1x2 cell, whose first element is a{1}='ÿþ2', the other are empty.
I had also tried to adapt to my case the answers to these questions:
importing data with time in MATLAB
Read data files with specific format in matlab and convert date to matal serial time
but I didn't succeed.
How can I import that csv file?
EDIT After the answer of #macduff i try to copy-paste in a new file the data reported above and use:
a = textscan(fid, '%s %f %f','Delimiter',';');
and it works.
Unfortunately that didn't solve the problem because I have to process csv files generated automatically, which seems to be the cause of the strange MATLAB behavior.
What about trying:
a = textscan(fid, '%s %f %f','Delimiter',';');
For me I get:
a =
{4x1 cell} [4x1 double] [4x1 double]
So each element of a corresponds to a column in your csv file. Is this what you need?
Thanks!
Seems you're going about it the right way. The example you provide poses no problems here, I get the output you desire. What's in the 1x2 cell?
If I were you I'd try again with a smaller subset of the file, say 10 lines, and see if the output changes. If yes, then try 100 lines, etc., until you find where the 4x1 cell + 4x2 array breaks down into the 1x2 cell. It might be that there's an empty line or a single empty field or whatever, which forces textscan to collect data in an additional level of cells.
Note that 'CollectOutput',1 will collect the last two columns into a single array, so you'll end up with 1 cell array of 4x1 containing strings, and 1 array of 4x2 containing doubles. Is that indeed what you want? Otherwise, see #macduff's post.
I've had to parse large files like this, and I found I didn't like textscan for this job. I just use a basic while loop to parse the file, and I use datevec to extract the timestamp components into a 6-element time vector.
%% Optional: initialize for speed if you have large files
n = 1000 %% <# of rows in file - if known>
timestamp = zeros(n,6);
value1 = zeros(n,1);
value2 = zeros(n,1);
fid = fopen(fname, 'rt');
if fid < 0
error('Error opening file %s\n', fname); % exit point
end
cntr = 0
while true
tline = fgetl(fid); %% get one line
if ~ischar(tline), break; end; % break out of loop at end of file
cntr = cntr + 1;
splitLine = strsplit(tline, ';'); %% split the line on ; delimiters
timestamp(cntr,:) = datevec(splitLine{1}, 'yyyy-mm-dd HH:MM:SS.FFF'); %% using datevec to parse time gives you a standard timestamp vector
value1(cntr) = splitLine{2};
value2(cntr) = splitLine{3};
end
%% Concatenate at the end if you like
result = [timestamp value1 value2];
I have some problems in reading data from text file and plotting it.The text file contains
Date; Time; Temp °C
05.08.2011; 11:00:47;23.75
05.08.2011; 11:01:21;23.69
05.08.2011; 11:01:56;25.69
05.08.2011; 11:02:16;23.63
05.08.2011; 11:02:50;23.63
05.08.2011; 11:03:24;23.63
I want to plot the Temperature values with elapsed minutes.
firstly i used
[a,b]=textread('file1.txt','%s %s','headerlines',1)
to read the data in a string and I get
'17:09:16;21.75'
After that I used
a= strread('17:08:00;21.81','%s','delimiter', ';')
to get
'17:08:00'
'21.81'
But after this I am not been able to figure out how to move forward to deal with both these strings, especially time.
I want to plot temperature with time on xaxis..but not this time the elapsed time..in this case 2 mins 37 secs.
Help needed
Thanks Aabaz.thats really a big favor..I dun why I could figure it out ..I spent so much time on it
I have some 50 files comprising this data..If i want to loop it under this code , how can accomplish it, cz i have names of the file under ROM IDs..alike 1AHJDDHUD1224.txt.
How wud pass the file names in the loop.Do I have to change the names of the files then pass them under loop.I dun knw
I have one more question that if I wanted the values to be plotted after every 60 seconds..alike as soon the data is available in text files graph is plotted , and then graph is updated after every 60 sec until some more values are available in text file
Consider the following code. It will cycle through all .DAT files in a specific directory, read the data files, then plots with a the x-axis formatted as date/time:
%# get a list of files
BASE_DIR = 'C:\Users\Amro\Desktop';
files = dir( fullfile(BASE_DIR,'*.dat') );
files = {files.name};
%# read all files first
dt = cell(numel(files),1);
temps = cell(numel(files),1);
for i=1:numel(files)
%# read data file
fname = fullfile(BASE_DIR,files{i});
fid = fopen(fname);
C = textscan(fid, '%s %s %f', 'delimiter',';', 'HeaderLines',1);
fclose(fid);
%# datetime and temperature
dt{i} = datenum( strcat(C{1},{' '},C{2}) );
temps{i} = C{3};
end
Now we can plot the data (say we had 16 files, thus layout subplots as 4-by-4)
figure
for i=1:16
subplot(4,4,i), plot(dt{i}, temps{i}, '.-')
xlabel('DateTime'), ylabel('Temp °C')
datetick('x','HH:MM:SS')
end
You can merge the time strings with sprintf and translate them to seconds with datenum. Then the rest will be easy. Here is how it could work:
fid=fopen('data','r');
header=fgetl(fid);
data=textscan(fid,'%s','delimiter',';');
fclose(fid);
data=data{:};
day=data(1:3:end);
hour=data(2:3:end);
temp=str2double(data(3:3:end));
time=cellfun(#(x) sprintf('%s %s',day{strcmpi(hour,x)},x),hour,'uniformoutput',0);
% timev=datevec(time,'mm.dd.yyyy HH:MM:SS');
timen=datenum(time,'mm.dd.yyyy HH:MM:SS');
seconds=timen*86400;
plot(seconds-seconds(1),temp);
You may want to check the date format as I did not know which format you were using, so I guessed it was mm.dd.yyyy HH:MM:SS (see Matlab date specifiers)