Plottig data - time plotting - matlab

I want to plot some data in Matlab, but I'm having problems with properly displaying time.
Time is in format:
HH:MM:SS.miliseconds
So for example: 11:16:41.835
I read my .txt file (tab delimited), and now I need to plot data. All data are the same length, and I reckon there is a problem with the time format. Any advice? plot(time, data1) doesn't work either.

Consider using matlab's built-in datavec, for example:
dvec = datevec('11:21:02.647', 'HH:MM:SS.FFF')
dvec = 1.0e+003 *
2.0100 0.0010 0.0010 0.0110 0.0210 0.0026
More info can be found here

Related

MATLAB CSV Import Warping Data

When I import my data (numerical matrix of NYSE stock data), the data isn't loaded properly:
the final part of my CSV data disp() displayed should be -
9.76, 10, 9.99, 9.94, 9.97,9.944,9.95,10,9.956,10.01
What I get when I call the disp(importDataResult) is -
0.0100 0.0099 0.0099 0.0100 etc..
Have you got any idea why when I import the data it is transformed completely? The below link contains my zipped CSV file so you can see the problem (I completely understand if you can't be bothered checking this out, but I'd be interested to know if the same problem applies to others' MATLAB / computers).
https://www.sendspace.com/file/slif0y
The code I'm using is:
function [ c ] = CreateCov_Test()
c = csvread('nyse_data_matrix_no_tags.csv');
disp(c);
end
Here is a screenshot of the issue:
https://s32.postimg.org/os74qfrlx/matlab_screen.png
Thank you very much!
Matlab is not transforming any data. The configuration of who Matlab is displaying variables is controlled with format, the default being format short.
An excerpt from the documentation:
format may be used to switch between different output display formats of all float variables as follows:
format SHORT Scaled fixed point format with 5 digits.
So what does Scaled fixed point format with 5 digits mean, well lets see
>> a = [0.1 10000 100]
>> disp(a)
1.0e+04 *
0.0000 1.0000 0.1000
Note the 1.0e+04 *, its a multiplier for all data in the matrix. When displaying a large matrix, this multiplier is often hidden (as in your case), which admittedly can be rather confusing.

MATLAB reading large text log files with errors

I'm trying to analyze a large text log file (11 GB). All of the data are numerical values, and a snippit of the data are listed below.
-0.0623 0.0524 -0.0658 -0.0015 0.0136 -0.0063 0.0259 -0.003
-0.0028 0.0403 0.0009 -0.0016 -0.0013 -0.0308 0.0511 0.0187
0.0894 0.0368 0*0243 0.0279 0.0314 -0.0212 0.0582 -0.0403 //<====row 3, weird ASCII char * is present
-0.0548 0.0132 0.0299 0.0215 0.0236 0.0215 0.003 -0.0641
-0.0615 0.0421 0.0009 0.0457 0.0018 -0.0259 0.041 0.031
-0.0793 0.01 //<====row 6, the data is misaligned here
0.0278 0.0053 -0.0261 0.0016 0.0233 0.0719
0.0143 0.0163 -0.0101 -0.0114 -0.0338 -0.0415 0.0143 0.129
-0.0748 -0.0432 0.0044 0.0064 -0.0508 0.0042 0.0237 0.0295
0.040 -0.0232 -0.0299 -0.0066 -0.0539 -0.0485 -0.0106 0.0225
Every set of data consists of 2048 rows, and each row has 8 columns.
Here comes the problem: when the data is transformed from binary files to text files using the logging software, a lot of the data are distorted. Take the data above as an example, row 3 column 3 there is a " * " present in the data. And in row 6, one row of data is broken into two rows, one row has 2 data and the other row has 6 data.
I am currently struggling reading this large text files using MATLAB. Since the file itself is so large, I can only use textscan to read the data.
for example:
C = textscan(fd,'%f%f%f%f%f%f%f%f',1,'Delimiter','\t ');
However, I cannot use '%f' as format since there contains several weird ASCII characters such as " * " or " ! " in the data. These distorted data cannot be treated as floating point numbers.
So I choose to use:
C = textscan(fd,'%s%s%s%s%s%s%s%s',1,'Delimiter','\t ');
and then I transfer those strings into doubles to be processed. However, this encounters the problem of broken lines. When it reaches row 6, it gives:
[-0.0793],[0.01],[],[],[],[],[],[];
[0.0278],[0.0053],[-0.0261],[0.0016],[0.0233],[0.0719],[0.0143],[0.0163];
while it is supposed to look like:
-0.0793 0.01 0.0278 0.0053 -0.0261 0.0016 0.0233 0.0719 ===> one set
0.0143 0.0163 -0.0101 -0.0114 -0.0338 -0.0415 0.0143 0.129 ===> another set
Then the data will be offset by one row and the columns are messed up.
Then I try to do:
C = textscan(fd,'%s',1,'Delimiter','\t ');
to read one element at one time. If this element is NaN, it will textscan the next one until it sees something other than NaN. Once it obtains 2048 non-empty elements, it will store those 2048 data into a matrix to be processed. After being processed, this matrix is cleared.
This method works well for the first 20% of the whole file....BUT,
since the file itself is 11GB which is very large, after reading about 20% of the file, MATLAB shows:
Error using ==> textscan
Out of memory. Type HELP MEMORY for your options.
(some people suggest using %f while doing textscan, but it won't work because there are some ASCII chars which are causing problem)
Any suggestions to deal with this file?
EDIT:
I have tried:
C = textscan(fd,'%s%s%s%s%s%s%s%s',2048,'Delimiter','\t ');
Although the result is incorrect due to the misalignment of data (like row 6), this code indeed does not cause the "Out of memory" problem. Out of memory problem only occurs when I try to use
C= textscan(fd,'%s',1,'Delimiter','\t ').
to read the data one entry by one entry. Anyone has any idea why this memory problem happens?
This might seem silly, but are you preallocating an array for this data? If the only issue (as it seems to be) with your last function is memory, perhaps
C = zeros(2048,8);
will alleviate your problem. Try inserting that line before you call textscan. I know that MATLAB often exhorts programmers to preallocate for speed; this is just a shot in the dark, but preallocating memory may fix your issue.
Edit: also see this MATLAB Central discussion of a similar issue. It may be that you will have to run the file in chunks, and then concatenate the arrays when each chunk is finished.
Try something like the code below. It preallocates space and reads numRow* numColumns from the textfile at a time. If you can initialize the bigData matrix then it shouldn't run out of memory ... I think.
Note: That I used 9 for #rows since your sample data had 9 complete rows you will want to use 2024 I presume. This also might need some end of file checks etc. and some error handling. Also any numbers w/ odd ascii text in the will turn into NaN.
Note 2: This still might not work or be very very slow. I had a similar problem reading large text files (10-20GB) that were slightly more complicated. I had to abandon reading them in Matlab. Instead I used Perl for an initial pass which output to binary. Then used Matlab to read the binary back into data. The 2 step approach ended up saving lots and lots of runtime. Link in case you are interested
function bigData = readData(fileName)
fid = fopen(fileName,'r');
numBlocks = 1; %Somehow determine # of blocks??? not sure if you know of a way to determine this
r = 9; %Replace 9 with your size 2048
c = 8;
bigData = zeros(r*numBlocks,8);
for k = 1:numBlocks
[dataBlock, rFlag] = readDataBlock(fid,r,c);
if rFlag
%Or some kind of error.
break
end
bigData((k-1)*r+1:k*r,:) = dataBlock;
end
fclose(fid);
function [dataBlock, rFlag]= readDataBlock(fid,r,c)
C= textscan(fid,'%s',r*c,'Delimiter','\t '); %replace 9*8 by the size of the data block.
dataBlock = [];
if numel(C{1}) == r*c
dataBlock = reshape(str2double(C{1}),9,8);
rFlag = false;
else
rFlag = true;
% ?? Throw an error or whatever is appropriate
end
While I don't really know how to solve your problems with the broken data, I can give some advice how to process big text data. Read it in batches of multiple lines and write the output directly to the hard drive. In your case the second might be unnecessary, if everything is working you could try to replace data with a variable.
The code was originally written for a different purpose, I deleted the parser for my problem and replaced it with parsed_data=buffer; %TODO;
outputfile='out1.mat';
inputfile='logfile1';
batchsize=1000; %process 1000 lines at once
data=matfile(outputfile,'writable',true); %Simply delete this line if you dant "data" to be a variable in your memory
h=dir(inputfile);
bytes_to_read=h.bytes;
data.out={};
input=fopen(inputfile);
buffer={};
while ftell(input)<bytes_to_read
buffer=[buffer,textscan(input,'%s',batchsize-numel(buffer))];
parsed_data=buffer; %TODO;
data.out(end+1,1)={parsed_data};
buffer={}; %In the default case your empty your buffer her.
%If a incomplete line read here partially, leave it in the buffer.
end
fclose(input);

Strange fluctuating behaviour in time difference between successive frames read

Intuitively I would assume that using the function readFrame(videoObj) to read each frame would change the attribute videoObj.CurrentTime as such:
videoObj.CurrentTime = videoObj.CurrentTime + videoObj.FrameRate * videoObj.Duration
I.e. whenever each frame is extracted from videoObj, videoObj.CurrentTime is set to the time value in the video file where the next frame is located.
However, by observing how the attribute videoObj.CurrentTime changes throughout the reading of the video file, I see that the above is almost correct. (see the bottom of this question)
So, does anyone have an idea of why the time difference between successive frames is fluctuating?
Here is the code used to plot the time difference between successive frames during reading.
video = VideoReader('filename');
time = zeros(video.FrameRate * video.Duration,2);
time(1,1) = video.CurrentTime;
i = 2;
while hasFrame(video)
frame = readFrame(video);
time(i,1) = video.CurrentTime;
time(i,2) = time(i,1)-time(i-1,1);
i = i + 1;
end
figure;plot(time(:,1),time(:,2),'*')
xlabel('elapsed time in the video')
ylabel('time difference between frames')
And here is a plot over the variable time, http://oi62.tinypic.com/2ykf7zm.jpg. (SO wouldn't let me upload the figure here, because I don't have enough reputation points).
If you feel uncomfortable clicking the link, here are some relevant snippets of the variable time. The left column is the elapsed time in the video, i.e. videoObj.CurrentTime during the reading, and the right column is the increase in videoObj.CurrentTime in each reading.
0 0
0.0660 0.0660
0.1330 0.0670
0.2000 0.0670
0.2660 0.0660
0.3330 0.0670
0.4000 0.0670
0.4660 0.0660
0.5330 0.0670
0.6000 0.0670
...
1.4660 0.0660
1.5330 0.0670
1.6000 0.0670
1.7330 0.1330
1.8000 0.0670
1.8660 0.0660
1.9330 0.0670
2 0.0670
As you can see the time difference between each reading fluctuates between 0.0660 and 0.0670. This can probably be explained by the precision of approximating floating numbers (but isn't a precision of order 10^-3 very bad anyway?). BUT, at some points, the difference is twice as much as it should be, with a value of 0.1330. I.e. it seems that readFrame(videoObj) skips a frame. How can I make sure that this doesn't happen?
The equation
videoObj.CurrentTime = videoObj.CurrentTime + videoObj.FrameRate * videoObj.Duration
is valid only if the video stream has a fixed frame rate. There are numerous video files out there where the frame-rate of the video varies within the video stream. One of the reasons for doing this is improved compression (for a slowly varying scene, you can lower the frame-rate). The FrameRate reported in this case is the average frame rate.
The file you are working with could be variable frame-rate file.
Hope this helps.
Dinesh

Excess of data when reading file in MatLab

faced with the problem, txt file have 1000 double values, but when I have loaded this file, I get 2000, other values are zero values.
Help me please, I just started learning Matlab.
clear all
file = 'C:\data.txt';
x = load(file);
x = sort(x(:));
x
x = 1.0e+010 *
0.0000
0.0000
0.0000
0.0000
...
Use the Matlab function textscan. This function goes line by line and parses data into C as a cell array. You can also control the number of lines it reads in based on the number N.
C = textscan(fileID,formatSpec,N)
The Matlab website has many examples on how to call this function.

How do you create a 2X5 matrix from a text file in matlab

I am trying to obtain a 5X2 matrixfrom a text file.
For Example :
0_0 1_0
0_200 1_200
0_400 1_400
0_600 1_600
0_800 1_800
This is the code I am currently using:
[filename,pathname]= uigetfile({'*.txt'});
set(handles.temp1,'String',fullfile(pathname,filename));
chosenfile=get(handles.temp1,'String');
fid=fopen(chosenfile);
allcoordinates=textscan(fid,'%s,%s','whitespace','\n');
fclose(fid);
This code would produce a 5X1 matrix as shown below :
0_01_0
0_2001_200
0_4001_400
0_6001_600
0_8001_800
Sadly, the approach that works best with interpretation of files, is to be very conservative with relying on the capabilities of 'canned routines' like textscan, dlmread and similar.
This is not because these routines are implemented badly, it's because there is very little standardization in number formatting in text files, and basically, everybody just invents a new standard on the spot.
You just can't design a routine that always works correctly for all text files. I think The Mathworks did a very decent job with their dlmread and similar, however, you have just presented yet another standard in number formatting that is overly difficult to interpret in one go with textscan or dlmread or others. Therefore, be conservative: just read it without too much hassle, and do the conversion yourself.
For example:
%// Read data
fid = fopen('yourFile.txt', 'r');
C = textscan(fid, '%s %s');
fclose(fid);
%// Replace all underscores with '.', and convert to 'double'
C = str2double(strrep([C{:}],'_','.'))
Results:
C =
0 1.0000
0.2000 1.2000
0.4000 1.4000
0.6000 1.6000
0.8000 1.8000