reading specific lines from .txt file - matlab

I have large text file which has lots of information inside. I need to access only a line which stars with '***'. That line contains 17 numbers with a space between them.
Example of the file is,
Msg count = 2629
max send msg count = 34
avg send msg count = 10.27
imbalance send msg count = 3.31
------------------------------
max recv msg count = 35
avg recv msg count = 10.27
imbalance recv msg count = 3.41
***1.100020 306852 1381937 11045 5398.19 2.05 10465 5398.19 1.94 2629 34 10.27 3.31 35 10.27 3.41 0.000000
[INFO] +++ Sat Sep 24 15:15:33 2016
+++ (test.c:816) stat1 end
Is there a way to do this?

Try to use this code:
infilename = 'nameoffile.txt'; % name of your file
m = memmapfile(infilename); % load file to memory (and after close it)
instrings = strsplit(char(m.Data.'),'\n','CollapseDelimiters',true).';
checkstr = '***';
% find all string (their indices) starting with checkstr
ind = find(strncmpi(instrings,checkstr,length(checkstr)));
if isempty(ind)
fprintf('\n No strings with %s',checkstr)
else
% first string with string checkstr
instrings(ind(1));
end

Related

How do I print a string in one line in MARIE?

I want to print a set of letters in one line in MARIE. I modified the code to print Hello World and came up with:
ORG 0 / implemented using "do while" loop
WHILE, LOAD STR_BASE / load str_base into ac
ADD ITR / add index to str_base
STORE INDEX / store (str_base + index) into ac
CLEAR / set ac to zero
ADDI INDEX / get the value at ADDR
SKIPCOND 400 / SKIP if ADDR = 0 (or null char)
JUMP DO / jump to DO
JUMP PRINT / JUMP to END
DO, STORE TEMP / output value at ADDR
LOAD ITR / load iterator into ac
ADD ONE / increment iterator by one
STORE ITR / store ac in iterator
JUMP WHILE / jump to while
PRINT, SUBT ONE
SKIPCOND 000
JUMP PR
HALT
PR, OUTPUT
JUMP WHILE
ONE, DEC 1
ITR, DEC 0
INDEX, HEX 0
STR_BASE, HEX 12 / memory location of str
STR, HEX 48 / H
HEX 65 / E
HEX 6C / L
HEX 6C / L
HEX 6F / O
HEX 0 / carriage return
HEX 57 / W
HEX 6F / O
HEX 72 / R
HEX 6C / L
HEX 64 / D
HEX 0 / NULL char
My program ends up halting past two iterations. I can't seem to figure out how to print a set of characters in one line. Thanks.
Your value of STR_BASE is almost certainly incorrect. Based on what is here I would say it needs to be 18 instead of 12. Also you would either want to remove current null char that is between "HELLO" and "WORLD" and replace it with a space or simply remove that line, depending on your intended output.

Read txt files in Matlab contain symbols in two first rows

I have data which are as the following :
MTtmax6000_N1000000_k+0.1_k-T0.001_k-D0.1_kh1.txt
# nMT=1000000 tmax=60000 trelax=10000 k+=0.1 k-T=0.001 k-D=0.1 kh=1
#t (L-L0) L varL NGTP varNGTP Cap varCap
0 0 50090.2 2089.48 0.100257 0.100158 0.104798 0.114295
100 0.897735 50091.1 2109.92 0.099841 0.0998968 0.104373 0.114029
200 1.80163 50092 2130.83 0.099736 0.0995947 0.104204 0.113554
300 2.70513 50092.9 2151.79 0.099775 0.0997319 0.104323 0.113928
400 3.60867 50093.9 2172.17 0.099982 0.0999776 0.104546 0.114294
500 4.50984 50094.8 2192.49 0.100229 0.100263 0.104795 0.114473
600 5.40802 50095.6 2213.72 0.100149 0.100159 0.10463 0.114101
700 6.3161 50096.6 2234.2 0.099856 0.100117 0.10433 0.114139
800 7.21386 50097.5 2254.76 0.099624 0.0997151 0.104171 0.113879
900 8.11601 50098.4 2275.18 0.100183 0.100386 0.104615 0.114237
1000 9.01724 50099.3 2296.13 0.100504 0.100423 0.105058 0.114745
1100 9.92572 50100.2 2317.11 0.100368 0.10056 0.105023 0.115089
1200 10.8262 50101.1 2338.26 0.099476 0.0998665 0.103951 0.113913
1300 11.7243 50102 2359.96 0.099775 0.0997559 0.104246 0.113753
1400 12.6273 50102.9 2381.2 0.100081 0.100099 0.104571 0.11406
1500 13.5297 50103.8 2401.8 0.099702 0.0997495 0.104267 0.114045
1600 14.4281 50104.7 2422.56 0.099792 0.0999496 0.104292 0.113975
1700 15.3369 50105.6 2443.44 0.099912 0.0999296 0.104452 0.114242
I tried to read these data by using dlmread, txtscan or textread when I implement the code I receive this massage:
Error using dlmread (line 139) Mismatch between file and format string. Trouble reading number from file (row 1u, field 1u) ==> # nMT=1000000 tmax=4000 trelax=1000 k+=1 k-T=0.01 k-D=0.1 kh=1\n
I want command to read txt files and ignore two first rows.Any help would be greatly appreciated. I will be grateful to you.
clc;
clear all;
close all;
%%
tic
Values11 = zeros(225,6);
K_minus_t =[0.01];
K_minus_d = [0.1];
%k_plus =[0.1 0.2 0.4 0.7 1 1.1 1.2 1.5 1.7 2 2.5 3 3.5 4 5];
m=length(K_minus_t);
r=length(K_minus_d);
kk=0;
ll=1;
for l=1:r
for j=1:m
h=[1];
k_plus =[1];
K_minus_T =K_minus_t(j);
K_minus_D = K_minus_d(l);
sets = {k_plus, K_minus_T, K_minus_D,h};
[x,y,z r] = ndgrid(sets{:});
cartProd = [x(:) y(:) z(:) r(:)];
nFiles = size(cartProd,1);
filename{nFiles,j}=[];
for i=1:nFiles
%% MT_Sym_N1000000_k+1_k-T0.01_k-D0.1_kh1.txt
filename{i,j} = ['MT_Sym_N1000000_' ...
'k+' num2str(cartProd(i,1)) '_' ...
'k-T' num2str(cartProd(i,2),'%6.3g') '_' ...
'k-D' num2str(cartProd(i,3)) '_' ...
'kh' num2str(cartProd(i,4)) '' ...
'.txt'];
file1=dlmread(filename{i,j})
%% line (length)
t= file1(:,1);
dline= file1(:,2);
[coef_line1,s]= polyfit(t, dline, 1);
coef_line(i,:)= coef_line1;
v1{i}=s.R;
v2{i}=s.df;
v3{i}=s.normr;
Dl(i)=sqrt (v3{i}/length(t));
end
end
end
Use importdata:
x = importdata('file.txt',' ',2); %// ' ': col separator; 2: number of header lines
data = x.data; %// x.data is what you want
The first line gives a struct x with data, textdata and colheaders fields. The numeric data is in field data, so x.data is what you want.

Matlab Code for Reading Text file with inconsistent rows

I am new to Matlab and have been working my way through using Google. But now I have hit the wall it seems.
I have a text file which looks like following:
Information is for illustration reasons only
Aggregated Results
Date;$/Val1;Total $;Exp. Val1;Act. Val1
01-Oct-2008; -5.20; -1717; 330; 323
02-Oct-2008; -1.79; -595; 333; 324
03-Oct-2008; -2.29; -765; 334; 321
04-Oct-2008; -2.74; -917; 335; 317
Total Period; -0.80; -8612; 10748; 10276
Aggregated Results for location State PA
Date;$/Val1;Total $;Exp. Val1;Act. Val1
01-Oct-2008; -5.20; -1717; 330; 323
02-Oct-2008; -1.79; -595; 333; 324
03-Oct-2008; -2.29; -765; 334; 321
Total Period; -0.80; -8612; 10748; 10276
Results for account A1
Date;$/Val1;Total $;Exp. Val1;Act. Val1
01-Oct-2008; -7.59; -372; 49; 51
Total Period; -0.84; -1262; 1502; 1431
Results for account A2
Date;$/MWh;Total $;Exp. MWh;Act. MWh
01-Oct-2008; -8.00; -392; 49; 51
02-Oct-2008; 0.96; 47; 49; 51
03-Oct-2008; -0.75; -37; 50; 48
04-Oct-2008; 1.28; 53; 41; 40
Total Period; -0.36; -534; 1502; 1431
I want to extract following information in a cell/matrix format so that I can use it later to selectively do operations like average of accounts A1 and A2 or average of PA and A1, etc.
PA -0.8
A1 -0.84
A2 -0.036
I'd go this way:
fid = fopen(filename,'r');
A = textscan(fid,'%s','delimiter','\r');
A = A{:};
str_i = 'Total Period';
ix = find(strncmp(A,str_i,length(str_i)));
res = arrayfun(#(i) str2num(A{ix(i)}(length(str_i)+2:end)),1:numel(ix),'UniformOutput',false);
res = cat(2,res{:});
This way you'll get all the numerical values after a string 'Total Period' in a matrix, so that you may pick the values you need.
Similarly you may operate with strings PA, A1 and A2.
Matlab is not that nice when it comes to dealing with messy data. You may want to preprocess it a bit first.
However, here is an easy general way to import mixed numeric and non-numeric data in Matlab for a limited number of normal sized files.
Step 1: Copy the contents of the file into excel and save it as xls or xlsx
Step 2: Use xlsread
[NUM,TXT,RAW]=xlsread('test.xlsx')
From there the parsing should be maneagable.
Hopefully they will add non-numeric support to csvread or dlmread in the future.

Creating open-high-low-close (ohlc) bars from tick data in Matlab

I have a CSV file 'XPQ12.csv' of futures tick data in the following form:
20090312 30:14.0 717.25 1 E
20090312 30:15.0 718.47 1 E
20090312 30:17.0 717.25 1 E
20090312 30:32.0 718.42 1 E
20090312 30:49.0 715.32 1 E
20090312 30:58.0 717.57 1 E
20090312 31:06.0 716.65 3 E
20090312 31:12.0 718.35 2 E
20090312 31:45.0 721.14 1 E
20090312 31:52.0 719.24 1 E
20090312 32:11.0 717.02 6 E
20090312 32:29.0 717.14 1 E
20090312 32:35.0 717.34 1 E
20090312 32:55.0 717.26 1 E
(The first column is the yearmonthdate, the second column is the minute:second:tenthofsecond, the third column is the price, the fourth column is the number of contracts traded, and the fifth indicates if the trade was electronic or in a pit). In my actual data set, I may have thousands of price quotes within any given minute.
I read the file using the following code:
fid = fopen('C:\Program Files\MATLAB\R2013a\XPQ12.csv','r');
[c] = fscanf(fid, '%d,%d:%d.%d,%f,%d,%c')
Which outputs:
20090312
30
14
0
717.25
1
69
20090312
30
15
0
718.47
3
69
.
.
.
(the 69s are the matlab representation for E I believe)
Now I want to cut this up into one minute ohlc bars, so that for each minute, I record what the first, highest, lowest, and last price was within that minute. I'd really like to know the best way to go about this.
My original idea was to store the sequence of minutes in a vector d, and while working through the data, each time the number at the end of d changed I would record the corresponding price as an open, record the previous price as a close for the last bar, and find the largest and smallest prices within each open and close.
c(2) is the first minute, so I said:
d(1)=c(2);
and then noting that I'd always be counting by 7 before getting to the next minute, I said:
Nrows = numel(textread('XPQ12.csv','%1c%*[^\n]')); % counts rows in file
for i=1:Nrows
if mod(i-2,7)== 0;
d(end+1)=c(i);
end
end
which should fill up d with all the minutes:
30
30
30
30
30
30
31
31
31
31
32
32
32
32
in the case of the example data. I'm kind of lost what to do from here, or if what I'm doing is on the right track.
From where you are:
Minutes = c(2:7:end);
MinuteValues=unique(Minutes);
Prices = c(5:7:end);
if (length(Prices)>length(Minutes))
Prices=Prices(1:length(Minutes));
elseif (length(Prices)<length(Minutes))
Minutes=Minutes(1:length(Prices));
OverflowValues=1+find(Minutes(2:end)==0 & Minutes(1:end-1)==59);
for v=length(OverflowValues):-1:1
Minutes(OverflowValues(v):end)=Minutes(OverflowValues(v):end)+60;
end
Highs=zeros(1,length(MinuteValues));
Lows=zeros(1,length(MinuteValues));
First=zeros(1,length(MinuteValues));
Last=zeros(1,length(MinuteValues));
for v=1:length(MinuteValues)
Highs(v) = max(Prices(Minutes==MinuteValues(v)));
Lows(v) = min(Prices(Minutes==MinuteValues(v)));
First(v) = Prices(find(Minutes==MinuteVales(v),1,'first'));
Last(v) = Prices(find(Minutes==MinuteVales(v),1,'last'));
end
Using textread would make this easier for you, as mentioned.
(If you are lost at this stage, I wouldn't find accumarray as mentioned in the comments is the best place to start!)
By the way, this is assuming that minutes increases above 60 and you don't have hours in there somewhere. Otherwise this won't work at all.

Changing format of many files in Excel

I have a folder filled with thousands of csv files. When I open one file, the data looks like:
20110503 01:46.0 1527.8 1 E
20110503 01:46.0 1537.8 1 E
20110504 37:40.0 1536.6 1 E
20110504 37:40.0 1533.6 1 E
20110504 36:17.0 1531.1 1 E
The second column(time) has minutes and seconds before the decimal point. If I select the second column, right click and click format cells, select time, and change to 13:30:55 mode, the same data looks like:
20110503 19:01:46 1527.8 1 E
20110503 19:01:46 1537.8 1 E
20110504 0:37:40 1536.6 1 E
20110504 0:37:40 1533.6 1 E
20110504 8:36:17 1531.1 1 E
Now I can see hours, minutes and seconds. I have written a matlab function that reads these files, but needs to be able to read the hours. The function can only be used after I change the format to display the hours. Now I have to apply the function to all the files in the folder.
I'm wondering, is there a way to change the default time display so hours are included? If not, is there a way of writing a script to change the format of these files? Thanks!
Note: the part of my matlab function that reads the file looks like:
fid = fopen('E:\Tick Data\Data Output\NGU13.csv','rt');
c = fscanf(fid, '%d,%d:%d:%d,%f,%d,%*c');
datamat = reshape(c,6,length(c)/6)'; % reshape into matrix
yyyymmdd = datamat(:,1);
hr = datamat(:,2);
mn = datamat(:,3);
sec = datamat(:,4);
pp = datamat(:,5); % price
vv = datamat(:,6); % volume
In Excel:
In Notepad, you can see hours, minutes, seconds, and milliseconds:
20111206,09:50:56.411,4.320,1,E
20111206,10:02:10.167,4.300,1,E
20111206,11:24:09.052,4.313,1,E
20111206,11:46:09.359,4.307,1,E
20111206,11:50:22.785,4.320,1,E
For a record of the type
20010402, 09:30:24.456, 4.235, 1, E
you should use this fmt:
fmt = '%f%f:%f:%f.%f%f%*s';
data = textscan(fid, fmt, 'Delimiter',',','CollectOutput',true);