extracting numeric data from text in data files in Matlab - matlab

I have a .txt data file which has a few rows of text comments in the beginning, followed by the columns of actual data. It looks something like this:
lens (mm): 150
Power (uW): 24.4
Inner circle: 56x56
Outer Square: 256x320
remarks: this run looks good
2.450000E+1 6.802972E+7 1.086084E+6 1.055582E-5 1.012060E+0 1.036552E+0
2.400000E+1 6.866599E+7 1.088730E+6 1.055617E-5 1.021491E+0 1.039043E+0
2.350000E+1 6.858724E+7 1.086425E+6 1.055993E-5 1.019957E+0 1.036474E+0
2.300000E+1 6.848760E+7 1.084434E+6 1.056495E-5 1.017992E+0 1.034084E+0
By using importdata, Matlab automatically separates the text data and the actual data . But how do I extract those numeric data from the text (which is stored in cells format)? What I want to do to achieve:
extract those numbers (e.g. 150, 24.4)
If possible, extract the names ('lens', 'Power')
If possible, extract the units ('mm', 'uW')
1 is the most important and 2 or 3 is optional. I am also happy to change the format of the text comments if that simplifies the codes.

Let's say your sample data is saved as demo.txt, you can do the following:
function q47203382
%% Reading from file:
COMMENT_ROWS = 5;
% Read info rows:
fid = fopen('demo.txt','r'); % open for reading
txt = textscan(fid,'%s',COMMENT_ROWS,'delimiter', '\n'); txt = txt{1};
fclose(fid);
% Read data rows:
numData = dlmread('demo.txt',' ',COMMENT_ROWS,0);
%% Processing:
desc = cell(5,1);
unit = cell(2,1);
quant = cell(5,1);
for ind1 = 1:numel(txt)
if ind1 <= 2
[desc{ind1}, unit{ind1}, quant{ind1}] = readWithUnit(txt{ind1});
else
[desc{ind1}, quant{ind1}] = readWOUnit(txt{ind1});
end
end
%% Display:
disp(desc);
disp(unit);
disp(quant);
disp(mat2str(numData));
end
function [desc, unit, quant] = readWithUnit(str)
tmp = strsplit(str,{' ','(',')',':'});
[desc, unit, quant] = tmp{:};
end
function [desc, quant] = readWOUnit(str)
tmp = strtrim(strsplit(str,': '));
[desc, quant] = tmp{:};
end
We read the data in two stages: textscan for the comment rows in the beginning, and dlmread for the following numeric data. Then, it's a matter of splitting the text in order to obtain the various bits of information.
Here's the output of the above:
>> q47203382
'lens'
'Power'
'Inner circle'
'Outer Square'
'remarks'
'mm'
'uW'
'150'
'24.4'
'56x56'
'256x320'
'this run looks good'
[24.5 68029720 1086084 1.055582e-05 1.01206 1.036552;
24 68665990 1088730 1.055617e-05 1.021491 1.039043;
23.5 68587240 1086425 1.055993e-05 1.019957 1.036474;
23 68487600 1084434 1.056495e-05 1.017992 1.034084]
(I took the liberty to format the output a bit for easier viewing.)
See also: str2double.

Related

Saving a literal file name as a variable in Matlab

My goal is to load some files listed in the table, extract the data, and save the results as a variable of the first file name. The lists in the table are user-input characters, which represent the names of the files that will be loaded soon. I'll give you an example because you may not understand what I mean. When A,B,C (strings) are listed in the table, my code will find where they are located(eg A.txt) and load their data. After the data has been collected from them, the results are saved in the name of the table like this : A(variable)= result_data(:4). Here is my code. please let me know the wrong place. (Note that table is nx1 cell array using uitable.)
function pushbutton1_Callback(hObject, eventdata, handles)
data = get(handles.uitable,'data'); % get strings in table
for i = 1:size(data(:,1)) % count the #strings
fid = fopen([ data(i),'.csv' ]); %load the data and extract what I need
...
fclose(fid);
data(i) = result(row_1:row_2 , 4) % this is the result_data
% data(i) is variable string, so I am not sure whether to use the eval function.
end
Without having your table to debug further here is my suggestions. data is probably a cell array since you are pulling it from a uitable as below.
data = get(handles.uitable,'data'); % get strings in table
So this line should error:
fid = fopen([ data(i),'.csv' ]);
Change it to this:
fid = fopen([ data{i},'.csv' ]);
or this:
fid = fopen([ char(data(i)),'.csv' ]);
When saving your results to the variable name which matches your string I would suggesting using a structure with dynamic field names instead of a bare variable ... otherwise you will probably have to use eval which should be avoided.
So this (which isn't what you asked for):
data(i) = result(row_1:row_2 , 4) % this is the result_data
Should become:
outData.(data{i}) = result(row_1:row_2 , 4) % this is the result_data
If data is a cell array like you said containing {'A','B','C',...}
Then outData would be of the form below and contain each results.
outData.A
outData.B
outData.C

MATLAB Parse Data file

I have a text file that contains many lines of header and overhead information. Then, following that, there are repetitive blocks of data that I am interested in capturing. However, the first block is a bit different then the ones that follow it. The file structure is as follows:
**Header and overhead:**
...
...
...
SPD -> PX: SS3Data[07]: Recv Data
Sync: 0xXXXXXXXX
Chan: N
ID: N
Seq: N
SS: N
Words: N
Time: 0xXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXX
PX: SS3Data[07]: Recv Data
Sync: 0xXXXXXXXX
Chan: N
ID: N
Seq: N
SS: N
Words: N
Time: 0xXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXX
PX: SS3Data[07]: Recv Data
Sync: 0xXXXXXXXX
Chan: N
ID: N
Seq: N
SS: N
Words: N
Time: 0xXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXX
I'd like to be able to capture the data in the said Blocks and store them in a structure such as the following:
S.Block1.Sync
S.Block1.Chan
S.Block1.Chan
S.Block1.ID
S.Block1.Seq
S.Block1.SS
S.Block1.Words
S.Block1.Time
S.Block1.Data
.
.
.
S.BlockN.Sync
S.BlockN.Chan
S.BlockN.Chan
S.BlockN.ID
S.BlockN.Seq
S.BlockN.SS
S.BlockN.Words
S.BlockN.Time
S.BlockN.Data
The X's following the Time field are HEX characters. There is 64 characters in the first line, and 32 in the second.
The best approach for this problem depends on the size of your files. If you can read it in all at once, I would suggest the following approach using textscan()
% read file => stored in "data.txt"
fID = fopen(fullfile(cd, 'data.txt'), 'r');
tmp = textscan(fID, '%s');
fclose(fID);
lines = tmp{1};
% find rows with data. You might want to add some additional checks, or
% check whether the labels are indeed always in this order
chan_row = find(strcmpi(lines, 'Chan:'))+1; % could add a check here that the distance between the rows is all the same
% save in a table
tbl = table();
tbl.Chan = lines(chan_row);
tbl.ID = lines(chan_row+2);
tbl.Seq = lines(chan_row+4);
tbl.SS = lines(chan_row+6);
tbl.Words = lines(chan_row+8);
tbl.Time = lines(chan_row+10);
tbl.Data = lines(chan_row+12);
Note that I store the results in a table. This is likely much easier to process than your struct with the id number in the name. Per field, you might want to do some additional transformations, such as converting particular fields to categoricals.
If that is not feasible because the file is very large, you could try fopen() in combination with fgetl() to read in line-by-line instead.
If this is only a one time data capture rather than something that's actually part of your code, just ctrl-c the data in notepad, then in Matlab click on your variable workspace and ctrl-v. It should bring up a window that will let you import that data easily enough

for loop+structure allocation in matlab

This is a problem I am working on in Matlab.
I am looping through a series of *.ALL files and stripping the field name by a delimiter '.'. The first field is the network, second station name, third location code and fourth component. I pre-allocate my structure based on the number of files (3) I run through which for this example is a 3x3x3 structure that I would like to define as struct(station,loc code,component). You can see these definitions in my code example below.
I would like to loop through the station, loc code, and component and fill their values in the structure. The only problem is for some reason the way I've defined the loop it's actually looping through the files more than once. I only want to loop through each file once and extract the station, comp, and loc code from it and put it inside the structure. Because it's looping through the files more than once it's taken like 10 minutes to fill the structure. This is not very efficient at all. Can someone point out the culprit line for me or tell me what I'm doing incorrectly?
Here's my code below:
close all;
clear;
[files]=dir('*.ALL');
for i = 1:length(files)
fields=textscan(files(i).name, '%s', 'Delimiter', '.');
net{i,1}=fields{1}{:};
sta{i,1}=fields{1}{2};
loc{i,1}=fields{1}{3};
comp{i,1}=fields{1}{4};
data = [];
halfhour(1:2) = struct('data',data);
hour(1:24) = struct('halfhour',halfhour);
day(1:366) = struct('hour',hour);
PSD_YE_DBT(1:length(files),1:length(files),1:length(files)) =
struct('sta','','loc','','comp','','allData',[],'day',day);
end
for s=1:1:length(sta)
for l=1:1:length(loc)
for c=1:1:length(comp)
tempFileName = strcat(net{s},'.',sta{s},'.',loc{l},'.',comp{c},'.','ALL');
fid = fopen(tempFileName);
PSD_YE_DBT(s,l,c).sta = sta{s};
PSD_YE_DBT(s,l,c).loc = loc{l};
PSD_YE_DBT(s,l,c).comp = comp{c};
end
end
end
Example file names for the three files I'm working with are:
XO.GRUT.--.HHE.ALL
XO.GRUT.--.HHN.ALL
XO.GRUT.--.HHZ.ALL
Thanks in advance!

Import datasheet iteration matlab - name to exist outside of function

So have some simple collumated data in text files named like:
Hm_slit_01.txt...Hm_slit_21.txt; Hs_slit_01.txt...Hs_slit_23.txt; Sm_slit_01.txt...Sm_slit_27.txt; Ss_slit_01.txt...Ss_slit_32.txt etc...
and I need to import it as datasheets into matlab, idealy with the same name but no .txt.
So I have created a function that takes a prefix Hm, Hs ... and a final number to iterate the file naming up to. BUT it doesn't save the datasheet externally in the workplace.
Here is the code.
function [datasheet] = imp_vrad(prefix,numslits)
%[data] = imp_vrad(prefix,numslits)
% imports data with filename <prefix>_slit_<##>.txt
% to be a matlab datasheet data = <prefix>_slit_<##>
% here ## starts from '01' and goes to 'numslits'
% FILES MUST BE IN WORKING DIRECTORY!
for currslit=1:numslits
if currslit < 10
filename = [prefix '_slit_0' num2str(currslit) '.txt']; %names file
data = [prefix '_slit_0' num2str(currslit)]; %names datasheet
else
filename = [prefix '_slit_' num2str(currslit) '.txt']; %names file
data = [prefix '_slit_' num2str(currslit)]; %names datasheet
end
disp(['importing ' filename ' as ' data])
importdata(filename); %imports the data
end
end
The line in question is the last line. I have also tried data=importdata(filename); to no avail.
Thanks in advance!
I'm not sure if there's a way to change the scope of a variable in matlab. You can define them dynamically using eval, but that still won't get us past the scope problem.
You have two options: if the names mater to you, you could put the imported tables in a struct, akin to what import does; or, if you only care about the index, you could import them into an array (perhaps multidimensional).
For the former you could do something like
function [datasheet] = imp_vrad(prefix,numslits)
%[data] = imp_vrad(prefix,numslits)
% imports data with filenames <prefix>_slit_<##>.txt
% into a struct with labels <prefix>_slit_<##>
% here ## starts from '01' and goes to 'numslits'
% FILES MUST BE IN WORKING DIRECTORY!
%
datasheet = struct();
for currslit=1:numslits
dataname = sprintf('%s_slit_%02d',prefix,currslit);
filename = [dataname,'.txt'];
disp(['importing ' filename ' as ' data])
data = importdata(filename); %imports the data
datasheet.(dataname) = data;
end
end

How can I read a simple txt file in Matlab in the fortran way (i.e. I want to keep reading after the newline)

I have to read the simple text file I write on the end of this post (it is just a sctructured grid). In fortran it is so easy to do this, you just have to do:
read(fileunit,*)
read(fileunit,*) mc,nc
do j = 1, nc
read (fileunit, *) dummy, dummy, (xcor(j,i), i=1,mc)
enddo
is there an equivalent function in matlab that reads element by element and keeps reading after the newline like in fortran? I could not find it, all the function as fscanf, textscan etc read line by line and then i have to parse each line. Here is the file. thanks for any help A.
Gridfile version 8.675.44
8 3
eta= 1 0.00000000000000000E+00 1.50000000000000000E+02
4.50000000000000000E+02 6.00000000000000000E+02
4.50000000000000000E+02 6.00000000000000000E+02
4.50000000000000000E+02 6.00000000000000000E+02
eta= 2 0.00000000000000000E+00 1.50000000000000000E+02
3.00000000000000000E+02 4.50000000000000000E+02
7.50000000000000000E+02 9.00000000000000000E+02
4.50000000000000000E+02 6.00000000000000000E+02
eta= 3 0.00000000000000000E+00 1.50000000000000000E+02
3.00000000000000000E+02 4.50000000000000000E+02
7.50000000000000000E+02 9.00000000000000000E+02
4.50000000000000000E+02 6.00000000000000000E+02
There are many ways to do this, but perhaps you will like the way fscanf works, as in this example. After the file is opened by something like fin = fopen('gridfile.txt') and the header swallowed, you can use fscanf(f, 'x= %d'), and then fscanf(f, '%f'), which will read the entire block. fscanf does not stop at the end of a line if not instructed to do so. Taken together, a solution could look like
fin = fopen('gridfile.txt');
fgetl(fin);
% read data counts
cnt = fscanf(fin, '%d %d', 2);
mc = cnt(1);
nc = cnt(2);
xcor = zeros(nc, mc);
% read blocks of data
for j = 1 : nc
fscanf(fin, '%s %s', 2);
xcor(j, :) = fscanf(fin, '%f', mc)';
end
fclose(fin);
fscanf keeps matching the format specifier as long as possible, and returns only when no further consecutive matches can be found. The above examples uses this in two places. First, to extract the dimensionality cnt, in your example (8, 3), and second, to read eight consecutive floating point values per record.