Concatenating text from different text files into one - matlab

Suppose there are five text files. Contents of files are textfile1 = i saw an alligator, textfile2 = alligator was sitting near a tree, textfile3 = alligator was sleeping, textfile4 = parrot was flying, textfile5 = parrot was flying.
I used the code below to find paths of textfiles containing word alligator:
sdirectory = 'C:\Users\anurag\Desktop\Animals\Annotations\';
textfiles = dir([sdirectory '*.eng']);
num_of_files = length(textfiles);
C = cell(num_of_files, 1);
for w = 1:length(textfiles)
file = [sdirectory textfiles(w).name];
%// load string from a file
STR = importdata(file);
BL = cellfun(#lower, STR, 'uni',0);
%// extract string between tags
%// assuming you want to remove the angle brackets
B = regexprep(BL, '<.*?>','');
B(strcmp(B, '')) = [];
%// split each string by delimiters and add to C
tmp = regexp(B, '/| ', 'split');
C{w} = [tmp{:}];
end
where = [];
for j = 1:length(C)
file1 = [sdirectory textfiles(j).name];
if find(ismember(C{j},'alligator'))
where = [where num2str(j) '.eng, '];
disp(file1)
end
end
Finally the variable file1 will show the path of textfile containing word alligator one by one. Is there any way to concatenate the strings of textfiles containing the required word into a cell array.

Related

how to read broken numbers on two lines in a text file in matlab?

how to read broken numbers on two lines in matlab?
I am generating some results in text files that are being broken into two lines. Example:
text x = 1.
2345 text
What would a code look like to read the value x = 1.2345?
Suppose the value of x = 1.2345 is in file named name.txt.
When it doesn't break the values I'm looking for:
text x = 1.2345 text
I use the following (working) code:
buffer = fileread('name.txt') ;
search = 'x = ' ;
local = strfind(buffer, search);
xvalue = sscanf(buffer(local(1,1)+numel(search):end), '%f', 1);
You can remove line breaks (and other "white space", if needed) before parsing the string:
>> str = sprintf('text x = 1.\n2345 text')
str =
'text x = 1.
2345 text'
>> str = regexprep(str, '\n', '')
str =
'text x = 1.2345 text'

MATLAB - replace the string in a text file with index and delete those line without indexed

I have a text file with data structure like this
30,311.263671875,158.188034058,20.6887207031,17.4877929688,0.000297248129755,aeroplane
30,350.668334961,177.547393799,19.1939697266,18.3677368164,0.00026999923648,aeroplane
30,367.98135376,192.697219849,16.7747192383,23.0987548828,0.000186387239864,aeroplane
30,173.569274902,151.629364014,38.0069885254,37.5704650879,0.000172595537151,aeroplane
30,553.904602051,309.903320312,660.893981934,393.194030762,5.19620243722e-05,aeroplane
30,294.739196777,156.249740601,16.3522338867,19.8487548828,1.7795707663e-05,aeroplane
30,34.1946258545,63.4127349854,475.104492188,318.754821777,6.71026540999e-06,aeroplane
30,748.506652832,0.350944519043,59.9415283203,28.3256549835,3.52978979379e-06,aeroplane
30,498.747009277,14.3766479492,717.006652832,324.668731689,1.61551643174e-06,aeroplane
30,81.6389465332,498.784301758,430.23046875,210.294677734,4.16855394647e-07,aeroplane
30,251.932098389,216.641052246,19.8385009766,20.7131652832,3.52147743106,bicycle
30,237.536972046,226.656692505,24.0902862549,15.7586669922,1.8601918593,bicycle
30,529.673400879,322.511322021,25.1921386719,21.6920166016,0.751171214506,bicycle
30,255.900146484,196.583847046,17.1589355469,27.4430847168,0.268321367912,bicycle
30,177.663650513,114.458488464,18.7516174316,16.6759414673,0.233057001606,bicycle
30,436.679382324,273.383331299,17.4342041016,19.6081542969,0.128449092153,bicycle
I want to index those file with a label file.and the result will be something like this.
60,509.277435303,284.482452393,26.1684875488,31.7470092773,0.00807665128377,15
60,187.909835815,170.448471069,40.0388793945,58.8763122559,0.00763951029512,15
60,254.447280884,175.946624756,18.7212677002,21.9440612793,0.00442053096776,15
However there might be some class that is not in label class and I need to filter those line out so I can use load() to load in.(you can't have char inside that text file and execute load().
here is my implement:
function test(vName,meta)
f_dt = fopen([vName '.txt'],'r');
f_indexed = fopen([vName '_indexed.txt'], 'w');
lbls = loadlbl()
count = 1;
while(true),
if(f_dt == -1),
break;
end
dt = fgets(f_dt);
if(dt == -1),
break
else
dt_cls = strsplit(dt,','){7};
dt_cls = regexprep(dt_cls, '\s+', '');
cls_idx = find(strcmp(lbls,dt_cls));
if(~isempty(cls_idx))
dt = strrep(dt,dt_cls,int2str(cls_idx));
fprintf(f_indexed,dt);
end
end
end
fclose(f_indexed);
if(f_dt ~= -1),
fclose(f_dt);
end
end
However it work very very slow because the text file contains 100 thousand of lines. Is it anyway that I could do this task smarter and faster?
You may use textscan, and get the indices/ line numbers of the labels you want. After knowing the line numbers, you can extract what you want.
fid = fopen('data.txt') ;
S = textscan(fid,'%s','delimiter','\n') ;
S = S{1} ;
fclose(fid) ;
%% get bicycle lines
idx = strfind(S, 'bicycle');
idx = find(not(cellfun('isempty', idx)));
S_bicycle = S(idx)
%% write this to text file
fid2 = fopen('text.txt','wt') ;
fprintf(fid2,'%s\n',S_bicycle{:});
fclose(fid2) ;
From S_bicycle, you can extract your numbers.

Matlab; how to extract information from a header's file (text file)

I have many text files that have 35 lines of header followed by a large matrix with data of an image (that info can be ignored and do not need to read it at the moment). I want to be able to read the header lines and extract information contained on those lines. For instance the first few lines of the header are..
File Version Number: 1.0
Date: 06/05/2015
Time: 10:33:44 AM
===========================================================
Beam Voltage (-kV) = 13.000
Filament (W) = 4.052
Cond. (-kV) = 8.885
CenterX1 (V) = 10.7
CenterY1 (V) = -45.9
Objective (%) = 71.40
OctupoleX = -0.4653
OctupoleY = -0.1914
Angle (deg) = 0.00
.
I would like to be able to open this text file and read the vulue of the day and time the file was created, filament power, the condenser voltage, the angle, etc.. and save these in variables or send them to a text box on a GUI program.
I have tried several things but since the values I want to extract some times are after a '=' or after a ':' or simply after a '' then I do not know how to approach this. Perhaps reading each line and look for a match of a word?
Any help would be much appreciated.
Thanks,
Alex
This is not particularly difficult, and one of the ways to do it would be to parse line-by-line as you suggested. Something like this:
MAX_LINES_TO_READ = 35;
fid = fopen('input.txt');
lineCount = 0;
dateString = '';
beamVoltage = 0;
while ~eof(fid)
line = fgetl(fid);
lineCount = lineCount + 1;
%//check conditions for skipping loop body
if isempty(line)
continue
elseif lineCount > MAX_LINES_TO_READ
break
end
%//find headers you are interested in
if strfind(line, 'Date')
%//find the first location of the header separator
idx = find(line, ':', 1);
%//extract substring starting from 1 char after separator
%//note: the trim is to get rid of leading/trailing whitespace
dateString = strtrim(line(idx + 1 : end));
elseif strfind(line, 'Beam Voltage')
idx = find(line, '=', 1);
beamVoltage = str2double(line(idx + 1 : end));
end
end
fclose(fid);

import data and Looping for the file

My purpose is to split the string into part then check whether the 'f11_data' contain those split word. if yes then return 0, if no then return 1. I have 100 strings, but it doesn't make sense to type the str/needles 100 inside my code times. How do I use looping to do that? I'm facing a problem using importdata.
str1 = 'http://en.wikipedia.org/wiki/hostname';
str2 = 'http://hello/world/hello';
str3 = 'http://hello/asd/wee';
f11_data1 = 'hostname From wikipedia, the free encyclopedia Jump to: navigation, search In computer networking, a hostname (archaically nodename .....';
f11_data2 = 'hell';
f11_data3 = 'hello .....';
needles1 = strcat('\<', regexpi(str1,'[:/.]*','split'), '\>')
needles2 = strcat('\<', regexpi(str2,'[:/.]*','split'), '\>')
needles3 = strcat('\<', regexpi(str3,'[:/.]*','split'), '\>')
~cellfun('isempty', regexpi(f11_data1, needles1, 'once'))
~cellfun('isempty', regexpi(f11_data2, needles2, 'once'))
~cellfun('isempty', regexpi(f11_data3, needles3, 'once'))
This is how I modified the above code using a loop:
data = importdata('URL')
needles = regexp(data,'[:/.]*','split') %// note the different search string
for i = 1:2
A11_data = needles{i};
data2 = importdata(strcat('f11_data', int2str(i)));
%feature11_data=(~cellfun('isempty', regexpi(data2, needles, 'once')))
%feature11(i)=feature11_data
~cellfun('isempty', regexpi(data2, needles, 'once'))
end
I get the error :
"
??? Error using ==> regexpi
All cells for regexpi must be strings.
Error in ==> f11_test2 at 14
~cellfun('isempty', regexpi(haystack, needles, 'once')) "
It's hard to tell what you are trying to do but, if you want to do something equivalent,
for i = 1:3
str = importdata(URL_LIST[i])
needle = strcat('\<', regexp(str,'[:/.]*','split'), '\>');
haystack = importdata(HAYSTACK_LIST[i]);
~cellfun('isempty', regexpi(haystack, needle, 'once'))
end
Obviously you need to define URL_LIST and HAYSTACK_LIST.

Read with order files in subfolders in matlab

I've got a folder which contains subfolders with text files. I want to read those file with the same order as they are in the subfolders. I've got a problem with that. I use the following matlab code:
outNames = {};
k=1;
feature = zeros(619,85);
fileN = cell(619,1);
for i=1:length(nameFolds)
dirList = dir(strcat(path, num2str(cell2mat(nameFolds(i,1)))));
names = {dirList.name};
outNames = {};
for j=1:numel(names)
name = names{j};
if ~isequal(name,'.') && ~isequal(name,'..')
[~,name] = fileparts(names{j});
outNames{end+1} = name;
fileName = strcat(path, num2str(cell2mat(nameFolds(i,1))), '\', name, '.descr' );
feature(k,:) = textread(fileName);
fileN{k} = [fileName num2str(k)];
k= k+1;
end
end
end
In one subfolder I've got the above text file names:
AnimalPrint_tiger_test_01.descr
AnimalPrint_tiger_test_02.descr
AnimalPrint_tiger_test_03.descr
AnimalPrint_tiger_test_04.descr
AnimalPrint_tiger_test_05.descr
AnimalPrint_tiger_test_06.descr
AnimalPrint_tiger_test_07.descr
AnimalPrint_tiger_test_08.descr
AnimalPrint_tiger_test_09.descr
AnimalPrint_tiger_test_10.descr
AnimalPrint_tiger_test_11.descr
AnimalPrint_tiger_test_12.descr
AnimalPrint_tiger_test_13.descr
AnimalPrint_tiger_test_14.descr
AnimalPrint_tiger_test_15.descr
AnimalPrint_zebra_test_1.descr
AnimalPrint_zebra_test_2.descr
AnimalPrint_zebra_test_3.descr
AnimalPrint_zebra_test_4.descr
AnimalPrint_zebra_test_5.descr
AnimalPrint_zebra_test_12.descr
But it seems that it reads first the AnimalPrint_zebra_test_12.descr and after the AnimalPrint_zebra_test_1.descr and the rest. Any idea why this happens?
dir sorts the files according to their names, for instance
test_1
test_12 % 1 followed by 2
test_2
test_3
You may want to build your own order with ['test_' num2str(variable) '.descr'] that concatenates test_ with an incrementing variable.