How to display names after regexp? - matlab

I am trying to find all files in a directory that match 'hello'. i have the following code:
fileData = dir();
m_file_idx = 1;
fileNames = {fileData.name};
index = regexp(filenames,'\w*hello\w*','match') ;
inFiles = fileNames(~cellfun(#isempty,index));
Ex. if my directory has 3 files with the word hello in it, inFiles returns me
inFiles =
[1x23 char] [1x26 char] [1x25 char]
instead i want inFiles to return me the name of the file,ex thisishello.m,hiandhello.txt
how can i do this in a simple way?

This code:
fileData = dir();
fileNames = {fileData.name};
disp('The full directory...')
disp(fileNames)
index = regexp(fileNames,'\w*hello\w*','match');
inFiles = fileNames(~cellfun(#isempty,index));
disp('Print out the file names')
inFiles{:}
generates this output:
>> script
The full directory...
Columns 1 through 6
'.' '..' 'andsevenyears.txt' 'fourscore.txt' 'hello1.txt' 'hello2.txt'
Column 7
'script.m'
Print out the file names
ans =
hello1.txt
ans =
hello2.txt
To me it looks as if you were having some issues with understanding cell arrays. Here's a specific tutorial that works through them. (jerad's link also looks like a good resource)

I think what's going on here is that when an element of a cell array is longer than a certain length (appears to be 19 characters for strings), matlab doesn't print the actual element, it prints a description of the content instead (in this case, "[1x23 char]").
For example:
>> names = {'1234567890123456789' 'bar' 'car'}
names =
'1234567890123456789' 'bar' 'car'
>> names = {'12345678901234567890' 'bar' 'car'}
names =
[1x20 char] 'bar' 'car'
celldisp might work better for your situation:
>> celldisp(names)
names{1} =
12345678901234567890
names{2} =
bar
names{3} =
car

Related

retriving a text file content information in Matlab

I have a .txt file which includes 300 lines. For example the first line is:
ANSWER: correct: yes, time: 6.880674, guess: Lay, action: Lay, file: 16
or the second line is:
ANSWER: correct: no, time: 7.150422, guess: Put on top, action: Stir, file: 18
Only 'time' and 'file' values are numbers and the others are string.
I want to store the values of "correct", "time", "guess", "action" and "file" of the whole 300 lines in the different variables (like some arrays).
How can I do this in the Matlab?
Option 1:
You can use textscan with the following formatSpec:
formatSpec = 'ANSWER: correct:%s time:%f guess:%s action:%s file: %f';
data = textscan(fileID,formatSpec,'Delimiter',',');
where fileID is the file identifier obtained by fopen.
Option 2:
Another option is to use readtable, with the formatting above (directly with the file name, no fileID):
data = readtable('53485991.txt','Format',formatSpec,'Delimiter',',',...
'ReadVariableNames',false);
% the next lines are just to give the table variables some meaningful names:
varNames = strsplit(fmt,{'ANSWER',':','%s',' ','%f'});
data.Properties.VariableNames = varNames(2:end-1);
The result (ignore the values, as I messed that example a little bit while playing with it):
data =
4×5 table
correct time guess action file
_______ ______ _______________ ______ ____
'yes' 6.8888 'Lay' 'Lay' 16
'no' 7.8762 'Put on top' 'Stir' 18
'no' 7.1503 'Put on bottom' 'Stir' 3
'no' 7.151 'go' 'Stir' 270
The advantage in option 2 is that a table is a much more convenient way to hold these data than a cell array (which is the output of textscan).
Use fgetl to get a line of the file and while loop to read all of the lines.
For each line, use regexp to partition the string into cells by : and , delimiter. Then, use strip to remove leading and trailing whitespace for each cell.
Here is the solution:
f = fopen('a.txt');
aline = fgetl(f);
i = 1;
while ischar(aline)
content = strip(regexp(aline,':|,','split'));
correct{i} = content{3};
time(i) = str2double(content{5});
guess{i}= content{7};
action{i} = content{9};
file(i) = str2double(content{11});
i = i + 1;
aline = fgetl(f);
end
fclose(f);
Example:
Suppose a.txt file looks like this
ANSWER: correct: yes, time: 6.880674, guess: Lay, action: Lay, file: 16
ANSWER: correct: no, time: 7.150422, guess: Put on top, action: Stir, file: 18
After executing the script, the results are
correct =
1×2 cell array
'yes' 'no'
time =
6.8807 7.1504
guess =
1×2 cell array
'Lay' 'Put on top'
action =
1×2 cell array
'Lay' 'Stir'
file =
16 18

iterating only through the files names in a directory using MATLAB

How to only access the file names in a directory?
>> files = dir('*.png');
>> disp(class(dir('*.png')))
struct
>> fields
fields =
'name'
'date'
'bytes'
'isdir'
'datenum'
>> for i=1:numel(fields)
files.(fields{i}.name)
end
Struct contents reference from a non-struct array object.
>> for i=1:numel(fields)
files.(fields{i}).name
end
Expected one output from a curly brace or dot indexing expression, but there were 11 results.
File names are in the field names of the struct array returned by dir. So:
files = dir('*.png');
for k = 1:numel(files)
f = files(k).name; % f contains the name of each file
end
You can use ls like this
list=ls('*.png');
for ii=1:size(list,1)
s = strtrim(list(ii,:)); % a string containing the name of each file
end
ls works with chars instead of cells.

Matlab - How can I find the lowest common directory of an arbitrary group of files?

I have a cell array of full file names and I want to find the lowest common directory where it makes sense to store accumulated data and what not.
Here is an example hierarchy of test data:
C:\Test\Run1\data1
C:\Test\Run1\data2
C:\Test\Run1\data3
C:\Test\Run2\data1
C:\Test\Run2\data2
.
.
.
In Matlab, the paths are stored in a cell array as follows (each run shares a row):
filePaths = {...
'C:\Test\Run1\data1','C:\Test\Run1\data2','C:\Test\Run1\data3'; ...
'C:\Test\Run2\data1','C:\Test\Run2\data2','C:\Test\Run2\data3'};
I want to write a routine that outputs the common path C:\Test\Run1 so that I can store relevant plots in a new directory there.
C:\Test\Run1\Accumulation_Plots
C:\Test\Run2\Accumulation_Plots
.
.
.
Previously, I was only concerned with two files in an x-by-2 cell, so the regiment below worked; however, strcmp lost it's appeal since I can't (AFAIK) index the whole cell at once.
d = 1;
while strcmp(filePaths{1}(1:d),filePaths{2}(1:d))
d = d + 1;
end
common_directory = filePaths{1}(1:d-1);
mkdir(common_directory,'Accumulation_Plots');
As suggested by #nekomatic, I'm posting my comment as an answer.
filePaths = {...
'C:\Test\Run1\data1','C:\Test\Run1\data2','C:\Test\Run1\data3'; ...
'C:\Test\Run2\data1','C:\Test\Run2\data2','C:\Test\Run2\data3'};
% Sort the file paths
temp = sort(filePaths(:));
% Take the first and the last one, and split by '\'
first = strsplit(temp{1}, '\');
last = strsplit(temp{end}, '\');
% Compare them up to the depth of the smallest. Find the 'first N matching values'
sizeMin = min(numel(first), numel(last));
N = find(~[cellfun(#strcmp, first(1:sizeMin), last(1:sizeMin)) 0], 1, 'first') - 1;
% Get the smallest common path
commonPath = strjoin(first(1:N), '\');
You just need to compare the first d characters of any path in the array - e.g. path 1 - with the first d characters of the other paths. The longest common base path can't be longer than path 1 and it can't be shorter than the shortest common base path between path 1 and any other path.
There must be several ways you could do that, but a concise one is using strfind to match the strings and cellfun with isempty to check which ones didn't match:
% filePaths should contain at least two paths
filePaths = {...
'C:\Test\Run1\data1','C:\Test\Run1\data2','C:\Test\Run1\data3'; ...
'C:\Test\Run2\data1','C:\Test\Run2\data2','C:\Test\Run2\data3'};
path1 = filePaths{1};
filePaths = filePaths(2:end);
% find longest common left-anchored substring
d = 1;
while ~any(cellfun(#isempty, strfind(filePaths, path1(1:d))))
d = d + 1;
end
% find common base path from substring
[common_directory, ~, ~] = fileparts(path1(1:d));
Your code leaves d containing the length of the longest common left-anchored substring between the paths, but that might be longer than the common base path; fileparts extracts the actual base path from that substring.

Extract specific column information from table in MATLAB

I have several *.txt files with 3 columns information, here just an example of one file:
namecolumn1 namecolumn2 namecolumn3
#----------------------------------------
name1.jpg someinfo1 name
name2.jpg someinfo2 name
name3.jpg someinfo3 name
othername1.bmp info1 othername
othername2.bmp info2 othername
othername3.bmp info3 othername
I would like to extract from "namecolumn1" only the names starting with name but from column 1.
My code look like this:
file1 = fopen('test.txt','rb');
c = textscan(file1,'%s %s %s','Headerlines',2);
tf = strcmp(c{3}, 'name');
info = c{1}{tf};
the problem is that when I do disp(info) I got only the first entry from the table: name1.jpg and I would like to have all of them:
name1.jpg
name2.jpg
name3.jpg
You're pretty much there. What you're seeing is an example of MATLAB's Comma Separated List, so MATLAB is returning each value separately.
You can verify this by entering c{1}{tf} in the command line after running your script, which returns:
>> c{1}{tf}
ans =
name1.jpg
ans =
name2.jpg
ans =
name3.jpg
Though sometimes we'd want to concatenate them, I think in the case of character arrays it is more difficult to work with than retaining the cell arrays:
>> info = [c{1}{tf}]
info =
name1.jpgname2.jpgname3.jpg
versus
>> info = c{1}(tf)
info =
'name1.jpg'
'name2.jpg'
'name3.jpg'
The former would require you to reshape the result (and whitespace pad, if the strings are different lengths), whereas you can index the strings in a cell array directly without having to worry about any of that (e.g. info{1}).

matlab How to remove .jpg from file name

I am looping through a lot of files and I need to remove the '.jpg' from each name.
Example file name:
20403y.jpg
but I just need the
20403y
All the file names end with 'y' if that helps.
One way is with regular expressions:
filename = 'myfilename.jpg';
pattern = '.jpg';
replacement = '';
regexprep(filename,pattern,replacement)
Result:
ans =
myfilename
If you have the filenames in a cell array feed the cell array to regexprep. As the documentation explains, "If str is a cell array of strings, then the regexprep return value s is always a cell array of strings having the same dimensions as str."
Example:
myfilenames = {'myfilename.jpg' 'afilename.jpg' 'anotherfilename.jpg' };
newfilenames= regexprep(myfilenames,'.jpg','');
Result:
newfilenames =
'myfilename' 'afilename' 'anotherfilename'
files = dir('*y.jpg');
% Loop through each
for id = 1:length(files)
% Get the file name (minus the extension)
[p, f] = fileparts(files(id).name); % f will just give you file name
% Use following to rename the files
% I think you don't want to rename them
% movefile(files(id).name, f);
end