Separating an array based on whether it contains a phrase or not

Separating an array based on whether it contains a phrase or not - matlab

I am really just a noob at Matlab, so please don't get upset if I use wrong syntax. I am currently writing a small program in which I put all .xlsx file names from a certain directory into an array. I now want to separate the files into two different arrays based on their name. This is what I tried:
files = dir('My_directory\*.xlsx')
file_number = 1;
file_amount = length(files);
while file_number <= file_amount;
file_name = files(file_number).name;
filescs = [];
filescwf = [];
if strcmp(file_name,'*cs.xlsx') == 1;
filescs = [filescs,file_name];
else
filescwf = [filescwf,file_name];
end
file_number = file_number + 1
end
The idea here is that strcmp(file_name,'*cs.xlsx') checks if file_name contains 'cs' at the end. If it does, it is put into filescs, if it doesn't it is put into filescwf. However, this does not seem to work...
Any thoughts?

strcmp(file_name,'*cs.xlsx') checks whether file_name is identical to *cs.xlsx. If there is no file by that name (hint: few file systems allow '*' as part of a file name), it will always be false. (btw: there is no need for the '==1' comparison or the semicolon on the respective line)
You can use array indexing here to extract the relevant part of the filename you want to compare. file_name(1:5), will give you the first 5 characters, file_name(end-5:end) will give you the last 6, for example.

strcmp doesn't work with wildcards such as *cs.xlsx. See this question for an alternative approach.

You can use regexp to check the final letters of each of your files, then cellfun to apply regexp to all your filenames.
Here, getIndex will have 1's for all the files ending with cs.xlsx. The (?=$) part make sure that cs.xlsx is at the end.
files = dir('*.xlsx')
filenames = {files.name}'; %get filenames
getIndex = cellfun(#isempty,regexp(filenames, 'cs.xlsx(?=$)'));
list1 = filenames(getIndex);
list2 = filenames(~getIndex);

Related

Saving figure without providing filename [duplicate]

this question about matlab:
i'm running a loop and each iteration a new set of data is produced, and I want it to be saved in a new file each time. I also overwrite old files by changing the name. Looks like this:
name_each_iter = strrep(some_source,'.string.mat','string_new.(j).mat')
and what I#m struggling here is the iteration so that I obtain files:
...string_new.1.mat
...string_new.2.mat
etc.
I was trying with various combination of () [] {} as well as 'string_new.'j'.mat' (which gave syntax error)
How can it be done?

Strings are just vectors of characters. So if you want to iteratively create filenames here's an example of how you would do it:
for j = 1:10,
filename = ['string_new.' num2str(j) '.mat'];
disp(filename)
end
The above code will create the following output:
string_new.1.mat
string_new.2.mat
string_new.3.mat
string_new.4.mat
string_new.5.mat
string_new.6.mat
string_new.7.mat
string_new.8.mat
string_new.9.mat
string_new.10.mat

You could also generate all file names in advance using NUM2STR:
>> filenames = cellstr(num2str((1:10)','string_new.%02d.mat'))
filenames =
'string_new.01.mat'
'string_new.02.mat'
'string_new.03.mat'
'string_new.04.mat'
'string_new.05.mat'
'string_new.06.mat'
'string_new.07.mat'
'string_new.08.mat'
'string_new.09.mat'
'string_new.10.mat'
Now access the cell array contents as filenames{i} in each iteration

sprintf is very useful for this:
for ii=5:12
filename = sprintf('data_%02d.mat',ii)
end
this assigns the following strings to filename:
data_05.mat
data_06.mat
data_07.mat
data_08.mat
data_09.mat
data_10.mat
data_11.mat
data_12.mat
notice the zero padding. sprintf in general is useful if you want parameterized formatted strings.

For creating a name based of an already existing file, you can use regexp to detect the '_new.(number).mat' and change the string depending on what regexp finds:
original_filename = 'data.string.mat';
im = regexp(original_filename,'_new.\d+.mat')
if isempty(im) % original file, no _new.(j) detected
newname = [original_filename(1:end-4) '_new.1.mat'];
else
num = str2double(original_filename(im(end)+5:end-4));
newname = sprintf('%s_new.%d.mat',original_filename(1:im(end)-1),num+1);
end
This does exactly that, and produces:
data.string_new.1.mat
data.string_new.2.mat
data.string_new.3.mat
...
data.string_new.9.mat
data.string_new.10.mat
data.string_new.11.mat
when iterating the above function, starting with 'data.string.mat'

Matlab - How can I find the lowest common directory of an arbitrary group of files?

I have a cell array of full file names and I want to find the lowest common directory where it makes sense to store accumulated data and what not.
Here is an example hierarchy of test data:
C:\Test\Run1\data1
C:\Test\Run1\data2
C:\Test\Run1\data3
C:\Test\Run2\data1
C:\Test\Run2\data2
.
.
.
In Matlab, the paths are stored in a cell array as follows (each run shares a row):
filePaths = {...
'C:\Test\Run1\data1','C:\Test\Run1\data2','C:\Test\Run1\data3'; ...
'C:\Test\Run2\data1','C:\Test\Run2\data2','C:\Test\Run2\data3'};
I want to write a routine that outputs the common path C:\Test\Run1 so that I can store relevant plots in a new directory there.
C:\Test\Run1\Accumulation_Plots
C:\Test\Run2\Accumulation_Plots
.
.
.
Previously, I was only concerned with two files in an x-by-2 cell, so the regiment below worked; however, strcmp lost it's appeal since I can't (AFAIK) index the whole cell at once.
d = 1;
while strcmp(filePaths{1}(1:d),filePaths{2}(1:d))
d = d + 1;
end
common_directory = filePaths{1}(1:d-1);
mkdir(common_directory,'Accumulation_Plots');

As suggested by #nekomatic, I'm posting my comment as an answer.
filePaths = {...
'C:\Test\Run1\data1','C:\Test\Run1\data2','C:\Test\Run1\data3'; ...
'C:\Test\Run2\data1','C:\Test\Run2\data2','C:\Test\Run2\data3'};
% Sort the file paths
temp = sort(filePaths(:));
% Take the first and the last one, and split by '\'
first = strsplit(temp{1}, '\');
last = strsplit(temp{end}, '\');
% Compare them up to the depth of the smallest. Find the 'first N matching values'
sizeMin = min(numel(first), numel(last));
N = find(~[cellfun(#strcmp, first(1:sizeMin), last(1:sizeMin)) 0], 1, 'first') - 1;
% Get the smallest common path
commonPath = strjoin(first(1:N), '\');

You just need to compare the first d characters of any path in the array - e.g. path 1 - with the first d characters of the other paths. The longest common base path can't be longer than path 1 and it can't be shorter than the shortest common base path between path 1 and any other path.
There must be several ways you could do that, but a concise one is using strfind to match the strings and cellfun with isempty to check which ones didn't match:
% filePaths should contain at least two paths
filePaths = {...
'C:\Test\Run1\data1','C:\Test\Run1\data2','C:\Test\Run1\data3'; ...
'C:\Test\Run2\data1','C:\Test\Run2\data2','C:\Test\Run2\data3'};
path1 = filePaths{1};
filePaths = filePaths(2:end);
% find longest common left-anchored substring
d = 1;
while ~any(cellfun(#isempty, strfind(filePaths, path1(1:d))))
d = d + 1;
end
% find common base path from substring
[common_directory, ~, ~] = fileparts(path1(1:d));
Your code leaves d containing the length of the longest common left-anchored substring between the paths, but that might be longer than the common base path; fileparts extracts the actual base path from that substring.

Matlab | How to load/use files with consecutive names (abc1, abc2, abc3) and then pass on to the next series (cba1, cba2, cba3)?

I have a folder containing a series of data with file names like this:
abc1
abc2
abc3
bca1
bca2
bca3
bca4
bca5
cba1
... etc
My goal is to load all the relevant files for each file name, so all the "abc" files, and plot them in one graph. Then move on to the next file name, and do the same, and so forth. Is there a way to do this?
This is what I currently have to load and run through all the files, grab the data in them and get their name (without the .mat extension) to be able to save the graph with the same filename.
dirName = 'C:\DataDirectory';
files = dir( fullfile(dirName,'*.mat') );
files = {files.name}';
data = cell(numel(files),1);
for i=1:numel(files)
fname = fullfile(dirName,files{i});
disp(fname);
files{i} = files{i}(1:length(files{i})-4);
disp(files{i});
[Rest of script]
end

You already found out about the cool features of dir, and have a cell array files, which contains all file names, e.g.
files =
'37abc1.mat'
'37abc2.mat'
'50bca1.mat'
'50bca2.mat'
'1cba1.mat'
'1cba2.mat'
The main task now is to find all prefixes, 37abc, 50bca, 1cba, ... which are present in files. This can be done using a regular expression (regexp). The Regexp Pattern can look like this:
'([\d]*[\D]*)[\d]*.mat'
i.e. take any number of numbers ([\d]*), then any number of non-numeric characters ([\D]*) and keep those (by putting that in brackets). Next, there will be any number of numeric characters ([\d]*), followed by the text .mat.
We call the regexp function with that pattern:
pre = regexp(files,'([\d]*[\D]*)[\d]*.mat','tokens');
resulting in a cell array (one cell for each entry in files), where each cell contains another cell array with the prefix of that file. To convert this to a simple not-nested cell array, we call
pre = [pre{:}];
pre = [pre{:}];
resulting in
pre =
'37abc' '37abc' '50bca' '50bca' '1cba' '1cba'
To remove duplicate entries, we use the unique function:
pre = unique(pre);
pre =
'37abc' '50bca' '1cba'
which leaves us with all prefixes, that are present. Now you can loop through each of these prefixes and apply your stuff. Everything put together is:
% Find all files
dirName = 'C:\DataDirectory';
files = dir( fullfile(dirName,'*.mat') );
files = {files.name}';
% Find unique prefixes
pre = regexp(files,'([\d]*[\D]*)[\d]*.mat','tokens');
pre = [pre{:}]; pre = [pre{:}];
pre = unique(pre);
% Loop through prefixes
for ii=1:numel(pre)
% Get files with this prefix
curFiles = dir(fullfile(dirName,[pre{ii},'*.mat']));
curFiles = {curFiles.name}';
% Loop through all files with this prefix
for jj=1:numel(curFiles)
% Here the magic happens
end
end

Sorry, I misunderstood your question, I found this solution:
file = dir('*.mat')
matching = regexp({file.name}, '^[a-zA-Z_]+[0-9]+\.mat$', 'match', 'once'); %// Match once on file name, must be a series of A-Z a-z chars followed by numbers.
matching = matching(~cellfun('isempty', matching));
str = unique(regexp(matching, '^[a-zA-Z_]*', 'match', 'once'));
str = str(~cellfun('isempty', str));
group = cell(size(str));
for is = 1:length(str)
ismatch = strncmp(str{is}, matching, length(str{is}));
group{is} = matching(ismatch);
end
Answer came from this source: Matlab Central

Create a list and append file names to this list

I am trying to append filenames in a directory to a list for later processing. The code below does not work.
files = dir( fullfile(home,'*.csv') );
files = {files.name}'; %'# file names
symbolsList = [];
filedata = cell(numel(files),1); %# store file contents
for i=1:numel(files)
[pathstr,name,ext] = fileparts(files{i});
symbolsList(end + 1) = name; % THIS GIVES ERROR
end

In your code, symbolsList will be interpreted as an array of characters. The statement where the error is appearing is interpreted as appending a single character to symbolsList. You are probably getting a subscript alignment mismatch because a name will most likely have more than one character, yet you are trying to fit many characters into a single spot in that array of characters. That's probably not what you want.
You want each "space" to have a name. Because each name will most likely not have the same amount of characters, you should probably use a cell array instead:
files = dir( fullfile(home,'*.csv') );
files = {files.name}'; %'# file names
symbolsList = cell(numel(files),1); %// Change
filedata = cell(numel(files),1); %# store file contents
for i=1:numel(files)
[pathstr,name,ext] = fileparts(files{i});
symbolsList{i} = name; %// Change
end
Take note that I've pre-allocated the cell array and for each file you want to look at, I've indexed into the right cell and placed the name there. This is preferred over concatenation primarily due to efficiency. To access the ith name, simply do:
name_to_choose = symbolsList{i};
Minor Note
filedata in your code isn't being used anywhere at all. Are you sure you put all of your code up?

What do I have to add at the beginning of this loop?

how I can read the following files using the for loop: (can the loop ignore the characters in filenames?)
abc-1.TXT
cde-2.TXT
ser-3.TXT
wsz-4.TXT
aqz-5.TXT
iop-6.TXT
What do I have to add at the beginning of this loop ??
for i = 1:1:6
nom_fichier = strcat(['MyFile\.......' num2str(i) '.TXT']);

You can avoid constructing the filenames by using the DIR command. For instance:
myfiles = dir('*.txt');
for i = 1:length(myfiles)
nom_fichier = myfiles(i).name;
...do processing here...
end

First of all, why would you use strcat here? This is, by itself, a SINGLE string. All concatenation has already been done by the brackets [].
['MyFile\.......' num2str(i) '.TXT']
Next, I'm not certain what is your question here. Is it how to load in the data? If the files are simply delimited numbers, with the same number of them on each line, then load will suffice to load them in, or perhaps you may need textread.
My guess is you do not know how to build the main part of of the file name. You might do it this way:
Names = {'abc' 'cde 'ser' 'wsz' 'aqz' 'iop'};
for i = 1:6
fn = ['MyFile',filesep,Names{i},'-',num2str(i),'.TXT'];
data = load(fn);
% do other stuff ...
end
If you don't want to create a variable with the names by typing them in, then use dir, perhaps like this to create a list of text file names:
Names = dir('MyFile\*.TXT');

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Separating an array based on whether it contains a phrase or not - matlab

strcmp doesn't work with wildcards such as *cs.xlsx. See this question for an alternative approach.

Related

Saving figure without providing filename [duplicate]

Matlab - How can I find the lowest common directory of an arbitrary group of files?

Matlab | How to load/use files with consecutive names (abc1, abc2, abc3) and then pass on to the next series (cba1, cba2, cba3)?

Create a list and append file names to this list

What do I have to add at the beginning of this loop?

Categories

Resources