Find duplicates in matlab path - matlab

Duplicates in the matlab path are a hassle, because you can not control which one gets executed. A first step to handle duplicates is to find them. How can I find duplicate .m files in my matlab path ?

Well, in itself it's not a herculean task. We juste have to list all .m files in the path and find multiple occurrences of the same file. We can use a mix of the path, what, and unique functions.
function find_duplicate()
P=path;
P=strsplit(P, pathsep());
% mydir='/home/myusername/matlabdir';
% P=P(strncmpi(mydir,P,length(mydir)));
P=cellfun(#(x) what(x),P,'UniformOutput',false);
P=vertcat(P{:});
Q=arrayfun(#(x) x.m,P,'UniformOutput',false); % Q is a cell of cells of strings
Q=vertcat(Q{:});
R=arrayfun(#(x) repmat({x.path},size(x.m)),P,'UniformOutput',false); % R is a cell of cell of strings
R=vertcat(R{:});
[C,ia,ic]=unique(Q);
for c=1:numel(C)
ind=strcmpi(C{c},Q);
if sum(ind)>1
fprintf('duplicate %s at paths\n\t',C{c});
fprintf('%s\n\t',R{ind});
fprintf('\n');
end
end
end
Rather than handling the complete Matlab path, one can restrict the search for duplicates to one's own folder. To do that, just uncomment the third line and replace the directory name by one of your choice.

To analyze a given folder (recursively), you can proceed as follows.
folder = 'C:\Users\Luis\Desktop'; %// folder to be analyzed
[ success files id ] = fileattrib(['.' filesep '*']); %// this is recursive
[fullNames{1:numel(files)}] = deal(files.Name);
isMFile = cellfun(#(s) all(s(end-1:end)=='.m'), fullNames);
fullNames = fullNames(isMFile); %// keep only m-files
F = numel(fullNames);
start = cellfun(#(s) find(s==filesep,1,'last'), fullNames);
names = arrayfun(#(k) fullNames{k}(start(k)+1:end), 1:F, 'uni', 0); %// file name
[ii jj] = ndgrid(1:F); %// generate all pairs
equal = arrayfun(#(n) strcmp(names{ii(n)},names{jj(n)}), 1:F^2); %// test each
%// pair of files
equal = reshape(equal,F,F) - eye(F); %// equality with oneself doesn't count
isDuplicate = any(equal); %// it is a duplicate if it has some equal file
duplicates = fullNames(isDuplicate); %// cell array with full names of duplicates
To test the whole path, use the above code in a loop over all folders in the path. You can do it along the following lines (I haven't tested it, as I don't have the strsplit function):
p = path;
p = strsplit(p,';');
duplicates = {};
for kk = numel(p)
folder = p{kk};
[ success files id ] = fileattrib(['.' filesep '*']);
[fullNames{1:numel(files)}] = deal(files.Name);
isMFile = cellfun(#(s) all(s(end-1:end)=='.m'), fullNames);
fullNames = fullNames(isMFile);
F = numel(fullNames);
start = cellfun(#(s) find(s==filesep,1,'last'), fullNames);
names = arrayfun(#(k) fullNames{k}(start(k)+1:end), 1:F, 'uni', 0);
[ii jj] = ndgrid(1:F);
equal = arrayfun(#(n) strcmp(names{ii(n)},names{jj(n)}), 1:F^2);
equal = reshape(equal,F,F) - eye(F);
isDuplicate = any(equal);
duplicates = {duplicates, fullNames(isDuplicate)}; %// add previous ones
end

Related

Matlab: Error using readtable (line 216) Input must be a row vector of characters or string scalar

I gave the error Error using readtable (line 216) Input must be a row vector of characters or string scalar when I tried to run this code in Matlab:
clear
close all
clc
D = 'C:\Users\Behzad\Desktop\New folder (2)';
filePattern = fullfile(D, '*.xlsx');
file = dir(filePattern);
x={};
for k = 1 : numel(file)
baseFileName = file(k).name;
fullFileName = fullfile(D, baseFileName);
x{k} = readtable(fullFileName);
fprintf('read file %s\n', fullFileName);
end
% allDates should be out of the loop because it's not necessary to be in the loop
dt1 = datetime([1982 01 01]);
dt2 = datetime([2018 12 31]);
allDates = (dt1 : calmonths(1) : dt2).';
allDates.Format = 'MM/dd/yyyy';
% 1) pre-allocate a cell array that will store
% your tables (see note #3)
T2 = cell(size(x)); % this should work, I don't know what x is
% the x is xlsx files and have different sizes, so I think it should be in
% a loop?
% creating loop
for idx = 1:numel(x)
T = readtable(x{idx});
% 2) This line should probably be T = readtable(x(idx));
sort = sortrows(T, 8);
selected_table = sort (:, 8:9);
tempTable = table(allDates(~ismember(allDates,selected_table.data)), NaN(sum(~ismember(allDates,selected_table.data)),size(selected_table,2)-1),'VariableNames',selected_table.Properties.VariableNames);
T2 = outerjoin(sort,tempTable,'MergeKeys', 1);
% 3) You're overwriting the variabe T2 on each iteration of the i-loop.
% to save each table, do this
T2{idx} = fillmissing(T2, 'next', 'DataVariables', {'lat', 'lon', 'station_elevation'});
end
the x is each xlsx file from the first loop. my xlsx file has a different column and row size. I want to make the second loop process for all my xlsx files in the directory.
did you know what is the problem? and how to fix it?
Readtable has one input argument, a filename. It returns a table. In your code you have the following:
x{k} = readtable(fullFileName);
All fine, you are reading the tables and storing the contents in x. Later in your code you continue with:
T = readtable(x{idx});
You already read the table, what you wrote is basically T = readtable(readtable(fullFileName)). Just use T=x{idx}

Save matrices of a loop iteration in one matrix

I have a loop that makes a 100x10 matrix in every iteration, i want to save all the matrices of this loop in one matrix. assuming that i have a loop with 5 iterations, i want to have a 500x10 matrix in the end (after appending all the 5 matrices of the loop).
for ii = 1:numfiles
str = fullfile(PathName,FileName{ii});
file_id = fopen(str);
data = fread (file_id)';
....
s = zeros (100, 10);
%doing some stuffs
save('s_all', 's','-append');
end
I have used save('s_all', 's','-append');
but it doesn't append the matrices. How can i do that?
As you can read in the document:
save(filename,variables,'-append') adds new variables to an existing file. If a variable already exists in a MAT-file, then save overwrites it with the value in the workspace.
Therefore, save just adds a variable at the end of the .mat file, not to add at the end of a variable inside the .mat file.
Solution 1:
To write matrix into the file it would be better using dlmwrite likes the following:
dlmwrite(filename,s,'-append');
You can find more details here.
In a complete case you can do:
filename = 's_all.csv';
for ii = 1:numfiles
str = fullfile(PathName,FileName{ii});
file_id = fopen(str);
data = fread (file_id)';
% ...
s = zeros (100, 10);
%doing some stuffs
dlmwrite(filename,s,'-append');
end
Solution 2:
The other solution is each time load the specified matrix, then attach the matrices into it, and then append to the file.
filename = 'file.mat';
% suppose originMatrix is an empty matrix or a matrix with columns size 10
for ii = 1:numfiles
load(filename,'originMatrix');
s = zeros (100, 10);
%doing some stuffs
originMatrix = [originMatrix; s];
save(filename,'originMatrix','-append');
end

How to force Matlab to read files in a folder serially?

I have files in a folder that are numbered from writer_1 to writer_20. I wrote a code to read all the files and store them in cells. But the problem is that the files are not read serially.
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = []; %Remove these two from list
training = [];
for i = 1:length(folders)
current_folder = [Path_training folders(i).name '\'];
.
.
.
.
.
Here folders(1).name is writer_1 and folders(2).name is writer_10
I know that dir will return its results as the explorer does but is there any way to force it to go numerically?
I'm training an SVM based on these numbers and this problem is making it difficult.
I don't know of any direct solutions to the problem you are having.
I found a solution for a problem similar to yours, here
List = dir('*.png');
Name = {List.name};
S = sprintf('%s,', Name{:}); % '10.png,100.png,1000.png,20.png, ...'
D = sscanf(S, '%d.png,'); % [10; 100, 1000; 20; ...]
[sortedD, sortIndex] = sort(D); % Sort numerically
sortedName = Name(sortIndex); % Apply sorting index to original names
Differences are:
You are dealing with directories instead of files
Your directories have other text in their names in addition to the numbers
Approach #1
%// From your code
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = []
%// Convert folders struct to a cell array with all of the data from dir
folders_cellarr = struct2cell(folders)
%// Get filenames
fn = folders_cellarr(1,:)
%// Get numeral part and sorted indices
num=str2double(cellfun(#(x) strjoin(regexp(x,['\d'],'match'),''), fn(:),'uni',0))
[~,ind] = sort(num)
%// Re-arrange folders based on the sorted indices
folders = folders(ind)
Approach #2
If you would like to avoid struct2cell, here's an alternative approach -
%// Get filenames
fn = cellstr(ls(Path_training))
fn(ismember(fn,{'.', '..'}))=[]
%// Get numeral part and sorted indices
num=str2double(cellfun(#(x) strjoin(regexp(x,['\d'],'match'),''), fn(:),'uni',0))
[~,ind] = sort(num)
%// List directory and re-arrange the elements based on the sorted indices
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = []
folders = folders(ind)
Please note that strjoin is a recent addition to MATLAB Toolbox. So, if you are on an older version of MATLAB, here's the source code link from MATLAB File-exchange.
Stealing a bit from DavidS and with the assumption that your folders all are of the form "writer_XX" with XX being digits.
folders = dir([pwd '\temp']);
folders(ismember( {folders.name}, {'.', '..'}) ) = [];
% extract numbers from cell array
foldersNumCell = regexp({folders.name}, '\d*', 'match');
% convert from cell array of strings to double
foldersNumber = str2double(foldersNumCell);
% get sort order
[garbage,sortI] = sort(foldersNumber);
% rearrange the structure
folders = folders(sortI);
The advantage of this is that it avoids a for loop. In reality it only makes a difference though if you have tens of thousands for folders. (I created 50,000 folders labeled 'writer_1' to 'writer_50000'. The difference in execution time was about 1.2 seconds.
Here is a slightly different way (edited to fix bug and implement suggestion of #Divakar to eliminate for loop)
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = [];
%// Get folder numbers as cell array of strings
folder_nums_cell = regexp({folders.name}, '\d*', 'match');
%// Convert cell array to vector of numbers
folder_nums = str2double(vertcat(folder_nums_cell{:}));
%// Sort original folder array
[~,inds] = sort(folder_nums);
folders = folders(inds);

Matlab: Renaming files in a folder sequentially

If there are the following files in the folder C:\test\:
file1.TIF, file2.TIF .... file100.TIF
Can MatLab automatically rename them to:
file_0001.TIF, file_0002.TIF, .... file_0100.TIF?
No-loop approach -
directory = 'C:\test\'; %//' Directory where TIFF images are present
filePattern = fullfile(directory, 'file*.tif'); %//' files pattern with absolute paths
old_filename = cellstr(ls(filePattern)) %// Get the filenames
file_ID = strrep(strrep(old_filename,'file',''),'.TIF','') %// Get numbers associated with each file
str_zeros = arrayfun(#(t) repmat('0',1,t), 5-cellfun(#numel,file_ID),'uni',0) %// Get zeros string to be pre-appended to each filename
new_filename = strcat('file_',str_zeros,file_ID,'.TIF') %// Generate new filenames
cellfun(#(m1,m2) movefile(m1,m2),fullfile(directory,old_filename),fullfile(directory,new_filename)) %// Finally rename files with the absolute paths
Edit 1:
For a case when you have the filenames as file27.TIF, file28.TIF, file29.TIF and so on and you would like to rename them as file0001.TIF, file0002.TIF, file0003.TIF and so on respectively, try this -
directory = 'C:\test\'; %//' Directory where TIFF images are present
filePattern = fullfile(directory, 'file*.tif'); %//' files pattern with absolute paths
old_filename = cellstr(ls(filePattern)) %// Get the filenames
file_ID = strrep(strrep(old_filename,'file',''),'.TIF','') %// Get numbers associated with each file
file_ID_doublearr = str2double(file_ID)
file_ID_doublearr = file_ID_doublearr - min(file_ID_doublearr)+1
file_ID = strtrim(cellstr(num2str(file_ID_doublearr)))
str_zeros = arrayfun(#(t) repmat('0',1,t), 4-cellfun(#numel,file_ID),'uni',0) %// Get zeros string to be pre-appended to each filename
new_filename = strcat('file',str_zeros,file_ID,'.TIF') %// Generate new filenames
cellfun(#(m1,m2) movefile(m1,m2),fullfile(directory,old_filename),fullfile(directory,new_filename)) %// Finally rename files with the absolute paths
A slightly more robust method:
dirlist = dir(fullfile(mypath,'*.TIF'));
fullnames = {dirlist.name}; % Get rid of one layer of cell array-ness
[~,fnames,~] = cellfun(#fileparts,fullnames,'UniformOutput',false); % Create cell array of the file names from the output of dir()
fnums = cellfun(#str2double,regexprep(fnames,'[^0-9]','')); % Delete any character that isn't a number, returns it as a vector of doubles
fnames = regexprep(fnames,'[0-9]',''); % Delete any character that is a number
for ii = 1:length(dirlist)
newname = sprintf('%s_%04d.TIF',fnames{ii},fnums(ii)); % Create new file name
oldfile = fullfile(mypath,dirlist(ii).name); % Generate full path to old file
newfile = fullfile(mypath,newname); % Generate full path to new file
movefile(oldfile, newfile); % Rename the files
end
Though this will accomodate filenames of any length, it does assume that there are no numbers in the filename other than the counter at the end. MATLAB likes to throw things into nested cell arrays, so I incorporated cellfun in a couple places to bring things into more manageable formats. It also allows us to vectorize some of the code.

Subset folder contents Matlab

I have about 1500 images within a folder named 3410001ne => 3809962sw. I need to subset about 470 of these files to process with Matlab code. Below is the section of code prior to my for loop which lists all of the files in a folder:
workingdir = 'Z:\project\code\';
datadir = 'Z:\project\input\area1\';
outputdir = 'Z:\project\output\area1\';
cd(workingdir) %points matlab to directory containing code
files = dir(fullfile(datadir, '*.tif'))
fileIndex = find(~[files.isdir]);
for i = 1:length(fileIndex)
fileName = files(fileIndex(i)).name;
Files also have ordinal directions attached (e.g. 3410001ne, 3410001nw), however, not all directions are associated with each root. How can I subset the folder contents to include 470 of 1500 files ranging from 3609902sw => 3610032sw? Is there a command where you can point Matlab to a range of files in a folder, rather than the entire folder? Thanks in advance.
Consider the following:
%# generate all possible file names you want to include
ordinalDirections = {'n','s','e','w','ne','se','sw','nw'};
includeRange = 3609902:3610032;
s = cellfun(#(d) cellstr(num2str(includeRange(:),['%d' d])), ...
ordinalDirections, 'UniformOutput',false);
s = sort(vertcat(s{:}));
%# get image filenames from directory
files = dir(fullfile(datadir, '*.tif'));
files = {files.name};
%# keep only subset of the files matching the above
files = files(ismember(files,s));
%# process selected files
for i=1:numel(files)
fname = fullfile(datadir,files{i});
img = imread(fname);
end
Something like this maybe could work.
list = dir(datadir,'*.tif'); %get list of files
fileNames = {list.name}; % Make cell array with file names
%Make cell array with the range of wanted files.
wantedNames = arrayfun(#num2str,3609902:3610032,'uniformoutput',0);
%Loop through the list with filenames and compare to wantedNames.
for i=1:length(fileNames)
% Each row in idx will be as long as wantedNames. The cells will be empty if
% fileNames{i} is unmatched and 1 if match.
idx(i,:) = regexp(fileNames{i},wantedNames);
end
idx = ~(cellfun('isempty',idx)); %look for unempty cells.
idx = logical(sum(,2)); %Sum each row