I have a folderstructure on which I want to detect duplicate foldernames e.g.
C:\APP10001.001
C:\APP10001.002
C:\APP10002.001
C:\APP10003.002
C:\APP10003.003
C:\APP10003.004
C:\APP10004.002
In this case I want to detect that there are folders with the same name (wihtout the folder extension), C:\APP10001 and C:\APP10003.
Output should look like:
APP10001 2
APP10003 3
Related
Let's say string is a variable file name like few examples below:
file1_name_cr_001.csv
file2_name1_name2.nn.123.456_updt_000.csv
filename_2012.444.1234_utc_del_004.csv
The length of last 8 string values will always remain fixed i.e. (_001.csv,_000.csv,_004.csv). We need to only extract values = cr, updt, del
How can we get the value as single value before _cr,_updt,_del.?
any suggetions.?
output should get like this:
file1_name/cr/001
file2_name1_name2.nn.123.456/updt/000
filename_2012.444.1234_utc/del/004
I have reproduced the above and got the below results.
First, I took a sample file name in set variable.
Then, I got the string from start to length-8.
#substring(variables('sample'),0,sub(length(variables('sample')),8))
For end folder:
#replace(split(substring(variables('sample'),sub(length(variables('sample')),8), 8),'.')[0],'_','')
For Start folder:
#substring(variables('before_8'), 0, lastIndexOf(variables('before_8'), '_'))
For middle folder:
#split(variables('before_8'), '_')[sub(length(split(variables('before_8'), '_')), 1)]
Result folder structure:
#concat(variables('start'),'/',variables('middle'),'/',variables('end'))
Result:
Give this variable in copy activity source folder path and it will generate the folder structure for you.
For multiple file names, first store all file names in an array then use a ForEach and inside ForEach do the same operations as above.
I have a perl script that reads a .txt and a .bam file, and creates an output called output.txt.
I have a lot of files that are all in different folders, but are only slightly different in the filename and directory path.
All of my txt files are in different subfolders called PointMutation, with the full path being
/Volumes/Lab/Data/Darwin/Patient/[Plate 1/P1H10]/PointMutation
The text(s) in the bracket is the part that changes, But the Patient subfolder contains all of my txt files.
My .bam file is located in a subfolder named DNA with a full path of
/Volumes/Lab/Data/Darwin/Patient/[Plate 1/P1H10]/SequencingData/DNA
Currently how I run this script is go on the terminal
cd /Volumes/Lab/Data/Darwin/Patient/[Plate 1/P1H10]/PointMutation
perl ~/Desktop/Scripts/Perl.pl "/Volumes/Lab/Data/Darwin/Patient/[Plate
1/P1H10]/PointMutation/txtfile.txt" "/Volumes/Lab/Data/Darwin/Patient/[Plate
1/P1H10]/SequencingData/DNA/bamfile.bam"
With only 1 or two files, that is fairly easy, but I would like to automate it once the files get much larger. Also once I run these once, I don't want to do it again, but I will get more information from the same patient, is there a way to block a folder from being read?
I would do something like:
for my $dir (glob "/Volumes/Lab/Data/Darwin/Patient/*/"){
# skip if not a directory
if (! -d $dir) {
next;
}
my $txt = "$dir/PointMutation/txtfile.txt";
my $bam = "$dir/SequencingData/DNA/bamfile.bam";
# ... you magical stuff here
}
This is assuming that all directories under /Volumes/Lab/Data/Darwin/Patient/ follow the convention.
That said, more long term/robust way of organizing analyses with lots of different files all over the place is either 1) organize all files necessary for each analysis under one directory, or 2) to create meta files (i'd use JSON/yaml) which contain the necessary file names.
I'd like to remove '-2' from the filenames looking like this:
EID-NFBSS-2FE454B7-2_TD.eeg
EID-NFBSS-2FE454B7-2_TD.vhdr
EID-NFBSS-2FE454B7-2_TD.vmrk
EID-NFBSS-3B3BF9FA-2_BU.eeg
EID-NFBSS-2FE454B7-2_PO.txt
So as you may see the names of the files are different and there are different kind of extensions as well. All what I want to do is remove '-2' from all of the filenames. I was trying use this:
pattern = '-2';
replacement = '';
regexprep(filename,pattern,replacement)
and I got the results in the console, but after many attempts I have no idea how to 'say' to MATLAB switch the filnames in the same location.
#excaza hit it right on the money. You'll have to probe your desired directory for a list of files via dir, then loop through each filename and remove any occurrences of -2, then use movefile to rename the file, and delete to delete the old file.
Something like this comes to mind:
%// Get all files in this directory
d = fullfile('path', 'to', 'folder', 'here');
directory = dir(d);
%// For each file in this directory...
for ii = 1 : numel(directory)
%// Get the relative filename
name = directory(ii).name;
%// Replace any instances of -2 with nothing
name_after = regexprep(name, '-2', '');
%// If the string has changed after this...
if ~strcmpi(name, name_after)
%// Get the absolute path to both the original file and
%// the new file name
fullname = fullfile(directory, name);
fullname_after = fullfile(directory, name_after);
%// Create the new file
movefile(fullname, fullname_after);
%// Delete the old file
delete(fullname);
end
end
The logic behind this is quite simple. First, the string d determines the directory where you want to search for files. fullfile is used to construct your path by parts. The reason why this is important is because this allows the code to be platform agnostic. The delineation to separate between directories is different between operating systems. For example, in Windows the character is \ while on Mac OS and Linux, it's /. I don't know which platform you're running so fullfile is going to be useful here. Simply take each part of your directory and put them in as separate strings into fullfile.
Now, use dir to find all files in this directory of your choice. Replace the /path/to/folder/here with your desired path. Next, we iterate over all of the files. For each file, we get the relative filename. dir contains information about each file, and the field you want that is most important is the name attribute. However, this attribute is relative, which means that only the filename itself, without the full path to where this file is stored is given. After, we use regexprep as you have already done to replace any instances of -2 with nothing.
The next point is important. After we try and change the filename, if the file isn't the same, we need to create a new file by simply copying the old file to a new file of the changed name and we delete the old file. The function fullfile here helps establish absolute paths to where your file is located in the off-chance that are you running this script in a directory that doesn't include the files you're looking for.
We use fullfile to find the absolute paths to both the old and new file, use movefile to create the new file and delete to delete the old file.
I apologize in advance that this question is not specific. But my goal is to take a bunch of image files, which are currently named as: 0.tif, 1.tif, 2.tif, etc... and rename them just as numbers to 000.tif, 001.tif, 002.tif, ... , 010.tif, etc...
The reason I want to do this is because I am trying to load the images into matlab and for batch processing but matlab does not order them correctly. I use the dir command as dir(*.tif) to get all the images and load them into an array of files that I can iterate over and process, but in this array element 1 is 0.tif, element 2 is 1.tif, element 3 is 10.tif, element 4 is 100.tif, and so on.
I want to keep the ordering of the elements as I process them. However, I do not care if I have to change the order of the elements BEFORE processing them (i.e. I can make it work to rename, for example, 2.tif to 10.tif if I had to) but I am looking for a way to convert the file names the way I initially described.
If there is a better way to get matlab to properly order the files when it loads them into the array using dir please let me know because that would be much easier.
Thanks!!
You can do this without having to rename the files, if you want. When you grab the files using dir, you'll have a list of files like so:
files =
'0.tif'
'1.tif'
'10.tif'
...
You can grab just the numeric part using regexp:
nums = regexp(files,'\d+','match');
nums = str2double([nums{:}]);
nums =
0 1 10 11 12 ...
regexp returns its matches as a cell-array, the second line converts it back to actual numbers.
We can now get an actual numeric order by sorting the resulting array:
[~,order] = sort(nums);
and then put the files in the correct order:
files = files(order);
This should (I haven't tested it, I don't have a folder full of numerically labelled files handy) produce a list of files like so:
files=
'0.tif'
'1.tif'
'2.tif'
'3.tif'
...
this is partially dependent on the version of matlab you have. If you have a version with findstr this should work well
num_files_to_rename = numel(name_array);
for ii=1:num_files_to_rename
%in my test i used cells to store my strings you may need to
%change the bracket type for your application
curr_file = name_array{ii};
%locates the period in the file name (assume there is only one)
period_idx = findstr(curr_file ,'.');
%takes everything to the left of the period (excluding the period)
file_name = str2num(curr_file(1:period_idx-1));
%zeropads the file name to 3 spaces using a 0
new_file_name = sprintf('%03d.tiff',file_name)
%you can uncomment this after you are sure it works as you planned
%movefile(curr_file, new_file_name);
end
the actual rename operation movefile is commented out for now. make sure the output names are as you expect before uncommenting it and renaming all the files.
EDIT there is no real error checking in this code, it just assumes every file name has one and only one period, and an actual number as the name
The Batch file below do the rename of the files you want:
#echo off
setlocal EnableDelayedExpansion
for /F "delims=" %%f in ('dir /B *.tif') do (
set "name=00%%~Nf"
ren "%%f" "!name:~-3!.tif"
)
Note that this solution preserve the same order of your original files, even if there are missing numbers in the sequence..
I have one dir with 50 folders, and each folder has 50 files. I have a script to read all files in each folder and save the results, but I need to type the folder name every time. Is there any loop or batch tools I can use? Any suggestions or code greatly appreciated.
There may be a cleaner way to do it, but the output of the dir command can be assigned to a variable. This gives you a struct, with the pertinent fields being name and isdir. For instance, assuming that the top-level directory (the one with 50 files) only has folders in it, the following will give you the first folder's name:
folderList = dir();
folderList(3).name
(Note that the first two entries in the folderList struct will be for "." (the current directory) and ".." (the parent directory), so if you want the first directory with files in it you have to go to the third entry). If you wish to go through the folders one by one, you can do something like the following:
folderList = dir();
for i = 3:length(folderList)
curr_directory = pwd;
cd(folderList(i).name); % changes directory to the next working directory
% operate with files as if you were in that directory
cd(curr_directory); % return to the top-level directory
end
If the top-level directory contains files as well as folders, then you need to check the isdir of each entry in the folderList struct--if it is "1", it's a directory, if it is "0", it's a file.