I have this folder organization
root/folder_1/file1_1 --up to-- file_5693
root/folder_2/file2_1 --up to-- file_100
root/folder_3/file3_1 --up to-- file_600
root/folder_4/file4_1 --up to-- file_689
I'd like to select a number (1000 example) of random files in each folder and put all them together in an output folder but, for folders with less than 200 files I'd like to copy all files.
root_2/output:
file1_350
.
.
.
file2_1 --> file2_100
.
.
.
etc
how can I do this ?
I tried to list all folder names in the directory with dir command but the folder numbers are not sequential. Any help ?
I might misunderstand but I do not see a reason for ordering the folder names since you will copy them anyway.
The following is the script for copying files inside folders which is again is in the root directory.
You can just change the following four variables ROOT_DIR, OUT_DIR, THRESHOLD_COPY and N_RANDOM_COPY.
% Define
ROOT_DIR = './'; % where the subdirectories are located
OUT_DIR = './root2'; % copy destination
THRESHOLD_COPY = 200; % threshold for copying all files
N_RANDOM_COPY = 100; % number of files that you want to copy
dirList = dir(ROOT_DIR);
dirList = dirList(3:end); % first two are ./ and ../
dirOnlyIndicators = cell2mat({dirList.isdir});
dirs = dirList(dirOnlyIndicators);
for dirIterator = transpose(dirs)
subdirList = dir([ ROOT_DIR dirIterator.name]);
fileIndicators = ~cell2mat({subdirList.isdir});
subfileList = {subdirList(fileIndicators)};
nFiles = sum(fileIndicators);
copyIndices = [];
if nFiles > THRESHOLD_COPY
copyIndices = randperm(nFiles);
copyIndices = copyIndices(1:N_RANDOM_COPY);
else
copyIndices = 1:nFiles;
end
for copyIndex = copyIndices
copyfile([ ROOT_DIR dirIterator.name '/' subfileList{copyIndex}.name],...
[OUT_DIR '/' subfileList{copyIndex}.name],...
'f');
end
end
Related
Could you please help me to read the data from a table.txt in a series of subfolders from a directory? In all the subfolders, the output to read has the same name, 'table.txt'. I want to process the data and save the output in the same folder.
I can process it using the following code.
a = readmatrix('table.txt');
a4 = a(:,4);
a4 = a4 - mean(a4);
N = 2^(nextpow2(length(a4)));
freq = (abs(fftshift(fft(a4,N))));
t=[0:1e-12:20e-9].';
ts=t(2)-t(1);
F = ((-N/2:N/2-1)/N)*(1/ts);
fmr=[(F(N/2+1:end)/1e9)' freq(N/2+1:end)];
writematrix(fmr, 'fmr.csv');
cd folder
But how to perform the same action on all the subfolders?
Could somebody please help me out?
You can use the "find files in subfolders" behaviour of dir. Something like this:
allTables = dir('**/table.txt');
for ii = 1:numel(allTables)
thisFolder = allTables(ii).folder;
inFile = fullfile(thisFolder, allTables(ii).name);
a = readmatrix(inFile);
% do stuff ...
fmr = ...
outFile = fullfile(thisFolder, 'fmr.csv');
writematrix(fmr, outFile);
end
Is it possible to create a filedatastore of mat files filtering files by filename pattern?
So far I got this:
fds = fileDatastore(dir_save,'ReadFcn',#load,'FileExtensions','.mat','IncludeSubfolders',true);
f=1;
while hasdata(fds)
disp(num2str(progress(fds)*100)
dataarray = read(fds);
if ~isempty(strfind(fds.Files{f},myPattern))
%% do somthing
end
f=f+1;
end
But some mat files I will not be using are really large and therefore slow down the process.
I cannot move all the files to 1 directory because my directory structure is like:
d01/file1.mat
d01/myPatternFile.mat
d01/othefile.mat
d02/file1.mat
d02/myPatternFile.mat
d02/othefile.mat
etc
You can use wildcards in the call to fullfile.
Using an example from the documentation of fileDatastore:
% No filtering
>> fds = fileDatastore(fullfile(matlabroot,'toolbox','matlab','demos'),'ReadFcn',#load,'FileExtensions','.mat')
fds =
FileDatastore with properties:
Files: {
'E:\MATLAB64\R2018b\toolbox\matlab\demos\accidents.mat';
'E:\MATLAB64\R2018b\toolbox\matlab\demos\airfoil.mat';
'E:\MATLAB64\R2018b\toolbox\matlab\demos\airlineResults.mat'
... and 37 more
}
UniformRead: 0
ReadFcn: #load
AlternateFileSystemRoots: {}
% Filtering for .mat files starting with "w"
>> fds =
fileDatastore(fullfile(matlabroot,'toolbox','matlab','demos','w*'),'ReadFcn',#load,'FileExtensions','.mat')
fds =
FileDatastore with properties:
Files: {
'E:\MATLAB64\R2018b\toolbox\matlab\demos\west0479.mat';
'E:\MATLAB64\R2018b\toolbox\matlab\demos\wind.mat'
}
UniformRead: 0
ReadFcn: #load
AlternateFileSystemRoots: {}
For checking over different folders, use two wildcards:
>> !mkdir d01
>> !mkdir d02
>> !touch d01/file1.mat
>> !touch d01/myPatternFile.mat
>> !touch d02/file2.mat
>> !touch d02/myPatternFile.mat
>> fileDatastore(fullfile(pwd,'d*\myPattern*'),'ReadFcn',#load,'FileExtensions','.mat')
ans =
FileDatastore with properties:
Files: {
'H:\Documents\56133896\d01\myPatternFile.mat';
'H:\Documents\56133896\d02\myPatternFile.mat'
}
UniformRead: 0
ReadFcn: #load
AlternateFileSystemRoots: {}
I'm new to Python and want to write a script to recursively delete files in a directory which are older than 2 years and have a specific extension like .zip, .txt etc.
I know this isn't GitHub but: I spend quite some time trying to figure it out and I have to admit the answer isn't that obvious but I found it
eventually. I have no idea why I spent half an hour on this random program but I did.
Its lucky i'm using python 3.7 as well because I didn't see your tag on the bottom of the post. This Image is a demo of me running what is titled The Program
Features
- Deletes all files from directory and subdirectory
- Able to change the extension to whatever you want eg: txt, bat, png, jpg
- Lets you change the folder you want erased to what you want eg from your C drive to pictures
The Program
import glob,os,sys,re,datetime
os.chdir("C:\\Users\\") # ------> PLEASE CHANGE THIS TO PREVENT YOUR C DRIVE GETTING DESTROYED THIS IS JUST AN EXAMPLE
src = os.getcwd()#Scans src which must be set to the current working directory
cn = 0
filedate = '2019'
clrd = 0
def random_function_name():
print("No files match the given criteria!")
return;
def find(path, *exts):
dirs = [a[0] for a in os.walk(path)]
f_filter = [d+e for d in dirs for e in exts]
return [f for files in [glob.iglob(files) for files in f_filter] for f in files]
print(src)
my_files = find(src,'\*py', '\*txt') #you can also add parameters like '\*txt', '\*jpg' ect
for f in my_files:
cn += 1
if filedate in datetime.datetime.fromtimestamp(os.path.getctime(f)).strftime('%Y/%m/%d|%H:%M:%S'):
print(' | CREATED:',datetime.datetime.fromtimestamp(os.path.getctime(f)).strftime('%Y/%m/%d|%H:%M:%S'),'|', 'Folder:','[',os.path.basename(os.path.dirname(f)),']', 'File:', os.path.split(os.path.abspath(f))[1], ' Bytes:', os.stat(f).st_size)
clrd += os.stat(f).st_size
def delete():
if cn != 0:
x = str(input("Delete {} file(s)? >>> ".format(cn)))
if x.lower() == 'yes':
os.remove(f)
print("You have cleared {} bytes of data".format(clrd))
sys.exit()
if x.lower() == 'no':
print('Aborting...')
sys.exit()
if x != 'yes' or 'no':
if x != '':
print("type yes or no")
delete()
else: delete()
if cn == 0:
print(str("No files to delete."))
sys.exit()
delete()
if filedate not in datetime.datetime.fromtimestamp(os.path.getctime(f)).strftime('%Y/%m/%d|%H:%M:%S'):
sys.setrecursionlimit(2500)
random_function_name()
On its own
This is for applying it to your own code
import glob,os,sys,re,datetime
os.chdir('C:\\Users')
src = os.getcwd()
def find(path, *exts):
dirs = [a[0] for a in os.walk(path)]
f_filter = [d+e for d in dirs for e in exts]
return [f for files in [glob.iglob(files) for files in f_filter] for f in files]
my_files = find(src,'\*py', '\*txt') #to add extensions do \*extension
for f in my_files:
if filedate in datetime.datetime.fromtimestamp(os.path.getctime(f)).strftime('%Y/%m/%d|%H:%M:%S'):
os.remove(f)
I need to loop through each file that are in the following subfolders:
/Testing
/Training
/Validation
This would be similar to the code below except it would loop through every file in those three subfolders (right now it loops through files 1 to 92, but now they are split up into these thry folders).
for i=1:92
str = sprintf('load data%i.mat', i);
eval(str);
Info.data=Info.data(:,[1,2,3,5,6,7,9,10,11]);
str = sprintf('save data%i.mat', i);
eval(str);
end
p1=pwd;
p2={'\Testing' '\Training' '\Validation'};
for i=1:length(p2)
cd([p1, p2{i}])
files = dir('*.mat');
for file = files'
load(file.name);
Info.data=Info.data(:,[9,10,11]);
save(file.name);
cd(p1);
end
end
I used to have Matlab and loaded all txt-files from directory "C:\folder\" into Matlab with the following code:
myFolder = 'C:\folder\';
filepattern = fullfile(myFolder, '*.txt');
files = dir(filepattern);
for i=1:length(files)
eval(['load ' myFolder,files(i).name ' -ascii']);
end
If C:\folder\ contains A.txt, B.txt, C.txt, I would then have matrices A, B and C in the workspace.
The code doesn't work in octave, maybe because of "fullfile"?. Anyway, with the following code I get matrices with the names C__folder_A, C__folder_B, C__folder_C. However, I need matrices called A, B, C.
myFolder = 'C:\folder\';
files = dir(myFolder);
for i=3:length(files)
eval(['load ' myFolder,files(i).name ' -ascii']);
end
Can you help me?
Thanks,
Martin
PS: The loop starts with 3 because files(1).name = . and files(2).name = ..
EDIT:
I have just found a solution. It's not elegant, but it works.
I just add the path in which the files are with "addpath", then I don't have to give the full name of the directory in the loop.
myFolder = 'C:\folder\';
addpath(myFolder)
files = dir(myFolder);
for i=3:length(files)
eval(['load ' files(i).name ' -ascii']);
end
It's usually bad design if you load files to variables which name is generated dynamically and you should load them to a cell array instead but this should work:
files = glob('C:\folder\*.txt')
for i=1:numel(files)
[~, name] = fileparts (files{i});
eval(sprintf('%s = load("%s", "-ascii");', name, files{i}));
endfor
The function scanFiles searches file names with extensions in the current dirrectory (initialPath) and subdirectories recursively. The parameter fileHandler is a function that you can use to process populated file structure (i.e. read text, load image, etc.)
Source
function scanFiles(initialPath, extensions, fileHandler)
persistent total = 0;
persistent depth = 0; depth++;
initialDir = dir(initialPath);
printf('Scanning the directory %s ...\n', initialPath);
for idx = 1 : length(initialDir)
curDir = initialDir(idx);
curPath = strcat(curDir.folder, '\', curDir.name);
if regexp(curDir.name, "(?!(\\.\\.?)).*") * curDir.isdir
scanFiles(curPath, extensions, fileHandler);
elseif regexp(curDir.name, cstrcat("\\.(?i:)(?:", extensions, ")$"))
total++;
file = struct("name",curDir.name,
"path",curPath,
"parent",regexp(curDir.folder,'[^\\\/]*$','match'),
"bytes",curDir.bytes);
fileHandler(file);
endif
end
if!(--depth)
printf('Total number of files:%d\n', total);
total=0;
endif
endfunction
Usage
# txt
# textFileHandlerFunc=#(file)fprintf('%s',fileread(file.path));
# scanFiles("E:\\Examples\\project\\", "txt", textFileHandlerFunc);
# images
# imageFileHandlerFunc=#(file)imread(file.path);
# scanFiles("E:\\Examples\\project\\datasets\\", "jpg|png", imageFileHandlerFunc);
# list files
fileHandlerFunc=#(file)fprintf('path=%s\nname=%s\nsize=%d bytes\nparent=%s\n\n',
file.path,file.name,file.bytes,file.parent);
scanFiles("E:\\Examples\\project\\", "txt", fileHandlerFunc);