How to process the data from a table.txt file from a series of folders and save the output in the same folder using Matlab? - matlab

Could you please help me to read the data from a table.txt in a series of subfolders from a directory? In all the subfolders, the output to read has the same name, 'table.txt'. I want to process the data and save the output in the same folder.
I can process it using the following code.
a = readmatrix('table.txt');
a4 = a(:,4);
a4 = a4 - mean(a4);
N = 2^(nextpow2(length(a4)));
freq = (abs(fftshift(fft(a4,N))));
t=[0:1e-12:20e-9].';
ts=t(2)-t(1);
F = ((-N/2:N/2-1)/N)*(1/ts);
fmr=[(F(N/2+1:end)/1e9)' freq(N/2+1:end)];
writematrix(fmr, 'fmr.csv');
cd folder
But how to perform the same action on all the subfolders?
Could somebody please help me out?

You can use the "find files in subfolders" behaviour of dir. Something like this:
allTables = dir('**/table.txt');
for ii = 1:numel(allTables)
thisFolder = allTables(ii).folder;
inFile = fullfile(thisFolder, allTables(ii).name);
a = readmatrix(inFile);
% do stuff ...
fmr = ...
outFile = fullfile(thisFolder, 'fmr.csv');
writematrix(fmr, outFile);
end

Related

Trying to install a corpus for countVectorizer in sklearn package

I am trying to load a corpus from my local drive into python at one time with a for loop and then read each text file and save it for analysis with countVectorizer. But, I am only getting the last file. How do I get the results from all of the files to be stored for analysis with countVectorizer?
This code brings out the text from last file in folder.
folder_path = "folder"
#import and read all files in animal_corpus
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename, 'r') as f:
txt = f.read()
print(txt)
MyList= [txt]
## Create a CountVectorizer object that you can use
MyCV1 = CountVectorizer()
## Call your MyCV1 on the data
DTM1 = MyCV1.fit_transform(MyList)
## get col names
ColNames=MyCV1.get_feature_names()
print(ColNames)
## convert DTM to DF
MyDF1 = pd.DataFrame(DTM1.toarray(), columns=ColNames)
print(MyDF1)
This code works, but would not work for a huge corpus that I am preparing it for.
#import and read text files
f1 = open("folder/animal_1.txt",'r')
f1r = f1.read()
f2 = open("/folder/animal_2.txt",'r')
f2r = f2.read()
f3 = open("/folder/animal_3.txt",'r')
f3r = f3.read()
#reassemble corpus in python
MyCorpus=[f1r, f2r, f3r]
## Create a CountVectorizer object that you can use
MyCV1 = CountVectorizer()
## Call your MyCV1 on the data
DTM1 = MyCV1.fit_transform(MyCorpus)
## get col names
ColNames=MyCV1.get_feature_names()
print(ColNames)
## convert DTM to DF
MyDF2 = pd.DataFrame(DTM1.toarray(), columns=ColNames)
print(MyDF2)
I figured it out. Just gotta keep grinding.
MyCorpus=[]
#import and read all files in animal_corpus
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename, 'r') as f:
txt = f.read()
MyCorpus.append(txt)

Recursively delete files older than 2 years (with specific extension like .zip, .log etc)

I'm new to Python and want to write a script to recursively delete files in a directory which are older than 2 years and have a specific extension like .zip, .txt etc.
I know this isn't GitHub but: I spend quite some time trying to figure it out and I have to admit the answer isn't that obvious but I found it
eventually. I have no idea why I spent half an hour on this random program but I did.
Its lucky i'm using python 3.7 as well because I didn't see your tag on the bottom of the post. This Image is a demo of me running what is titled The Program
Features
- Deletes all files from directory and subdirectory
- Able to change the extension to whatever you want eg: txt, bat, png, jpg
- Lets you change the folder you want erased to what you want eg from your C drive to pictures
The Program
import glob,os,sys,re,datetime
os.chdir("C:\\Users\\") # ------> PLEASE CHANGE THIS TO PREVENT YOUR C DRIVE GETTING DESTROYED THIS IS JUST AN EXAMPLE
src = os.getcwd()#Scans src which must be set to the current working directory
cn = 0
filedate = '2019'
clrd = 0
def random_function_name():
print("No files match the given criteria!")
return;
def find(path, *exts):
dirs = [a[0] for a in os.walk(path)]
f_filter = [d+e for d in dirs for e in exts]
return [f for files in [glob.iglob(files) for files in f_filter] for f in files]
print(src)
my_files = find(src,'\*py', '\*txt') #you can also add parameters like '\*txt', '\*jpg' ect
for f in my_files:
cn += 1
if filedate in datetime.datetime.fromtimestamp(os.path.getctime(f)).strftime('%Y/%m/%d|%H:%M:%S'):
print(' | CREATED:',datetime.datetime.fromtimestamp(os.path.getctime(f)).strftime('%Y/%m/%d|%H:%M:%S'),'|', 'Folder:','[',os.path.basename(os.path.dirname(f)),']', 'File:', os.path.split(os.path.abspath(f))[1], ' Bytes:', os.stat(f).st_size)
clrd += os.stat(f).st_size
def delete():
if cn != 0:
x = str(input("Delete {} file(s)? >>> ".format(cn)))
if x.lower() == 'yes':
os.remove(f)
print("You have cleared {} bytes of data".format(clrd))
sys.exit()
if x.lower() == 'no':
print('Aborting...')
sys.exit()
if x != 'yes' or 'no':
if x != '':
print("type yes or no")
delete()
else: delete()
if cn == 0:
print(str("No files to delete."))
sys.exit()
delete()
if filedate not in datetime.datetime.fromtimestamp(os.path.getctime(f)).strftime('%Y/%m/%d|%H:%M:%S'):
sys.setrecursionlimit(2500)
random_function_name()
On its own
This is for applying it to your own code
import glob,os,sys,re,datetime
os.chdir('C:\\Users')
src = os.getcwd()
def find(path, *exts):
dirs = [a[0] for a in os.walk(path)]
f_filter = [d+e for d in dirs for e in exts]
return [f for files in [glob.iglob(files) for files in f_filter] for f in files]
my_files = find(src,'\*py', '\*txt') #to add extensions do \*extension
for f in my_files:
if filedate in datetime.datetime.fromtimestamp(os.path.getctime(f)).strftime('%Y/%m/%d|%H:%M:%S'):
os.remove(f)

Loop through all files in subfolders

I need to loop through each file that are in the following subfolders:
/Testing
/Training
/Validation
This would be similar to the code below except it would loop through every file in those three subfolders (right now it loops through files 1 to 92, but now they are split up into these thry folders).
for i=1:92
str = sprintf('load data%i.mat', i);
eval(str);
Info.data=Info.data(:,[1,2,3,5,6,7,9,10,11]);
str = sprintf('save data%i.mat', i);
eval(str);
end
p1=pwd;
p2={'\Testing' '\Training' '\Validation'};
for i=1:length(p2)
cd([p1, p2{i}])
files = dir('*.mat');
for file = files'
load(file.name);
Info.data=Info.data(:,[9,10,11]);
save(file.name);
cd(p1);
end
end

Octave: Load all files from specific directory

I used to have Matlab and loaded all txt-files from directory "C:\folder\" into Matlab with the following code:
myFolder = 'C:\folder\';
filepattern = fullfile(myFolder, '*.txt');
files = dir(filepattern);
for i=1:length(files)
eval(['load ' myFolder,files(i).name ' -ascii']);
end
If C:\folder\ contains A.txt, B.txt, C.txt, I would then have matrices A, B and C in the workspace.
The code doesn't work in octave, maybe because of "fullfile"?. Anyway, with the following code I get matrices with the names C__folder_A, C__folder_B, C__folder_C. However, I need matrices called A, B, C.
myFolder = 'C:\folder\';
files = dir(myFolder);
for i=3:length(files)
eval(['load ' myFolder,files(i).name ' -ascii']);
end
Can you help me?
Thanks,
Martin
PS: The loop starts with 3 because files(1).name = . and files(2).name = ..
EDIT:
I have just found a solution. It's not elegant, but it works.
I just add the path in which the files are with "addpath", then I don't have to give the full name of the directory in the loop.
myFolder = 'C:\folder\';
addpath(myFolder)
files = dir(myFolder);
for i=3:length(files)
eval(['load ' files(i).name ' -ascii']);
end
It's usually bad design if you load files to variables which name is generated dynamically and you should load them to a cell array instead but this should work:
files = glob('C:\folder\*.txt')
for i=1:numel(files)
[~, name] = fileparts (files{i});
eval(sprintf('%s = load("%s", "-ascii");', name, files{i}));
endfor
The function scanFiles searches file names with extensions in the current dirrectory (initialPath) and subdirectories recursively. The parameter fileHandler is a function that you can use to process populated file structure (i.e. read text, load image, etc.)
Source
function scanFiles(initialPath, extensions, fileHandler)
persistent total = 0;
persistent depth = 0; depth++;
initialDir = dir(initialPath);
printf('Scanning the directory %s ...\n', initialPath);
for idx = 1 : length(initialDir)
curDir = initialDir(idx);
curPath = strcat(curDir.folder, '\', curDir.name);
if regexp(curDir.name, "(?!(\\.\\.?)).*") * curDir.isdir
scanFiles(curPath, extensions, fileHandler);
elseif regexp(curDir.name, cstrcat("\\.(?i:)(?:", extensions, ")$"))
total++;
file = struct("name",curDir.name,
"path",curPath,
"parent",regexp(curDir.folder,'[^\\\/]*$','match'),
"bytes",curDir.bytes);
fileHandler(file);
endif
end
if!(--depth)
printf('Total number of files:%d\n', total);
total=0;
endif
endfunction
Usage
# txt
# textFileHandlerFunc=#(file)fprintf('%s',fileread(file.path));
# scanFiles("E:\\Examples\\project\\", "txt", textFileHandlerFunc);
# images
# imageFileHandlerFunc=#(file)imread(file.path);
# scanFiles("E:\\Examples\\project\\datasets\\", "jpg|png", imageFileHandlerFunc);
# list files
fileHandlerFunc=#(file)fprintf('path=%s\nname=%s\nsize=%d bytes\nparent=%s\n\n',
file.path,file.name,file.bytes,file.parent);
scanFiles("E:\\Examples\\project\\", "txt", fileHandlerFunc);

Selecting Randon files from a folder tree

I have this folder organization
root/folder_1/file1_1 --up to-- file_5693
root/folder_2/file2_1 --up to-- file_100
root/folder_3/file3_1 --up to-- file_600
root/folder_4/file4_1 --up to-- file_689
I'd like to select a number (1000 example) of random files in each folder and put all them together in an output folder but, for folders with less than 200 files I'd like to copy all files.
root_2/output:
file1_350
.
.
.
file2_1 --> file2_100
.
.
.
etc
how can I do this ?
I tried to list all folder names in the directory with dir command but the folder numbers are not sequential. Any help ?
I might misunderstand but I do not see a reason for ordering the folder names since you will copy them anyway.
The following is the script for copying files inside folders which is again is in the root directory.
You can just change the following four variables ROOT_DIR, OUT_DIR, THRESHOLD_COPY and N_RANDOM_COPY.
% Define
ROOT_DIR = './'; % where the subdirectories are located
OUT_DIR = './root2'; % copy destination
THRESHOLD_COPY = 200; % threshold for copying all files
N_RANDOM_COPY = 100; % number of files that you want to copy
dirList = dir(ROOT_DIR);
dirList = dirList(3:end); % first two are ./ and ../
dirOnlyIndicators = cell2mat({dirList.isdir});
dirs = dirList(dirOnlyIndicators);
for dirIterator = transpose(dirs)
subdirList = dir([ ROOT_DIR dirIterator.name]);
fileIndicators = ~cell2mat({subdirList.isdir});
subfileList = {subdirList(fileIndicators)};
nFiles = sum(fileIndicators);
copyIndices = [];
if nFiles > THRESHOLD_COPY
copyIndices = randperm(nFiles);
copyIndices = copyIndices(1:N_RANDOM_COPY);
else
copyIndices = 1:nFiles;
end
for copyIndex = copyIndices
copyfile([ ROOT_DIR dirIterator.name '/' subfileList{copyIndex}.name],...
[OUT_DIR '/' subfileList{copyIndex}.name],...
'f');
end
end