Is there a Pythonic way to add a string to the start of each file in a directory made with the name of the file without its extension? - python-3.7

I have written a code to append a string which is made up of file name to the file with that file name, but it does not append just one line, but the name of all the files in that folder and the line gets added after the data in all the files. All I want is to append a string to the start of the file.
With my code, I am getting all the Three Lines printed in all the files, at the end of each file i.e.
previous data...
parent a A B C D
parent b A B C D
parent c A B C D
This is my code
import os
import glob
os.chdir("C://Users//folder_naming_test_python//")
files = os.listdir()
#print("files=" )
#print(files)
d = []
for k in os.listdir():
d.append( k.split('.')[0])
#print("names=")
#print(d)
prefix = 'parent '
postfix = ' A B C D'
Headers = list(map(lambda orig_string :prefix + orig_string + postfix, d))
#print("Headers = ")
#print(Headers)
array_len = len(Headers)
for file in files:
for i in range(array_len):
f = open(file, 'a+')
a = f.read()
f.seek(0)
f.write(Headers[i]+'\n')
f.close()
f = open(file, 'r')
print(f.read())
My input data example would say; 3 files in a folder with names
a.txt, b.txt, c.txt
what I expect is irrespective of the data in the files,
either
parent a A B C D or
parent b A B C D or
parent c A B C D
followed by the data in file.....
has to be printed on first line of each file respectively(Note. a, b and c strings have to go in individual files and not all together in all the files)

It can be done very easily using fstrings
import os
from pathlib import Path
for filename in os.listdir():
with open(filename, "r+") as f:
content = f.read()
f.seek(0, 0)
f.write(f"parent {Path(filename).stem} A B C D\n")
f.write(content)

Related

Trying to install a corpus for countVectorizer in sklearn package

I am trying to load a corpus from my local drive into python at one time with a for loop and then read each text file and save it for analysis with countVectorizer. But, I am only getting the last file. How do I get the results from all of the files to be stored for analysis with countVectorizer?
This code brings out the text from last file in folder.
folder_path = "folder"
#import and read all files in animal_corpus
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename, 'r') as f:
txt = f.read()
print(txt)
MyList= [txt]
## Create a CountVectorizer object that you can use
MyCV1 = CountVectorizer()
## Call your MyCV1 on the data
DTM1 = MyCV1.fit_transform(MyList)
## get col names
ColNames=MyCV1.get_feature_names()
print(ColNames)
## convert DTM to DF
MyDF1 = pd.DataFrame(DTM1.toarray(), columns=ColNames)
print(MyDF1)
This code works, but would not work for a huge corpus that I am preparing it for.
#import and read text files
f1 = open("folder/animal_1.txt",'r')
f1r = f1.read()
f2 = open("/folder/animal_2.txt",'r')
f2r = f2.read()
f3 = open("/folder/animal_3.txt",'r')
f3r = f3.read()
#reassemble corpus in python
MyCorpus=[f1r, f2r, f3r]
## Create a CountVectorizer object that you can use
MyCV1 = CountVectorizer()
## Call your MyCV1 on the data
DTM1 = MyCV1.fit_transform(MyCorpus)
## get col names
ColNames=MyCV1.get_feature_names()
print(ColNames)
## convert DTM to DF
MyDF2 = pd.DataFrame(DTM1.toarray(), columns=ColNames)
print(MyDF2)
I figured it out. Just gotta keep grinding.
MyCorpus=[]
#import and read all files in animal_corpus
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename, 'r') as f:
txt = f.read()
MyCorpus.append(txt)

MUMPS Address Validation

I am working on prerequisite questions for a class I am trying to attend. I am working on revisions to two pieces of code. I have completed one and I am stuck on this one. I am trying to read an abbreviated address line. In this case FL33606. I am able to read the address. But I am receiving an undefined error for the Quit command "Q: done". Would someone be able to assist me in identifying what is wrong?
N prompt,val, done
S prompt="Enter State and Zip (StateZip): "
F W !,prompt R val Q:val="" D Q:done
. I val'="?2A5N" W !,"Invalid entry" Q
. S done=1
I val="" q
W !,"Valid Entry: ",val
Q
I have two errors
done variable should be defined before the first read
the pattern should not be in quotes, where ? is the operator not =
S prompt="Enter State and Zip (StateZip): "
S done=0
F W !,prompt R val Q:val="" D Q:done
. I val'?2A5N W !,"Invalid entry" Q
. S done=1
I val="" q
W !,"Valid Entry: ",val
Q
Why do you use short commands, and dots?
Is not this much better readable?
Set prompt = "Enter State and Zip (StateZip): "
For {
Write !,prompt
Read val
Quit:val=""
Quit:val?2A5N
Write !,"Invalid entry"
}
If val="" Quit
Write !,"Valid Entry: ",val
Quit

Recursively delete files older than 2 years (with specific extension like .zip, .log etc)

I'm new to Python and want to write a script to recursively delete files in a directory which are older than 2 years and have a specific extension like .zip, .txt etc.
I know this isn't GitHub but: I spend quite some time trying to figure it out and I have to admit the answer isn't that obvious but I found it
eventually. I have no idea why I spent half an hour on this random program but I did.
Its lucky i'm using python 3.7 as well because I didn't see your tag on the bottom of the post. This Image is a demo of me running what is titled The Program
Features
- Deletes all files from directory and subdirectory
- Able to change the extension to whatever you want eg: txt, bat, png, jpg
- Lets you change the folder you want erased to what you want eg from your C drive to pictures
The Program
import glob,os,sys,re,datetime
os.chdir("C:\\Users\\") # ------> PLEASE CHANGE THIS TO PREVENT YOUR C DRIVE GETTING DESTROYED THIS IS JUST AN EXAMPLE
src = os.getcwd()#Scans src which must be set to the current working directory
cn = 0
filedate = '2019'
clrd = 0
def random_function_name():
print("No files match the given criteria!")
return;
def find(path, *exts):
dirs = [a[0] for a in os.walk(path)]
f_filter = [d+e for d in dirs for e in exts]
return [f for files in [glob.iglob(files) for files in f_filter] for f in files]
print(src)
my_files = find(src,'\*py', '\*txt') #you can also add parameters like '\*txt', '\*jpg' ect
for f in my_files:
cn += 1
if filedate in datetime.datetime.fromtimestamp(os.path.getctime(f)).strftime('%Y/%m/%d|%H:%M:%S'):
print(' | CREATED:',datetime.datetime.fromtimestamp(os.path.getctime(f)).strftime('%Y/%m/%d|%H:%M:%S'),'|', 'Folder:','[',os.path.basename(os.path.dirname(f)),']', 'File:', os.path.split(os.path.abspath(f))[1], ' Bytes:', os.stat(f).st_size)
clrd += os.stat(f).st_size
def delete():
if cn != 0:
x = str(input("Delete {} file(s)? >>> ".format(cn)))
if x.lower() == 'yes':
os.remove(f)
print("You have cleared {} bytes of data".format(clrd))
sys.exit()
if x.lower() == 'no':
print('Aborting...')
sys.exit()
if x != 'yes' or 'no':
if x != '':
print("type yes or no")
delete()
else: delete()
if cn == 0:
print(str("No files to delete."))
sys.exit()
delete()
if filedate not in datetime.datetime.fromtimestamp(os.path.getctime(f)).strftime('%Y/%m/%d|%H:%M:%S'):
sys.setrecursionlimit(2500)
random_function_name()
On its own
This is for applying it to your own code
import glob,os,sys,re,datetime
os.chdir('C:\\Users')
src = os.getcwd()
def find(path, *exts):
dirs = [a[0] for a in os.walk(path)]
f_filter = [d+e for d in dirs for e in exts]
return [f for files in [glob.iglob(files) for files in f_filter] for f in files]
my_files = find(src,'\*py', '\*txt') #to add extensions do \*extension
for f in my_files:
if filedate in datetime.datetime.fromtimestamp(os.path.getctime(f)).strftime('%Y/%m/%d|%H:%M:%S'):
os.remove(f)

Loop through all files in subfolders

I need to loop through each file that are in the following subfolders:
/Testing
/Training
/Validation
This would be similar to the code below except it would loop through every file in those three subfolders (right now it loops through files 1 to 92, but now they are split up into these thry folders).
for i=1:92
str = sprintf('load data%i.mat', i);
eval(str);
Info.data=Info.data(:,[1,2,3,5,6,7,9,10,11]);
str = sprintf('save data%i.mat', i);
eval(str);
end
p1=pwd;
p2={'\Testing' '\Training' '\Validation'};
for i=1:length(p2)
cd([p1, p2{i}])
files = dir('*.mat');
for file = files'
load(file.name);
Info.data=Info.data(:,[9,10,11]);
save(file.name);
cd(p1);
end
end

Octave: Load all files from specific directory

I used to have Matlab and loaded all txt-files from directory "C:\folder\" into Matlab with the following code:
myFolder = 'C:\folder\';
filepattern = fullfile(myFolder, '*.txt');
files = dir(filepattern);
for i=1:length(files)
eval(['load ' myFolder,files(i).name ' -ascii']);
end
If C:\folder\ contains A.txt, B.txt, C.txt, I would then have matrices A, B and C in the workspace.
The code doesn't work in octave, maybe because of "fullfile"?. Anyway, with the following code I get matrices with the names C__folder_A, C__folder_B, C__folder_C. However, I need matrices called A, B, C.
myFolder = 'C:\folder\';
files = dir(myFolder);
for i=3:length(files)
eval(['load ' myFolder,files(i).name ' -ascii']);
end
Can you help me?
Thanks,
Martin
PS: The loop starts with 3 because files(1).name = . and files(2).name = ..
EDIT:
I have just found a solution. It's not elegant, but it works.
I just add the path in which the files are with "addpath", then I don't have to give the full name of the directory in the loop.
myFolder = 'C:\folder\';
addpath(myFolder)
files = dir(myFolder);
for i=3:length(files)
eval(['load ' files(i).name ' -ascii']);
end
It's usually bad design if you load files to variables which name is generated dynamically and you should load them to a cell array instead but this should work:
files = glob('C:\folder\*.txt')
for i=1:numel(files)
[~, name] = fileparts (files{i});
eval(sprintf('%s = load("%s", "-ascii");', name, files{i}));
endfor
The function scanFiles searches file names with extensions in the current dirrectory (initialPath) and subdirectories recursively. The parameter fileHandler is a function that you can use to process populated file structure (i.e. read text, load image, etc.)
Source
function scanFiles(initialPath, extensions, fileHandler)
persistent total = 0;
persistent depth = 0; depth++;
initialDir = dir(initialPath);
printf('Scanning the directory %s ...\n', initialPath);
for idx = 1 : length(initialDir)
curDir = initialDir(idx);
curPath = strcat(curDir.folder, '\', curDir.name);
if regexp(curDir.name, "(?!(\\.\\.?)).*") * curDir.isdir
scanFiles(curPath, extensions, fileHandler);
elseif regexp(curDir.name, cstrcat("\\.(?i:)(?:", extensions, ")$"))
total++;
file = struct("name",curDir.name,
"path",curPath,
"parent",regexp(curDir.folder,'[^\\\/]*$','match'),
"bytes",curDir.bytes);
fileHandler(file);
endif
end
if!(--depth)
printf('Total number of files:%d\n', total);
total=0;
endif
endfunction
Usage
# txt
# textFileHandlerFunc=#(file)fprintf('%s',fileread(file.path));
# scanFiles("E:\\Examples\\project\\", "txt", textFileHandlerFunc);
# images
# imageFileHandlerFunc=#(file)imread(file.path);
# scanFiles("E:\\Examples\\project\\datasets\\", "jpg|png", imageFileHandlerFunc);
# list files
fileHandlerFunc=#(file)fprintf('path=%s\nname=%s\nsize=%d bytes\nparent=%s\n\n',
file.path,file.name,file.bytes,file.parent);
scanFiles("E:\\Examples\\project\\", "txt", fileHandlerFunc);