After reading .csv files in a directory, I want to save each of them into .html files using their original file names. But, my code below brings along the extension (.csv) from the original filenames.
For example,
Original files: File1.csv, File2.csv
Result files: File1.csv.html, File2.csv.html
I want to remove ".csv" from the new file names.
import pandas as pd
import glob, os
os.chdir(r"C:\Users\.....\...")
for counter, file in enumerate(glob.glob("*.csv")):
df = pd.read_csv(file, skipinitialspace=False, sep =',', engine='python')
df.to_file(os.path.basename(file) + ".html")
The code below removed ".csv" but also ".html"
df.to_file(os.path.basename(file + ".html").split('.')[0])
My expectation is:
File1.html, File2.html
EDIT:
Another post [How to get the filename without the extension from a path in Python? suggested how to list existing files without extensions. My issue, however, is to read existing files in a directory and save them using their original file names (excluding original extension) with new extension.
Related
I have a directory with several .txt files. I want to read all these files into a dataframe, but want to exclude one problematic file. Is there a way I can do this?
The files are named #100.1-YYYY1HH10MM.txt, #101.1-YYYY11HH20MM.txt, #102.1-YYYY9HH5MM.txt etc. You'll note that the file names are prefixed with an incremental number e.g. #100.1, #102.1 etc. If I want to read all these files except say file number #350.1, how can I do that? Not sure if I can use regex here.
from pyspark.sql.functions import *
filename = '/mnt/directory/*.txt' #Read all TXT files in the folder
filename = '/mnt/directory/#{1[0-4,7-9],[0,2-3][0-9]}.1.txt' #Try regex to filter out one file
if your lists are not too big, than using glob and looping can be a simple solution:
import glob
dont_want = ['#350.1']
files = []
for x in glob.glob("path/*.txt"):
for y in dont_want:
if y not in x: files.append(x)
df = spark.read.csv(mylist)
I hope one of you are willing to help a complete Python beginner.
I have managed to create my first script where I append multiple excel files in a folder into one merged file. So far so good!
But I also need the script to create an additional column and complete it with the last two characters of the filename from each file it appends.
My script looks like this for now:
import pandas as pd
import glob
# getting excel files to be merged from the Desktop
path = "C:\\Users\\123\\OneDrive\\Descriptions\\Translated"
# read all the files with extension .xlsx i.e. excel
filenames = glob.glob(path + "\*.xlsx")
print('File names:', filenames)
# empty data frame for the new output excel file with the merged excel files
outputxlsx = pd.DataFrame()
# for loop to iterate all excel files
for file in filenames:
# using concat for excel files
# after reading them with read_excel()
df = pd.concat(pd.read_excel(file, sheet_name=None), ignore_index=True, sort=False)
# appending data of excel files
outputxlsx = outputxlsx.append( df, ignore_index=True)
print('Final Excel sheet now generated at the same location:')
outputxlsx.to_excel("C:/Users/123/OneDrive/Descriptions/Translated/Merged.xlsx", index=False)
The files in the folder are named like this:
CZ, PL, TR_cs-CZ
CZ, PL, TR_pl-PL
CZ, PL, TR_tr-TR
So the last column should be:
CZ
PL
TR
Thank you!!
I am really a newbie in matlab programming. I have a problem in coding to import multiple csv files into one from certain folder:
This is my code:
%% Importing multiple CSV files
myDir = uigetdir; %gets directory
myFiles = dir(fullfile(myDir,'*.csv')); %gets all csv files in struct
for k = 1:length(myFiles)
data{k} = csvread(myFiles{k});
end
I use the code uigetdir in order to be able to select data from any folder, because I try to make an automation program so it would be flexible to use by others. The code that I run only look for the directory and shows the list, but not for merging the csv files into one and read it in "import data". I want it to be merged and read as one file.
My merged file should look like this with semicolon delimited and consist of 47 csv files merged together (this picture is one of the csv file I have):
my merged file
I have been working for it a whole day but I find always error code. Please help me :(. Thank you very much in advance for your help.
As the error message states, you're attempting to reference myFiles as a cell array when it is not. The output of dir is a structure, which cannot be indexed like a cell array.
You want to do something like the following:
for k = 1:numel(myFiles)
filepath = fullfile(myFiles(k).folder, myFiles(k).name);
data{k} = csvread(filepath);
end
I'm trying to process a list of files that start with the same string, but only the .mat files. In my folder I have log files with names such as:
CADS3P5Ph1_LKS_20141210_EVAL_103443_001.avi
CADS3P5Ph1_LKS_20141210_EVAL_103443_001_MeasData.mat
CADS3P5Ph1_LKS_20141210_EVAL_103443_002.avi
CADS3P5Ph1_LKS_20141210_EVAL_103443_002_MeasData.mat
CADS3P5Ph1_LKS_20141210_EVAL_103443_003.avi
CADS3P5Ph1_LKS_20141210_EVAL_103443_003_MeasData.mat
CADS3P5Ph1_LKS_20141210_EVAL_104236_001.avi
CADS3P5Ph1_LKS_20141210_EVAL_104236_001_MeasData.mat
I only need to process the files that have the same timestamp (e.g. 103443_xxx)
I made a variable looking with a wildcard
filename = CADS3P5Ph1_LKS_20141210_EVAL_103443_001_MeasData.mat
general_name = filename(1:end - 17);
general_name = strcat(general_name,'*','');
So when I do dir(general_name), it finds all the files that start with "CADS3P5Ph1_LKS_20141210_EVAL_103443",
How do I only get the .mat files, and not the .avi files
I tried
dir(general_name && *.mat)
Is there a way to make something like this work?
Thanks!
Using strcat with general_name and the wildcard character for .mat extensions should work:
dir(strcat(general_name,'*.mat'))
I have 4 folders in the same directory where each folder contains ~19 .xls files. I have written the code below to obtain the name of each of the folders and the name of each .xls file within the folders.
path='E:\Practice';
folder = path;
dirListing = dir(folder);
dirListing=dirListing(3:end);%first 2 are just pointers
for i=1:length(dirListing);
f{i} = fullfile(path, dirListing(i,1).name);%obtain the name of each folder
files{i}=dir(fullfile(f{i},'*.xls'));%find the .xls files
for j=1:length(files{1,i});
File_Name{1,i}{j,1}=files{1,i}(j,1).name;%find the name of each .xls file
end
end
Now I'm trying to import the data from excel into matlab by using xlsread. What I'm struggling with is knowing how to load the data into matlab within a loop where the excel files are in different directories (different folders).
This leaves me with a 1x4 cell named File_Name where each cell refers to a different folder located under 'path', and within each cell is then the name of the spreadsheets wanting to be imported. The size of the cells vary as the number of spreadsheets in each folder varies.
Any ideas?
thanks in advance
I'm not sure if I'm understanding your problem, but all you have to do is concatenate the strings that contain directory (f{}) and the file name. Modifying your code:
for i=1:length(dirListing);
f{i} = fullfile(path, dirListing(i,1).name);%obtain the name of each folder
files{i}=dir(fullfile(f{i},'*.xls'));%find the .xls files
for j=1:length(files{1,i});
File_Name{1,i}{j,1}=files{1,i}(j,1).name;%find the name of each .xls file
fullpath = [f{i} '/' File_Name{1,i}{j,1}];
disp(['Reading file: ' fullpath])
x = xlsread(fullpath);
end
end
This works on *nix systems. You may have to join the filenames with a '\' on Windows. I'll find a more elegant way and update this posting.
Edit: The command filesep gives the forward or backward slash, depending on your system. The following should give you the full path:
fullpath = [f{i} filesep File_Name{1,i}{j,1}];
Take a look at this helper function, written by a member of the matlab community.
It allows you to recursively search through directories to find files that match a certain pattern. This is a super handy function to use when looking to match files.
You should be able to find all your files in a single call to this function. Then you can loop through the results of the rdir function, loading the files one at a time into whatever data structure you want.