How to rename a file inside a zip file without extracting it using Matlab commands - matlab

I have a bunch of zip folders that I have to extract and read the data (stored in a unique file). The problem is some of these folders have two files by any kind of error (instead of 1) with the same name. When I use the Matlab command "unzip", one of the files is overwrited by the other. The problem is these two files are not the same: one of them has the information I need, and the other one is almost empty. So I would like to rename these two files to file_a and file_b, extract them, and once both are extracted, keep only the larger one.
Do you know if there is any way to rename files inside a zip?

I made a function which will modify the filenames inside the zip file so they can be uncompressed seemlessly.
The function locate the file names in the zip file and change the first letter of each file it encounter with a sequence "A, B, C, D, etc ...".
function differentiateFileNames(zipFilename)
%% get the filenames contained in the zip file
filenames = getZipFileNames(zipFilename) ;
nFiles = numel(filenames) ;
%% Find the positions of the file name fields
% read the full file as a string
str = fileread(zipFilename) ;
% if all filenames are identical, we only need to search for the first name
% in our list
idx = strfind( str , filenames{1} ) ;
%% group indices by physical file
% Each filename appears twice in the zip file:
% ex for 2 files: file1 ... file2 ... file1 ...file2
idx = reshape(idx,nFiles,2)-1 ;
%% Now modify each filename
% (replace the first character of each filename)
fid = fopen(zipFilename,'r+') ;
for k=1:nFiles
char2write = uint8('A'+(k-1)) ; % will be: A, B, C, D, ect ...
fseek(fid,idx(k,1),'bof') ;
fwrite(fid,char2write,'uint8') ;
fseek(fid,idx(k,2),'bof') ;
fwrite(fid,char2write,'uint8') ;
end
fclose(fid) ;
end
function filenames = getZipFileNames(zipFilename)
try
% Create a Java file of the ZIP filename.
zipJavaFile = java.io.File(zipFilename);
% Create a Java ZipFile and validate it.
zipFile = org.apache.tools.zip.ZipFile(zipJavaFile);
% Extract the entries from the ZipFile.
entries = zipFile.getEntries;
catch exception
if ~isempty(zipFile)
zipFile.close;
end
delete(cleanUpUrl);
error(message('MATLAB:unzip:invalidZipFile', zipFilename));
end
cleanUpObject = onCleanup(#()zipFile.close);
k = 0 ;
filenames = cell('') ;
while entries.hasMoreElements
k=k+1;
filenames{k,1} = char(entries.nextElement.getName) ;
end
zipFile.close
end
Be aware that this script assumes that all the files have a similar name in the zip file. When it locate the file names position it only check versus the first file name found.
The sub function getZipFileNames is just a rip off of parts of the unzip.m, with only the necessary content to be able to read the file names contained in the zip file.
For testing:
I made a zip file containing 2 files:
New Text Document1.txt
New Text Document2.txt
I modified the file names inside the zip file with a hex editor, in order to have:
New Text Document1.txt
New Text Document1.txt
so both files have the same name in the archive. If I try to unzip that file, as you described I only get one file in output (the last file overwrite the other).
If I run differentiateFileNames(zipFilename), then unzip the file, I get 2 files in the output directory:
Aew Text Document1.txt
Bew Text Document1.txt
I know it can look a bit cryptic, but it insures the files are diferentiated. If you want, as an exercise, it wouldn't take much to extend the script to directly unzip the files, find out the largest one, delete the other, then rename the file left with the proper original name.

Related

Converting multiple .txt files to .mat in the same folder

I have many .txt files that contain n rows and 7 columns each delimited with whitespace. I want to convert each file to .mat file and save that in the same folder.
I tried this but it's not working:
files = dir('*.txt');
for file = files'
data=importdata(file.name);
save(file.name, 'data');
end
While this works for a single file, i want to do it programmably since the number of .txt files i have is very large:
data=importdata('myfile.txt');
save('myfile', 'data');
Thank you for your help
This should work
files = dir('*.txt');
for idx = 1:length(files)
file_name = files(idx).name;
fprintf("Processing File %s\n",file_name);
data=importdata(file_name);
[filepath,name,ext] = fileparts(fullfile(pwd,file_name));
save([name '.mat'],'data');
end
dir creates a stucture which you need to index through so we create the for loop to start at 1 and keep going until all the elements of dir have been processed.
Note in the code, I've also added a section to split the file name (e.g file1.txt) in to the file name and extension. This is so we only use the name part and not the extension when creating the mat file.
#scotty3785's answers worked well and also this worked for me in case somebody needs it:
files = dir('*.txt');
for i=1:length(files)
data=importdata(files(i).name);
save(erase(files(i).name,".txt"), 'data');
end

Loop through .fig files and group them into folders based on the file name

I have a lot of .fig files that are named like this: 20160922_01_id_32509055.fig, 20160921_02_id_53109418.fig and so on.
So I thought that I create a script that loop through all the .fig files in the folder and group(copy) them into another folder(s) based on the last number in the file name. The folder is created based on the id number. Is this possible?
I have been looking on other solutions involving looping through folders but I am totally fresh. This would make it easier for me to check the .fig files while I am learning to do other stuff in Matlab.
All is possible with MATLAB! We can use dir to get all .fig files, then use regexp to get the numeric part of each filename and then use copyfile to copy the file to it's new home. If you want to move it instead, you can use movefile instead .
% Define where the files are now and where you want them.
srcdir = '/my/input/directory';
outdir = '/my/output/directory';
% Find all .fig files in the source directory
figfiles = dir(fullfile(srcdir, '*.fig'));
figfiles = {figfiles.name};
for k = 1:numel(figfiles)
% Extract the last numeric part from the filename
numpart = regexp(figfiles{k}, '(?<=id_)\d+', 'match', 'once');
% Determine the folder we are going to put it in
destination = fullfile(outdir, numpart);
% Make sure the folder exists
if ~exist(destination, 'dir')
mkdir(destination)
end
% Copy the file there!
copyfile(fullfile(srcdir, figfiles{k}), destination)
end
Here's an example how to identify and copy the files. I'll let you do the for loop :)
>> Figs = dir('*.fig'); % I had two .fig files on my desktop
>> Basename = strsplit(Figs(1).name, '.');
>> Id = strsplit(Basename{1}, '_');
>> Id = Id{3};
>> mkdir(fullfile('./',Id));
>> copyfile(Figs(1).name, fullfile('./',Id));
Play with the commands to see what they do. It should be straightforward :)

Matlab:renaming Files in a Sequential Order

I have a number of text files with no Sequential Order :
010010.txt 010030.txt 010070.txt
How could I change the file names to:
text01.txt text02.txt ....
Is it possible not to re writte over the old directory but create a new directory
I have used the following script but the result is that it is working fine but it goes from text001.txt to text021.txt to then text041.txt
any idea?
directory = 'C:\test\'; %//' Directory with txt files
filePattern = fullfile(directory, '*.txt'); %//' files pattern with absolute paths
old_filename = cellstr(ls(filePattern)) %// Get the filenames
file_ID = strrep(strrep(old_filename,'file',''),'.txt','') %// Get numbers associated with each file
file_ID_doublearr = str2double(file_ID)
file_ID_doublearr = file_ID_doublearr - min(file_ID_doublearr)+1
file_ID = strtrim(cellstr(num2str(file_ID_doublearr)))
str_zeros = arrayfun(#(t) repmat('0',1,t), 4-cellfun(#numel,file_ID),'uni',0) %// Get zeros string to be pre-appended to each filename
new_filename = strcat('file',str_zeros,file_ID,'.txt') %// Generate new filenames
cellfun(#(m1,m2) movefile(m1,m2),fullfile(directory,old_filename),fullfile(directory,new_filename)) %// Finally rename files with the absolute paths
That looks pretty complicated. I would simply make a system call to move all of the files to a new directory, then sequentially rename each file one at a time with additional system calls. It also looks like you're using Windows, so I'll provide a solution for that platform. You have the beginning right where you are reading in the files from a source directory.
directory = 'C:\test\'; %// Directory with txt files
directoryToCopyOver = 'C:\out\'; %// Directory where you want to copy the files over
%// Copy source directory to target directory
system(['xcopy ' directory ' ' directoryToCopyOver]);
filePattern = fullfile(directoryToCopyOver, '*.txt'); %//' files pattern with absolute paths
names = dir(filePattern); %// Find all files with above pattern
%// For each file we have...
for idx = 1 : numel(names)
name = names(idx).name; %// Get a name of a file
%// Rename this file to textxx.txt
outName = sprintf('text%2.2d.txt', idx);
%// Call system and rename the file
system(['ren ' directoryToCopyOver name ' ' directoryToCopyOver outName]);
end
Some important things to note is that I use system to make system calls to your Windows command prompt. I use xcopy to copy a whole directory from one point to another. In this case, this would be your source directory over to a new target directory. After I do this, I invoke MATLAB's dir to determine all of the file names that match the particular pattern you have laid out, which is all of the text files.
Then, for each text file name we have, we read in this name, then create an output name of type textxx.txt, where xx is a number starting from 1 to as many text files as we have, and then I invoke the Windows command prompt command ren to rename the file from the original name to the new name. Also, take a look at sprintf from MATLAB. It is designed to create strings using formatting delimiters. If you see how I called it, %2.2d means that I am expecting the number to be two digits long, and should the number be less than two digits, fill the spaces with a zero. If you want to increase the amount of digits, simply add more to each place. For example, if you want to have 4 digits, do %4.4d, and so on. This will properly create the right string so that we can rename the right file in this new directory.
Hope this helps!

possible to extract files according filename listed in a text file by using matlab?

i have thousand files in a folder, however, i only need to extract out hundred files from the folder according to the filename listed in a text file into new folder. The filenames in text file is listed as a column..is that possible to be run by using matlab?what is the code shall i need to write? Thanks.
example:
filenames.txt is in the C:\matlab
folder include thousand files is named as BigFiles also in C:\matlab
files to be extracted from BigFiles folder is listed in column as below:
filenames.txt
a1sndh
sd3rfe
rgd4de
sd5erw
please advise...thanks...
Enumerate all files in a folder of a specific type (if needed) using:
%main directory to process
directory = 'to_process';
%enumerate all files (.m in this case)
files = dir(fullfile(directory,'*.m'));
numfiles = length(files);
fprintf('Found %i files\n',numfiles)
Then you could load the single column using one of the many file I/O functions in Matlab.
Then just loop through all the input names and check it's name against all the read in files (files{i}.name), and if so, move it.
EDIT:
From what I understood, you are looking for a solution along the lines:
filenames.txt
a.txt
b.txt
c.txt
.
.
.
moveMyFiles.m
%# read filenames listed in a text file
fid = fopen('C:\matlab\filenames.txt');
fList = textscan(fid, '%s');
fList = fList{1};
fclose(fid);
%# source/destination folder names
sourceDir = 'C:\matlab\BigFiles';
destDir = 'C:\matlab\out';
if ~exist(destDir,'dir')
mkdir(destDir);
end
%# move files one by one
for i=1:numel(fList)
movefile(fullfile(sourceDir,fList{i}), fullfile(destDir,fList{i}));
end
You can replace the MOVEFILE function by COPYFILE if you simply want to copy the files instead of moving them...

MATLAB - read files from directory?

I wish to read files from a directory and iteratively perform an operation on each file. This operation does not require altering the file.
I understand that I should use a for loop for this. Thus far I have tried:
FILES = ls('path\to\folder');
for i = 1:size(FILES, 1);
STRU = pdbread(FILES{i});
end
The error returned here suggests to me, a novice, that listing a directory with ls() does not assign the contents to a data structure.
Secondly I tried creating a file containing on each line a path to a file, e.g.,
C:\Documents and Settings\My Documents\MATLAB\asd.pdb
C:\Documents and Settings\My Documents\MATLAB\asd.pdb
I then read this file using the following code:
fid = fopen('paths_to_files.txt');
FILES = textscan(fid, '%s');
FILES = FILES{1};
fclose(fid);
This code reads the file but creates a newline where a space exists in the pathway, i.e.
'C:\Documents'
'and'
'Setting\My'
'Documents\MATLAB\asd.pdb'
Ultimately, I then intended to use the for loop
for i = 1:size(FILES, 1)
PDB = pdbread(char(FILES{i}));
to read each file but pdbread() throws an error proclaiming that the file is of incorrect format or does not exist.
Is this due to the newline separation of paths when the pathway file is read in?
Any help or suggestions greatly apppreciated.
Thanks,
S :-)
First Get a list of all files matching your criteria:
( in this case pdb files in C:\My Documents\MATLAB )
matfiles = dir(fullfile('C:', 'My Documents', 'MATLAB', '*.pdb'))
Then read in a file as follows:
( Here i can vary from 1 to the number of files )
data = load(matfiles(i).name)
Repeat this until you have read all your files.
A simpler alternative if you can rename your files is as follows:-
First save the reqd. files as 1.pdb, 2.pdb, 3.pdb,... etc.
Then the code to read them iteratively in Matlab is as follows:
for i = 1:n
str = strcat('C:\My Documents\MATLAB', int2str(i),'.pdb');
data = load(matfiles(i).name);
% use our logic here
% before proceeding to the next file
end
I copy this from yahoo answers! It worked for me
% copy-paste the following into your command window or your function
% first, you have to find the folder
folder = uigetdir; % check the help for uigetdir to see how to specify a starting path, which makes your life easier
% get the names of all files. dirListing is a struct array.
dirListing = dir(folder);
% loop through the files and open. Note that dir also lists the directories, so you have to check for them.
for d = 1:length(dirListing)
if ~dirListing(1).isdir
fileName = fullfile(folder,dirListing(d).name); % use full path because the folder may not be the active path
% open your file here
fopen(fileName)
% do something
end % if-clause
end % for-loop