I wish to use Perl module IO::Uncompress::AnyUncompress, which is documented here : http://perldoc.perl.org/IO/Uncompress/AnyUncompress.html.
However, this documentation seems to elude the fact that a compressed archive (.zip, .7z) contains a tree of compressed files. I would like to extract only a single file from the archive and not the full archive, for example :
my $archivename = 'archive.7z';
my $filetoextract = './bin/file.lib';
my $archive = new IO::Uncompress::AnyUncompress($archivename);
my $filecontent = $archive->extract($filetoextract);
However, the API does not seem to have such an extract() fonction, neither a function that would return the list of files contained in the archive.
Have I missed something ?
IO::Uncompress::AnyUncompress only deals with a single compressed byte stream. You'll need a module like Archive::Any, Archive::Any::Lite, or Archive::Libarchive::XS.
Related
The goal of my code is to look into a certain folder and create a new text file with a list of names of all the files that aren't empty in that folder written to a new file, and the list of names of all the empty files (no text) into another folder. My current code is only able to create a new text file with a list of names of all the files (regardless of its content) written to a new file. I want to know how to set up if statement regarding the content of the file (array).
function ListFile
dirName = '';
files = dir(fullfile(dirName,'*.txt'));
files = {files.name};
[fid,msg] = fopen(sprintf('output.txt'),'w+t');
assert(fid>=0,msg)
fprintf(fid,'%s\n',files{:});
fclose(fid);
EDIT: The linked solution in Stewie Griffin's comment is way better. Use this!
A simple approach would be to iterate all files, open them, and check their content. Caveat: If you have large files, this approach might be memory intensive.
A possible code for that could look like this:
function ListFile
dirName = '';
files = dir(fullfile(dirName, '*.txt'));
files = {files.name};
fidEmpty = fopen(sprintf('output_empty_files.txt'), 'w+t');
fidNonempty = fopen(sprintf('output_nonempty_files.txt'), 'w+t');
for iFile = 1:numel(files)
content = fileread(files{iFile})
if (isempty(content))
fprintf(fidEmpty, '%s\n', files{iFile});
else
fprintf(fidNonempty, '%s\n', files{iFile});
end
end
fclose(fidEmpty);
fclose(fidNonempty);
I have two non-empty files nonempty1.txt and nonempty2.txt as well as two empty files empty1.txt and empty2.txt. Running this code, I get the following outputs.
Debugging output from fileread:
content =
content =
content = Test
content = Another test
Content of output_empty_files.txt:
empty1.txt
empty2.txt
Content of output_nonempty_files.txt:
nonempty1.txt
nonempty2.txt
Matlab isn't really the optimal tool for this task (although it is capable). To generate the files you're looking for, a command line tool would be much more efficient.
For example, using GNU find you could do
find . -type f -not -empty -ls > notemptyfiles.txt
find . -type f -empty -ls > emptyfiles.txt
to create the text files you desire. Here's a link for doing something comparable using the windows command line. You could also call these functions from within Matlab if you want to using the system command. This would be much faster than iterating over the files from within Matlab.
I have a perl script that reads a .txt and a .bam file, and creates an output called output.txt.
I have a lot of files that are all in different folders, but are only slightly different in the filename and directory path.
All of my txt files are in different subfolders called PointMutation, with the full path being
/Volumes/Lab/Data/Darwin/Patient/[Plate 1/P1H10]/PointMutation
The text(s) in the bracket is the part that changes, But the Patient subfolder contains all of my txt files.
My .bam file is located in a subfolder named DNA with a full path of
/Volumes/Lab/Data/Darwin/Patient/[Plate 1/P1H10]/SequencingData/DNA
Currently how I run this script is go on the terminal
cd /Volumes/Lab/Data/Darwin/Patient/[Plate 1/P1H10]/PointMutation
perl ~/Desktop/Scripts/Perl.pl "/Volumes/Lab/Data/Darwin/Patient/[Plate
1/P1H10]/PointMutation/txtfile.txt" "/Volumes/Lab/Data/Darwin/Patient/[Plate
1/P1H10]/SequencingData/DNA/bamfile.bam"
With only 1 or two files, that is fairly easy, but I would like to automate it once the files get much larger. Also once I run these once, I don't want to do it again, but I will get more information from the same patient, is there a way to block a folder from being read?
I would do something like:
for my $dir (glob "/Volumes/Lab/Data/Darwin/Patient/*/"){
# skip if not a directory
if (! -d $dir) {
next;
}
my $txt = "$dir/PointMutation/txtfile.txt";
my $bam = "$dir/SequencingData/DNA/bamfile.bam";
# ... you magical stuff here
}
This is assuming that all directories under /Volumes/Lab/Data/Darwin/Patient/ follow the convention.
That said, more long term/robust way of organizing analyses with lots of different files all over the place is either 1) organize all files necessary for each analysis under one directory, or 2) to create meta files (i'd use JSON/yaml) which contain the necessary file names.
I want to read multiple files from a folder but this code does not work properly:
direction=dir('data');
for i=3:length(direction)
Fold_name=strcat('data\',direction(i).name);
filename = fullfile(Fold_name);
fileid= fopen(filename);
data = fread (fileid)';
end
I modified your algorithm to make it easier
Just use this form :
folder='address\datafolder\' ( provide your folder address where data is located)
then:
filenames=dir([folder,'*.txt']); ( whatever your data format is, you can specify it in case you have other files you do not want to import, in this example, i used .txt format files)
for k = 1 : numel(filenames)
Do your code
end
It should work. It's a much more efficient method, as it can apply to any folder without you worrying about names, number order etc... Unless you want to specify certain files with the same format within the folder. I would recommend you to use a separate folder to put your files in.
In case of getting access to all the files after reading:
direction=dir('data');
for i=3:length(direction)
Fold_name=strcat('data\',direction(i).name);
filename = fullfile(Fold_name);
fileid(i)= fopen(filename);
data{i-2} = fread (fileid(i))';
end
I'm trying to process a list of files that start with the same string, but only the .mat files. In my folder I have log files with names such as:
CADS3P5Ph1_LKS_20141210_EVAL_103443_001.avi
CADS3P5Ph1_LKS_20141210_EVAL_103443_001_MeasData.mat
CADS3P5Ph1_LKS_20141210_EVAL_103443_002.avi
CADS3P5Ph1_LKS_20141210_EVAL_103443_002_MeasData.mat
CADS3P5Ph1_LKS_20141210_EVAL_103443_003.avi
CADS3P5Ph1_LKS_20141210_EVAL_103443_003_MeasData.mat
CADS3P5Ph1_LKS_20141210_EVAL_104236_001.avi
CADS3P5Ph1_LKS_20141210_EVAL_104236_001_MeasData.mat
I only need to process the files that have the same timestamp (e.g. 103443_xxx)
I made a variable looking with a wildcard
filename = CADS3P5Ph1_LKS_20141210_EVAL_103443_001_MeasData.mat
general_name = filename(1:end - 17);
general_name = strcat(general_name,'*','');
So when I do dir(general_name), it finds all the files that start with "CADS3P5Ph1_LKS_20141210_EVAL_103443",
How do I only get the .mat files, and not the .avi files
I tried
dir(general_name && *.mat)
Is there a way to make something like this work?
Thanks!
Using strcat with general_name and the wildcard character for .mat extensions should work:
dir(strcat(general_name,'*.mat'))
I have 4 folders in the same directory where each folder contains ~19 .xls files. I have written the code below to obtain the name of each of the folders and the name of each .xls file within the folders.
path='E:\Practice';
folder = path;
dirListing = dir(folder);
dirListing=dirListing(3:end);%first 2 are just pointers
for i=1:length(dirListing);
f{i} = fullfile(path, dirListing(i,1).name);%obtain the name of each folder
files{i}=dir(fullfile(f{i},'*.xls'));%find the .xls files
for j=1:length(files{1,i});
File_Name{1,i}{j,1}=files{1,i}(j,1).name;%find the name of each .xls file
end
end
Now I'm trying to import the data from excel into matlab by using xlsread. What I'm struggling with is knowing how to load the data into matlab within a loop where the excel files are in different directories (different folders).
This leaves me with a 1x4 cell named File_Name where each cell refers to a different folder located under 'path', and within each cell is then the name of the spreadsheets wanting to be imported. The size of the cells vary as the number of spreadsheets in each folder varies.
Any ideas?
thanks in advance
I'm not sure if I'm understanding your problem, but all you have to do is concatenate the strings that contain directory (f{}) and the file name. Modifying your code:
for i=1:length(dirListing);
f{i} = fullfile(path, dirListing(i,1).name);%obtain the name of each folder
files{i}=dir(fullfile(f{i},'*.xls'));%find the .xls files
for j=1:length(files{1,i});
File_Name{1,i}{j,1}=files{1,i}(j,1).name;%find the name of each .xls file
fullpath = [f{i} '/' File_Name{1,i}{j,1}];
disp(['Reading file: ' fullpath])
x = xlsread(fullpath);
end
end
This works on *nix systems. You may have to join the filenames with a '\' on Windows. I'll find a more elegant way and update this posting.
Edit: The command filesep gives the forward or backward slash, depending on your system. The following should give you the full path:
fullpath = [f{i} filesep File_Name{1,i}{j,1}];
Take a look at this helper function, written by a member of the matlab community.
It allows you to recursively search through directories to find files that match a certain pattern. This is a super handy function to use when looking to match files.
You should be able to find all your files in a single call to this function. Then you can loop through the results of the rdir function, loading the files one at a time into whatever data structure you want.