Perl - concatenate files with similar names pattern and write concatenated file names to a list - perl

I have a directory with multiple sub-directories in it and each subdir has a fixed set of files - one for each category like -
1)Main_dir
1.1) Subdir1 with files
- Test.1.age.txt
- Test.1.name.txt
- Test.1.place.csv
..........
1.2) Subdir2 with files
- Test.2.age.txt
- Test.2.name.txt
- Test.2.place.csv
.........
there are around 20 folders with 10 files in them. I need to first concatenate files under each category like Test.1.age.txt and Test.2.age.txt into a combined.age.txt file and once I do all concatenation I want to printout these filenames in a new Final_list.txt file like
./Main_dir/Combined.age.txt
./Main_dir/Combined.name.txt
I am able to read all the files from all subdirs in an array, but i am not sure how to do pattern search for the similar files names. Also, will be able to figure out this printout part of the code. Can anyone please share on how to do this pattern search for concatenation? My code so far :
use warnings;
use strict;
use File::Spec;
use Data::Dumper;
use File::Basename;
foreach my $file (#files) {
print "$file\n";
}
my $testdir = './Main_dir';
my #Comp_list = glob("$testdir/test_dir*/*.txt");
I am trying to do the pattern search on the array contents in the #Comp_list, which I surely need to learn -
foreach my $f1 (#Comp_list) {
if($f1 !~ /^(\./\.txt$/) {
print $f1; # check if reading the file right
#push it to a file using concatfile(
}}
Thanks a lot!

This should work for you. I've only tested it superficially as it would take me a while to create some test data, so as you have some at hand I'm hoping you'll report back with any problems
The program segregates all the files found by the equivalent of your glob call, and puts them in buckets according to their type. I've assumed that the names are exactly as you've shown, so the type is penultimate field when the file name is split on dots; i.e. the type of Test.1.age.txt is age
Having collected all of the file lists, I've used a technique that is originally designed to read through all of the files specified on the command line. If #ARGV is set to a list of files then an <ARGV> operation will read through all the files as if they were one, and so can easily be copied to a new output file
If you need the files concatenated in a specific order then I will have to amend my solution. At present they will be processed in the order that glob returns them -- probably in lexical order of their file names, but you shouldn't rely on that
use strict;
use warnings 'all';
use v5.14.0; # For autoflush method
use File::Spec::Functions 'catfile';
use constant ROOT_DIR => './Main_dir';
my %files;
my $pattern = catfile(ROOT_DIR, 'test_dir*', '*.txt');
for my $file ( glob $pattern ) {
my #fields = split /\./, $file;
my $type = lc $fields[-2];
push #{ $files{$type} }, $file;
}
STDOUT->autoflush; # Get prompt reports of progress
for my $type ( keys %files ) {
my $outfile = catfile(ROOT_DIR, "Combined.$type.txt");
open my $out_fh, '>', $outfile or die qq{Unable to open "$outfile" for output: $!};
my $files = $files{$type};
printf qq{Writing aggregate file "%s" from %d input file%s ... },
$outfile,
scalar #$files,
#$files == 1 ? '' : 's';
local #ARGV = #$files;
print $out_fh $_ while <ARGV>;
print "complete\n";
}

I think it's easier if you categorize the files first then you can work with them.
use warnings;
use strict;
use File::Spec;
use Data::Dumper;
use File::Basename;
my %hash = ();
my $testdir = './main_dir';
my #comp_list = glob("$testdir/**/*.txt");
foreach my $file (#comp_list){
$file =~ /(\w+\.\d\..+\.txt)/;
next if not defined $1;
my #tmp = split(/\./, $1);
if (not defined $hash{$tmp[-2]}) {
$hash{$tmp[-2]} = [$file];
}else{
push($hash{$tmp[-2]}, $file);
}
}
print Dumper(\%hash);
Files:
main_dir
├── sub1
│   ├── File.1.age.txt
│   └── File.1.name.txt
└── sub2
├── File.2.age.txt
└── File.2.name.txt
Result:
$VAR1 = {
'age' => [
'./main_dir/sub1/File.1.age.txt',
'./main_dir/sub2/File.2.age.txt'
],
'name' => [
'./main_dir/sub1/File.1.name.txt',
'./main_dir/sub2/File.2.name.txt'
]
};
You can create a loop to concatenate and combine files

Related

Need to loop through directory and all of it's subdirectories to find files at a certain in Perl

I am attempting to loop through a directory and all of its sub-directories to see if the files within those directories are a certain size. But I am not sure if the files in the #files array still contains the file size so I can compare the size( i.e. - size <= value_size ). Can someone offer any guidance?
use strict;
use warnings;
use File::Find;
use DateTime;
my #files;
my $dt = DateTime->now;
my $date = $dt->ymd;
my $start_dir = "/apps/trinidad/archive/in/$date";
my $empty_file = 417;
find( \&wanted, $start_dir);
for my $file( #files )
{
if(`ls -ltr | awk '{print $5}'`<= $empty_file)
{
print "The file $file appears to be empty please check within the folder if this is empty"
}
else
return;
}
exit;
sub wanted {
push #files, $File::Find::name unless -d;
return;
}
I think you could use this code instead of shelling out to awk.
(Don't understand why my empty_file = 417; is an empty file size).
if (-s $file <= $empty_file)
Also notice that you are missing an open and close brace for your else branch.
(Unsure why you want to 'return' if the first file found that is not 'empty' branches to the return which doesn't do anything because return is only used to return from a function).
The exit is unnecessary and the return in the wanted function is unnessary.
Update: A File::Find::Rule solution could be used. Here is a small program that captures all files less than 14 bytes in my current directory and all of it's subdirectories.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use File::Find::Rule;
my $dir = '.';
my #files = find( file => size => "<14", in => $dir);
say -s $_, " $_" for #files;

Recursive grep in perl

I am new to perl. I have a directory structure. In each directory, I have a log file. I want to grep pattern from that file and do post processing. Right now I am grepping the pattern from those files using unix grep and putting into text file and reading that text file to do post processing, But I want to automate task of reading each file and grepping pattern from that file. In the code below the mdp_cgdis_1102.txt have grepped pattern from directories. I would really appreciate any help
#!usr/bin/perl
use strict;
use warnings;
open FILE, 'mdp_cgdis_1102.txt' or die "Cannot open file $!";
my #array = <FILE>;
my #arr;
my #brr;
foreach my $i (#array){
#arr = split (/\//, $i);
#brr = split (/\:/, $i);
print " $arr[0] --- $brr[2]";
}
It is unclear to me which part of the process needs automating. I'll go by "want to automate reading each file and grepping pattern from that file," whereby you presumably already have a list of files. If you actually need to build the file list as well see the added code below.
One way: pull all patterns from each file and store that in a hash (filename => arrayref-with-patterns)
my %file_pattern;
foreach my $file (#filelist) {
open my $fh, '<', $file or die "Can't open $file: $!";
$file_pattern{$file} = [ grep { /$pattern/ } <$fh> ];
close $fh;
}
The [ ] takes a reference to the list returned by grep, ie. constructs an "anonymous array", and that (reference) is assigned as a value to the $file key.
Now you can process your patterns, per log file
foreach my $filename (sort keys %file_pattern) {
print "Processing log $filename.\n";
my #patterns = #{$file_pattern{$filename}};
# Process the list of patterns in this log file
}
ADDED
In order to build the list of files #filelist used above, from a known list of directories, use core File::Find
module which recursively scans supplied directories and applies supplied subroutines
use File::Find;
find( { wanted => \&process_logs, preprocess => \&select_logs }, #dir_list);
Your subroutine process_logs() is applied to each file/directory that passed preprocessing by the second sub, with its name available as $File::Find::name, and in it you can either populate the hash with patterns-per-log as shown above, or run complete processing as needed.
Your subroutine select_logs() contains code to filter log files from all files in each directory, that File::Find would normally processes, so that process_file() only gets the log files.
Another way would be to use the other invocation
find(\&process_all, #dir_list);
where now the sub process_all() is applied to all entries (files and directories) found and thus this sub itself needs to ensure that it only processes the log files. See linked documentation.
The equivalent of
find ... -name '*.txt' -type f -exec grep ... {} +
is
use File::Find::Rule qw( );
my $base_dir_qfn = ...;
my $re = qr/.../;
my #log_qfns =
File::Find::Rule
->name(qr/\..txt\z/)
->file
->in($base_dir_qfn);
my $success = 1;
for my $log_qfn (#log_qfns) {
open(my $fh, '<', $log_qfn)
or do {
$success = 0;
warn("Can't open log file \"$log_qfn\": $!\n);
next;
};
while (<$fh>) {
print if /$re/;
}
}
exit(1) if !$success;
Use File::Find to traverse the directory.
In a loop go through all the logfiles:
Open the file
read it line by line
For each line, do a regular expression match (
if ($line =~ /pattern/) ) or use
if (index($line, $searchterm) >= 0) if you are looking for a certain static string.
If you find a match, print the line.
close the file
I hope that gives you enough pointers to get started. You will learn more if you find out how to do each of these steps in Perl by yourself (I pointed out the hard ones).

how to open directory and read the files inside that directory using perl

I am trying to unzip the files and counting the matching characters in files , and after that i need to concatenate the files based on file names. i successfully achieved 1st 2 steps but i am facing the problem to achieve 3rd objective. this is the script i am using.
#! use/bin/perl
use strict;
use warnings;
print"Enter file name for Unzip\n";
print"File name: ";
chomp(my $Filename=<>);
system("gunzip -r ./$Filename\*\n");
print"Enter match characters";
chomp(my $match=<>);
system("grep -c '$match' ./$Filename/* > $Filename/output");
open $fh,"/home/final_stage/test_(copy)";
if(my $file="sra_*_*_*_R1")
{
print $file;
}
system("mkdir $Filename/R1\n");
system("mkdir $Filename/R2\n");
Based on "sra____R1" file name matching i have to concatenate and put the out in R1 folder and "sra____R2" file name R2 folder.
Help me to complete this work, all suggestions are welcome !!!!!
#!/usr/bin/perl
use strict;
use warnings;
use Path::Class;
use autodie; # die if problem reading or writing a file
my $dir = dir("/tmp"); # /tmp
my $file = $dir->file("file.txt");# Read in the entire contents of a file
my $content = $file->slurp();#
openr() returns an IO::File object to read from
my $file_handle = $file->openr(); # Read in line at a timewhile
( my $line = $file_handle->getline() )
{
print $line;
}
Enjoy your day !!!!

Reduce folder lists to lowest common folder

I have a giant list of file paths that are simply too large for our SCM to process. I need to whittle them down based on the lowest common level folder. For example, given the following paths:
//folder1/folder2/folder2
//folder1/folder2/folder5
//folder1/folder3/folder6
//folderx/foldery/folder9
//folderx/foldery/folder10
Based on that, I would like to arrive at this:
//folder1/folder2
//folder1/folder3
//folderx/foldery
The folder list will be read from a text file, and is around 2M line long.
Any help would be greatly appreciated.
This looks to be a good use for split() and hashes:
use strict;
use warnings;
my %seen;
foreach my $path ( #paths ) {
$path =~ s|^//||; # Strip off leading //
my #elems = split( '/', $path );
$seen{$elems[0]}{$elems[1]}++;
}
foreach my $rootpath ( sort keys %seen ) {
foreach my $secondpath ( sort keys %{$seen{$rootpath}} ) {
print "//" . $rootpath . "/" . $secondpath . "\n";
}
}
If you only want to print out paths that have been seen twice or more, insert a next if $seen{$rootpath}{$secondpath} > 1; before the print().
I haven't tested this so there could be syntax errors, but the code gives the general gist.
How about:
#!/usr/local/bin/perl
use strict;
use warnings;
use 5.010;
my %out;
while(<DATA>) {
chomp;
m#^(//[^/]+/[^/]+)#;
$out{$1} = 1;
}
say for keys%out;
__DATA__
//folder1/folder2/folder2
//folder1/folder2/folder5
//folder1/folder3/folder6
//folderx/foldery/folder9
//folderx/foldery/folder10
output:
//folderx/foldery
//folder1/folder3
//folder1/folder2

How can I list files under a directory with a specific name pattern using Perl?

I have a directory /var/spool and inside that, directories named
a b c d e f g h i j k l m n o p q r s t u v x y z
And inside each "letter directory", a directory called "user" and inside this, many directories called auser1 auser2 auser3 auser4 auser5 ...
Every user directory contains mail messages and the file names have the following format: 2. 3. 4. 5. etc.
How can I list the email files for every user in every directory in the following way:
/var/spool/a/user/auser1/11.
/var/spool/a/user/auser1/9.
/var/spool/a/user/auser1/8.
/var/spool/a/user/auser1/10.
/var/spool/a/user/auser1/2.
/var/spool/a/user/auser1/4.
/var/spool/a/user/auser1/12.
/var/spool/b/user/buser1/12.
/var/spool/b/user/buser1/134.
/var/spool/b/user/buser1/144.
etc.
I need that files and then open every single file for modify the header and body. This part I already have, but I need the first part.
I am trying this:
dir = "/var/spool";
opendir ( DIR, $dir ) || die "No pude abrir el directorio $dirname\n";
while( ($filename = readdir(DIR))){
#directorios1 = `ls -l "$dir/$filename"`;
print("#directorios1\n");
}
closedir(DIR);
But does not work the way I need it.
You can use File::Find.
As others have noted, use File::Find:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
find(\&find_emails => '/var/spool');
sub find_emails {
return unless /\A[0-9]+[.]\z/;
return unless -f $File::Find::name;
process_an_email($File::Find::name);
return;
}
sub process_an_email {
my ($file) = #_;
print "Processing '$file'\n";
}
Use File::Find to traverse a directory tree.
For a fixed level of directories, sometimes it's easier to use glob than File::Find:
while (my $file = </var/spool/[a-z]/user/*/*>) {
print "Processing $file\n";
}
People keep recommending File::Find, but the other piece that makes it easy is my File::Find::Closures, which provides the convenience functions for you:
use File::Find;
use File::Find::Closures qw( find_by_regex );
my( $wanted, $reporter ) = find_by_regex( qr/^\d+\.\z/ );
find( $wanted, #directories_to_search );
my #files = $reporter->();
You don't even need to use File::Find::Closures. I wrote the module so that you could lift out the subroutine you wanted and paste it into your own code, perhaps tweaking it to get what you needed.
Try this:
sub browse($);
sub browse($)
{
my $path = $_[0];
#append a / if missing
if($path !~ /\/$/)
{
$path .= '/';
}
#loop through the files contained in the directory
for my $eachFile (glob($path.'*'))
{
#if the file is a directory
if(-d $eachFile)
{
#browse directory recursively
browse($eachFile);
}
else
{
# your file processing here
}
}
}#browse