I want to get the folder last modified date and time through perl. My code is:
my #dirs = grep { -d } glob "$SOME_DIR/*";
foreach my $dir (#dirs)
{
print($dir);
print((stat $dir)[9]); #line got problem with
}
But it just printed nothing but the sub foders under $SOME_DIR. I am SURE the $SOME_DIR path exists since the print($dir) works. Would anyone know what cant get the last modified time of a directory? Thanks!
For me, with $SOME_DIR = ".";, I got the output:
./c-vs-c++1369283477./computist-1.dSYM1381934424./computist-2.dSYM1381934897./ll3.dSYM1381816690./syncio.dSYM1381984813./xs.dSYM1381986208
This mildy revised code:
#!/usr/bin/env perl
use strict;
use warnings;
my $SOME_DIR = ".";
my #dirs = grep { -d } glob "$SOME_DIR/*";
foreach my $dir (#dirs)
{
printf "%-20s - %d\n", $dir, (stat $dir)[9];
}
gave the output:
./c-vs-c++ - 1369283477
./computist-1.dSYM - 1381934424
./computist-2.dSYM - 1381934897
./ll3.dSYM - 1381816690
./syncio.dSYM - 1381984813
./xs.dSYM - 1381986208
You need to demonstrate what is in your $SOME_DIR. For example, you might use:
system "ls -l $SOME_DIR";
to show what you should be seeing.
try this:-
#!/usr/bin/perl
#1=`ls -ltr abcdpathtodir | grep ^d | awk '{\$1=\$2=\$3=\$4=\$5=""; print \$0}'`;
foreach $i (#1)
{
print " ---$i\n";
}
or other way:-
#m=`ls -ltr dir | grep ^d | awk '{print \$6,\$7,\$8,\$9}'`;
foreach $i (#m)
{
print "$i\n";
}
Related
My requirement is to check if a nested directory structure is having any binary file or not.
The directory structure looks something like this:
DIR-A
|
|--DIR-X
| |
| |--DIR-X1
| |--DIR-X2
|
|--DIR-Y
| |
| |--DIR-Y1
| |--DIR-Y2
| |--DIR-Y3
|
|--DIR-Z
| |
| |--DIR-Z1
| |--DIR-Z2
| |--DIR-Z3
At any point in time there can be more directories at Level-1 or Level-2 i.e. there can be some more directories i.e. DIR-P, DIR-Q etc at level-1 and there can be DIR-X3 or DIR-Y4 at level-2.
I have written a sample code but it exits if it finds DIR-X1, Ideally it should exit if there is a binary file inside the directory.
#!/usr/bin/perl
my $someDir = "/path/of/DIR-A";
my #files = ();
my $file;
my $i=0;
opendir(DIR, "$someDir") or die "Cant open $someDir: $!\n";
#files = readdir(DIR);
foreach $file(#files)
{
unless ($file =~ /^[.][.]?\z/)
{
print "$i : $file \n";
$i++;
last;
}
}
if ($i != 0)
{
print "The directory contains files! \n";
exit 1;
}
else
{
print "This DIR-A is Empty! \n";
exit 0;
}
closedir(DIR);
Please suggest me get to the expected solution as below:
read DIR-A
print SUCCESS, if none of the nested directories have a binary file.
print ERROR, if at least one of the nested directories has a binary file.
Thanks!
Use File::Find::Rule
#!/usr/bin/env perl
use strict;
use warnings;
use File::Find::Rule;
my $someDir = "/path/of/DIR-A";
my #files = File::Find::Rule->file()
->name('*.bin')
->in($someDir);
This will get you all files with the extension '.bin'.
If you need to perform a per file test to check that they are 'binary' then you can use grep on your list of #files.
my #files = grep {-B} File::Find::Rule->file()
->in($someDir);
print "Binary files found\n" if #files;
Also:
use strict; use warnings;. It's good.
Code formatting is a really good thing. perltidy -pbp makes it easy.
I am unclear as to what a binary file is for your test. I am assuming that any file found in the directory structure traversed is a binary file. Using File::Find, which is a core module:
use File::Find;
my $error = 0;
find(\&wanted, #ARGV);
if( $error ) {
print "ERROR, $error files found\n";
}
else {
print "SUCCESS\n";
}
sub wanted {
if( -f $_ ) {
$error++;
}
}
You may add any test to the wanted function. The find function will invoke the function provided for each file found in the list of directories that is also passed, which will be traversed recursively in depth-first search order (much like the find command does.) Passing it #ARGV you may invoke the script with a list of directories as required (maybe using shell expansion like DIR-*.)
The test function will get the file name being traversed in $_, while the current working directory is set to the directory that contains the file.
You can use below script to find if binary file exist or not recursively.
#! /usr/bin/env perl
use warnings;
use strict;
use File::Find;
my $path="/path/of/DIR-A";
sub find_binary {
my $file = $File::Find::name;
if (-B $file && ! -d $file) {
print "ERROR: At least one of the nested directories has a binary file : $file\n";
exit;
}
}
find(\&find_binary,$path);
print("SUCCESS: None of the nested directories have a binary file. \n");
Use (warning: my) module File::Globstar:
use v5.10;
use File::Globstar qw(globstar);
my $dir = $ARGV[0] // '.';
say join "\n", grep { -B && ! -d } globstar "$dir/**";
If you want the list of files in a list, assign it instead of printing it:
my #nondirs = grep { -B && ! -d } globstar "$dir/**";
If you know the extender of the files, you can also do this:
my #nondirs = grep { -B && ! -d } globstar "$dir/**/*.png";
Note that the file test -B produces a truthy value for empty files which is maybe not what you want. In that case change the test to -B && -s && ! -d.
I use Perl scripts to run commands for several users in several VOBs on ClearCase. I have a list of the VOBs which I read in from a text file. I then loop on that list and do whatever ClearCase command I am trying to do. However, this time the script does not seem to work. If I print out the command to the screen then go and copy and paste it at the prompt it works fine. It just will not executed from the Perl script. The only difference I saw was the fmt characters but even when I remove that it does not execute. I tried first putting the fmt on the line directly then tried setting them to variables. You will see the first comment line is the one that failed but I left it there as an example of what I tried. The last two comments are from another script that I run like this that does work.
Code:
#! /usr/local/bin/perl -w
use strict;
open(VOBS,"vobs.txt") || die "Can't open: !$\n";
my $u = '%u';
my $a ='%Ad';
my $n ='%N/n';
my $user='john';
my $ct = '/usr/atria/bin/cleartool';
while(my $newvobs=<VOBS>){
chomp($newvobs);
my $tag = $newvobs;
print "\n $tag \n";
print " $ct lstype -kind brtype -invob $tag | grep $user ";
`$ct lstype -kind brtype -invob $tag | grep $user`;
# `/usr/atria/bin/cleartool lstype -kind brtype -invob $tag -fmt '%u %Ad %N/\n' `;
# print "\n cleartool rmtag -view $tag \n";
#`/usr/atria/bin/cleartool rmtag -view $tag `;
}
close(VOBS);
Actually Your program runs, but is does not print anything.
Example:
#!/usr/bin/perl
use strict;
use warnings;
my $cmd = "cat";
`$cmd $0 | grep warning`;
Output: (nothing)
Easiest to fix. Last line
print `$cmd $0 | grep warning`
Output:
use warnings;
print `$cmd $0 | grep warning`;
If You need the exit code, replace last line with
my $exit = system("$cmd $0 | grep warning");
print $exit;
Output:
use warnings;
my $exit = system("$cmd $0 | grep warning");
0
Or use open to process output:
open my $fh, "$cmd $0 | grep warning|" or die;
while (<$fh>) { print $_; }
close $fh;
Output:
use warnings;
open my $fh, "$cmd $0 | grep warning|" or die;
But I could suggest something like bellow. Using AUTOLOAD the clearcase commands can be used as internal perl commands.
#!/usr/bin/perl
use strict;
use warnings;
sub AUTOLOAD {
(my $sub = $::AUTOLOAD) =~ s/.*:://;
print "---\n";
system("time $sub #_");
print "---\n";
}
my $cmd = "cat";
eval "$cmd($0)";
Output:
---
#!/usr/bin/perl
use strict;
use warnings;
sub AUTOLOAD {
(my $sub = $::AUTOLOAD) =~ s/.*:://;
print "---\n";
system("time $sub #_");
print "---\n";
}
cat($0);
0.00user 0.00system 0:00.00elapsed 400%CPU (0avgtext+0avgdata 2112maxresident)k
0inputs+0outputs (0major+174minor)pagefaults 0swaps
---
For reference, here is a perl script which uses -fmt, "Finding the latest baseline for a component":
Example:
ccperl StreamComp.pl mystream#\pvobtag | findstr {component}
Script "streamcomp.pl":
#!/usr/bin/perl -w
my $cmdout = `cleartool desc -fmt '%[latest_bls]CXp' stream:$ARGV[0]`;
my #baselines = split(/,/,$cmdout);
foreach $baseline (#baselines)
{
$compname=`cleartool desc -fmt '%[component]p' $baseline`;
printf("%-30s \t %s\n", $compname, $baseline);
}
It is a program working for a ClearCase UCM environment, but it could give you an idea of the kind of working statements (try without grep first) you could try to reproduce in your own base ClearCase program.
Fred here again with a little issue I'm having that I hope you guys can help me with.
I'm reviewing for midterms and going over an old file I found on here and I wanted to get it working. I can't find it on here anymore but I still have the source code so I'll make another question on it.
So here was his assignment:
Write a perl script that will compare two directories for differences in regular files. All regular files with the same names should be tested with the unix function /usr/bin/diff -q which will determine whether they are identical. A file in dir1 which does not have a similarly named file in dir2 will have it's name printed after the string <<< while a file in dir2 without a corresponding dir1 entry will be prefixed with the string >>>. If two files have the same name but are different then the file name will be surrounded by > <.
Here is the script:
#!/usr/bin/perl -w
use File::Basename;
#files1 = `/usr/bin/find $ARGV[0] -print`;
chop #files1;
#files2 = `/usr/bin/find $ARGV[1] -print`;
chop #files2;
statement:
for ($i=1; #files1 >= $i; $i++) {
for ($x=1; #files2 >= $x; $x++) {
$file1 = basename($files1[$i]);
$file2 = basename($files2[$x]);
if ($file1 eq $file2) {
shift #files1;
shift #files2;
$result = `/usr/bin/diff -q $files1[$i] $files2[$x]`;
chop $result;
if ($result eq "Files $files1[$i] and $files2[$x] differ") {
print "< $file1 >\n";
next statement;
} else {
print "> $file1 <\n";
}
} else {
if ( !-e "$files1[$i]/$file2") { print ">>> $file2\n";}
unless ( -e "$files2[$x]/$file1") { print "<<< $file1\n";}
}
}
}
This is the output:
> file2 <
>>> file5
<<< file1
The output should be:
> file1 <
> file2 <
<<< file4
>>> file5
I already checked the files to make sure that they all match and such but still having problems. If anyone can help me out I would greatly appreciate it!
First off, always use these:
use strict;
use warnings;
It comes with a short learning curve, but they more than make up for it in the long run.
Some notes:
You should use the File::Find module instead of using a system call.
You start your loops at array index 1. In perl, the first array index is 0. So you skip the first element.
Your loop condition is wrong. #files >= $x means you will iterate to 1 more than max index (normally). You want either $x < #files or $x <= $#files.
You should use chomp, which is a safer version of chop.
Altering the arrays you are iterating over is a sure way to cause yourself some confusion.
Why use if (! -e ...) and then unless (-e ...)? That surely just adds confusion.
And this part:
$file1 = basename($files1[$i]);
...
if ( !-e "$files1[$i]/$file2" )
Assuming #files1 contains file names and not just directories, this will never match anything. For example:
$file2 = basename("dir/bar.html");
$file1 = basename("foo/bar.html");
-e "foo/bar.html/bar.html"; # does not compute
I would recommend using hashes for the lookup, assuming you only want to match against identical file names and missing file names:
use strict;
use warnings;
use File::Find;
use List::MoreUtils qw(uniq);
my (%files1, %files2);
my ($dir1, $dir2) = #ARGV;
find( sub { -f && $files1{$_} = $File::Find::name }, $dir1);
find( sub { -f && $files2{$_} = $File::Find::name }, $dir2);
my #all = uniq(keys %files1, keys %files2);
for my $file (#all) {
my $result;
if ($files1{$file} && $files2{$file}) { # file exists in both dirs
$result = qx(/usr/bin/diff -q $files1{$file} $files2{$file});
# ... etc
} elsif ($files1{$file}) { # file only exists in dir1
} else { # file only exists in dir2
}
}
In the find() subroutine, $_ represents the base name, and $File::Find::name the name including path (which is suitable for use with diff). The -f check will assert that you only include regular files in your hash.
I am trying to sort some data in bash. Data looks like below.
20110724.gz 1347
20110724.gz 2128
20110725.gz 1315
20110725.gz 2334
20110726.gz 808
20110726.gz 1088
-bash-3.2$
After sorting, it should look like
20110724.gz 3475
20110725.gz 3649
20110726.gz 1896
Basically, for a given date, the data are summed up. Can somebody help? Thanks.
hmm, hopefully I figure it out in a few days.
Here's a quick and dirty perl oneliner:
$ perl -e 'my %h = (); while (<>) { chomp; my ($fname, $count) = split; $h{$fname} += $count;} foreach my $k (sort keys %h) {print $k, " ", $h{$k}, "\n"}' < datafile
Here's a perl solution.
Usage: script.pl input.txt > output.txt
Code:
use warnings;
use strict;
use ARGV::readonly;
my %sums;
while (<>) {
my ($date, $num) = split;
$sums{$date} += $num;
}
for my $date (sort keys %sums) {
print "$date $sums{$date}\n";
}
Or as a one-liner:
$ perl -we 'my %h; while(<>) { ($d,$n)=split; $h{$d}+=$n; } print "$_ $h{$_}\n" for sort keys %h;' data2.txt
In case you do need a numerical sort on the dates:
sort { substr($a,0,8) <=> substr($b,0,8) } keys %sums;
You don't need perl for doing that. Some shell trickery will help :)
sort -n -k1,8 <file | while true ; do
if ! read line ; then
test -n "$accfile" && echo $accfile $value
break
fi
line=$(echo $line | tr -s ' ' )
curfile=$(echo $line | cut -d\ -f1)
curvalue=$(echo $line | cut -d\ -f2)
if [ $curfile != "$accfile" ] ; then
# new file, output the last if not empty
test -n "$accfile" && echo $accfile $value
accfile=$curfile
value=$curvalue
else
value=$(expr $value \+ $curvalue)
fi
done
The k parameter tells sort what characters use to sort. As dates are put in number-ordered format, a number sort (-n) works.
I have a folder with large number of files, some of with have exactly the same contents. I want to remove files with duplicate contents, meaning if two or more files with duplicate content found, I'd like to leave one of these files, and delete the others.
Following is what I came up with, but I don't know if it works :) , didn't try it yet.
How would you do it? Perl or general algorithm.
use strict;
use warnings;
my #files = <"./files/*.txt">;
my $current = 0;
while( $current <= $#files ) {
# read contents of $files[$current] into $contents1 scalar
my $compareTo = $current + 1;
while( $compareTo <= $#files ) {
# read contents of $files[compareTo] into $contents2 scalar
if( $contents1 eq $contents2 ) {
splice(#files, $compareTo, 1);
# delete $files[compareTo] here
}
else {
$compareTo++;
}
}
$current++;
}
Here's a general algorithm (edited for efficiency now that I've shaken off the sleepies -- and I also fixed a bug that no one reported)... :)
It's going to take forever (not to mention a lot of memory) if I compare every single file's contents against every other. Instead, why don't we apply the same search to their sizes first, and then compare checksums for those files of identical size.
So then when we md5sum every file (see Digest::MD5) calculate their sizes, we can use a hash table to do our matching for us, storing the matches together in arrayrefs:
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
my %files_by_size;
foreach my $file (#ARGV)
{
push #{$files_by_size{-s $file}}, $file; # store filename in the bucket for this file size (in bytes)
}
Now we just have to pull out the potential duplicates and check if they are the same (by creating a checksum for each, using Digest::MD5), using the same hashing technique:
while (my ($size, $files) = each %files_by_size)
{
next if #$files == 1;
my %files_by_md5;
foreach my $file (#$files_by_md5)
{
open my $filehandle, '<', $file or die "Can't open $file: $!";
# enable slurp mode
local $/;
my $data = <$filehandle>;
close $filehandle;
my $md5 = md5_hex($data);
push #{$files_by_md5{$md5}}, $file; # store filename in the bucket for this MD5
}
while (my ($md5, $files) = each %files_by_md5)
{
next if #$files == 1;
print "These files are equal: " . join(", ", #$files) . "\n";
}
}
-fini
Perl, with Digest::MD5 module.
use Digest::MD5 ;
%seen = ();
while( <*> ){
-d and next;
$filename="$_";
print "doing .. $filename\n";
$md5 = getmd5($filename) ."\n";
if ( ! defined( $seen{$md5} ) ){
$seen{$md5}="$filename";
}else{
print "Duplicate: $filename and $seen{$md5}\n";
}
}
sub getmd5 {
my $file = "$_";
open(FH,"<",$file) or die "Cannot open file: $!\n";
binmode(FH);
my $md5 = Digest::MD5->new;
$md5->addfile(FH);
close(FH);
return $md5->hexdigest;
}
If Perl is not a must and you are working on *nix, you can use shell tools
find /path -type f -print0 | xargs -0 md5sum | \
awk '($1 in seen){ print "duplicate: "$2" and "seen[$1] } \
( ! ($1 in seen ) ) { seen[$1]=$2 }'
md5sum *.txt | perl -ne '
chomp;
($sum, $file) = split(" ");
push #{$files{$sum}}, $file;
END {
foreach (keys %files) {
shift #{$files{$_}};
unlink #{$files{$_}} if #{$files{$_}};
}
}
'
Perl is kinda overkill for this:
md5sum * | sort | uniq -w 32 -D | cut -b 35- | tr '\n' '\0' | xargs -0 rm
(If you are missing some of these utilities or they don't have these flags/functions,
install GNU findutils and coreutils.)
Variations on a theme:
md5sum *.txt | perl -lne '
my ($sum, $file) = split " ", $_, 2;
unlink $file if $seen{$sum} ++;
'
No need to go and keep a list, just to remove one from the list and delete the rest; simply keep track of what you've seen before, and remove any file matching a sum that's already been seen. The 2-limit split is to do the right thing with filenames containing spaces.
Also, if you don't trust this, just change the word unlink to print and it will output a list of files to be removed. You can even tee that output to a file, and then rm $(cat to-delete.txt) in the end if it looks good.
a bash script is more expressive than perl in this case:
md5sum * |sort -k1|uniq -w32 -d|cut -f2 -d' '|xargs rm
I'd recommend that you do it in Perl, and use File::Find while you're at it.
Who knows what you're doing to generate your list of files, but you might want to combine it with your duplicate checking.
perl -MFile::Find -MDigest::MD5 -e '
my %m;
find(sub{
if(-f&&-r){
open(F,"<",$File::Find::name);
binmode F;
$d=Digest::MD5->new->addfile(F);
if(exists($m{$d->hexdigest}){
$m{$d->hexdigest}[5]++;
push $m{$d->hexdigest}[0], $File::Find::name;
}else{
$m{$d->hexdigest} = [[$File::Find::name],0,0,0,0,1];
}
close F
}},".");
foreach $d (keys %m) {
if ($m{$d}[5] > 1) {
print "Probable duplicates: ".join(" , ",$m{$d}[0])."\n\n";
}
}'
Here is a way of filtering by size first and by md5 checksum second:
#!/usr/bin/perl
use strict; use warnings;
use Digest::MD5 qw( md5_hex );
use File::Slurp;
use File::Spec::Functions qw( catfile rel2abs );
use Getopt::Std;
my %opts;
getopt('de', \%opts);
$opts{d} = '.' unless defined $opts{d};
$opts{d} = rel2abs $opts{d};
warn sprintf "Checking %s\n", $opts{d};
my $files = get_same_size_files( \%opts );
$files = get_same_md5_files( $files );
for my $size ( keys %$files ) {
for my $digest ( keys %{ $files->{$size}} ) {
print "$digest ($size)\n";
print "$_\n" for #{ $files->{$size}->{$digest} };
print "\n";
}
}
sub get_same_md5_files {
my ($files) = #_;
my %out;
for my $size ( keys %$files ) {
my %md5;
for my $file ( #{ $files->{$size}} ) {
my $contents = read_file $file, {binmode => ':raw'};
push #{ $md5{ md5_hex($contents) } }, $file;
}
for my $k ( keys %md5 ) {
delete $md5{$k} unless #{ $md5{$k} } > 1;
}
$out{$size} = \%md5 if keys %md5;
}
return \%out;
}
sub get_same_size_files {
my ($opts) = #_;
my $checker = defined($opts->{e})
? sub { scalar ($_[0] =~ /\.$opts->{e}\z/) }
: sub { 1 };
my %sizes;
my #files = grep { $checker->($_) } read_dir $opts->{d};
for my $file ( #files ) {
my $path = catfile $opts->{d}, $file;
next unless -f $path;
my $size = (stat $path)[7];
push #{ $sizes{$size} }, $path;
}
for my $k (keys %sizes) {
delete $sizes{$k} unless #{ $sizes{$k} } > 1;
}
return \%sizes;
}
You might want to have a look at how I did to find duplicate files and remove them. Though you have to modify it to your needs.
http://priyank.co.in/remove-duplicate-files