how do I check if a domain name is available (in bulk)? - perl

I am looking to do a bulk domain name lookup to see if some domain names are available for purchase. I couldn't find a perl module, but it seems like there should be a way to do this in perl. I am looking for something free. thx!

From here: http://www.webhostingtalk.com/showthread.php?t=625723
Here’s a quick perl script that requires Net :: DNS (fairly common module).
#!/usr/bin/perl
# Domains Available
# Josh Skidmore <josh#vorcado.com>
# 05 August 2007 | 11:40p EST
# Requirements
use Net::DNS;
# Variables
%VAR = (
db => './domains.txt',
);
# Open file
open (DB,$VAR{'db'});
my (#domains) = <DB>;
close (DB);
# Test domains
foreach my $domain (#domains)
{
chomp($domain);
my ($available) = &check_domain(domain => $domain);
if ($available)
{
print "$domain is available.<br />\n";
}
else
{
print "$domain is NOT available<br />\n";
}
}
sub check_domain {
# Test domain for existance
# Josh Skidmore <josh#vorcado.com>
# 05 August 2007 | 11:42p EST
# Variables
my (%DATA) = #_ ;
my ($available) = 0;
# Start Net::DNS
my $res = Net::DNS::Resolver->new;
$res->udp_timeout(2);
$res->tcp_timeout(2);
my ($domain) = $res->search($DATA{'domain'});
if ($domain)
{
($available) = 1;
}
# Output
return ($available);
}

There are plenty of whois-like modules on CPAN. Net::Whois::Parser looks promising, for one.
Barring that, the whois command is available in Linux and other Unix-y systems (including Cygwin for windows). There's also a whois script in Perl on CPAN. Running those programs is the easy part. The tricky part, which Perl can definitely help with, is parsing the output from those programs.

Related

Subroutine returning a string warns about "Useless use of a constant in void context"

I have a simple function blurb in a module that returns some text
package Il::NetApp::Dox::FlashCache;
use strict;
use warnings;
no if $] >= 5.018, warnings => "experimental::smartmatch";
use Carp;
use Carp::Heavy;
use Data::Dumper;
use FindBin qw($Bin);
use Il::SysMon::Tools 3.2.1 qw( :debug :special_chars :text_format :help_snippets);
use Il::NetApp::Tools qw( :help_snippets );
# ===========================================================================
# = Texte - ab hier wird übersetzt =
# ===========================================================================
# ===========================================================================
# Markdown Syntax in blurb, extra und examples=>txt!
#
# Verwendbare Variablen in den Texten:
#
# $VERBOSE_HINT = Hinweis -v einzusetzen
#
# ===========================================================================
sub blurb {
q{Checks several metrics of NetApps FlashCache (PAM II).}; # Line 27
}
sub extra {
<<END_EXTRA,
This plugin checks various performance counters of the NetApp-system.
A list of supported counters is printed via the `--counter` switch.
$HELP_DISCOVER_COUNTERS
END_EXTRA
}
#
# Examples: Hier ist jeweils nur txt => zu übersetzen
#
sub simple_examples {
my $examples =
[
{
cmd => q{--explore=counters},
txt => q{List all available and supported counters on the target system.}
},
{
cmd => q{-z hit_percent -w 0 -c 0},
txt => q{Monitor the hitrate for trendanalyses but do not alarm.}
},
]
; # Ende von my $examples =
return $examples;
}
# sub advanced_examples {
# my $examples =
# [
# {
# cmd => q{},
# txt => q{}
# },
# ]
# ; # Ende von my $examples =
# return $examples;
# }
# ===========================================================================
# = ENDE der Texte - ab hier ist nichts mehr zu übersetzen =
# ===========================================================================
1; # return true
On one server we are getting occasional warnings:
Useless use of a constant ("Checks several metrics of "...) in void context at .../lib/Il/NetApp/Dox/FlashCache.pm line 27.
A Perl subroutine returns the value of the last statement executed if it is an expression, and this technique has worked before. I can't reproduce the problem with Perl v5.10.1 or v5.18.2.
The site having these warnings is running Perl v5.16.3
# perl --version
This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
(with 33 registered patches, see perl -V for more detail)
Could this be a bug in a specific Perl version?
Void context is a context where there's nothing to consume what's been returned.
The warnings occurs where the sub is defined with an empty prototype:
use warnings;
sub blurb () { q(some string) }
blurb();
Can you show NetApp/Dox/FlashCache.pm line 27?
Oh no, finally we have found the reason for that error message "Useless use of a constant" on some customers server. The problem had been introduced by a bug in the distribution chain on the way to the customer.
The code we have sent was (example, reduced):
#!/usr/local/bin/perl -w
use warnings;
use strict;
use feature ":5.10";
say blurb();
say "\n\n\nDESCRIPTION and EXAMPLES\n\n" . extra();
sub blurb {
q{I am the blurb - print me wherever you want me to be shown.};
}
sub extra {
<<END;
Some extra documentation - spanning several lines including indents.
EXAMPLES:
# so something
do_something.pl -H <hostname>
Does something with hostname.
HINT: Da also lag der Hund begraben (German saying, "that's where the dog is buried" translates to "there's the fly in the ointment")
END
}
The code which had actually been run on the customers server and which caused that error was slightly different:
#!/usr/local/bin/perl -w
use warnings;
use strict;
use feature ":5.10";
say blurb();
say "\n\n\nDESCRIPTION and EXAMPLES\n\n" . extra();
sub blurb {
q{I am the blurb - print me wherever you want me to be shown.};
my $status;
}
sub extra {
<<END;
Some extra documentation - spanning several lines including indents.
EXAMPLES:
# so something
do_something.pl -H <hostname>
Does something with hostname.
HINT: Da also lag der Hund begraben (German saying, "that's where the dog is buried" translates to "there's the fly in the ointment")
END
}
The cause for the error-message was the additional line in the blurb-sub.
Conclusions
Beside of fixing the distribution chain future code will be made a bit more robust by either storing the text in a variable and returning that variable later or leaving out the last semicolon which would have thrown a clear error during compile-time and would have saved hours of debugging.
#!/usr/local/bin/perl -w
use warnings;
use strict;
use feature ":5.10";
say blurb();
say "\n\n\nDESCRIPTION and EXAMPLES\n\n" . extra();
sub blurb {
q{I am the blurb - print me wherever you want me to be shown.}
}
sub extra {
my $extra =
<<END;
Some extra documentation - spanning several lines including indents.
EXAMPLES:
# so something
do_something.pl -H <hostname>
Does something with hostname.
HINT: Da also lag der Hund begraben (German saying, "that's where the dog is buried" translates to "there's the fly in the ointment")
END
return $extra;
}
Sorry for the first miss-leading code-example and thanks for all your comments!

Trying to input variable into url and having encoding issues

I am new to Perl and trying to make a script that takes input from the user and then get XML data from a website based on that input together with a url and then relay it back to the user.
But I have had some issues now with make a usable link based on the input from the user.
This is my code in full:
use strict;
use warnings;
my $row = 0;
use XML::LibXML;
print "\n\n\nOn what place do you need a weather report for? -> ";
chomp( my $ort = <> );
my $url = join('', "http://www.yr.no/place/Sweden/Västra_Götaland/",$ort,"/forecast_hour_by_hour.xml");
my $dom = XML::LibXML->load_xml(location => $url);
print "\n\nSee below the weather for ", $ort, ":\n\n";
foreach my $weatherdata ($dom->findnodes('//time')) {
if($row != 10){
my $temp = $weatherdata->findvalue('./temperature/#value');
my $value = $weatherdata->findvalue('./#from');
my $valuesub = substr $value, 11, 5;
print "At ", $valuesub, " the temperature will be: ", $temp, "C\n";
$row++;
}
}
print "\n\n";
If I write a place I want the weather info on. For example:
Mellerud
Then it takes that and I get a response from the link with propper data.
However. If I Write
Åmål
Its not making any sense to the script. I now get:
Could not create file parser context for file
"http://www.yr.no/place/Sweden/V├ñstra_G├Âtaland/Åmål/forecast_hour_by_hour.xml":
No error at test4.pl line 14
If I replace ",$ort," and just add Åmål I get the propper result.
I have been searching for different types of encoding for this, but I have not found a solution that works.
Once again I would like to point out that I am really new to this. I might miss something really simple. My apologies for that.
::EDIT 1::
After suggestion from #zdim I added use open ':std', ':encoding(UTF-8)';
This added some different results, but does only generate more error as following here:
Also I am running this in Windows CMD under administrator privileges.
According to #zdim its running fine in linux with xterm for input, v5.16.
Is there a way to make it work in Windows?
The problem is that CMD.exe is limited to 8-bit codepages. The "Å" and "å" characters are mapped (in Swedish Windows) to positions in the upper 8-bit range of codepage 850 that are illegal code points in Unicode.
If you need to output non-7-bit-ASCII characters, consider running PowerShell ISE. If you set it up correctly, it can cope with any character (in output) that the font you're using supports. The big downside is that PowerShell ISE is not a console, and therefore doesn't allow input from console/keyboard using STDIN. You can work around this by supplying your input as arguments, from a pipe, in a setting file, or thru graphical UI query elements.
To set up Windows PowerShell ISE to work with UTF8:
Set PowerShell to allow running local unsigned user scripts by running (in administrator elevated PowerShell):
Set-ExecutionPolicy RemoteSigned
Create or edit the file "<Documents>\WindowsPowerShell\Microsoft.PowerShellISE_profile.ps1" and add something like:
perl -w -e 'print qq!Initializing with Perl...\n!;'
[System.Console]::OutputEncoding = [System.Text.Encoding]::UTF8;
(You need the Perl bit (or something equivalent) there to allow for the
modification of the encoding.)
In PowerShell ISE's options, set the font to Consolas.
In your perl scripts, always do:
binmode(STDOUT, ':encoding(UTF-8)');
binmode(STDERR, ':encoding(UTF-8)');
My solution to the OP's problem:
use strict;
use warnings;
my $row = 0;
use XML::LibXML;
binmode(STDOUT, ':encoding(UTF-8)');
binmode(STDERR, ':encoding(UTF-8)');
#ARGV or die "No arguments!\n";
my $ort = shift #ARGV;
print "\n\n\nGetting weather report for \"$ort\"\n";
my $url = join('', "http://www.yr.no/place/Sweden/Västra_Götaland/",$ort,"/forecast_hour_by_hour.xml");
my $dom = XML::LibXML->load_xml(location => $url);
print "\n\nSee below the weather for ", $ort, ":\n\n";
foreach my $weatherdata ($dom->findnodes('//time')) {
if($row != 10){
my $temp = $weatherdata->findvalue('./temperature/#value');
my $value = $weatherdata->findvalue('./#from');
my $valuesub = substr $value, 11, 5;
print "At ", $valuesub, " the temperature will be: ", $temp, "C\n";
$row++;
}
}
print "\n\n";
Output:
(run at around 2018-06-09T14:05 UTC; 16:05 CEST (which is Sweden's time zone)):
PS (censored)> perl -w $env:perl5lib\Tests\Amal-Test.pl "Åmål"
Getting weather report for "Åmål"
See below the weather for Åmål:
At 17:00 the temperature will be: 27C
At 18:00 the temperature will be: 26C
At 19:00 the temperature will be: 25C
At 20:00 the temperature will be: 23C
At 21:00 the temperature will be: 22C
At 22:00 the temperature will be: 21C
At 23:00 the temperature will be: 20C
At 00:00 the temperature will be: 19C
At 01:00 the temperature will be: 18C
At 02:00 the temperature will be: 17C
Another note:
Relying on data to always be in an exact position in a string might not be the best idea.
Instead of:
my $valuesub = substr $value, 11, 5;
maybe consider matching it with a regular expression instead:
if ($value =~ /T((?:[01]\d|2[0-3]):[0-5]\d):/) {
my $valuesub = $1;
print "At ", $valuesub, " the temperature will be: ", $temp, "C\n"; }
else {
warn "Malformed value: $value\n";
}

Perl - Trouble with my unzip system call for zip file crack

I am a junior currently taking a scripting languages class that is suppose to spit us out with intermediate level bash, perl, and python in one semester. Since this class is accelerated, we speed through topics quickly and our professor endorses using forums to supplement our learning if we have questions.
I am currently working on our first assignment. The requirement is to create a very simple dictionary attack using a provided wordlist "linux.words" and a basic bruteforce attack. The bruteforce needs to compensate for any combination of 4 letter strings.
I have used print statements to check if my logic is sound, and it seems it is. If you have any suggestions on how to improve my logic, I am here to learn and I am all ears.
This is on Ubuntu v12.04 in case that is relevant.
I have tried replacing the scalar within the call with a straight word like unicorn and it runs fine, obviously is the wrong password, and it returns correctly. I have done this both in terminal and in the script itself. My professor has looked over this for a good 15 minutes he could spare, before referring me to forum, and said it looked good. He suspected that since I wrote the code using Notepad++ there might be hidden characters. I rewrote the code straight in the terminal using vim and it gave the same errors above. The code pasted is below is from vim.
My actual issue is that my system call is giving me problems. It returns the help function for unzip showing usages and other help material.
Here is my code.
#!/usr/bin/perl
use strict;
use warnings;
#Prototypes
sub brute();
sub dict();
sub AddSlashes($);
### ADD SLASHES ###
sub AddSlashes($)
{
my $text = shift;
$text =~ s/\\/\\\\/g;
$text =~ s/'/\\'/g;
$text =~ s/"/\\"/g;
$text =~ s/\\0/\\\\0/g;
return $text;
}
### BRUTEFORCE ATTACK ###
sub brute()
{
print "Bruteforce Attack...\n";
print "Press any key to continue.\n";
if (<>)
{
#INCEPTION START
my #larr1 = ('a'..'z'); #LEVEL 1 +
foreach (#larr1)
{
my $layer1 = $_; #LEVEL 1 -
my #larr2 = ('a'..'z'); #LEVEL 2 +
foreach (#larr2)
{
my $layer2 = $_; # LEVEL 2 -
my#larr3 = ('a'..'z'); #LEVEL 3 +
foreach (#larr3)
{
my $layer3 = $_; #LEVEL 3 -
my#larr4 = ('a'..'z'); #LEVEL 4 +
foreach (#larr4)
{
my $layer4 = $_;
my $pass = ("$layer1$layer2$layer3$layer4");
print ($pass); #LEVEL 4 -
}
}
}
}
}
}
### DICTIONARY ATTACK ###
sub dict()
{
print "Dictionary Attack...\n"; #Prompt User
print "Provide wordlist: ";
my $uInput = "";
chomp($uInput = <>); #User provides wordlist
(open IN, $uInput) #Bring in wordlist
or die "Cannot open $uInput, $!"; #If we cannot open file, alert
my #dict = <IN>; #Throw the wordlist into an array
foreach (#dict)
{
print $_; #Debug, shows what word we are on
#next; #Debug
my $pass = AddSlashes($_); #To store the $_ value for later use
#Check pass call
my $status = system("unzip -qq -o -P $pass secret_file_dict.zip > /dev/null 2>&1"); #Return unzip system call set to var
#Catch the correct password
if ($status == 0)
{
print ("Return of unzip is ", $status, " and pass is ", $pass, "\n"); #Print out value of return as well as pass
last;
}
}
}
### MAIN ###
dict();
exit (0);
Here is my error
See "unzip -hh" or unzip.txt for more help. Examples:
unzip data1 -x joe => extract all files except joe from zipfile data1.zip
unzip -p foo | more => send contents of foo.zip via pipe into program more
unzip -fo foo ReadMe => quietly replace existing ReadMe if archive file newer
aerify
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.
Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]
Default action is to extract files in list, except those in xlist, to exdir;
file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for usage).
-p extract files to pipe, no messages -l list files (short format)
-f freshen existing files, create none -t test compressed archive data
-u update files, create if necessary -z display archive comment only
-v list verbosely/show version info -T timestamp archive to latest
-x exclude files that follow (in xlist) -d extract files into exdir
modifiers:
-n never overwrite existing files -q quiet mode (-qq => quieter)
-o overwrite files WITHOUT prompting -a auto-convert any text files
-j junk paths (do not make directories) -aa treat ALL files as text
-U use escapes for all non-ASCII Unicode -UU ignore any Unicode fields
-C match filenames case-insensitively -L make (some) names lowercase
-X restore UID/GID info -V retain VMS version numbers
-K keep setuid/setgid/tacky permissions -M pipe through "more" pager
-O CHARSET specify a character encoding for DOS, Windows and OS/2 archives
-I CHARSET specify a character encoding for UNIX and other archives
See "unzip -hh" or unzip.txt for more help. Examples:
unzip data1 -x joe => extract all files except joe from zipfile data1.zip
unzip -p foo | more => send contents of foo.zip via pipe into program more
unzip -fo foo ReadMe => quietly replace existing ReadMe if archive file newer
aerifying
It is obviously not complete. In the main I will switch the brute(); for dict(); as needed to test. Once I get the system call working I will throw that into the brute section.
If you need me to elaborate more on my issue, please let me know. I am focused here on learning, so please add idiot proof comments to any thing you respond to me with.
First: DO NOT USE PERL'S PROTOTYPES. They don't do what you or your professor might wish they do.
Second: Don't write homebrew escaping routines such as AddSlashes. Perl has quotemeta. Use it.
Your problem is not with the specific programming language. How much time your professor has spent on your problem, how many classes you take are irrelevant to the problem. Focus on the actual problem, not all the extraneous "stuff".
Such as, what is the point of sub brute? You are not calling it in this script, it is not relevant to your problem, so don't post it. Narrow down your problem to the smallest relevant piece.
Don't prompt for the wordlist file in the body of dict. Separate the functionality into bite sized chunks so in each context you can focus on the problem at hand. Your dict_attack subroutine should expect to receive either a filehandle or a reference to an array of words. To keep memory footprint low, we'll assume it's a filehandle (so you don't have to keep the entire wordlist in memory).
So, your main looks like:
sub main {
# obtain name of wordlist file
# open wordlist file
# if success, call dict_attack with filehandle
# dict_attack returns password on success
}
Now, you can focus on dict_attack.
#!/usr/bin/perl
use strict;
use warnings;
main();
sub dict_attack {
my $dict_fh = shift;
while (my $word = <$dict_fh>) {
$word =~ s/\A\s+//;
$word =~ s/\s+\z//;
print "Trying $word\n";
my $pass = quotemeta( $word );
my $cmd = "unzip -qq -o -P $pass test.zip";
my $status = system $cmd;
if ($status == 0) {
return $word;
}
}
return;
}
sub main {
my $words = join("\n", qw(one two three four five));
open my $fh, '<', \$words or die $!;
if (my $pass = dict_attack($fh)) {
print "Password is '$pass'\n";
}
else {
print "Not found\n";
}
return;
}
Output:
C:\...> perl y.pl
Trying one
Trying two
Trying three
Trying four
Trying five
Password is 'five'

ksh perl script.. if condition

Friends...
I have got bash script which calls perl script and emails logfile result everytime.
I want to change my bash script such that it should only email if there is value in perl subroutine row counter (rcounter++) and not all time.
any tips on how to change .ksh file?
.ksh
#!/bin/ksh
d=`date +%Y%m%d`
log_dir=$HOME
output_file=log.list
if ! list_tables -login /#testdb -outputFile $output_file
then
mailx -s "list report : $d" test#mail < $output_file
fi
=======Below if condition also works for me=============================
list_tables -login /#testdb -outputFile $output_file
if ["$?" -ne "0"];
then
mailx -s "list report : $d" test#mail < $output_file
fi
========================================================================
Perl Script: list_tables
use strict;
use Getopt::Long;
use DBI;
use DBD::Oracle qw(:ora_types);
my $exitStatus = 0;
my %options = ()
my $oracleLogin;
my $outputFile;
my $runDate;
my $logFile;
my $rcounter;
($oracleLogin, $outputFile) = &validateCommandLine();
my $db = &attemptconnect($oracleLogin);
&reportListTables($outputFile);
$db->$disconnect;
exit($rcounter);
#---------------------------
sub reportListTables {
my $outputFile = shift;
if ( ! open (OUT,">" . $outputfile)) {
&logMessage("Error opening $outputFile");
}
print OUT &putTitle;
my $oldDB="DEFAULT";
my $dbcounter = 0;
my $i;
print OUT &putHeader();
#iterate over results
for (my $i=0; $i<=$lstSessions; $i++) {
# print result row
print OUT &putRow($i);
$dbCounter++;
}
print OUT &putFooter($dbCounter);
print OUT " *** Report End \n";
closeOUT;
}
#------------------------------
sub putTitle {
my $title = qq{
List Tables: Yesterday
--------------
};
#------------------------------
sub putHeader {
my $header = qq{
TESTDB
==============
OWNER Table Created
};
#------------------------------
sub putRow {
my $indx = shift;
my $ln = sprintf "%-19s %-30s %-19s",
$lstSessions[$indx]{owner},
$lstSessions[$indx]{object_name},
$lstSessions[$indx]{created};
return "$ln\n";
}
#------------------------------
sub getListTables {
my $runDt = shift;
$rcounter = 0;
my $SQL = qq{
selct owner, object_name, to_char(created,'MM-DD-YYYY') from dba_objects
};
my $sth = $db->prepare ($SQL) or die $db->errstr;
$sth->execute() or die $db->errstr;;
while (my #row = $sth->fethcrow_array) {
$lstSessions[$rcounter] {owner} =$row[0];
$lstSessions[$rcounter] {object_name} =$row[1];
$lstSessions[$rcounter] {created} =$row[2];
&logMessage(" Owner: $lstSessions[$rcounter]{owner}");
&logMessage(" Table: $lstSessions[$rcounter]{object_name}");
&logMessage(" created: $lstSessions[$rcounter]{created}");
$rcounter++;
}
&logMessage("$rcounter records found...");
}
thanks..
also happy to include mail-x part in perl if that makes life more easy..
I am not sure I understood your question correctly. Also, your code is incomplete. So there's some guessing involved.
You cannot check the value of a local Perl variable from the caller's side.
But if your question is if the Perl code added anything to the logfile, the solution is simple: Delete the "rcounter records found..." line (which doesn't make sense anyway since it is always executed, whether the query returned results or not). Then, let the shell script backup the logfile before the call to Perl, and make a diff afterwards, sending the mail only if diff tells you there has been output added to the logfile.
If this doesn't help you, please clarify the question.
EDIT (from comments below):
Shell scripting isn't that difficult. Right now, your Perl script ends with:
$db->($exitStatus);
That is your exit code. You don't check that in your shell script anyway, so you could change it to something more useful, like the number of data rows written. A primitive solution would be to make $rcounter global (instead of local to getListTables()), by declaring it at the top of the Perl script (e.g. after my $logFile;). Then you could replace the "exitStatus" line above with simply:
$rcounter;
Voila, your Perl script now returns the number of data rows written.
In Perl, a return code of 0 is considered a failure, any other value is a success. In shell, it's the other way around - but luckily you don't have to worry about that as Perl knows that and "inverts" (negates) the return code of a script when returning to the calling shell.
So all you need is making the mailing depend on a non-zero return of Perl:
if list_tables -login /#testdb -outputFile $output_file
then
mailx -s "list report : $d" test#mail < $output_file
fi
A comment on the side: It looks to me as if your programming skill isn't up to par with the scope of the problem you are trying to solve. If returning a value from Perl to bash gives you that much trouble, you should probably spend your time with tutorials, not with getting input from a database and sending emails around. Learn to walk before you try to fly...

Perl script to find all unowned files and directories on Unix - How can I optimize further? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Following my findings and suggestions in my other post How to exclude a list of full directory paths in find command on Solaris, I have decided to write a Perl version of this script and see how I could optimize it to run faster than a native find command. So far, the results are impressive!
The purpose of this script is to report all unowned files and directories on a Unix system for audit compliance. The script has to accept a list of directories and files to exclude (either by full path or wildcard name), and must take as little processing power as possible. It is meant to be run on hundreds of Unix system that we (the company I work for) support, and has be able to run on all those Unix systems (multiple OS, multiple platforms: AIX, HP-UX, Solaris and Linux) without us having to install or upgrade anything first. In other words, it has to run with standard libraries and binaries we can expect on all systems.
I have not yet made the script argument-aware, so all arguments are hard-coded in the script. I plan on having the following arguments in the end and will probably use getopts to do it:
-d = comma delimited list of directories to exclude by path name
-w = comma delimited list of directories to exclude by basename or wildcard
-f = comma delimited list of files to exclude by path name
-i = comma delimited list of files to exclude by basename or wildcard
-t:list|count = Defines the type of output I want to see (list of all findinds, or summary with count per directory)
Here is the source I have done so far:
#! /usr/bin/perl
use strict;
use File::Find;
# Full paths of directories to prune
my #exclude_dirs = ('/dev','/proc','/home');
# Basenames or wildcard names of directories I want to prune
my $exclude_dirs_wildcard = '.svn';
# Full paths of files I want to ignore
my #exclude_files = ('/tmp/test/dir3/.svn/svn_file1.txt','/tmp/test/dir3/.svn/svn_file2.txt');
# Basenames of wildcard names of files I want to ignore
my $exclude_files_wildcard = '*.tmp';
my %dir_globs = ();
my %file_globs = ();
# Results will be sroted in this hash
my %found = ();
# Used for storing uid's and gid's present on system
my %uids = ();
my %gids = ();
# Callback function for find
sub wanted {
my $dir = $File::Find::dir;
my $name = $File::Find::name;
my $basename = $_;
# Ignore symbolic links
return if -l $name;
# Search for wildcards if dir was never searched before
if (!exists($dir_globs{$dir})) {
#{$dir_globs{$dir}} = glob($exclude_dirs_wildcard);
}
if (!exists($file_globs{$dir})) {
#{$file_globs{$dir}} = glob($exclude_files_wildcard);
}
# Prune directory if present in exclude list
if (-d $name && in_array(\#exclude_dirs, $name)) {
$File::Find::prune = 1;
return;
}
# Prune directory if present in dir_globs
if (-d $name && in_array(\#{$dir_globs{$dir}},$basename)) {
$File::Find::prune = 1;
return;
}
# Ignore excluded files
return if (-f $name && in_array(\#exclude_files, $name));
return if (-f $name && in_array(\#{$file_globs{$dir}},$basename));
# Check ownership and add to the hash if unowned (uid or gid does not exist on system)
my ($dev,$ino,$mode,$nlink,$uid,$gid) = stat($name);
if (!exists $uids{$uid} || !exists($gids{$gid})) {
push(#{$found{$dir}}, $basename);
} else {
return
}
}
# Standard in_array perl implementation
sub in_array {
my ($arr, $search_for) = #_;
my %items = map {$_ => 1} #$arr;
return (exists($items{$search_for}))?1:0;
}
# Get all uid's that exists on system and store in %uids
sub get_uids {
while (my ($name, $pw, $uid) = getpwent) {
$uids{$uid} = 1;
}
}
# Get all gid's that exists on system and store in %gids
sub get_gids {
while (my ($name, $pw, $gid) = getgrent) {
$gids{$gid} = 1;
}
}
# Print a list of unowned files in the format PARENT_DIR,BASENAME
sub print_list {
foreach my $dir (sort keys %found) {
foreach my $child (sort #{$found{$dir}}) {
print "$dir,$child\n";
}
}
}
# Prints a list of directories with the count of unowned childs in the format DIR,COUNT
sub print_count {
foreach my $dir (sort keys %found) {
print "$dir,".scalar(#{$found{$dir}})."\n";
}
}
# Call it all
&get_uids();
&get_gids();
find(\&wanted, '/');
print "List:\n";
&print_list();
print "\nCount:\n";
&print_count();
exit(0);
If you want to test it on your system, simply create a test directory structure with generic files, chown the whole tree with a test user you create for this purpose, and then delete the user.
I'll take any hints, tips or recommendations you could give me.
Happy reading!
Try starting with these, then see if there's anything more you can do.
Use hashes instead of the arrays that need to be searched using in_array(). This is so you can do a direct hash lookup in one step instead of converting the entire array to a hash for every iteration.
You don't need to check for symlinks because they will be skipped since you have not set the follow option.
Maximise your use of _; avoid repeating IO operations. _ is a special filehandle where the file status information is cached whenever you call stat() or any file test. This means you can call stat _ or -f _ instead of stat $name or -f $name. (Calling -f _ is more than 1000x faster than -f $name on my machine because it uses the cache instead of doing another IO operation.)
Use the Benchmark module to test out different optimisation strategies to see if you actually gain anything. E.g.
use Benchmark;
stat 'myfile.txt';
timethese(100_000, {
a => sub {-f _},
b => sub {-f 'myfile.txt'},
});
A general principle of performance tuning is find out exactly where the slow parts are before you try to tune it (because the slow parts might not be where you expect them to be). My recommendation is to use Devel::NYTProf, which can generate an html profile report for you. From the synopsis, on how to use it (from the command line):
# profile code and write database to ./nytprof.out
perl -d:NYTProf some_perl.pl
# convert database into a set of html files, e.g., ./nytprof/index.html
# and open a web browser on the nytprof/index.html file
nytprofhtml --open