Perl File Globbing Oddities - perl

I'm writing a script that will loop through a range of numbers, build a glob pattern, and test if a file exists in a directory based on the glob.
The images are Nascar car number images, and follow the the following pattern:
1_EARNHARDTGANASSI_256.TGA
2_PENSKERACING_256.TGA
Here is a snippet of the script that I am using:
foreach $currCarNum (0..101) {
if (glob("//headshot01/CARS/${currCarNum}_*_256.TGA")) {
print("Car image $currCarNum exists\n");
} else {
print("Car image $currCarNum doesn't exist\n");
}
}
The problem I'm having, is that images that exist in the directory, and that should match the file glob pattern do not.
For example, the file with the following name returns as not existing:
2_PENSKERACING_256.TGA
Whereas, the following returns as existing:
1_EARNHARDTGANASSI_256.TGA
If I use the same file glob pattern in DOS or Cygwin, both files are listed properly.
Are file glob patterns interpreted differently in Perl? Is there something I am missing?

You need to have the results returned in a list format instead of a scalar format. Try this for your if statement, it worked for me when I tested it.
if (my #arr = glob("//headshot01/CARS/${currCarNum}_*_256.TGA")) {

From perldoc perlop:
A (file)glob evaluates its (embedded)
argument only when it is starting a
new list. All values must be read
before it will start over. In list
context, this isn't important because
you automatically get them all anyway.
However, in scalar context the
operator returns the next value each
time it's called, or undef when the
list has run out.

Related

My Perl variable to variable substitutions do not work

I have a substitution to make in a Perl script, which I do not seem to get working. I have a string in a text file which has the form:
T+30H
The string T+30H has to be written in many files and has to change from file to file. It is two digits and sometimes three digits. First I define the variable:
my $wrffcr=qr{T+\d+H};
After reading the file containing the string, I have the following substitution command (starting with the file capture)
#scrptlines=<$NCLSCRPT>;
foreach $scrptlines (#scrptlines) {
$scrptlines =~ s/$wrffcr/T+$fcrange2[$jj]H/g;
}
$fcrange2[$jj] is defined and I confirm its value by printing its value just before the above 4 lines of code.
print "$fcrange2[$jj]\n";
When I run my script, nothing changes for this particular substitution. I suspect it is to do with the way I define the string to be substituted.
I will appreciate any assistance.
Zilore Mumba
Watch out for the first + in my $wrffcr=qr{T+\d+H};. It'll make it match 1 or more Ts, not T followed by a +. You probably want
my $wrffcr=qr{T\+\d+H};

What does $variable{$2}++ mean in Perl?

I have a two-column data set in a tab-separated .txt file, and the perl script reads it as FH and this is the immediate snippet of code that follows:
while(<FH>)
{
chomp;
s/\r//;
/(.+)\t(.+)/;
$uniq_tar{$2}++;
$uniq_mir{$1}++;
push#{$mir_arr{$1}},$2;
push #{$target{$2}} ,$1;
}
When I try to print any of the above 4 variables, it says the variables are uninitialized.
And, when I tried to print $uniq_tar{$2}++; and $uniq_mir{$1}++;
It just prints some numbers which I cannot understand.
I would just like to know what this part of code evaluate in general?
$uniq_tar{$2}++;
The while loop puts each line of your file, in turn, into Perl's special variable $_.
/.../ is the match operator. By default it works on $_.
/(.*)\t(.*)/ is a regular expression inside the match operator. If the regex matches what is in $_, then the bits of the matching string that are inside the two pairs of parentheses are stored in Perl's special variables $1 and $2.
You have hashes called %uniq_tar and %uniq_mir. You access individual elements in a hash using the $hashname{key}. So, $uniq_tar{$1} is finding the value in %uniq_tar associated with the key that is stored in $1 (that is - the part of your record before the first tab).
$variable++ increments the number in $variable. So $uniq_tar{$1}++ increments the value that we found in the previous paragraph.
So, as zdim says, it's a frequency counter. You read each line in the file, and extract the bits of data before and after the first tab in the line. You then increment the values in two hashes to count the number of occurences of each of the strings.

How to perl convert xml (name with pattern) to json?

The next convert test.xml to json:
perl -MJSON::Any -MXML::Simple -le'print JSON::Any->new()->objToJson(XMLin("/tmp/test.xml "))'
but I need convert any xml (example test-1.xml test-2.xml test-3.xml test-4.xml etc) with pattern name /tmp/test-*.xml, but if I use:
perl -MJSON::Any -MXML::Simple -le'print JSON::Any->new()->objToJson(XMLin("/tmp/test-*.xml "))'
I have the next messages:
File does not exist: /tmp/test-*.xml at -e line 1
How I do it?
There's problems with what you're trying to do:
XML::Simple isn't simple. It's for simple XML. It'll mangle your XML and give inconsistent results. See: Why is XML::Simple "Discouraged"?
XML is fundamentally more complicated than JSON, so there's no linear transformation. You need to figure out what'd you'd do with attributes and duplicate elements for a start.
File does not exist: /tmp/test-*.xml at -e line 1 - means the file doesn't exist. So you're not going to get very far. But XMLin doesn't accept wildcards. You'll have to process one file at a time.
The first two points are solvable, provided you accept that this cannot be a generic solution - to give a moderately general solution, we'll need an example of your source XML. But it won't be a one liner.
You seem to be asking how to find files matching a file glob.
You could use
my #qfns = glob("/tmp/test-*.xml");
If you just want the first matching file, use
my ($qfn) = glob("/tmp/test-*.xml");
Do not use the following since glob acts an iterator in scalar context.
my $qfn = glob("/tmp/test-*.xml"); # XXX
You can try this using glob and map functions.
perl -MJSON::Any -MXML::Simple -le'local $,="\n"; print map { JSON::Any->new()->objToJson(XMLin($_)) } glob "/path/to/my/test*.xml"'

perl quoting in ftp->ls with wildcard

contents of remote directory mydir :
blah.myname.1.txt
blah.myname.somethingelse.txt
blah.myname.randomcharacters.txt
blah.notmyname.1.txt
blah.notmyname.2.txt
...
in perl, I want to download all of this stuff with myname
I am failing really hard with the appropriate quoting. please help.
failed code
my #files;
#files = $ftp->ls( '*.myname.*.txt' ); # finds nothing
#files = $ftp->ls( '.*.myname.*.txt' ); # finds nothing
etc..
How do I put the wildcards so that they are interpreted by the ls, but not by perl? What is going wrong here?
I will assume that you are using the Net::FTP package. Then this part of the docs is interesting:
ls ( [ DIR ] )
Get a directory listing of DIR, or the current directory.
In an array context, returns a list of lines returned from the server. In a scalar context, returns a reference to a list.
This means that if you call this method with no arguments, you get a list of all files from the current directory, else from the directory specified.
There is no word about any patterns, which is not suprising: FTP is just a protocol to transfer files, and this module only a wrapper around that protocoll.
You can do the filtering easily with grep:
my #interesting = grep /pattern/, $ftp->ls();
To select all files that contain the character sequence myname, use grep /myname/, LIST.
To select all files that contain the character sequence .myname., use grep /\.myname\./, LIST.
To select all files that end with the character sequence .txt, use grep /\.txt$/, LIST.
The LIST is either the $ftp->ls or another grep, so you can easily chain multiple filtering steps.
Of course, Perl Regexes are more powerful than that, and we could do all the filtering in a single /\.myname\.[^.]+\.txt$/ or something, depending on your exact requirements. If you are desperate for a globbing syntax, there are tools available to convert glob patterns to regex objects, like Text::Glob, or even to do direct glob matching:
use Text::Glob qw(match_glob);
my #interesting = match_glob ".*.myname.*.txt", $ftp->ls;
However, that is inelegant, to say the least, as regexes are far more powerful and absolutely worth learning.

How does this Perl one liner to check if a directory is empty work?

I got this strange line of code today, it tells me 'empty' or 'not empty' depending on whether the CWD has any items (other than . and ..) in it.
I want to know how it works because it makes no sense to me.
perl -le 'print+(q=not =)[2==(()=<.* *>)].empty'
The bit I am interested in is <.* *>. I don't understand how it gets the names of all the files in the directory.
It's a golfed one-liner. The -e flag means to execute the rest of the command line as the program. The -l enables automatic line-end processing.
The <.* *> portion is a glob containing two patterns to expand: .* and *.
This portion
(q=not =)
is a list containing a single value -- the string "not". The q=...= is an alternate string delimiter, apparently used because the single-quote is being used to quote the one-liner.
The [...] portion is the subscript into that list. The value of the subscript will be either 0 (the value "not ") or 1 (nothing, which prints as the empty string) depending on the result of this comparison:
2 == (()=<.* *>)
There's a lot happening here. The comparison tests whether or not the glob returned a list of exactly two items (assumed to be . and ..) but how it does that is tricky. The inner parentheses denote an empty list. Assigning to this list puts the glob in list context so that it returns all the files in the directory. (In scalar context it would behave like an iterator and return only one at a time.) The assignment itself is evaluated in scalar context (being on the right hand side of the comparison) and therefore returns the number of elements assigned.
The leading + is to prevent Perl from parsing the list as arguments to print. The trailing .empty concatenates the string "empty" to whatever came out of the list (i.e. either "not " or the empty string).
<.* *>
is a glob consisting of two patterns: .* are all file names that start with . and * corresponds to all files (this is different than the usual DOS/Windows conventions).
(()=<.* *>)
evaluates the glob in list context, returning all the file names that match.
Then, the comparison with 2 puts it into scalar context so 2 is compared to the number of files returned. If that number is 2, then the only directory entries are . and .., period. ;-)
<.* *> means (glob(".*"), glob("*")). glob expands file patterns the same way the shell does.
I find that the B::Deparse module helps quite a bit in deciphering some stuff that throws off most programmers' eyes, such as the q=...= construct:
$ perl -MO=Deparse,-p,-q,-sC 2>/dev/null << EOF
> print+(q=not =)[2==(()=<.* *>)].empty
> EOF
use File::Glob ();
print((('not ')[(2 == (() = glob('.* *')))] . 'empty'));
Of course, this doesn't instantly produce "readable" code, but it surely converts some of the stumbling blocks.
The documentation for that feature is here. (Scroll near the end of the section)