perl quoting in ftp->ls with wildcard - perl

contents of remote directory mydir :
blah.myname.1.txt
blah.myname.somethingelse.txt
blah.myname.randomcharacters.txt
blah.notmyname.1.txt
blah.notmyname.2.txt
...
in perl, I want to download all of this stuff with myname
I am failing really hard with the appropriate quoting. please help.
failed code
my #files;
#files = $ftp->ls( '*.myname.*.txt' ); # finds nothing
#files = $ftp->ls( '.*.myname.*.txt' ); # finds nothing
etc..
How do I put the wildcards so that they are interpreted by the ls, but not by perl? What is going wrong here?

I will assume that you are using the Net::FTP package. Then this part of the docs is interesting:
ls ( [ DIR ] )
Get a directory listing of DIR, or the current directory.
In an array context, returns a list of lines returned from the server. In a scalar context, returns a reference to a list.
This means that if you call this method with no arguments, you get a list of all files from the current directory, else from the directory specified.
There is no word about any patterns, which is not suprising: FTP is just a protocol to transfer files, and this module only a wrapper around that protocoll.
You can do the filtering easily with grep:
my #interesting = grep /pattern/, $ftp->ls();
To select all files that contain the character sequence myname, use grep /myname/, LIST.
To select all files that contain the character sequence .myname., use grep /\.myname\./, LIST.
To select all files that end with the character sequence .txt, use grep /\.txt$/, LIST.
The LIST is either the $ftp->ls or another grep, so you can easily chain multiple filtering steps.
Of course, Perl Regexes are more powerful than that, and we could do all the filtering in a single /\.myname\.[^.]+\.txt$/ or something, depending on your exact requirements. If you are desperate for a globbing syntax, there are tools available to convert glob patterns to regex objects, like Text::Glob, or even to do direct glob matching:
use Text::Glob qw(match_glob);
my #interesting = match_glob ".*.myname.*.txt", $ftp->ls;
However, that is inelegant, to say the least, as regexes are far more powerful and absolutely worth learning.

Related

Select files using include and exclude array of recursive glob patterns

I've been given two file glob parameters in JSON, include and exclude, in the following format:
{
include: ['**/*.md', '**/swagger/*.json', '**/*.yml', 'somedir/*.yml'],
exclude: ['**/obj/**', 'otherdir/**', '**/includes/**']
}
I'm tasked with walking a directory tree to select files according to the include and exclude rules in these formats; this has to be written as a Powershell script.
I've been trying to find a built-in command that supports the double-asterisk, recursive file glob pattern; additionally, since Powershell is converting the JSON to an object, it would be nice if the command parameters could accept an array as input.
I've looked at Get-ChildItem, but I'm not sure that I can mimic the glob resolution behavior using -include, -exclude, and/or -filter. I've also looked at Resolve-Path, but I'm not sure if the wildcards will work correctly (and I might have to manually exclude paths).
How can I select paths using multiple recursive wildcard file globs in Powershell while excluding other globs? Is there a Powershell command that supports this?
Thank you!
EDIT:
In these glob patterns, the single asterisk is a regular wildcard. The double asterisk (**), however, is a known standard which denotes a recursive directory search.
For example: the pattern dir1/*/file.txt would match:
dir1/dir2/file.txt
dir1/dir3/file.txt
...but not:
dir1/dir2/dir3/file.txt
The pattern dir1/**/file.txt would match everything that the above selector would, but it would also match:
dir1/dir3/dir4/file.txt
dir1/dir7/dir9/dir23/dir47/file.txt
and so on. So, an exclude glob pattern like **/obj/** basically means "exclude anything found in any obj folder found at any point in the directory hierarchy, no matter how deep".

How to exclude some files using file::find::rule module using perl?

I had tried to remove the files which ever named along with digits but it is not happening in my code.Here $output is my directory location.In which the directory contains multiple folders and sub folders.
From that folders and sub folders i want to pick my .ml files .In which the only the aplhabets named .ml files to be listed.
If the file names comes like(ev4.html,ev8.html and so on it should be omitted).
Because here the file names comes along with the digits so i want to exclude the files which ever named along with digits and print the excepted output.
Here is my code:
use strict;
use warnings;
use File::Find::Rule;
my $output="/home/location/radio/datas";
my #files=File::Find::Rule->file()
->name('*.ml')
#->name(qr/\^ev\d+/)->prune->discard
->in($output);
for my $file(#files)
{
print "file:$file\n";
}
Obtained output:
file:/dacr/dacr.ml
file:/DV/DV.ml
file:DV/ev4/ev4.ml
Expected Output:
file:/dacr/dacr.ml
file:/DV/DV.ml
Your attempt was almost correct, but your regular expression is wrong, and the prune and discard will remove all files, not only the ones for the regex.
my #files=File::Find::Rule->file()
->name('*.ml')
->name(qr/\^ev\d+/) # wrong regex
->prune->discard # throws away all files
->in($output);
The correct regular expression to get files that contain any digit is simply \d. You are saying a literal ^, the letters ev and any number of digits, at least one.
To make File::Find::Rule take all files that end in .ml and then not the ones that have a digit, use not.
my #files=File::Find::Rule->file()
->name('*.ml')
->not( File::Find::Rule->name(qr/\d/) )
->in($output);
This will get all .ml files, and discard any file that has any digit in the name.

How to perl convert xml (name with pattern) to json?

The next convert test.xml to json:
perl -MJSON::Any -MXML::Simple -le'print JSON::Any->new()->objToJson(XMLin("/tmp/test.xml "))'
but I need convert any xml (example test-1.xml test-2.xml test-3.xml test-4.xml etc) with pattern name /tmp/test-*.xml, but if I use:
perl -MJSON::Any -MXML::Simple -le'print JSON::Any->new()->objToJson(XMLin("/tmp/test-*.xml "))'
I have the next messages:
File does not exist: /tmp/test-*.xml at -e line 1
How I do it?
There's problems with what you're trying to do:
XML::Simple isn't simple. It's for simple XML. It'll mangle your XML and give inconsistent results. See: Why is XML::Simple "Discouraged"?
XML is fundamentally more complicated than JSON, so there's no linear transformation. You need to figure out what'd you'd do with attributes and duplicate elements for a start.
File does not exist: /tmp/test-*.xml at -e line 1 - means the file doesn't exist. So you're not going to get very far. But XMLin doesn't accept wildcards. You'll have to process one file at a time.
The first two points are solvable, provided you accept that this cannot be a generic solution - to give a moderately general solution, we'll need an example of your source XML. But it won't be a one liner.
You seem to be asking how to find files matching a file glob.
You could use
my #qfns = glob("/tmp/test-*.xml");
If you just want the first matching file, use
my ($qfn) = glob("/tmp/test-*.xml");
Do not use the following since glob acts an iterator in scalar context.
my $qfn = glob("/tmp/test-*.xml"); # XXX
You can try this using glob and map functions.
perl -MJSON::Any -MXML::Simple -le'local $,="\n"; print map { JSON::Any->new()->objToJson(XMLin($_)) } glob "/path/to/my/test*.xml"'

Perl File Globbing Oddities

I'm writing a script that will loop through a range of numbers, build a glob pattern, and test if a file exists in a directory based on the glob.
The images are Nascar car number images, and follow the the following pattern:
1_EARNHARDTGANASSI_256.TGA
2_PENSKERACING_256.TGA
Here is a snippet of the script that I am using:
foreach $currCarNum (0..101) {
if (glob("//headshot01/CARS/${currCarNum}_*_256.TGA")) {
print("Car image $currCarNum exists\n");
} else {
print("Car image $currCarNum doesn't exist\n");
}
}
The problem I'm having, is that images that exist in the directory, and that should match the file glob pattern do not.
For example, the file with the following name returns as not existing:
2_PENSKERACING_256.TGA
Whereas, the following returns as existing:
1_EARNHARDTGANASSI_256.TGA
If I use the same file glob pattern in DOS or Cygwin, both files are listed properly.
Are file glob patterns interpreted differently in Perl? Is there something I am missing?
You need to have the results returned in a list format instead of a scalar format. Try this for your if statement, it worked for me when I tested it.
if (my #arr = glob("//headshot01/CARS/${currCarNum}_*_256.TGA")) {
From perldoc perlop:
A (file)glob evaluates its (embedded)
argument only when it is starting a
new list. All values must be read
before it will start over. In list
context, this isn't important because
you automatically get them all anyway.
However, in scalar context the
operator returns the next value each
time it's called, or undef when the
list has run out.

How does this Perl one liner to check if a directory is empty work?

I got this strange line of code today, it tells me 'empty' or 'not empty' depending on whether the CWD has any items (other than . and ..) in it.
I want to know how it works because it makes no sense to me.
perl -le 'print+(q=not =)[2==(()=<.* *>)].empty'
The bit I am interested in is <.* *>. I don't understand how it gets the names of all the files in the directory.
It's a golfed one-liner. The -e flag means to execute the rest of the command line as the program. The -l enables automatic line-end processing.
The <.* *> portion is a glob containing two patterns to expand: .* and *.
This portion
(q=not =)
is a list containing a single value -- the string "not". The q=...= is an alternate string delimiter, apparently used because the single-quote is being used to quote the one-liner.
The [...] portion is the subscript into that list. The value of the subscript will be either 0 (the value "not ") or 1 (nothing, which prints as the empty string) depending on the result of this comparison:
2 == (()=<.* *>)
There's a lot happening here. The comparison tests whether or not the glob returned a list of exactly two items (assumed to be . and ..) but how it does that is tricky. The inner parentheses denote an empty list. Assigning to this list puts the glob in list context so that it returns all the files in the directory. (In scalar context it would behave like an iterator and return only one at a time.) The assignment itself is evaluated in scalar context (being on the right hand side of the comparison) and therefore returns the number of elements assigned.
The leading + is to prevent Perl from parsing the list as arguments to print. The trailing .empty concatenates the string "empty" to whatever came out of the list (i.e. either "not " or the empty string).
<.* *>
is a glob consisting of two patterns: .* are all file names that start with . and * corresponds to all files (this is different than the usual DOS/Windows conventions).
(()=<.* *>)
evaluates the glob in list context, returning all the file names that match.
Then, the comparison with 2 puts it into scalar context so 2 is compared to the number of files returned. If that number is 2, then the only directory entries are . and .., period. ;-)
<.* *> means (glob(".*"), glob("*")). glob expands file patterns the same way the shell does.
I find that the B::Deparse module helps quite a bit in deciphering some stuff that throws off most programmers' eyes, such as the q=...= construct:
$ perl -MO=Deparse,-p,-q,-sC 2>/dev/null << EOF
> print+(q=not =)[2==(()=<.* *>)].empty
> EOF
use File::Glob ();
print((('not ')[(2 == (() = glob('.* *')))] . 'empty'));
Of course, this doesn't instantly produce "readable" code, but it surely converts some of the stumbling blocks.
The documentation for that feature is here. (Scroll near the end of the section)