How to exclude some files using file::find::rule module using perl? - perl

I had tried to remove the files which ever named along with digits but it is not happening in my code.Here $output is my directory location.In which the directory contains multiple folders and sub folders.
From that folders and sub folders i want to pick my .ml files .In which the only the aplhabets named .ml files to be listed.
If the file names comes like(ev4.html,ev8.html and so on it should be omitted).
Because here the file names comes along with the digits so i want to exclude the files which ever named along with digits and print the excepted output.
Here is my code:
use strict;
use warnings;
use File::Find::Rule;
my $output="/home/location/radio/datas";
my #files=File::Find::Rule->file()
->name('*.ml')
#->name(qr/\^ev\d+/)->prune->discard
->in($output);
for my $file(#files)
{
print "file:$file\n";
}
Obtained output:
file:/dacr/dacr.ml
file:/DV/DV.ml
file:DV/ev4/ev4.ml
Expected Output:
file:/dacr/dacr.ml
file:/DV/DV.ml

Your attempt was almost correct, but your regular expression is wrong, and the prune and discard will remove all files, not only the ones for the regex.
my #files=File::Find::Rule->file()
->name('*.ml')
->name(qr/\^ev\d+/) # wrong regex
->prune->discard # throws away all files
->in($output);
The correct regular expression to get files that contain any digit is simply \d. You are saying a literal ^, the letters ev and any number of digits, at least one.
To make File::Find::Rule take all files that end in .ml and then not the ones that have a digit, use not.
my #files=File::Find::Rule->file()
->name('*.ml')
->not( File::Find::Rule->name(qr/\d/) )
->in($output);
This will get all .ml files, and discard any file that has any digit in the name.

Related

My Perl variable to variable substitutions do not work

I have a substitution to make in a Perl script, which I do not seem to get working. I have a string in a text file which has the form:
T+30H
The string T+30H has to be written in many files and has to change from file to file. It is two digits and sometimes three digits. First I define the variable:
my $wrffcr=qr{T+\d+H};
After reading the file containing the string, I have the following substitution command (starting with the file capture)
#scrptlines=<$NCLSCRPT>;
foreach $scrptlines (#scrptlines) {
$scrptlines =~ s/$wrffcr/T+$fcrange2[$jj]H/g;
}
$fcrange2[$jj] is defined and I confirm its value by printing its value just before the above 4 lines of code.
print "$fcrange2[$jj]\n";
When I run my script, nothing changes for this particular substitution. I suspect it is to do with the way I define the string to be substituted.
I will appreciate any assistance.
Zilore Mumba
Watch out for the first + in my $wrffcr=qr{T+\d+H};. It'll make it match 1 or more Ts, not T followed by a +. You probably want
my $wrffcr=qr{T\+\d+H};

Perl: Run script on multiple files in multiple directories

I have a perl script that reads a .txt and a .bam file, and creates an output called output.txt.
I have a lot of files that are all in different folders, but are only slightly different in the filename and directory path.
All of my txt files are in different subfolders called PointMutation, with the full path being
/Volumes/Lab/Data/Darwin/Patient/[Plate 1/P1H10]/PointMutation
The text(s) in the bracket is the part that changes, But the Patient subfolder contains all of my txt files.
My .bam file is located in a subfolder named DNA with a full path of
/Volumes/Lab/Data/Darwin/Patient/[Plate 1/P1H10]/SequencingData/DNA
Currently how I run this script is go on the terminal
cd /Volumes/Lab/Data/Darwin/Patient/[Plate 1/P1H10]/PointMutation
perl ~/Desktop/Scripts/Perl.pl "/Volumes/Lab/Data/Darwin/Patient/[Plate
1/P1H10]/PointMutation/txtfile.txt" "/Volumes/Lab/Data/Darwin/Patient/[Plate
1/P1H10]/SequencingData/DNA/bamfile.bam"
With only 1 or two files, that is fairly easy, but I would like to automate it once the files get much larger. Also once I run these once, I don't want to do it again, but I will get more information from the same patient, is there a way to block a folder from being read?
I would do something like:
for my $dir (glob "/Volumes/Lab/Data/Darwin/Patient/*/"){
# skip if not a directory
if (! -d $dir) {
next;
}
my $txt = "$dir/PointMutation/txtfile.txt";
my $bam = "$dir/SequencingData/DNA/bamfile.bam";
# ... you magical stuff here
}
This is assuming that all directories under /Volumes/Lab/Data/Darwin/Patient/ follow the convention.
That said, more long term/robust way of organizing analyses with lots of different files all over the place is either 1) organize all files necessary for each analysis under one directory, or 2) to create meta files (i'd use JSON/yaml) which contain the necessary file names.

How to trim the file extensions from a filename

I have a foreach loop that gets a list of objects in Folder4 after trimming the full path where the objects reside.
Here is sample code:
$row.Path = $path.InnerText.Replace("/Folder1/Folder2/folder3/folder4/","")
Sample Output:
usp_StoredProcedurename.prc,
fn_FunctionName.udf
File.sql
The last thing I need to do is to remove any extension, ie .prc, .pdf, .udf, .sql, etc
Here is the coplete for each:
You are probably looking for the static GetFileNameWithoutExtension method. To use it, you have to pass a single file or path to it:
[System.Io.Path]::GetFileNameWithoutExtension("usp_StoredProcedurename.prc")
Depending on the actual output of $row.Path you could split the path and join them back later if you want.
Alternative, you could use a regex to remove the file extensions for alle files within your string at once:
$row.Path -replace '\..*'
Be aware that regex will remove everything after a dot.

What is the right regex to match a relative path to an image file?

I have this path ../../Capture.jpg. So far I've figured out this incomplete regex: '[../]+'. I want to check if user puts in the right path like ../../image file name. The file extensions can be jpg, png, ..
your [../]+ is not sufficient or correct for the job at hand, if you REALLY want to match a bunch of ../ at the start of a filename.
It's not completely clear what you want to do exactly, but the following will match one or more ../ at the start of a string:
/^((?:\.\.\/)+)/
basically:
^ to anchor to the start of the string being tested - will not match any ../ INSIDE the string
( and the balancing ) at the end: capture the contents within. All your ../../ will be available in a variable called $1
then I'm using (?: ) to wrap the next content. This groups the bit inside, but does NOT save the value inside a $1, $2, etc. More information soon...
The REAL pattern of interest is
\.\.\/
Since . and / are magic characters, they need 'escaping' with backslash. This tells Perl that the . and / do NOT have a special meaning at this point.
I've used the (?: ) wrapper to group them together, so that the + operates on all 3 characters of interest. The + operator means "one or more repetitions".
So, my pattern will match one or more repetitions of ../ which are anchored to the start of the string. Furthermore, the exact contents matched will be available in $1 if you are interested in doing something with that (eg count how many ../ you have)
Please ask if you have further questions, or I have misunderstood your goals.
EDIT: to suit your new requirements, and add a bit of bonus:
m!^\.\./\.\./(([^/]+)\.([^.]+))$!
Note first that I've used m!pattern! instead of /pattern/. Firstly, if Perl sees /pattern/ it assumes it's m/pattern/ but you can use an alternative character to wrap the patterns. This is useful if you actually want to use / in your pattern without having to go nuts with backslashes.
so:
^ exactly match only from the start
followed by exactly ../../
next I've used ( ) wrappers to capture the bits following. Explanation after...
ignoring the ( and ) now:
[^/]+ one or more repetitions (+) of any character that isn't /
. literally a dot - the one before the extension
[^./]+ one or more repetitions of any character that isn't . or /
Notice how the [^/]+ allows for any character including . but prevents another directory part from sneaking in. Thus, the filename could be foo.bar.jpg and it will be collected properly.
Notice how [^./]+ allows for any character in the extension except a dot - and also excluding / to prevent another directory segment from sneaking in.
Finally, $ is used to ensure we've reached the end of the pattern.
as for the captures:
$1 will contain all of foo.bar.jpg
$2 will contain foo.bar
$3 will contain jpg (not .jpg) but I'll leave it up to you to figure out what to change if you wish to capture the dot as well.
FINALLY - in a typical script, you might do something like:
if($filename =~ m!^\.\./\.\./(([^/]+)\.([^./]+))$!) {
print "You correctly entered ../../$1 giving basename=$2 and extension=$3 - Bravo!\n";
}
else {
print "you've failed to read the instructions properly\n";
}
As a bonus, I even tested that, and found 2 spolling mistaiks you'll never have to see
cheers.
# convert relative file paths to md links ...
# file paths and names with letters , nums - and _ s supported
$str =~ s! (\.\.\/([a-zA-Z0-9_\-\/\\]*)[\/\\]([a-zA-Z0-9_\-]*)\.([a-zA-Z0-9]*)) ! [$3]($1) !gm
If you don't care the path prefix, use:
$path =~ /\.(jpg|png)$/
or
substr($path, -4) ~~ ['.jpg', '.png']
With exactly '../../', use:
$path =~ m!^\.\./\.\./[^/]*\.(jpg|png)$!
With any number of '../'s, use:
$path =~ m!^(\.\./)*[^/]*\.(jpg|png)$!

perl quoting in ftp->ls with wildcard

contents of remote directory mydir :
blah.myname.1.txt
blah.myname.somethingelse.txt
blah.myname.randomcharacters.txt
blah.notmyname.1.txt
blah.notmyname.2.txt
...
in perl, I want to download all of this stuff with myname
I am failing really hard with the appropriate quoting. please help.
failed code
my #files;
#files = $ftp->ls( '*.myname.*.txt' ); # finds nothing
#files = $ftp->ls( '.*.myname.*.txt' ); # finds nothing
etc..
How do I put the wildcards so that they are interpreted by the ls, but not by perl? What is going wrong here?
I will assume that you are using the Net::FTP package. Then this part of the docs is interesting:
ls ( [ DIR ] )
Get a directory listing of DIR, or the current directory.
In an array context, returns a list of lines returned from the server. In a scalar context, returns a reference to a list.
This means that if you call this method with no arguments, you get a list of all files from the current directory, else from the directory specified.
There is no word about any patterns, which is not suprising: FTP is just a protocol to transfer files, and this module only a wrapper around that protocoll.
You can do the filtering easily with grep:
my #interesting = grep /pattern/, $ftp->ls();
To select all files that contain the character sequence myname, use grep /myname/, LIST.
To select all files that contain the character sequence .myname., use grep /\.myname\./, LIST.
To select all files that end with the character sequence .txt, use grep /\.txt$/, LIST.
The LIST is either the $ftp->ls or another grep, so you can easily chain multiple filtering steps.
Of course, Perl Regexes are more powerful than that, and we could do all the filtering in a single /\.myname\.[^.]+\.txt$/ or something, depending on your exact requirements. If you are desperate for a globbing syntax, there are tools available to convert glob patterns to regex objects, like Text::Glob, or even to do direct glob matching:
use Text::Glob qw(match_glob);
my #interesting = match_glob ".*.myname.*.txt", $ftp->ls;
However, that is inelegant, to say the least, as regexes are far more powerful and absolutely worth learning.