Glob pattern to check if sibling folder exists - import

Specifically, this is actually for Vite's import.meta.glob, meaning it's really for fast-glob (https://github.com/mrmlnc/fast-glob#pattern-syntax), but anyway... here's the question:
Is it possible to check if a sibling or parent folder exists to match a file using a glob pattern?
For example, let's say I have a folder structure like this
/src/lib/hi.txt
/src/foo/hi.txt
/src/foo/bar/hi.txt
/src/baz/hi.txt
/src/qux/bar/hi.txt
/src/qux/hi.txt
I want to look for any file that is sitting in a folder that has a sibling folder named bar. In this example, the glob would match with /src/foo/hi.txt and /src/qux/hi.txt
I imagined a glob like so /src/**/bar/../hi.txt would work, but the bar and .. just cancels each other out in fast-glob and I get matched with all the hi.txt files.
Is this something glob supports?

Related

How to fetch file path dynamically using pyspark

I have multiple files in my folder , i want to pattern match if any file is present , if that file is present then store the variable with whole file path.
how to achieve this in pyspark
Since you want to store the whole path in a variable, you can achieve this with a combination of dbutils and Regular expression pattern matching.
We can use dbutils.fs.ls(path) to return the list of files present in a folder (storage account or DBFS). Assign its return value to a variable called files.
#my sample path- mounted storage account folder.
files = dbutils.fs.ls("/mnt/repro")
Loop through this list. Now using Python's re.match() you can check if the current item's file name matches your pattern. If it matches, append its path to your result variable (list).
from re import match
matched_files=[]
for file in files:
#print(file)
if(match("sample.*csv", file.name)): #"sample.*csv" is pattern to be matched
matched_files.append(file.path)
#print("Matched files: ",matched_files)
Sample output:

Perl: Run script on multiple files in multiple directories

I have a perl script that reads a .txt and a .bam file, and creates an output called output.txt.
I have a lot of files that are all in different folders, but are only slightly different in the filename and directory path.
All of my txt files are in different subfolders called PointMutation, with the full path being
/Volumes/Lab/Data/Darwin/Patient/[Plate 1/P1H10]/PointMutation
The text(s) in the bracket is the part that changes, But the Patient subfolder contains all of my txt files.
My .bam file is located in a subfolder named DNA with a full path of
/Volumes/Lab/Data/Darwin/Patient/[Plate 1/P1H10]/SequencingData/DNA
Currently how I run this script is go on the terminal
cd /Volumes/Lab/Data/Darwin/Patient/[Plate 1/P1H10]/PointMutation
perl ~/Desktop/Scripts/Perl.pl "/Volumes/Lab/Data/Darwin/Patient/[Plate
1/P1H10]/PointMutation/txtfile.txt" "/Volumes/Lab/Data/Darwin/Patient/[Plate
1/P1H10]/SequencingData/DNA/bamfile.bam"
With only 1 or two files, that is fairly easy, but I would like to automate it once the files get much larger. Also once I run these once, I don't want to do it again, but I will get more information from the same patient, is there a way to block a folder from being read?
I would do something like:
for my $dir (glob "/Volumes/Lab/Data/Darwin/Patient/*/"){
# skip if not a directory
if (! -d $dir) {
next;
}
my $txt = "$dir/PointMutation/txtfile.txt";
my $bam = "$dir/SequencingData/DNA/bamfile.bam";
# ... you magical stuff here
}
This is assuming that all directories under /Volumes/Lab/Data/Darwin/Patient/ follow the convention.
That said, more long term/robust way of organizing analyses with lots of different files all over the place is either 1) organize all files necessary for each analysis under one directory, or 2) to create meta files (i'd use JSON/yaml) which contain the necessary file names.

Select files using include and exclude array of recursive glob patterns

I've been given two file glob parameters in JSON, include and exclude, in the following format:
{
include: ['**/*.md', '**/swagger/*.json', '**/*.yml', 'somedir/*.yml'],
exclude: ['**/obj/**', 'otherdir/**', '**/includes/**']
}
I'm tasked with walking a directory tree to select files according to the include and exclude rules in these formats; this has to be written as a Powershell script.
I've been trying to find a built-in command that supports the double-asterisk, recursive file glob pattern; additionally, since Powershell is converting the JSON to an object, it would be nice if the command parameters could accept an array as input.
I've looked at Get-ChildItem, but I'm not sure that I can mimic the glob resolution behavior using -include, -exclude, and/or -filter. I've also looked at Resolve-Path, but I'm not sure if the wildcards will work correctly (and I might have to manually exclude paths).
How can I select paths using multiple recursive wildcard file globs in Powershell while excluding other globs? Is there a Powershell command that supports this?
Thank you!
EDIT:
In these glob patterns, the single asterisk is a regular wildcard. The double asterisk (**), however, is a known standard which denotes a recursive directory search.
For example: the pattern dir1/*/file.txt would match:
dir1/dir2/file.txt
dir1/dir3/file.txt
...but not:
dir1/dir2/dir3/file.txt
The pattern dir1/**/file.txt would match everything that the above selector would, but it would also match:
dir1/dir3/dir4/file.txt
dir1/dir7/dir9/dir23/dir47/file.txt
and so on. So, an exclude glob pattern like **/obj/** basically means "exclude anything found in any obj folder found at any point in the directory hierarchy, no matter how deep".

How to exclude some files using file::find::rule module using perl?

I had tried to remove the files which ever named along with digits but it is not happening in my code.Here $output is my directory location.In which the directory contains multiple folders and sub folders.
From that folders and sub folders i want to pick my .ml files .In which the only the aplhabets named .ml files to be listed.
If the file names comes like(ev4.html,ev8.html and so on it should be omitted).
Because here the file names comes along with the digits so i want to exclude the files which ever named along with digits and print the excepted output.
Here is my code:
use strict;
use warnings;
use File::Find::Rule;
my $output="/home/location/radio/datas";
my #files=File::Find::Rule->file()
->name('*.ml')
#->name(qr/\^ev\d+/)->prune->discard
->in($output);
for my $file(#files)
{
print "file:$file\n";
}
Obtained output:
file:/dacr/dacr.ml
file:/DV/DV.ml
file:DV/ev4/ev4.ml
Expected Output:
file:/dacr/dacr.ml
file:/DV/DV.ml
Your attempt was almost correct, but your regular expression is wrong, and the prune and discard will remove all files, not only the ones for the regex.
my #files=File::Find::Rule->file()
->name('*.ml')
->name(qr/\^ev\d+/) # wrong regex
->prune->discard # throws away all files
->in($output);
The correct regular expression to get files that contain any digit is simply \d. You are saying a literal ^, the letters ev and any number of digits, at least one.
To make File::Find::Rule take all files that end in .ml and then not the ones that have a digit, use not.
my #files=File::Find::Rule->file()
->name('*.ml')
->not( File::Find::Rule->name(qr/\d/) )
->in($output);
This will get all .ml files, and discard any file that has any digit in the name.

perl quoting in ftp->ls with wildcard

contents of remote directory mydir :
blah.myname.1.txt
blah.myname.somethingelse.txt
blah.myname.randomcharacters.txt
blah.notmyname.1.txt
blah.notmyname.2.txt
...
in perl, I want to download all of this stuff with myname
I am failing really hard with the appropriate quoting. please help.
failed code
my #files;
#files = $ftp->ls( '*.myname.*.txt' ); # finds nothing
#files = $ftp->ls( '.*.myname.*.txt' ); # finds nothing
etc..
How do I put the wildcards so that they are interpreted by the ls, but not by perl? What is going wrong here?
I will assume that you are using the Net::FTP package. Then this part of the docs is interesting:
ls ( [ DIR ] )
Get a directory listing of DIR, or the current directory.
In an array context, returns a list of lines returned from the server. In a scalar context, returns a reference to a list.
This means that if you call this method with no arguments, you get a list of all files from the current directory, else from the directory specified.
There is no word about any patterns, which is not suprising: FTP is just a protocol to transfer files, and this module only a wrapper around that protocoll.
You can do the filtering easily with grep:
my #interesting = grep /pattern/, $ftp->ls();
To select all files that contain the character sequence myname, use grep /myname/, LIST.
To select all files that contain the character sequence .myname., use grep /\.myname\./, LIST.
To select all files that end with the character sequence .txt, use grep /\.txt$/, LIST.
The LIST is either the $ftp->ls or another grep, so you can easily chain multiple filtering steps.
Of course, Perl Regexes are more powerful than that, and we could do all the filtering in a single /\.myname\.[^.]+\.txt$/ or something, depending on your exact requirements. If you are desperate for a globbing syntax, there are tools available to convert glob patterns to regex objects, like Text::Glob, or even to do direct glob matching:
use Text::Glob qw(match_glob);
my #interesting = match_glob ".*.myname.*.txt", $ftp->ls;
However, that is inelegant, to say the least, as regexes are far more powerful and absolutely worth learning.