Is there a Perl module that collapses file system paths such as a/b/.. or a//b? - perl

I'm writing a program where I have to remove redundancy in paths, e.g.
a/b/.. -> a
a//b -> a/b
a/./b -> a/b
Does any existing module do this?
Update: This normalization/canonicalization is described by RFC 3986. I only need the path segment normalization part.
Of course, this is simple to implement. I'm still wondering if it's already been packaged into some module.

Form path and meaning or relations of elements in hierarchy of URL is not specified in standard. Depending on server there could be no hierarchy at all - elements split by / could be treated as positional or order could have no meaning at all. Because of that, there's no specific module to handle that task for URLs.
However, if you're absolutely sure about how target server works, you can simply adapt File::Spec to your needs: extract path from URL (for example with URI), process it as it would be a file path, and then put it back.
Considering your comment that you'll be working with regular file names on file system, you don't even need to extract anything from path - File::Spec is enough for all your needs.
If you wish to work around File::Spec (by design) not resolving .., use splitpath from it to extract directory part of name, splitdir to split it to directories and then just iterate of that array, splice'ing two elements each time you encounter ... Use catdir and catfile to pack results back.

Related

Is there a way to use babel plugins relying ont he current filename with babel-loader?

I wrote a plugin for babel that relies on the opts.filename and opts.filenameRelative properties. It seems to be working within babel-loader for the purposes of analyzing the adjacent files, but the filename itself seems to be modified.
I'm wondering if theres a way, using babel-loader, to get access to the full source file path to use for generating a legible id and hash.
babel-loader does indeed pass the filename into the transform function. In my particular case, I was preprocessing typescript files with awesome-typescript-loader, and that was messing with the file path.

How to import files relative to main file, instead of current directory? ((Chez) Scheme)

For example, in my main.scm file I have (load "util.scm"). util.scm is a file in the same folder as main.scm. Both files are located in ~/documents/myproject/.
Now when I'm in this directory, and I run $ chez-scheme main.scm everything works fine. However, if I'm in my home directory and run $chez-scheme documents/myproject/main.scm it complains, not being able to find the file util.scm. I suppose this is the case because the current directory was my relevant home directory, and as such util.scm is indeed not there, it is actually in documents/myproject/. That being said, I'm used (in other languages) to the functionality of looking these paths up relative to the file containing the instruction to import, and I'd like to have that here as well. I've tried prefixing it by ./ or defining the file as a libary and doing (import (util)) but none of it works outside of documents/myproject/. Is there any way to get this to work as I intend it to?
I assume this is Chez-Scheme-specific. If not I'd prefer an answer that is implementation-neutral.
load is kind of awkward in R5RS since the report states that system interfaces are off topic in the report, but they include load which is a half hearted solution. The report does not say if the load is relative to the current directory or the file the load form originates from so in order to be portable I guess you are required to run your script from the current directory and have your loaded file relative to both.
Since Chez Scheme implements R6RS load is not really the right form to use. R6RS removed load in favor of libraries. You should make your file a library and consult how to install it. In some systems that is just placing the files in the right path, adding library location in configuration or running install script. How one uses the library is the same in all implementations, by using import.
According to Chez documentation you can pass --libdirs to it to give it one or more paths to consider for loading libraries. You can see the paths it scans by evaluating (library-directories)
There are several different ways to accomplish what (I think) you are trying to do, but eventually they all boil down to letting Chez know where to look for things. When given relative paths, include and load use the source-directories parameter to search for the requested file. Libraries have their path automatically prepended to source-directories while they are being loaded or compiled, so if your main.scm were a library definition then it would find util.scm as you expect.
However, it sounds like main.scm isn't a library, it's a top-level program. Unfortunately, Chez doesn't have a command line option to set the source-directories like it does for library directories. That leaves you with a bit less flexibility. Any of the following will work:
Make util.scm a library and invoke Chez with the --libdirs option to let it know where to look for libraries.
Set source-directories and load main.scm from inside the REPL rather than from the command line.
Write a wrapper shell script that does the above by echoing the commands into scheme so you don't have to type it yourself. (Only suitable if you don't also need to then type into the scheme session).
Write a wrapper shell script that cds into your project directory before running scheme (and presumably cds back to the original directory when it's done).

Add Perl module relative to script

im trying to add the module File-Copy-Recursive to my script as i have done with another module already, but when i try to use it i get an error i can not explain:
use lib "./cpan";
use Recursive qw(dircopy);
dircopy($path1, $path2);
the error i get is: Undefined subroutine &main::dircopy called at ...
I don't understand it, the module clearly has the function dircopy in it.
As other answers have already stated, this isn't working because you've moved the module's location in the include directory from File/Copy/Recursive.pm to just Recursive.pm.
Here's why that doesn't work:
A Perl module (file with a .pm extension) and a Perl package (collection of code under a specific namespace) are two completely different things. Normally, we'll put a package into a module which happens to have the same name, but this is really just to help us humans maintain our sanity. perl doesn't care one way or the other - one module can contain multiple packages, one package can be split across multiple files, and the names of the packages and the modules can be completely unrelated for all perl cares.
But, still... there's that convention of using the same name for both, which the use command exploits to make things a little more convenient. Behind the scenes, use Module; means require Module.pm; Module->import; - note that it calls import on the module name, not the name of the package contained within the module!
And that's the key to your issue. Even though you've moved the file out of the File/Copy/ directory, its contents still specify package File::Copy::Recursive, so that's where all of its code ends up. use Recursive attempts to call Recursive->import, which doesn't exist, so nothing gets imported. The dircopy function would be imported by File::Copy::Recursive->import, but that never gets called.
So, yeah. Move ./cpan/Recursive.pm to ./cpan/File/Copy/Recursive.pm so that the package name and the module name will match up again and sanity will be restored. (If you've been paying attention, you should be able to come up with at least two or three other ways to get this working, but moving the file to the proper place under ./cpan really is your best option if you need to keep the File::Copy::Recursive source in a subdirectory of your project's code.)
Use FindBin for relative lib path:
use FindBin;
use lib "$FindBin::Bin/./cpan";
use File::Copy::Recursive;
And you have to keep the whole 'tree' under ./cpan and the use line have to remain the same.
Files under ./cpan dir:
find ./cpan/
./cpan/File/Copy/Recursive.pm
The module name in Perl comes not only from the path, but also from its package declaration. You installed the module to ./cpan, but the package name specified is still File::Copy::Recursive.

What is the difference between require and load in common lisp?

I'm going through Practical Common Lisp, I'm almost finished, and one question that has not been answered for me so far (or maybe I just missed it) is the difference between "require" and "load".
So what is the difference?
Thanks.
require is used for modules, which can each consist of one or many files.
load is used to load an arbitrary single file.
The require function tests whether a
module is already present (using a
case-sensitive comparison); if the
module is not present, require
proceeds to load the appropriate file
or set of files. The pathname
argument, if present, is a single
pathname or a list of pathnames whose
files are to be loaded in order, left
to right. If the pathname argument is
nil or is not provided, the system
will attempt to determine, in some
system-dependent manner, which files
to load. This will typically involve
some central registry of module names
and the associated file lists.
Source: http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node119.html
The load function loads the file named by
filename into the Lisp environment. It
is assumed that a text (character
file) can be automatically
distinguished from an object (binary)
file by some appropriate
implementation-dependent means,
possibly by the file type. The
defaults for filename are taken from
the variable
default-pathname-defaults. If the filename (after the merging in of the
defaults) does not explicitly specify
a type, and both text and object types
of the file are available in the file
system, load should try to select the
more appropriate file by some
implementation-dependent means.
Source: http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node217.html
The difference is that (require) loads a module if it has not been loaded already; (load) loads a file.

Why do directory listings contain the current (.) and parent (..) directory?

Whenever I list the contents of a directory with a function like readdir, the returned file names also include "." and "..". I have the suspicion that these are just normal links in the file system and therefore indistinguishable from actual files, but I always have to filter them out because they are not actual objects in the directory I am listing. Is there a good reason for functions like readdir to include them? Do some operating systems or file systems contain more or different virtual file names? Is there a better way to filter them out other than by doing string comparison with "." and ".."?
Update: thank you all for answering. I suppose I always thought that things like ./ and ../ were mere conventions that could be handled by searching and replacing. I find it a bit surprising, though probably more efficient and transparent, to have them be part of the file system itself.
One question remains, though: since . and .. are arbitrary names for these links, are there file systems that use different ones?
. and .. are actually hard links in filesystems. They are needed so that you can specify relative paths, based on some reference path (consider "../sibling/file.txt"). Since these hard links are actually existing in the filesystem, it makes sense for readdir to tell you about them. (actually the term hard link just means some name that is indistinguishable from the actual directory referred to: they both point to the same inode in the filesystem).
Best way is to just strcmp and ignore them, if you don't want to list them.
Originally they were hard links, and the number of special cases in the filesystem code for . and .. were minimal. That's not true for all modern filesystems, however.
But the conventions have been established so that even filesystems where these two directory entries don't actually exist still report their existence through APIs like readdir. Changing this would now would break a lot of code.
I have the suspicion that these are
just normal links in the file system
and therefore indistinguishable from
actual files
They are. While you may perceive the file system as a hierarchy of "folders" "containing" folders, it is actually a doubly linked tree1, with directories being nodes and files being leafs. So, . and .. are needed links for accessing the leaves of the current node and for traversing the tree, and they are the same thing as all the other links.
When you call readdir, you get all the places you can directly go to from the current node. If you do not want to list places that you perceive as "up", you have to sort them out yourself. You should write a little function for that, perhaps called readdir_down. I do not know in which order readdir lists the directories, but perhaps you can just throw away the first two entries.
1) this is a first approximation, there are also "hard links" possible that make the tree actually a net.
One reason is that without them there is no way to get to the parent directory. Or get a handle to the current directory.
Without them, we cannot do such things as:
./run_this
Indeed, we couldn't add '.' to the $PATH, meaning we couldn't ever execute files that weren't already in the path.
These are normal directories, they are "hard links" to the current directory and directory above. They are present in all directories (even at the root level, where .. is exactly the same as .).
When using ls, you can filter out . and .. with ls -A (note the capital -A).
When applying a command to all dot-files, but not . or .., I often use .??* which matches only dot-file with a name of three characters or more.
touch .??*
Note this pattern also excludes any other file that begins with dot and is only two characters long (e.g. .x) but those files are uncommon.
When using programmatic file-listers like readdir() I do have to exclude . and .. manually. Since these two files are supposed to be first in the list returned by readdir() you can do this:
#files = readdir(DIR);
for (1..2) { shift #files; } # get rid of . and ..
# go on with your business
They are reported because they are stored in the directory listing. That's the way unices have always worked.
Because on Unix-like operating systems, the directory-listing commands include those, and you use them to move up and down in the filesystem hierarchy.
Something like grep { not /^.{1,2}\z/ } readdir HANDLE should work for you.
there is no good reason a directory scan should return these filenames.