Is File::Spec really necessary? - perl

I know all about the history of different OSes having different path formats, but at this point in time there seems to be a general agreement (with one sorta irrelevant holdout*) about how paths work. I find the whole File::Spec route of path management to be clunky and a useless pain.
Is it really worth having this baroque set of functions to manipulate paths? Please convince me I am being shortsighted.
* Irrelevant because even MS Windows allows forward slashes in paths, which means the only funky thing is the volume at the start and that has never really been a problem for me.

Two major systems have volumes. What's the parent of C:? In unix, it's C:/... In Windows, it's C:... (Unfortunately, most people misuse File::Spec to the point of breaking this.)
There are three different set of path separators in the major systems. The fact that Windows supports "/" could simplify building paths, but it doesn't help in parsing them or to canonising them.
File::Spec also provides useful functions that make it useful even if every system did use the same style of paths, such as the one that turns a path into a relative path.
That said, I never use File::Spec. I use Path::Class instead. Without sacrificing any usability or usefulness, Path::Class provides a much better interface. And it doesn't let users mishandle volumes.

For usual file management inside Perl, No, File::Spec is not necessary and using forward slahes everywhere makes much less pain and works on Win32 anyways.
cpanminus is a good example used by lots of people and have been proved work great on win32 platform. it doesn't use File::Spec for most file path manipulation and just uses forward slashes - that was even suggested so by the experienced Perl-Win32 developers.
The only place I had to use File::Spec's catfile in cpanm, though, is where I extract file paths from a perl error message (Can't locate File\Path.pm blah blah) and create a file path to pass to the command line (i.e. cmd.exe).
Meanwhile File::Spec provides useful functions such as canonical and rel2abs - that's not "necessary" per se but really useful.

Yes absolutely.
Golden rule of programming, never hard code string literals.
Edit: One of the best ways to avoid porting issues is to avoid OS specific constants especially in the form of inline literals.
i.e e.g drive + ":/" + path + "/" + filename
It is bad practice yet We all commit these attrocities in the haste of the moment or because it doesn't matter for that piece of code. File::Spec is there for when a programmer is adhering to gospel programming.
In addition it provides the values of special and often used system directories e.g tmp or devnull which can vary from one distribution/OS to another.
If anything it could probably do with some other members added to it like user to point to the users home directory

makepp (makepp.sourceforge.net) has a makefile variable $/ which is either / or \ (on non-Cygwin Win). The reason is that Win accepts / in filenames, but not in command names (where it starts an option).

From http://perldoc.perl.org/File/Spec.html:
catdir
Concatenate two or more directory names to form a complete path ending with a directory. But remove the trailing slash from the resulting string, because it doesn't look good, isn't necessary and confuses OS/2. Of course, if this is the root directory, don't cut off the trailing slash :-)
So for example in this example I wouldn't need the regex to remove the trailing slash if I would use catdir.

Related

NetLogo: primitives or extension primitives to determine operating system?

I was curious as to whether or not anybody was aware of a built-in Netlogo or Netlogo extension primitive that allows you to determine the operating system that the user is currently running? I wish to alter directory separators according to the user's operating system and being able to determine this information would be incredibly useful.
If there isn't any such thing, I'll get to building it!
No built-in primitive exists.
It would probably be possible to hack something kludgy together in pure NetLogo by using file-exists? to test for the presence or absence of certain OS-specific files, for example /etc/passwd.txt on Unix-like systems (including Mac OS X). As for the best files to use for this, I don't know, but it wouldn't surprise me if there was already an SO answer on this (since it isn't a NetLogo-specific question).
I thought maybe https://github.com/NetLogo/Shell-Extension/ had it. But I see now that it although it has primitives for reading and setting environment variables, it doesn't have similar primitives for Java system properties, which is what you need here (System.getProperty("os.name")). It would make a nice addition to the extension, I think.
re: "alter directory separators according to the user's operating system" specifically:
If you need to deal with paths that are coming from the operating system, then yeah, you need to be prepared to deal with platform-specific separators.
If you're only sending paths to the operating system, you may not need to worry about it. I haven't used Windows in a long time, but iirc it might just work to use forward slash.
If you're doing pathname manipulation, you'll probably want to check out Charles Staelin's pathdir extension, https://github.com/cstaelin/Pathdir-Extension. It includes a pathdir:get-separator primitive, as well as lots of other useful-looking stuff.

Executing system commands safely while coding in Perl

Should one really use external commands while coding in Perl? I see several disadvantages of it. It's not system independent plus security risks might also be there. What do you think? If there is no way and you have to use the shell commands from Perl then what is the safest way to execute that particular command (like checking pid, uid etc)?
It depends on how hard it is going to be to replicate the functionality in Perl. If I needed to run the m4 macro processor on something, I'd not think of trying to replicate that functionality in Perl myself, and since there's no module on http://search.cpan.org/ that looks suitable, it would appear others agree with me. In that case, then, using the external program is sensible. On the other hand, if I needed to read the contents of a directory, then the combination of readdir() et al plus stat() or lstat() inside Perl is more sensible than futzing with the output of ls.
If you need to execute commands, think very carefully about how you invoke them. In particular, you probably want to avoid the shell interpreting the arguments, so use the array form of system (see also exec), etc, rather than a single string for the command plus arguments (which means the shell is used to process the command line).
Executing external commands can be expensive simply because it involves forking new process and watching for its output if you need it.
Probably more importantly, should external process fail for any reason, it may be difficult to understand what happened by means of your script. Worse still, surprisingly often external process can be stuck forever, so will be your script. You can use special tricks like opening pipe and watching for output in loop, but this itself is error-prone.
Perl is very capable of doing many things. So, if you stick to using only Perl native constructs and modules to accomplish your tasks, not only it will be faster because you never fork, but it will be more reliable and easier to catch errors by looking at native Perl objects and structures returned by library routines. And of course, it will be automatically portable to different platforms.
If your script runs under elevated permissions (like root or under sudo), you should be very careful as to what external programs you execute. One of the simple ways to ensure basic security is to always specify commands by full name, like /usr/bin/grep (but still think twice and just do grep by Perl itself!). However, even this may not be enough if attacker is using LD_PRELOAD mechanism to inject rogue shared libraries.
If you are willing to go very secure, it is suggested to use tainted check by using -T flag like this:
#!/usr/bin/perl -T
Taint flag will be also enabled by Perl automatically if your script was determined to have different real and effective user or group ids.
Tainted mode will severely limit your ability to do many things (like system() call) without Perl complaining - see more at http://perldoc.perl.org/perlsec.html#Taint-mode, but it will give you much higher security confidence.
Should one really use external commands while coding in Perl?
There's no single answer to this question. It all depends on what you are doing within the wide range of potential uses of Perl.
Are you using Perl as a glorified shell script on your local machine, or just trying to find a quick-and-dirty solution to your problem? In that case, it makes a lot of sense to run system commands if that is the easiest way to accomplish your task. Security and speed are not that important; what matters is the ability to code quickly.
On the other hand, are you writing a production program? In that case, you want secure, portable, efficient code. It is often preferable to write the functionality in Perl (or use a module), rather than calling an external program. At least, you should think hard about the benefits and drawbacks.

Why is "package" keyword sometimes separated by a comment from the package name?

Analyzing sources of CPAN modules I can see something like this:
...
package # hide from PAUSE
Try::Tiny::ScopeGuard;
...
Obviously, it's taken from Try::Tiny, but I have seen this kind of comments between package keyword and package identifier in other modules too.
Why this procedure is used? What is its goal and what benefits does it have?
It is indeed a hack to hide a package from PAUSE's indexer.
When a distribution is uploaded to PAUSE, the indexer will examine each file in the upload, looking for the names of packages that are included in the distribution. Any indexed packages can show up in CPAN search results.
There are many reasons for not wanting the indexer to discover your packages. Your distribution may have many small or insignificant packages that would clutter up the search results for your module. You may have packages defined in your t (test) directory or some other non-standard directory that are not meant to be installed as part of the distribution. Your distribution may include files from a completely different distribution (that somebody else wrote).
The hack works because the indexer strictly looks for the keyword package and an expression that looks like a package name on the same line.
Nowadays, you can include a META.yml file with your distribution. The PAUSE indexer will look for and respect a no_index specification in this file. But this is a relatively new capability of the indexer so older modules and old-timer CPAN contributors will still use the line break hack.
Here's an example of a no_index spec from Forks::Super
no_index:
directory:
- t
- inc
package:
- Sys::CpuAffinity
- Signals::XSIG
- Signals::XSIG::Default
- Signals::XSIG::TieArray56
Sys::CpuAffinity and Signals::XSIG are separate distributions that are also packaged with Forks::Super. Some of the test scripts contain package declarations (e.g., Arbitrary::Test::Package) that shouldn't be indexed.
Okay, here's another shot at this phenomenon ... I've been whacky-hacking Perl for a dozen years and I've rarely seen this packy hack and possibly simply ignored and never bothered to investigate. One thing seems clear, though. There's some hackish processing going on at PAUSE that's been crafted in the good ol' Perl'n'UNIX school of thought that without the shadow of a doubt involves line-oriented text parsing, so they parse those Perl files, possibly even using grep, but rather perl itself, who knows, to extract package names and then kick of some procedure or get some stats or whatnot. And to trip up this procedure and hack around its ways the author splits the package declaration in two lines so the hacky packy grep job doesn't have a clue that there's a package declared right under its nose and the programmer is happy about his hacky skills and the PAUSE stats or whatever it is they're cobbling together are as they should be. Does that make sense?

How to split long Perl code into several files without too much manual editing?

How do I split a long Perl script into two or more different files that can all access the same variables - without having to rename all shared variables from e.g. $count to $::count (or $main::count which is the same)?
In other words, what's the best and simplest way to split the Perl script into several files without having to import a lot of variables/functions and/or do a lot of manual editing.
I assume it has something to do with making the code part of the same package/scope/namespace, but my experiments so far have failed.
I am not sure it makes a difference, but the script is used for web/CGI purposes and will be running under mod_perl.
EDIT - Background:
I kind of knew I would get that response. The reason I want to split up the file is the following:
Currently I have a single very old and very long Perl file. I know it is not following Perl best practices but it works.
The problem is, I need to distribute the data files it uses between different web servers, first of all for performance reasons. There will be one "master" server and one or several "slaves".
About 20% of the mentioned Perl file contains shared functions, 40% has the code need to run on the master server and 40% on the slave servers. Therefore, I would like to split the code into three files: 1. shared, 2. master-only, 3. slave-only. On the master server, 1 and 2 will be loaded, on the slaves, 1 and 3 will be loaded.
I assume this approach would use less process RAM and, more importantly, I would minimize the risk of not splitting the code correctly (e.g. a slave process calling a master data file). I don't see a great need for modularization, as the system works and the code does not need a lot of changes or exchanges with other projects.
EDIT 2 - Solution:
Found the solution I was looking for here:
http://www.perlmonks.org/?node_id=95813
In cases where the main package is in ownership of the variable, the
actual word 'main' can be ommitted to yield something like: $::var
It is possible to get around having to fully qualify variable names
when strict is in use. Applying a simply use vars to your script, with
the variable names as it arguments will get around explicit package
names.
Actually, I ended up repeating the our ($count, etc...) statement for the needed variables instead of use vars ();
Do let me know if I am missing something vital - apart from not going with modules! :)
#Axeman, Thanks, I will accept your answer, both for your effort and for sending me in the right direction.
Unless you put different package statements in their files, they will all be treated as if they had package main; at the top. So assuming that the scripts use package variables, you shouldn't have to do anything. If you have declared them with my (that is, if they are lexically scoped variables) then you would have to make sure that all references to the variables are in the same file.
But splitting scripts up for length is a rotten substitute for modularization. Yes, modularization helps keep code length down, but modularization if the proper way to keep code length down--for all the reasons that you would want to keep code-length down, modularization does it best.
If chopping the files by length could really work for you, then you could create a script like this:
do '/path/to/bin/part1.pl';
do '/path/to/bin/part2.pl';
do '/path/to/bin/part3.pl';
...
But I kind of suspect that if the organization of this code is as bad as you're--sort of--indicating, it might suffer from some of the same re-inventing the wheel that I've seen in Perl-ignorant scripts. Just offhand (I might be wrong) but I'm thinking you would be surprised how much could be chopped from the length by simply substituting better-tested Perl library idioms than for-looping and while-ing everything.

What is a good method for inventing a command name?

We're struggling to come up with a command name for our all purpose "developer helper" tool, which we are using on our project. It's like a wrapper for our existing tools like cmake and hg. The purpose of the command is really just to make our lives easier by combining multiple commands into one (for example, publishing packages). For example, we have commands like:
do conf
do build
do install
do publish
We've considered a few ambiguous names like do (as above) and run, but obviously, do is a Linux bash command and run is pretty ambiguous.
We'd like our command to be 2 chars short, preferably - but who thinks we're asking the impossible? Is there a practical way to check the availability of command names (other than just typing them into your terminal), or is it just a case of choose one and hope nobody else will use it? Are we worrying about nothing?
Since it's a "developer helper" tool why not use hm [run|build|port|deploy|test], Help Me ...
Give it a verbose name, then let everyone alias it to whatever they want. Make sure you use the verbose name in other scripts so that it removes ambiguity.
This way, each user gets to use whatever makes sense to him/her, and the scripts are more readable and more easily searchable (for example, grepping four "our_cool_tool" will usually yield better results than grepping for "run").
How many 2-character words are useful in this context? I think you need four. With that in mind, here are some suggestions.
omni
torq
fluf
mega
spif
crnk
splt
argh
quat
drul
scud
prun
sqat
zoom
sizl
I have more if you need them.
Pick one: http://en.wikipedia.org/wiki/List_of_all_two-letter_combinations
To check the availability of command names, I suggest looking for all two-letter filenames that are in the directories in your path. You can use a script like this
for item in `echo $PATH | sed 's/:/ /g'` ; do
ls -1d $item/??
done
It won't show builtins in your shell (like "do" as you mentioned) but it's a good start.
Change ?? to ??? for three-letter files, etc.
I'm going to vote for qp (quick package?) since it's easy to pronounce, easy to type, and easy to remember where the keys are on the keyboard.
I use "asd". it's short and most developers type it without thinking
(oh, and you can always claim later that it stands for some "Advanced Script for Developers" if you need to justify yourself a few years from now)
How about fu? As in Kung Fu. It's a special purpose tool. And it's really easy to type.
I think that run is a good name, at least anybody that will download your project will know what to do. Calling it without parameters should reveal your options.
Even 'do' will do, I think you can use backquotes to run it from bash scripts.
Also remember that running the tools without parameters will tell you what options you have.
Use makefiles to do everything for you.
How about calling it something descriptive, like 'build_runner', and then just aliasing it to 'br' (or preferred acronym) in your .bashrc?
There is a really crappy tool called cleartool (part of clearcase), and people will alias it on their machine to "ct". Perhaps you can have a longer command and suggest users alias it.
It would probably be best to do something like ire_and_curses suggested, name it descriptively then alias it to a 2 letter command. If I was choosing, I would name it dev_help and alias it to dh.
I think you're worrying about nothing. Install the program as 'the-command-to-do-evertyhing-and-if-you-dont-make-your-own-alias-for-it-you-should'. I don't think that will be too long for any modern filesystems, but you might need to shorten it to 'tctdeaiydmyoafiys'. See what common aliases are used, and then change the program's name to that. In other words: don't decide, let natural selection decide for you. If you are working with a team of < 10, this should not even remotely cause any problems.
Call it devtool alias to dt
Custom tools like that I like to start with the prefix 'jj-'. I can type (with big index-finger power) 'jj ' and see all my personal commands. Also, they group together in alphabetical lists. 'J' is not a very common character for built-inc commands, but you can pick your own.
Since you want two characters, you can use just 'zz', or something starting with 'z'.
Are you sure you want to put all your functionality in one command? That might be simultaneously over-constraining and over-loading the interface a little.
do conf
do build
do install
do publish