How to detect which HPC scheduler (Torque, Sun Grid Engine etc) am I using? - hpc

I need to run a different script depending on the type of scheduler, which necessitates a reliable way to detect if the scheduler is Torque, SGE or something else. Something like $SHELL telling which shell I am using. or something like name.
I am aware of environmental variables the two systems set, but they don't offer me a reliable or an elegant way - given the commands the env. variables are named similarly or identically.. there needs to several ifs and buts before we can conclude which is it.

Set an environment variable explicitly in your .bashrc that you can later query.
e.g.
export RUNNING_ON="moms_gpu_cluster5"
export THIS_SYSTEMS_SCHEDULER="SGE"
You won't have to rely on what the sysadmin gives you, or what the scheduler does or doesn't do.

Related

Powershell Remove-Variable cmdlet, do I need to call it at the end of each function/scriptblock?

This is a generic question, no code.
Not sure if I need to remove local variables as I thought it should be done by the Powershell Engine.
I had a script to gather info from WMI and used a lot of local variables. The output was messed up when running multiple times, but it got fixed after I clean up all local variables at the end of function/scriptblock.
Any thoughts/idea would be appreciated.
The trouble do not come from the fact that you do not remove your vars, but by at least from two beginers errors (or done by lazy developpers like me, supposing I'am a developper).
we forget to itialize our vars before using them.
we do not test every returns of our functions or CmdLets calls.
Once these two things done (code multipled by two at least) you can restart your script without cleaning anything except the processed datas.
For me scripting is most of the time done on a corner of a table even not push in a source repository.
So start scripting and ask yourself fewer questions.

Executing system commands safely while coding in Perl

Should one really use external commands while coding in Perl? I see several disadvantages of it. It's not system independent plus security risks might also be there. What do you think? If there is no way and you have to use the shell commands from Perl then what is the safest way to execute that particular command (like checking pid, uid etc)?
It depends on how hard it is going to be to replicate the functionality in Perl. If I needed to run the m4 macro processor on something, I'd not think of trying to replicate that functionality in Perl myself, and since there's no module on http://search.cpan.org/ that looks suitable, it would appear others agree with me. In that case, then, using the external program is sensible. On the other hand, if I needed to read the contents of a directory, then the combination of readdir() et al plus stat() or lstat() inside Perl is more sensible than futzing with the output of ls.
If you need to execute commands, think very carefully about how you invoke them. In particular, you probably want to avoid the shell interpreting the arguments, so use the array form of system (see also exec), etc, rather than a single string for the command plus arguments (which means the shell is used to process the command line).
Executing external commands can be expensive simply because it involves forking new process and watching for its output if you need it.
Probably more importantly, should external process fail for any reason, it may be difficult to understand what happened by means of your script. Worse still, surprisingly often external process can be stuck forever, so will be your script. You can use special tricks like opening pipe and watching for output in loop, but this itself is error-prone.
Perl is very capable of doing many things. So, if you stick to using only Perl native constructs and modules to accomplish your tasks, not only it will be faster because you never fork, but it will be more reliable and easier to catch errors by looking at native Perl objects and structures returned by library routines. And of course, it will be automatically portable to different platforms.
If your script runs under elevated permissions (like root or under sudo), you should be very careful as to what external programs you execute. One of the simple ways to ensure basic security is to always specify commands by full name, like /usr/bin/grep (but still think twice and just do grep by Perl itself!). However, even this may not be enough if attacker is using LD_PRELOAD mechanism to inject rogue shared libraries.
If you are willing to go very secure, it is suggested to use tainted check by using -T flag like this:
#!/usr/bin/perl -T
Taint flag will be also enabled by Perl automatically if your script was determined to have different real and effective user or group ids.
Tainted mode will severely limit your ability to do many things (like system() call) without Perl complaining - see more at http://perldoc.perl.org/perlsec.html#Taint-mode, but it will give you much higher security confidence.
Should one really use external commands while coding in Perl?
There's no single answer to this question. It all depends on what you are doing within the wide range of potential uses of Perl.
Are you using Perl as a glorified shell script on your local machine, or just trying to find a quick-and-dirty solution to your problem? In that case, it makes a lot of sense to run system commands if that is the easiest way to accomplish your task. Security and speed are not that important; what matters is the ability to code quickly.
On the other hand, are you writing a production program? In that case, you want secure, portable, efficient code. It is often preferable to write the functionality in Perl (or use a module), rather than calling an external program. At least, you should think hard about the benefits and drawbacks.

Argument passing strategy - environment variables vs. command line

Most of the applications we developers write need to be externally parametrized at startup. We pass file paths, pipe names, TCP/IP addresses etc. So far I've been using command line to pass these to the appplication being launched. I had to parse the command line in main and direct the arguments to where they're needed, which is of course a good design, but is hard to maintain for a large number of arguments. Recently I've decided to use the environment variables mechanism. They are global and accessible from anywhere, which is less elegant from architectural point of view, but limits the amount of code.
These are my first (and possibly quite shallow) impressions on both strategies but I'd like to hear opinions of more experienced developers -- What are the ups and downs of using environment variables and command line arguments to pass arguments to a process? I'd like to take into account the following matters:
design quality (flexibility/maintainability),
memory constraints,
solution portability.
Remarks:
Ad. 1. This is the main aspect I'm interested in.
Ad. 2. This is a bit pragmatic. I know of some limitations on Windows which are currently huge (over 32kB for both command line and environment block). I guess this is not an issue though, since you just should use a file to pass tons of arguments if you need.
Ad. 3. I know almost nothing of Unix so I'm not sure whether both strategies are as similarily usable as on Windows. Elaborate on this if you please.
1) I would recommend avoiding environmental variables as much as possible.
Pros of environmental variables
easy to use because they're visible from anywhere. If lots of independent programs need a piece of information, this approach is a whole lot more convenient.
Cons of environmental variables
hard to use correctly because they're visible (delete-able, set-able) from anywhere. If I install a new program that relies on environmental variables, are they going to stomp on my existing ones? Did I inadvertently screw up my environmental variables when I was monkeying around yesterday?
My opinion
use command-line arguments for those arguments which are most likely to be different for each individual invocation of the program (i.e. n for a program which calculates n!)
use config files for arguments which a user might reasonably want to change, but not very often (i.e. display size when the window pops up)
use environmental variables sparingly -- preferably only for arguments which are expected not to change (i.e. the location of the Python interpreter)
your point They are global and accessible from anywhere, which is less elegant from architectural point of view, but limits the amount of code reminds me of justifications for the use of global variables ;)
My scars from experiencing first-hand the horrors of environmental variable overuse
two programs we need at work, which can't run on the same computer at the same time due to environmental clashes
multiple versions of programs with the same name but different bugs -- brought an entire workshop to its knees for hours because the location of the program was pulled from the environment, and was (silently, subtly) wrong.
2) Limits
If I were pushing the limits of either what the command line can hold, or what the environment can handle, I would refactor immediately.
I've used JSON in the past for a command-line application which needed a lot of parameters. It was very convenient to be able to use dictionaries and lists, along with strings and numbers. The application only took a couple of command line args, one of which was the location of the JSON file.
Advantages of this approach
didn't have to write a lot of (painful) code to interact with a CLI library -- it can be a pain to get many of the common libraries to enforce complicated constraints (by 'complicated' I mean more complex than checking for a specific key or alternation between a set of keys)
don't have to worry about the CLI libraries requirements for order of arguments -- just use a JSON object!
easy to represent complicated data (answering What won't fit into command line parameters?) such as lists
easy to use the data from other applications -- both to create and to parse programmatically
easy to accommodate future extensions
Note: I want to distinguish this from the .config-file approach -- this is not for storing user configuration. Maybe I should call this the 'command-line parameter-file' approach, because I use it for a program that needs lots of values that don't fit well on the command line.
3) Solution portability: I don't know a whole lot about the differences between Mac, PC, and Linux with regard to environmental variables and command line arguments, but I can tell you:
all three have support for environmental variables
they all support command line arguments
Yes, I know -- it wasn't very helpful. I'm sorry. But the key point is that you can expect a reasonable solution to be portable, although you would definitely want to verify this for your programs (for example, are command line args case sensitive on any platforms? on all platforms? I don't know).
One last point:
As Tomasz mentioned, it shouldn't matter to most of the application where the parameters came from.
You should abstract reading parameters using Strategy pattern. Create an abstraction named ConfigurationSource having readConfig(key) -> value method (or returning some Configuration object/structure) with following implementations:
CommandLineConfigurationSource
EnvironmentVariableConfigurationSource
WindowsFileConfigurationSource - loading from a configuration file from C:/Document and settings...
WindowsRegistryConfigurationSource
NetworkConfigrationSource
UnixFileConfigurationSource - - loading from a configuration file from /home/user/...
DefaultConfigurationSource - defaults
...
You can also use Chain of responsibility pattern to chain sources in various configurations like: if command line argument is not supplied, try environment variable and if everything else fails, return defauls.
Ad 1. This approach not only allows you to abstract reading configuration, but you can easily change the underlying mechanism without any affect on client code. Also you can use several sources at once, falling back or gathering configuration from different sources.
Ad 2. Just choose whichever implementation is suitable. Of course some configuration entries won't fit for instance into command line arguments.
Ad 3. If some implementations aren't portable, have two, one silently ignored/skipped when not suitable for a given system.
I think this question has been answered rather well already, but I feel like it deserves a 2018 update. I feel like an unmentioned benefit of environmental variables is that they generally require less boiler plate code to work with. This makes for cleaner more readable code. However a major disadvatnage is that they remove a layers of isolation from different applications running on the same machine. I think this is where Docker really shines. My favorite design pattern is to exclusively use environment variables and run the application inside of a Docker container. This removes the isolation issue.
I generally agree with previous answers, but there is another important aspect: usability.
For example, in git you can create a repository with the .git directory outside of that. To specify that, you can use a command line argument --git-dir or an environmental variable GIT_DIR.
Of course, if you change the current directory to another repository or inherit environmental variables in scripts, you get a mistake. But if you need to type several git commands in a detached repository in one terminal session, this is extremely handy: you don't need to repeat the git-dir argument.
Another example is GIT_AUTHOR_NAME. It seems that it even doesn't have a command line partner (however, git commit has an --author argument). GIT_AUTHOR_NAME overrides the user.name and author.name configuration settings.
In general, usage of command line or environmental arguments is equally simple on UNIX: one can use a command line argument
$ command --arg=myarg
or an environmental variable in one line:
$ ARG=myarg command
It is also easy to capture command line arguments in an alias:
alias cfg='git --git-dir=$HOME/.cfg/ --work-tree=$HOME' # for dotfiles
alias grep='grep --color=auto'
In general most arguments are passed through the command line. I agree with the previous answers that this is more functional and direct, and that environmental variables in scripts are like global variables in programs.
GNU libc says this:
The argv mechanism is typically used to pass command-line arguments specific to the particular program being invoked. The environment, on the other hand, keeps track of information that is shared by many programs, changes infrequently, and that is less frequently used.
Apart from what was said about dangers of environmental variables, there are good use cases of them. GNU make has a very flexible handling of environmental variables (and thus is very integrated with shell):
Every environment variable that make sees when it starts up is transformed into a make variable with the same name and value. However, an explicit assignment in the makefile, or with a command argument, overrides the environment. (-- and there is an option to change this behaviour) ...
Thus, by setting the variable CFLAGS in your environment, you can cause all C compilations in most makefiles to use the compiler switches you prefer. This is safe for variables with standard or conventional meanings because you know that no makefile will use them for other things.
Finally, I would stress that the most important for a program is not programmer, but user experience. Maybe you included that into the design aspect, but internal and external design are pretty different entities.
And a few words about programming aspects. You didn't write what language you use, but let's imagine your tools allow you the best possible argument parsing. In Python I use argparse, which is very flexible and rich. To get the parsed arguments, one can use a command like
args = parser.parse_args()
args can be further split into parsed arguments (say args.my_option), but I can also pass them as a whole to my function. This solution is absolutely not "hard to maintain for a large number of arguments" (if your language allows that). Indeed, if you have many parameters and they are not used during argument parsing, pass them in a container to their final destination and avoid code duplication (which leads to inflexibility).
And the very final comment is that it's much easier to parse environmental variables than command line arguments. An environmental variable is simply a pair, VARIABLE=value. Command line arguments can be much more complicated: they can be positional or keyword arguments, or subcommands (like git push). They can capture zero or several values (recall the command echo and flags like -vvv). See argparse for more examples.
And one more thing. Your worrying about memory is a bit disturbing. Don't write overgeneral programs. A library should be flexible, but a good program is useful without any arguments. If you need to pass a lot, this is probably data, not arguments. How to read data into a program is a much more general question with no single solution for all cases.

avoiding exploit in perl variable extrapolation from file

I am optimizing a very time/memory consuming program by running it over a dataset and under multiple parameters. For each "run", I have a csv file, "setup.csv" set up with "runNumber","Command" for each run. I then import this into a perl script to read the command for the run number I would like, extrapolate the variables, then execute it on the system via the system command. Should I be worried about the potential for this to be exploited, (I am worried right now)? If so, what can I do to protect our server? My plan now is to change the file permissions of the "setup.csv" to read only and ownership to root, then go in as root whenever I need to append another run to the list.
Thank you very much for your time.
Run your code in taint mode with -T. That will force you to carefully launder your data. Only pass through strings that are ones you are expecting. Do not launder with .*, but rather check against a list of good strings.
Ideally, there a list of known acceptable values, and you validate against that.
Either way, you want to avoid the shell by using the multi-argument form of system or by using IPC::System::Simple's systemx.
If you can't avoid the shell, you must properly convert the text to pass to the command into shell literals.
Even then, you have to be careful of values that start with -. Lots of tools accept -- to denote the end options, allowing other values to be passed safely.
Finally, you might want to make sure the args don't contain the NUL character (\0).
systemx('tool', '--', #args)
Note: Passing arbitrary strings is not possible in Windows. Extra validation is required.

How can I control an interactive Unix application programmatically through Perl?

I have inherited a 20-year-old interactive command-line unix application that is no longer supported by its vendor. We need to automate some tasks in this application.
The most troublesome of these is creating thousands of new records with slightly different parameters (e.g. different identifiers, different names). The records have to be created in sequence, one at a time, which would take many months (and therefore dollars) to do manually. In most cases, creating a record has a very predictable pattern of keying in commands, reading responses, keying in further commands, etc. However, some record creation operations will result in error conditions ('record with this identifier already exists') that require a different set of commands to be exit gracefully.
I can see a few different ways to do this:
Named pipes. Write a Perl script that runs the target application with STDIN and STDOUT set to named pipes then sends the target application the sequence of commands to create a record with the required parameters, and then instructs the target application to exit and shut down. We then run the script as many times as required with different parameters.
Application. Find another Unix tool that can be used to script interactive programs. The only ones I have been able to find though are expect, but this does not seem top be maintained; and chat, which I recall from ages ago, and which seems to do more-or-less what I want, but appears to be only for controlling modems.
One more potential complication: I think the target application was written for a VT100 terminal and it uses some sort of escape sequences to do things like provide highlighting.
My question is what approach should I take? One of these, or something completely different? I quite like the idea of using named pipes and then having a Perl script that opens the FIFOs and reads and writes as required, as it provides a lot of flexibility, but from what I have read it seems like there's a lot of potential problems if I go down this path.
Thanks in advance.
I'd definitely stick to Perl for the extra flexibility, as chaos suggested. Are you aware of the Expect perl module? It's a lot nicer than the named pipe approach.
Note also with named pipes, you can't force the output coming back from your legacy application to be unbuffered, which could be annoying. I think Expect.pm uses pseudo-ttys to get around this problem, but I'm not sure. See the discussion in perlipc in the section "Bidirectional Communication with Another Process" for more details.
expect is a lot more solid than you're probably giving it credit for, but if I were you I'd still go with the Perl option, wanting to have a full and familiar programming language for managing the process and having confidence that whatever weird issues arise, there will be ways of addressing them.
Expect, either with the Tcl or Perl implementations, would be my first attempt. If you are seeing odd sequences in the output because it's doing odd terminal things, just filter those from the output before you do your matching.
With named pipes, you're going to end up reinventing Expect anyway.