How do I run subprocesses that use temp dirs in Haskell? - subprocess

I've got a Haskell program that needs to execute a separate (third party) binary; this binary will write its output to a file provided as a command-line argument (it does not seem willing to write to STDOUT). I see that System.Cmd will allow me to call this binary, but I'm quite mystified by the type of withTemporaryDirectory. Namely:
withTemporaryDirectory :: FilePath -> (FilePath -> IO a) -> IO a
whereas System.Cmd merely gives me:
rawSystem :: String -> [String] -> IO ExitCode
(as well as system, which isn't usefully different in this case).
I'm just stuck figuring out how to wire these up; I want to create a temp directory (this binary likes to vomit all over its CWD), run the binary, read from its output file (I'll know its name as I provide that as an argument to the binary in question) and then blow away the temp directory and its contents.
So, should I be writing a function whose type is (Filepath -> IO a) in order to do all the stuff I described? Are there any good examples anyone can provide to this effect?
In this case, the binary being used is PsiPred (protein secondary structure prediction) and while its source is available, I'd rather not have to modify it. This software we're working on is a computational biology program for remote homology detection in proteins.

FilePath is typdefed as a String. withTemporaryDirectory works as if you called mkdtemp(3) with its first argument, and then used its result to call the second argument (a function taking a path with the template applied, and running an IO action with it). After the inner function terminates, the directory is deleted.
In your case, I would assume you should use withTemporaryDirectory, and then inside the function you pass to it, change directories to the temporary one, actually run PsiPred, and then change back into your old one.

System.Cmd is part of the package process, which also contains the module System.Process which contains more general versions of system, i.e. createProcess and runProcess. Both allow to specify a working directory and much more. See System.Process

Related

Programmable arguments in perl pipes

I'm gradually working my way up the perl learning curve (with thanks to contributors to this REALLY helpful site), but am struggling with how to approach this particular issue.
I'm building a perl utility which utilises three (c++) third party programmes. Normally these are run: A $file_list | B -args | C $file_out
where process A reads multiple files, process B modifies each individual file and process C collects all input files in the pipe and produces a single output file, with a null input file signifying the end of the input stream.
The input files are large(ish) at around 100Mb and around 10 in number. The processes are CPU intensive and the whole process need to be applied to thousands of groups of files each day, so the simple solution of reading and writing intermediate files to disk is simply too inefficient. In addition, the process above is only part of a processing sequence, where the input files are already in memory and the output file also needs to be in memory for further processing.
There are a number of solutions to this already well documented and I have a prototype version utilising IPC::Open3(). So far, so good. :)
However - when piping each file to process A through process B I need to modify the arguments in process B for each input file without interrupting the forward flow to process C. This is where I come unstuck and am looking for some suggestions.
As further background:
Running in Ubuntu 16.04 LTS (currently within Virtual box)and perl v5.22.1
The programme will run on (and within) a single machine by one user (me !), i.e. no external network communication or multi user or public requirement - so simplicity of programming is preferred over strong security.
Since the process must run repeatedly without interruption, robust/reliable I/O handling is required.
I have access to the source code of each process, so that could be modified (although I'd prefer not to).
My apologies for the lack of "code to date", but I thought the question is more one of "How do I approach this?" rather than "How do I get my code to work?".
Any pointers or help would be very much appreciated.
You need a fourth program (call it D) that determines what the arguments to B should be and executes B with those arguments and with D's stdin and stdout connected to B's stdin and stdout. You can then replace B with D in your pipeline.
What language you use for D is up to you.
If you're looking to feed output from different programs into the pipes, I'd suggest what you want to look at is ... well, pipe.
This lets you set up a pipe - that works much like the ones you get from IPC::Open3 but have a bit more control over what you read/write into it.

How could I run a single line of code (not script) from command prompt?

Simple question here, just can't seem to pass it google in a way it can understand.
Say I wanted to execute a line of actual programming code (c++ or java or python... etc) like SetCursorPos or printf from the command prompt command line. I vaguely imagine I would have to invoke the compiler and pass the command to it like a parameter, where from it would then be converted into machine language and passed to... where exactly?
Okay so that was kind of two questions.
How to run actual code from the command line and
what exactly is happening when a fully compiled program, or converted line of code (presuming these are essentially binary containers at that point), is executed?
Question one takes priority obviously. Unfortunately, I can not find any documentation on it, just a bunch of stuff vaguely related to it.
How to run actual code from the command line
Without delving into the vast amounts of blurriness between them, there are two major categories of language implementations: interpreters and compilers.
With many interpreters (or implementations with implicit compilation, such as V8 JavaScript's jit compiler, or pretty much anything with a repl), running a single line from the command line should be fairly trivial. CPython (the standard implementation of Python) has the -c command option:
$ python -c 'print("Hello, world!")'
Hello, world!
Language implementations with explicit compilation steps will tend to be decidedly less simple. In particular, the compiler would need to either accept source either from directly out of the argument list, or from standard input (via piping or redirection). On the output side, your compiler would have to support immediately executing that program, or outputting it to standard out, so that an operating system feature (if it exists) can execute it from a pipe.
To my knowledge, most explicit compilers are not designed with such usage in mind. In such cases, your best bet is to see if there is a REPL available for the language in question, preferably one as compatible with your compiler as possible, or to create (or find) a wrapper that makes it look like your language has a REPL. The wrapper would:
Accept input along the lines of CPython above.
Create a temporary source file behind the scenes with the code to be run and any necessary boilerplate.
Pass that file to the compiler.
Automatically run the resulting executable.
Delete the source file and executable. These may be cleaned up by the operating system later instead, if they're in a temp directory.
From the point of view of the user, this should look pretty similar to the CPython example, as they wouldn't have to interact with or see the compiler or temporary files.

At which lines in my MATLAB code a variable is accessed?

I am defining a variable in the beginning of my source code in MATLAB. Now I would like to know at which lines this variable effects something. In other words, I would like to see all lines in which that variable is read out. This wish does not only include all accesses in the current function, but also possible accesses in sub-functions that use this variable as an input argument. In this way, I can see in a quick way where my change of this variable takes any influence.
Is there any possibility to do so in MATLAB? A graphical marking of the corresponding lines would be nice but a command line output might be even more practical.
You may always use "Find Files" to search for a certain keyword or expression. In my R2012a/Windows version is in Edit > Find Files..., with the keyboard shortcut [CTRL] + [SHIFT] + [F].
The result will be a list of lines where the searched string is found, in all the files found in the specified folder. Please check out the options in the search dialog for more details and flexibility.
Later edit: thanks to #zinjaai, I noticed that #tc88 required that this tool should track the effect of the name of the variable inside the functions/subfunctions. I think this is:
very difficult to achieve. The problem of running trough all the possible values and branching on every possible conditional expression is... well is hard. I think is halting-problem-hard.
in 90% of the case the assumption that the output of a function is influenced by the input is true. But the input and the output are part of the same statement (assigning the result of a function) so looking for where the variable is used as argument should suffice to identify what output variables are affected..
There are perverse cases where functions will alter arguments that are handle-type (because the argument is not copied, but referenced). This side-effect will break the assumption 2, and is one of the main reasons why 1. Outlining the cases when these side effects take place is again, hard, and is better to assume that all of them are modified.
Some other cases are inherently undecidable, because they don't depend on the computer states, but on the state of the "outside world". Example: suppose one calls uigetfile. The function returns a char type when the user selects a file, and a double type for the case when the user chooses not to select a file. Obviously the two cases will be treated differently. How could you know which variables are created/modified before the user deciding?
In conclusion: I think that human intuition, plus the MATLAB Debugger (for run time), and the Find Files (for quick search where a variable is used) and depfun (for quick identification of function dependence) is way cheaper. But I would like to be wrong. :-)

In Powershell, is it a good practice to write multiple functions in a single script file?

I was told to write multiple actions using powershell script. Actions such as Apppool creation, SQL updation, File editing and etc.
I am going to write such a bulk thing in script first time.
So i would like to know the best practice before writting them.
Is it a good practice to write all the function in a single file?
I am thinking at least 10 functions i may need to write. Assuming each function may have 10 lines of code.
Consider modules: the simplest format is a manifest (.psd1) and a single script file (.psm1) containing all the functions, aliases, ... the module exports (plus any internal helpers).
In this case you are clearly putting multiple connected functions in one file. Even if much of the code is only dot-sourced into the script module they are still logically in one entity.
On the other hand using scripts in your path to execute without having to load before hand would tend (as per Adriano's comment to the question) to support one function (at script scope rather than a function statement) makes sense.
Therefore: there is no one "good practice": it all depends on the details of the circumstance.
Be pragmatic, truth come from action, no from words ;O)
So begin, by the beginining :
1) Does the thing you want to do exists somewhere on internet EX PoshCode (if so you can adapt it)
2) Think about your functions (not to much) object : reuse the code (write your algorith in pseudo code)
3) Use internet to look for the functions even existing
4) Wrote all functions in the same file as the main code to test them. During this phase you'll discover new functions and parameters to add or to remove from existing ones
5) Once you have tested your code, put the reusable functions (and the ones they depend on) into one or multiple module.
My solutions will be to create a custom Module where will be possible add function later.
You can save your single file with all functions as mymodule.psm1 in mymodule folder under this path $env:psmodulepath.
then add-module mymodule (or better call it in you $profile to have it ready when console is up)

Program to change/obfuscate all hashes (MD5/SHA1) in a directory tree?

A) Are there any FOSS programs out there that can manage to hashchange all files in a directory tree?
B) Failing that, what methods could be used to develop this capability in a (crappy) self-written program without requiring the program to be sophisticated and content-aware?
C) [Answered] Is there any (roughly) universally safe location within a file (for example, around EOF?) where one could simply append/add psuedorandom data so the resulting hash is different?
Muchos gracias
There is no universally safe location. You would have to inspect every single file and handle its type accordingly, plus there are some formats (mostly proprietary) that are too rigid to modify in any way.