Mathematica in batch mode from the command line on Mac OS X - command-line

I'd like to start writing some unit tests for my Mathematica programs and control everything from the command line with some Makefiles.
It seems like Mathematica can be run from the command line but I can't see any basic instructions on getting started with doing this on Mac OS X — has anyone done this before?
Update:
Creating a test file like this:
Print["hello"];
x := 1;
y = x+1;
z = y+1;
Print["y="ToString#y];
Print["z="ToString#z];
Quit[];
And running it with
/Applications/Mathematica.app/Contents/MacOS/MathKernel -noprompt < test.m
is the closest I can get to some sort of batch processing. The output looks ugly, though; newlines are added for every line of the script!
"hello"
"y=2"
"z=3"
Is this the closest thing I can get to a script that can still output information to the console output? I'm only using Mathematica 6, but I hope that doesn't make a difference.

This, finally, gives output like I'd expect it to:
/Applications/Mathematica.app/Contents/MacOS/MathKernel -noprompt -run "<<test.m"
Makes sense, I suppose. Adding this to my .bash_profile allows easy execution (as in mma test.m):
mma () { /Applications/Mathematica.app/Contents/MacOS/MathKernel -noprompt -run "<<$1" ; }
See also dreeves's mash Perl script, which may offer advantages over this approach.

With some experimentation, I found that /Applications/Mathematica.app/Contents/MacOS/MathKernel can be launched from the command-line. It doesn't seem to accept the usual -h or --help command-line flags, though.

Thanks to Pillsy and Will Robertson for the MASH plug! Here's the relevant StackOverflow question: Call a Mathematica program from the command line, with command-line args, stdin, stdout, and stderr
If you don't use MASH, you may want to use the following utility functions that MASH defines.
For example, the standard Print will print strings with quotation marks -- not usually what you want in scripts.
ARGV = args = Drop[$CommandLine, 4]; (* Command line args. *)
pr = WriteString["stdout", ##]&; (* More *)
prn = pr[##, "\n"]&; (* convenient *)
perr = WriteString["stderr", ##]&; (* print *)
perrn = perr[##, "\n"]&; (* statements. *)
EOF = EndOfFile; (* I wish mathematica *)
eval = ToExpression; (* weren't so damn *)
re = RegularExpression; (* verbose! *)
read[] := InputString[""]; (* Grab a line from stdin. *)
doList[f_, test_] := (* Accumulate list of what f[] *)
Most#NestWhileList[f[]&, f[], test]; (* returns while test is true. *)
readList[] := doList[read, #=!=EOF&]; (* Slurp list'o'lines from stdin *)
To use MASH, just grab that perl file, mash.pl, and then make your test.m like the following:
#!/usr/bin/env /path/to/mash.pl
prn["hello"];
x := 1;
y = x+1;
z = y+1;
prn["y=", y];
prn["z=", z];

Related

Can Julia macros be used to generate code based on specific function implementation?

I am fairly new to Julia and I am learning about metaprogramming.
I would like to write a macro that receive in input a function and returns another function based on the implementation details of its input.
For example given:
function f(x)
x + 100
end
function g(x)
f(x)*x
end
function h(x)
g(x)-0.5*f(x)
end
I would like to write a macro that returns something like that:
function h_traced(x)
f = x + 100
println("loc 1 x: ", x)
g = f * x
println("loc 2 x: ", x)
res = g - 0.5 * f
println("loc 3 x: ", x)
Now both code_lowered and code_typed seems to give me back the AST in the form of CodeInfo, however when I try to use it programmatically in my macro I get empty object.
macro myExpand(f)
body = code_lowered(f)
println("myExpand Body lenght: ",length(body))
end
called like this
#myExpand :(h)
however the same call outside the macro works ok.
code_lowered(h)
At last even the following return an empty CodeInfo.
macro myExpand(f)
body = code_lowered(Symbol("h"))
println("myExpand Body lenght: ",length(body))
end
This might be incredible trivial but I could not work out myseld why the h symbol does not resolve to the function defined. Am I missing something about the scope of symbols?
I find it useful to think about macros as a way to transform an input syntax into an output syntax.
So you could very well define a macro #my_macro such that
#my_macro function h(x)
g(x)-0.5*f(x)
end
would expand to something like
function h_traced(x)
println("entering function: x=", x)
g(x)-0.5*f(x)
end
But to such a macro, h is merely a name, an identifier (technically, a Symbol) that can be transformed into h_traced. h is not the function that is bound to this name (in the same way as x = 2 involves binding a name x, to an integer value 2, but x is not 2; x is merely a name that can be used to refer to 2). In contrast to this, when you call code_lowered(h), h gets evaluated first, and code_lowered is passed its value (which is a function) as argument.
Back to our macro: expanding to an expression that involves the definition of g and f goes way further than mere syntax transformations: we're leaving the purely syntactic domain, since such a transformation would need to "understand" that these are functions, look up their definitions and so on.
You are right to think about code_lowered and friends: this is IMO the adequate level of abstraction for what you're trying to achieve. You should probably look into tools like Cassette.jl or IRTools.jl. That being said, if you're still relatively new to Julia, you might want to get a bit more used to the language before delving too deeply into such topics.
You don't need a macro, you need a generated function. They can not only return code (Expr), but also IR (lowered code). Usually, for this kind of thing, people use Base.uncompressed_ast, not code_lowered. Both Cassette and IRTools simplify the implementation for you, in different ways.
The basic idea is:
Have a generated function that takes a function and its arguments
In that function, get the IR of that function, and modify it to your purposes
Return the new IR from the generated function. This will then be compiled and called on the original arguments.
A short demonstration with IRTools:
julia> IRTools.#dynamo function traced(args...)
ir = IRTools.IR(args...)
p = IRTools.Pipe(ir)
for (v, stmt) in p
IRTools.insertafter!(p, v, IRTools.xcall(println, "loc $v"))
end
return IRTools.finish(p)
end
julia> function h(x)
sin(x)-0.5*cos(x)
end
h (generic function with 1 method)
julia> #code_ir traced(h, 1)
1: (%1, %2)
%3 = Base.getfield(%2, 1)
%4 = Base.getfield(%2, 2)
%5 = Main.sin(%4)
%6 = (println)("loc %3")
%7 = Main.cos(%4)
%8 = (println)("loc %4")
%9 = 0.5 * %7
%10 = (println)("loc %5")
%11 = %5 - %9
%12 = (println)("loc %6")
return %11
julia> traced(h, 1)
loc %3
loc %4
loc %5
loc %6
0.5713198318738266
The rest is left as an exercise. The numbers of the variables are off, because they are, of course, shifted during the transformation. You'd have to add some bookkeeping for that, or use the substitute function on Pipe in some way (but I never quite understood it). If you need the name of the variables, you can get the IR with slots preserved by using a different method of the IR constructor.
(And now the advertisement: I have written something like this. It's currently quite inefficient, but you might get some ideas from it.)

How to create autonumbered predicates in .logic files in LogiQL?

I am trying to set up a project with autoNumbered predicates. I couldn't use the lang:autoNumbered option in .logic files as it gave me the error that it expected a constraint or a lang:ordered.
So I rewrote my code in a .lb file, which worked. The code is as follows:
create --unique
addblock <doc>
node(n), node_id(n:id) -> int(id).
lang:autoNumbered(`node_id).
cons_node[] = n -> node(n).
lang:constructor(`cons_node).
node_has_label[l] = n -> string(l), node(n).
node_attribute[n, k] = v -> node(n), string(k), string(v).
node_attribute_id(id, att, val) <- node_id(n: id), node_attribute[n, att] = val.
</doc>
exec <doc>
+node(n), +cons_node[] = n,
+node_attribute[n, "label"] = "Person",
+node_attribute[n, "name"] = "Alice".
</doc>
echo --- node_att_table:
print node_attribute_id
close --destroy
Now I want to move this into a node.logic and a separate data file. How do I do this while keeping the lang:autoNumbered and lang:constructor commands?
EDIT:
This is the code that I have tried to run:
block(`node) {
export(`{
node(n), node_id(n:id) -> int(id).
lang:autoNumbered(`node_id).
cons_node[] = n -> node(n).
lang:constructor(`cons_node).
node_attribute(n, k; v) -> node(n), string(k), string(v).
})
} <-- .
And I get the error
error parsing block: expected a constraint or lang:ordering pragma (Error BLOCK_PARSE)
on the lang:autoNumbered and lang:constructor lines when I run lb config && make.
Extra info: I use Vagrant to run logicblox and am basing my examples on these blogs: https://developer.logicblox.com/2014/01/structuring-and-compiling-logicblox-applications/
I'm not sure what your original problem was, but this actually should work fine :). You should be able to put the logic in a .logic file and use the addblock --file option. The same applies to the exec logic. Using the tags versus separate files is basically equivalent. This should be identical to including it inline as you did there. If you want to load the data as a csv file, then this should work: https://developer.logicblox.com/content/docs4/core-reference/webhelp/predicates.html#file-predicates
Maybe you earlier tried it from the command-line and the back-tick caused some issues due to its special meaning in shell?

How can I determine disk space in MATLAB

Is there any function in MATLAB that determine free disk space? I have made a temporal function that uses MS-DOS dir command and parses the last line of its output. I think it's working as expected but I guess (1) it won't work in other systems (OS X, Linux, Unix, etx.) and (2) can also fail in different Windows versions. Perhaps someone could improve it to make it more generic? Thanks
The code:
function out = freediskspace
[~,d] = dos('dir');
C = textscan(d,'%s','Delimiter','\n'); C = C{1}{end};
C = strrep(C,',','');
r = regexp(C,'\d+','match');
out = str2double(r{2});
end
You can use a Java call (this works on both Linux and Windows - I have not checked OSX but it should be fine).
function free = getFreeSpace(path)
if nargin < 1 || isempty(path)
path= '.';
end
free = java.io.File(path).getFreeSpace();
end
For example,
>> f = getFreeSpace('C:\')
f =
3.9338e+11

Matlab ShortEng number format via sprintf() and fprintf()?

I like using MATLAB's shortEng notation in the interactive Command Window:
>> a = 123e-12;
>> disp(a);
1.2300e-10 % Scientific notation. Urgh!
>> format shortEng;
>> disp(a);
123.0000e-012 % Engineering notation! :-D
But I want to use fprintf:
>> format shortEng;
>> fprintf('%0.3e', a);
1.2300e-10 % Scientific. Urgh!
How do I print values with fprintf or sprintf with Engineering formatting using the MATLAB Format Operators?
I know I could write my own function to format the values into strings, but I'm looking for something already built into MATLAB.
NOTE: "Engineering" notation differs from "Scientific" in that the exponent is always a multiple of 3.
>> fprintf('%0.3e', a); % This is Scientific notation.
1.230000e-10
There is no way to use directly fprintf format specifier for the format you require. A way around is to use the output of disp as a string to be printed. But disp doesn't return a string, it writes directly to the standard output. So, how to do this?
Here's where evalc (eval with capture of output) comes to the rescue:
%// Create helper function
sdisp = #(x) strtrim(evalc(sprintf('disp(%g)', x)));
%// Test helper function
format ShortEng;
a = 123e-12;
fprintf(1, 'Test: %s', sdisp(a));
This is a workaround, of course, and can backfire in multiple ways because of the untested inputs of the helper functions. But it illustrates a point, and is one of the rare occasions where the reviled eval function family is actually irreplaceable.
You can use the following utility:
http://www.people.fas.harvard.edu/~arcrock/lib118/numutil/unpacknum.m
This will unpack the number also according to a given number N and makes sure that the exponent will be a multiple of N. By putting N=3 you have the Engineering Notation.
More into detail, unpacknum takes 3 arguments: the number x, the base (10 if you want Engineering Notation) and the value N (3 if you want Engineering Notation) and it returns the couple (f,e) which you can use in fprintf().
Check the unpacknum help for a quick example.
This function converts a value into a string in engineering notation:
function sNum = engn(value)
exp= floor(log10(abs(value)));
if ( (exp < 3) && (exp >=0) )
exp = 0; % Display without exponent
else
while (mod(exp, 3))
exp= exp - 1;
end
end
frac=value/(10^exp); % Adjust fraction to exponent
if (exp == 0)
sNum = sprintf('%+8.5G', frac);
else
sNum = sprintf('%+8.5GE%+.2d', frac, exp);
end
end
You can finetune the format to your liking. Usage in combination with fprintf is easy enough:
fprintf('%s\t%s\n', engn(543210.123), engn(-0.0000567)) % +543.21E+03 -56.7E-06
fprintf('%s\t%s\n', engn(-321.123), engn(876543210)) % -321.12 +876.54E+06
You can use the following utility posted to the MATLAB file exchange:
num2eng
It offers extensive control over the formatting of the output string and full input checking, so is more flexible and less prone to error than the simpler evalc approach suggested by user2271770.
It can also output strings using SI prefixes instead of engineering notation, if you prefer.

Performance difference between functions and pattern matching in Mathematica

So Mathematica is different from other dialects of lisp because it blurs the lines between functions and macros. In Mathematica if a user wanted to write a mathematical function they would likely use pattern matching like f[x_]:= x*x instead of f=Function[{x},x*x] though both would return the same result when called with f[x]. My understanding is that the first approach is something equivalent to a lisp macro and in my experience is favored because of the more concise syntax.
So I have two questions, is there a performance difference between executing functions versus the pattern matching/macro approach? Though part of me wouldn't be surprised if functions were actually transformed into some version of macros to allow features like Listable to be implemented.
The reason I care about this question is because of the recent set of questions (1) (2) about trying to catch Mathematica errors in large programs. If most of the computations were defined in terms of Functions, it seems to me that keeping track of the order of evaluation and where the error originated would be easier than trying to catch the error after the input has been rewritten by the successive application of macros/patterns.
The way I understand Mathematica is that it is one giant search replace engine. All functions, variables, and other assignments are essentially stored as rules and during evaluation Mathematica goes through this global rule base and applies them until the resulting expression stops changing.
It follows that the fewer times you have to go through the list of rules the faster the evaluation. Looking at what happens using Trace (using gdelfino's function g and h)
In[1]:= Trace#(#*#)&#x
Out[1]= {x x,x^2}
In[2]:= Trace#g#x
Out[2]= {g[x],x x,x^2}
In[3]:= Trace#h#x
Out[3]= {{h,Function[{x},x x]},Function[{x},x x][x],x x,x^2}
it becomes clear why anonymous functions are fastest and why using Function introduces additional overhead over a simple SetDelayed. I recommend looking at the introduction of Leonid Shifrin's excellent book, where these concepts are explained in some detail.
I have on occasion constructed a Dispatch table of all the functions I need and manually applied it to my starting expression. This provides a significant speed increase over normal evaluation as none of Mathematica's inbuilt functions need to be matched against my expression.
My understanding is that the first approach is something equivalent to a lisp macro and in my experience is favored because of the more concise syntax.
Not really. Mathematica is a term rewriter, as are Lisp macros.
So I have two questions, is there a performance difference between executing functions versus the pattern matching/macro approach?
Yes. Note that you are never really "executing functions" in Mathematica. You are just applying rewrite rules to change one expression into another.
Consider mapping the Sqrt function over a packed array of floating point numbers. The fastest solution in Mathematica is to apply the Sqrt function directly to the packed array because it happens to implement exactly what we want and is optimized for this special case:
In[1] := N#Range[100000];
In[2] := Sqrt[xs]; // AbsoluteTiming
Out[2] = {0.0060000, Null}
We might define a global rewrite rule that has terms of the form sqrt[x] rewritten to Sqrt[x] such that the square root will be calculated:
In[3] := Clear[sqrt];
sqrt[x_] := Sqrt[x];
Map[sqrt, xs]; // AbsoluteTiming
Out[3] = {0.4800007, Null}
Note that this is ~100× slower than the previous solution.
Alternatively, we might define a global rewrite rule that replaces the symbol sqrt with a lambda function that invokes Sqrt:
In[4] := Clear[sqrt];
sqrt = Function[{x}, Sqrt[x]];
Map[sqrt, xs]; // AbsoluteTiming
Out[4] = {0.0500000, Null}
Note that this is ~10× faster than the previous solution.
Why? Because the slow second solution is looking up the rewrite rule sqrt[x_] :> Sqrt[x] in the inner loop (for each element of the array) whereas the fast third solution looks up the value Function[...] of the symbol sqrt once and then applies that lambda function repeatedly. In contrast, the fastest first solution is a loop calling sqrt written in C. So searching the global rewrite rules is extremely expensive and term rewriting is expensive.
If so, why is Sqrt ever fast? You might expect a 2× slowdown instead of 10× because we've replaced one lookup for Sqrt with two lookups for sqrt and Sqrt in the inner loop but this is not so because Sqrt has the special status of being a built-in function that will be matched in the core of the Mathematica term rewriter itself rather than via the general-purpose global rewrite table.
Other people have described much smaller performance differences between similar functions. I believe the performance differences in those cases are just minor differences in the exact implementation of Mathematica's internals. The biggest issue with Mathematica is the global rewrite table. In particular, this is where Mathematica diverges from traditional term-level interpreters.
You can learn a lot about Mathematica's performance by writing mini Mathematica implementations. In this case, the above solutions might be compiled to (for example) F#. The array may be created like this:
> let xs = [|1.0..100000.0|];;
...
The built-in sqrt function can be converted into a closure and given to the map function like this:
> Array.map sqrt xs;;
Real: 00:00:00.006, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
...
This takes 6ms just like Sqrt[xs] in Mathematica. But that is to be expected because this code has been JIT compiled down to machine code by .NET for fast evaluation.
Looking up rewrite rules in Mathematica's global rewrite table is similar to looking up the closure in a dictionary keyed on its function name. Such a dictionary can be constructed like this in F#:
> open System.Collections.Generic;;
> let fns = Dictionary<string, (obj -> obj)>(dict["sqrt", unbox >> sqrt >> box]);;
This is similar to the DownValues data structure in Mathematica, except that we aren't searching multiple resulting rules for the first to match on the function arguments.
The program then becomes:
> Array.map (fun x -> fns.["sqrt"] (box x)) xs;;
Real: 00:00:00.044, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
...
Note that we get a similar 10× performance degradation due to the hash table lookup in the inner loop.
An alternative would be to store the DownValues associated with a symbol in the symbol itself in order to avoid the hash table lookup.
We can even write a complete term rewriter in just a few lines of code. Terms may be expressed as values of the following type:
> type expr =
| Float of float
| Symbol of string
| Packed of float []
| Apply of expr * expr [];;
Note that Packed implements Mathematica's packed lists, i.e. unboxed arrays.
The following init function constructs a List with n elements using the function f, returning a Packed if every return value was a Float or a more general Apply(Symbol "List", ...) otherwise:
> let init n f =
let rec packed ys i =
if i=n then Packed ys else
match f i with
| Float y ->
ys.[i] <- y
packed ys (i+1)
| y ->
Apply(Symbol "List", Array.init n (fun j ->
if j<i then Float ys.[i]
elif j=i then y
else f j))
packed (Array.zeroCreate n) 0;;
val init : int -> (int -> expr) -> expr
The following rule function uses pattern matching to identify expressions that it can understand and replaces them with other expressions:
> let rec rule = function
| Apply(Symbol "Sqrt", [|Float x|]) ->
Float(sqrt x)
| Apply(Symbol "Map", [|f; Packed xs|]) ->
init xs.Length (fun i -> rule(Apply(f, [|Float xs.[i]|])))
| f -> f;;
val rule : expr -> expr
Note that the type of this function expr -> expr is characteristic of term rewriting: rewriting replaces expressions with other expressions rather than reducing them to values.
Our program can now be defined and executed by our custom term rewriter:
> rule (Apply(Symbol "Map", [|Symbol "Sqrt"; Packed xs|]));;
Real: 00:00:00.049, CPU: 00:00:00.046, GC gen0: 24, gen1: 0, gen2: 0
We've recovered the performance of Map[Sqrt, xs] in Mathematica!
We can even recover the performance of Sqrt[xs] by adding an appropriate rule:
| Apply(Symbol "Sqrt", [|Packed xs|]) ->
Packed(Array.map sqrt xs)
I wrote an article on term rewriting in F#.
Some measurements
Based on #gdelfino answer and comments by #rcollyer I made this small program:
j = # # + # # &;
g[x_] := x x + x x ;
h = Function[{x}, x x + x x ];
anon = Table[Timing[Do[ # # + # # &[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
jj = Table[Timing[Do[ j[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
gg = Table[Timing[Do[ g[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
hh = Table[Timing[Do[ h[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
ListLinePlot[ {anon, jj, gg, hh},
PlotStyle -> {Black, Red, Green, Blue},
PlotRange -> All]
The results are, at least for me, very surprising:
Any explanations? Please feel free to edit this answer (comments are a mess for long text)
Edit
Tested with the identity function f[x] = x to isolate the parsing from the actual evaluation. Results (same colors):
Note: results are very similar to this Plot for constant functions (f[x]:=1);
Pattern matching seems faster:
In[1]:= g[x_] := x*x
In[2]:= h = Function[{x}, x*x];
In[3]:= Do[h[RandomInteger[100]], {1000000}] // Timing
Out[3]= {1.53927, Null}
In[4]:= Do[g[RandomInteger[100]], {1000000}] // Timing
Out[4]= {1.15919, Null}
Pattern matching is also more flexible as it allows you to overload a definition:
In[5]:= g[x_] := x * x
In[6]:= g[x_,y_] := x * y
For simple functions you can compile to get the best performance:
In[7]:= k[x_] = Compile[{x}, x*x]
In[8]:= Do[k[RandomInteger[100]], {100000}] // Timing
Out[8]= {0.083517, Null}
You can use function recordSteps in previous answer to see what Mathematica actually does with Functions. It treats it just like any other Head. IE, suppose you have the following
f = Function[{x}, x + 2];
f[2]
It first transforms f[2] into
Function[{x}, x + 2][2]
At the next step, x+2 is transformed into 2+2. Essentially, "Function" evaluation behaves like an application of pattern matching rules, so it shouldn't be surprising that it's not faster.
You can think of everything in Mathematica as an expression, where evaluation is the process of rewriting parts of the expression in a predefined sequence, this applies to Function like to any other head