sympy derivative with boolean - boolean

I am trying to take the derivative of a function including a boolean variable with sympy.
My expected result:
Two different derivatives, depending on the boolean being either True or False (i.e. 1 or 0).
Example:
import sympy as sy
c, x = sy.symbols("c x", positive=True, real=True)
bo = sy.Function("bo")
fct1 = sy.Function("fct1")
fct2 = sy.Function("fct2")
FOC2 = sy.Function("FOC2")
y = 5
a = 2
b = 4
def fct1(x):
return -0.004*x**2 + 0.25*x + 4
# the following gives the smaller positive intercept with the x-axis)
# this intercept is the threshold value for the boolean function, bo
min(sy.solve(fct1(x)-y, x))
def bo(x):
if fct1(x) <= y:
return 1
else:
return 0
def fct2(c, x):
return a + b*c + bo(x)*c
def FOC2(c, x):
return sy.diff(fct2(c, x), c)
print(FOC2(c, x))
The min-function after the comments shows me the threshold of x for bo being True or False would be 4.29..., thus positive and real.
Output:
TypeError: cannot determine truth value of Relation
I understand that the truth value depends on x, which is a symbol. Thus, without knowing x one cannot determine bo.
But how would I get my expected result, where bo is symbolic?

First off, I would advise you to carefully consider what is going on in your code the way it is pasted above. You first define a few sympy functions, e.g.
fct1 = sy.Function("fct1")
So after this, fct1 is an undefined sympy.Function - undefined in the sense that it is neither specified what its arguments are, nor what the function looks like.
However, then you define same-named functions explicitly, as in
def fct1(x):
return -0.004*x**2 + 0.25*x + 4
Note however, that at this point, fct1 ceases to be a sympy.Function, or any sympy object for that matter: you overwrite the old definition, and it is now just a regular python function!
This is also the reason that you get the error: when you call bo(x), python tries to evaluate
-0.004*x**2 + 0.25*x + 4 <= 5
and return a value according to your definition of bo(). But python does not know whether the above is true (or how to make that comparison), so it complains.
I would suggest 2 changes:
Instead of python functions, as in the code, you could simply use sympy expressions, e.g.
fct1 = -0.004*x**2 + 0.25*x + 4
To get the truth value of your condition, I would suggest to use the Heaviside function (wiki), which evaluates to 0 for a negative argument, and to 1 for positive. Its implementation in sympy is sympy.Heaviside.
Your code could then look as follows:
import sympy as sy
c, x = sy.symbols("c x", positive=True, real=True)
y = 5
a = 2
b = 4
fct1 = -0.004*x**2 + 0.25*x + 4
bo = sy.Heaviside(y - fct1)
fct2 = a + b*c + bo * c
FOC2 = sy.diff(fct2, c)
print(FOC2)
Two comments on the line
bo = sy.Heaviside(y - fct1)
(1) The current implementation does not evaluate sympy.Heaviside(0)by default; this is beacause there's differing definitions around (some define it to be 1, others 1/2). You'd want it to be 1, to be in accordance with the (weak) inequality in the OP. In sympy 1.1, this can be achieved by passing an additional argument to Heaviside, namely whatever you want Heaviside(0) to evaluate to:
bo = sy.Heaviside(y - fct1, 1)
This is not supported in older versions of sympy.
(2) You will get your FOC2, again involving a Heaviside term. What I like about this, is that you could keep working with this expression, say if you wanted to take a second derivative and so on. If, for the sake of readability, you would prefer a piecewise expression - no problem. Just replace the according line with
bo = sy.Heaviside(y - fct1)._eval_rewrite_as_Piecewise(y-fct1)
Which will translate to a piecewise function automatically. (note that under older versions, this automatically implicitly uses Heaviside(0) = 0.5 - best to use (1) and (2) together:
bo = sy.Heaviside(y - fct1, 1)._eval_rewrite_as_Piecewise(y-fct1)
Unfortunately, I don't have a working sympy 1.1 at my hands right now and can only test the old code.
One more noteconcerning sympy's piecewise functions: they are much more readable if using sympy's latex printing, by inserting
sy.init_printing()
early in the code.
(Disclaimer: I am by no means an expert in sympy, and there might be other, preferable solutions out there. Just trying to make a suggestion!)

Related

application of minimize_scalar

I am trying to replicate the results posted here (How to use scipy.optimize minimize_scalar when objective function has multiple arguments?) using a different structure. The goal is exactly the same but I just want to code it in a different way. Here is my code:
def mini(g,a,b,args):
object=lambda w1: g(w1, *args)
result=minimize_scalar(object, bounds=(a, b))
minzer, minval=result.x, result.fun
return minzer,minval
def errorr(w0,w1,x,y):
y_pred = w0 + w1*x
mse = ((y-y_pred)**2).mean()
return mse
x = np.array([1,2,3])
y = np.array([52,54,56])
w0=50
mini(errorr, -5, 5, (w0,x,y))
However, the results obtained using my code is quite different from the one in the original posts. I am wondering where did I make the mistake in my code that caused the different results. Thanks!
Since you use lambda w1: g(w1, *args), you are minimizing with respect to the first function argument w0. To minimize with respect to w1, you can write lambda w1: g(args[0], w1, *args[1:]) instead.
However, please avoid python keywords as variable names (e.g. object). In addition, a lambda function is an anonymous function, so assigning it to a variable contradicts its purpose. Consequently, I'd propose either
def mini(g,a,b,args):
def obj_fun(w1): return g(args[0], w1, *args[1:])
result = minimize_scalar(obj_fun, bounds=(a, b))
return result.x, result.fun
or
def mini(g,a,b,args):
result = minimize_scalar(lambda w1: g(args[0], w1, *args[1:]), bounds=(a, b))
return result.x, result.fun

scipy.optimize failure with a "vectorized" implementation

I have an optimization problem (1d) coded in 2 ways - one using a for loop and an other using numpy arrays. The for loop version works fine but the numpy one fails.
Actually it is a bit more complicated, it can work with different starting points (!!) or if I choose an other optimization algo like CG.
The 2 versions (functions and gradients) are giving the same results and the returned types are also the same as far as I can tell.
Here is my example, what am I missing?
import numpy as np
from scipy.optimize import minimize
# local params
v1 = np.array([1., 1.])
v2 = np.array([1., 2.])
# local functions
def f1(x):
s = 0
for i in range(len(v1)):
s += (v1[i]*x-v2[i])**2
return 0.5*s/len(v1)
def df1(x):
g = 0
for i in range(len(v1)):
g += v1[i]*(v1[i]*x-v2[i])
return g/len(v1)
def f2(x):
return 0.5*np.sum((v1*x-v2)**2)/len(v1)
def df2(x):
return np.sum(v1*(v1*x-v2))/len(v1)
x0 = 10. # x0 = 2 works
# tests...
assert np.abs(f1(x0)-f2(x0)) < 1.e-6 and np.abs(df1(x0)-df2(x0)) < 1.e-6 \
and np.abs((f1(x0+1.e-6)-f1(x0))/(1.e-6)-df1(x0)) < 1.e-4
# BFGS for f1: OK
o = minimize(f1, x0, method='BFGS', jac=df1)
if not o.success:
print('FAILURE', o)
else:
print('SUCCESS min = %f reached at %f' % (f1(o.x[0]), o.x[0]))
# BFGS for f2: failure
o = minimize(f2, x0, method='BFGS', jac=df2)
if not o.success:
print('FAILURE', o)
else:
print('SUCCESS min = %f reached at %f' % (f2(o.x[0]), o.x[0]))
The error I get is
A1 = I - sk[:, numpy.newaxis] * yk[numpy.newaxis, :] * rhok
IndexError: invalid index to scalar variable.
but I doesn't really helps me since it can work with some other starting values.
I am using an all new fresh python install (python 3.5.2, scipy 0.18.1 and numpy 1.11.3).
The solver expects the return value of jacobian df2 to be the same shape as its input x. Even though you passed in a scalar here, it's actually converted into a single element ndarray. Since you used np.sum, your result became scalar and that causes strange things to happen.
Enclose the scalar result of df2 with np.array, and your code should work.

How does rowfun know to reference variables inside a table

From the documentation, we see the following example:
g = gallery('integerdata',3,[15,1],1);
x = gallery('uniformdata',[15,1],9);
y = gallery('uniformdata',[15,1],2);
A = table(g,x,y)
func = #(x, y) (x - y);
B = rowfun(func,A,...
'GroupingVariable','g',...
'OutputVariableName','MeanDiff')
When the function func is applied to A in rowfun how does it know that there are variables in A called x and y?
EDIT: I feel that my last statement must not be true, as you do not get the same result if you did A = table(g, y, x).
I am still very confused by how rowfun can use a function that does not actually use any variables defined within the calling environment.
Unless you specify the rows (and their order) with the Name/Value argument InputVariables, Matlab will simply take column 1 as first input, column 2 as second input etc, ignoring eventual grouping columns.
Consequently, for better readability and maintainability of your code, I consider it good practice to always specify InputVariables explicitly.

What are # and : used for in Qbasic?

I have a legacy code doing math calculations. It is reportedly written in QBasic, and runs under VB6 successfully. I plan to write the code into a newer language/platform. For which I must first work backwards and come up with a detailed algorithm from existing code.
The problem is I can't understand syntax of few lines:
Dim a(1 to 200) as Double
Dim b as Double
Dim f(1 to 200) as Double
Dim g(1 to 200) as Double
For i = 1 to N
a(i) = b: a(i+N) = c
f(i) = 1#: g(i) = 0#
f(i+N) = 0#: g(i+N) = 1#
Next i
Based on my work with VB5 like 9 years ago, I am guessing that a, f and g are Double arrays indexed from 1 to 200. However, I am completely lost about this use of # and : together inside the body of the for-loop.
: is the line continuation character, it allows you to chain multiple statements on the same line. a(i) = b: a(i+N) = c is equivalent to:
a(i)=b
a(i+N)=c
# is a type specifier. It specifies that the number it follows should be treated as a double.
I haven't programmed in QBasic for a while but I did extensively in highschool. The # symbol indicates a particular data type. It is to designate the RHS value as a floating point number with double precision (similar to saying 1.0f in C to make 1.0 a single-precision float). The colon symbol is similar to the semicolon in C, as well, where it delimits different commands. For instance:
a(i) = b: a(i+N) = c
is, in C:
a[i] = b; a[i+N] = c;

Performance difference between functions and pattern matching in Mathematica

So Mathematica is different from other dialects of lisp because it blurs the lines between functions and macros. In Mathematica if a user wanted to write a mathematical function they would likely use pattern matching like f[x_]:= x*x instead of f=Function[{x},x*x] though both would return the same result when called with f[x]. My understanding is that the first approach is something equivalent to a lisp macro and in my experience is favored because of the more concise syntax.
So I have two questions, is there a performance difference between executing functions versus the pattern matching/macro approach? Though part of me wouldn't be surprised if functions were actually transformed into some version of macros to allow features like Listable to be implemented.
The reason I care about this question is because of the recent set of questions (1) (2) about trying to catch Mathematica errors in large programs. If most of the computations were defined in terms of Functions, it seems to me that keeping track of the order of evaluation and where the error originated would be easier than trying to catch the error after the input has been rewritten by the successive application of macros/patterns.
The way I understand Mathematica is that it is one giant search replace engine. All functions, variables, and other assignments are essentially stored as rules and during evaluation Mathematica goes through this global rule base and applies them until the resulting expression stops changing.
It follows that the fewer times you have to go through the list of rules the faster the evaluation. Looking at what happens using Trace (using gdelfino's function g and h)
In[1]:= Trace#(#*#)&#x
Out[1]= {x x,x^2}
In[2]:= Trace#g#x
Out[2]= {g[x],x x,x^2}
In[3]:= Trace#h#x
Out[3]= {{h,Function[{x},x x]},Function[{x},x x][x],x x,x^2}
it becomes clear why anonymous functions are fastest and why using Function introduces additional overhead over a simple SetDelayed. I recommend looking at the introduction of Leonid Shifrin's excellent book, where these concepts are explained in some detail.
I have on occasion constructed a Dispatch table of all the functions I need and manually applied it to my starting expression. This provides a significant speed increase over normal evaluation as none of Mathematica's inbuilt functions need to be matched against my expression.
My understanding is that the first approach is something equivalent to a lisp macro and in my experience is favored because of the more concise syntax.
Not really. Mathematica is a term rewriter, as are Lisp macros.
So I have two questions, is there a performance difference between executing functions versus the pattern matching/macro approach?
Yes. Note that you are never really "executing functions" in Mathematica. You are just applying rewrite rules to change one expression into another.
Consider mapping the Sqrt function over a packed array of floating point numbers. The fastest solution in Mathematica is to apply the Sqrt function directly to the packed array because it happens to implement exactly what we want and is optimized for this special case:
In[1] := N#Range[100000];
In[2] := Sqrt[xs]; // AbsoluteTiming
Out[2] = {0.0060000, Null}
We might define a global rewrite rule that has terms of the form sqrt[x] rewritten to Sqrt[x] such that the square root will be calculated:
In[3] := Clear[sqrt];
sqrt[x_] := Sqrt[x];
Map[sqrt, xs]; // AbsoluteTiming
Out[3] = {0.4800007, Null}
Note that this is ~100× slower than the previous solution.
Alternatively, we might define a global rewrite rule that replaces the symbol sqrt with a lambda function that invokes Sqrt:
In[4] := Clear[sqrt];
sqrt = Function[{x}, Sqrt[x]];
Map[sqrt, xs]; // AbsoluteTiming
Out[4] = {0.0500000, Null}
Note that this is ~10× faster than the previous solution.
Why? Because the slow second solution is looking up the rewrite rule sqrt[x_] :> Sqrt[x] in the inner loop (for each element of the array) whereas the fast third solution looks up the value Function[...] of the symbol sqrt once and then applies that lambda function repeatedly. In contrast, the fastest first solution is a loop calling sqrt written in C. So searching the global rewrite rules is extremely expensive and term rewriting is expensive.
If so, why is Sqrt ever fast? You might expect a 2× slowdown instead of 10× because we've replaced one lookup for Sqrt with two lookups for sqrt and Sqrt in the inner loop but this is not so because Sqrt has the special status of being a built-in function that will be matched in the core of the Mathematica term rewriter itself rather than via the general-purpose global rewrite table.
Other people have described much smaller performance differences between similar functions. I believe the performance differences in those cases are just minor differences in the exact implementation of Mathematica's internals. The biggest issue with Mathematica is the global rewrite table. In particular, this is where Mathematica diverges from traditional term-level interpreters.
You can learn a lot about Mathematica's performance by writing mini Mathematica implementations. In this case, the above solutions might be compiled to (for example) F#. The array may be created like this:
> let xs = [|1.0..100000.0|];;
...
The built-in sqrt function can be converted into a closure and given to the map function like this:
> Array.map sqrt xs;;
Real: 00:00:00.006, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
...
This takes 6ms just like Sqrt[xs] in Mathematica. But that is to be expected because this code has been JIT compiled down to machine code by .NET for fast evaluation.
Looking up rewrite rules in Mathematica's global rewrite table is similar to looking up the closure in a dictionary keyed on its function name. Such a dictionary can be constructed like this in F#:
> open System.Collections.Generic;;
> let fns = Dictionary<string, (obj -> obj)>(dict["sqrt", unbox >> sqrt >> box]);;
This is similar to the DownValues data structure in Mathematica, except that we aren't searching multiple resulting rules for the first to match on the function arguments.
The program then becomes:
> Array.map (fun x -> fns.["sqrt"] (box x)) xs;;
Real: 00:00:00.044, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
...
Note that we get a similar 10× performance degradation due to the hash table lookup in the inner loop.
An alternative would be to store the DownValues associated with a symbol in the symbol itself in order to avoid the hash table lookup.
We can even write a complete term rewriter in just a few lines of code. Terms may be expressed as values of the following type:
> type expr =
| Float of float
| Symbol of string
| Packed of float []
| Apply of expr * expr [];;
Note that Packed implements Mathematica's packed lists, i.e. unboxed arrays.
The following init function constructs a List with n elements using the function f, returning a Packed if every return value was a Float or a more general Apply(Symbol "List", ...) otherwise:
> let init n f =
let rec packed ys i =
if i=n then Packed ys else
match f i with
| Float y ->
ys.[i] <- y
packed ys (i+1)
| y ->
Apply(Symbol "List", Array.init n (fun j ->
if j<i then Float ys.[i]
elif j=i then y
else f j))
packed (Array.zeroCreate n) 0;;
val init : int -> (int -> expr) -> expr
The following rule function uses pattern matching to identify expressions that it can understand and replaces them with other expressions:
> let rec rule = function
| Apply(Symbol "Sqrt", [|Float x|]) ->
Float(sqrt x)
| Apply(Symbol "Map", [|f; Packed xs|]) ->
init xs.Length (fun i -> rule(Apply(f, [|Float xs.[i]|])))
| f -> f;;
val rule : expr -> expr
Note that the type of this function expr -> expr is characteristic of term rewriting: rewriting replaces expressions with other expressions rather than reducing them to values.
Our program can now be defined and executed by our custom term rewriter:
> rule (Apply(Symbol "Map", [|Symbol "Sqrt"; Packed xs|]));;
Real: 00:00:00.049, CPU: 00:00:00.046, GC gen0: 24, gen1: 0, gen2: 0
We've recovered the performance of Map[Sqrt, xs] in Mathematica!
We can even recover the performance of Sqrt[xs] by adding an appropriate rule:
| Apply(Symbol "Sqrt", [|Packed xs|]) ->
Packed(Array.map sqrt xs)
I wrote an article on term rewriting in F#.
Some measurements
Based on #gdelfino answer and comments by #rcollyer I made this small program:
j = # # + # # &;
g[x_] := x x + x x ;
h = Function[{x}, x x + x x ];
anon = Table[Timing[Do[ # # + # # &[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
jj = Table[Timing[Do[ j[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
gg = Table[Timing[Do[ g[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
hh = Table[Timing[Do[ h[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
ListLinePlot[ {anon, jj, gg, hh},
PlotStyle -> {Black, Red, Green, Blue},
PlotRange -> All]
The results are, at least for me, very surprising:
Any explanations? Please feel free to edit this answer (comments are a mess for long text)
Edit
Tested with the identity function f[x] = x to isolate the parsing from the actual evaluation. Results (same colors):
Note: results are very similar to this Plot for constant functions (f[x]:=1);
Pattern matching seems faster:
In[1]:= g[x_] := x*x
In[2]:= h = Function[{x}, x*x];
In[3]:= Do[h[RandomInteger[100]], {1000000}] // Timing
Out[3]= {1.53927, Null}
In[4]:= Do[g[RandomInteger[100]], {1000000}] // Timing
Out[4]= {1.15919, Null}
Pattern matching is also more flexible as it allows you to overload a definition:
In[5]:= g[x_] := x * x
In[6]:= g[x_,y_] := x * y
For simple functions you can compile to get the best performance:
In[7]:= k[x_] = Compile[{x}, x*x]
In[8]:= Do[k[RandomInteger[100]], {100000}] // Timing
Out[8]= {0.083517, Null}
You can use function recordSteps in previous answer to see what Mathematica actually does with Functions. It treats it just like any other Head. IE, suppose you have the following
f = Function[{x}, x + 2];
f[2]
It first transforms f[2] into
Function[{x}, x + 2][2]
At the next step, x+2 is transformed into 2+2. Essentially, "Function" evaluation behaves like an application of pattern matching rules, so it shouldn't be surprising that it's not faster.
You can think of everything in Mathematica as an expression, where evaluation is the process of rewriting parts of the expression in a predefined sequence, this applies to Function like to any other head