Simple small problem takes a lot of time to prove unsatisfiability - minizinc

My code generates minizinc problems on the flight, and recently I faced the following issue:
var int: a;
var int: b;
constraint a < b;
constraint a > b;
solve satisfy;
It takes enormous time (~ 2 mins on my machine) to prove unsatisfiability.
I tried different solvers and search strategies but failed to speed up the solving.
Unfortunately, I can't tie bounds and or/use MIP because of the specifics of my area. For example, I can have other constraints like a * b < 100 or a == arbitrary number.
Any thoughts? I will be very thankful for any advice.
best,

The issue is - as you mention - that there are no bounds on the variables which is not very good for CP solvers. Is there no way that you can give a large domain on the problem? For example with
var -100000..100000: a;
var -100000..100000: b;
then Gecode takes 0.5s to yield UNSAT.
For your original problem (with var int), I tested with some SAT based / non-CP solvers which detect UNSAT quite fast:
geas: 5.282s
picatSAT: 0.115s
optimathSAT: 0.519s
mistral: 0.177s
oscar_cbls: 1.376s
I also tested with the SAT hybrid solvers OR-tools and Chuffed, but they where quite slower on this specific problem.
However, that is just one half of the problem, since the solver should also be fast on non-UNSAT instances, right? Then the SAT solvers might not be the fastest.
One idea is to use a portfolio solver such as SunnyCP which has a pool of different FlatZinc solvers trying to solve the problem in parallel. (Unfortunately, my SunnyCP installation does not work right now so I can't test it.)

Related

Ways to Improve Universal Differential Equation Training with sciml_train

About a month ago I asked a question about strategies for better convergence when training a neural differential equation. I've since gotten that example to work using the advice I was given, but when I applied what the same advice to a more difficult model, I got stuck again. All of my code is in Julia, primarily making use of the DiffEqFlux library. In effort to keep this post as brief as possible, I won't share all of my code for everything I've tried, but if anyone wants access to it to troubleshoot I can provide it.
What I'm Trying to Do
The data I'm trying to learn comes from an SIRx model:
function SIRx!(du, u, p, t)
β, μ, γ, a, b = Float32.([280, 1/50, 365/22, 100, 0.05])
S, I, x = u
du[1] = μ*(1-x) - β*S*I - μ*S
du[2] = β*S*I - (μ+γ)*I
du[3] = a*I - b*x
nothing
end;
The initial condition I used was u0 = Float32.([0.062047128, 1.3126149f-7, 0.9486445]);. I generated data from t=0 to 25, sampled every 0.02 (in training, I only use every 20 points or so for speed, and using more doesn't improve results). The data looks like this: Training Data
The UDE I'm training is
function SIRx_ude!(du, u, p, t)
μ, γ = Float32.([1/50, 365/22])
S,I,x = u
du[1] = μ*(1-x) - μ*S + ann_dS(u, #view p[1:lenS])[1]
du[2] = -(μ+γ)*I + ann_dI(u, #view p[lenS+1:lenS+lenI])[1]
du[3] = ann_dx(u, #view p[lenI+1:end])[1]
nothing
end;
Each of the neural networks (ann_dS, ann_dI, ann_dx) are defined using FastChain(FastDense(3, 20, tanh), FastDense(20, 1)). I tried using a single neural network with 3 inputs and 3 outputs, but it was slower and didn't perform any better. I also tried normalizing inputs to the network first, but it doesn't make a significant difference outside of slowing things down.
What I've Tried
Single shooting
The network just fits a line through the middle of the data. This happens even when I weight the earlier datapoints more in the loss function. Single-shot Training
Multiple Shooting
The best result I had was with multiple shooting. As seen here, it's not simply fitting a straight line, but it's not exactly fitting the data eitherMultiple Shooting Result. I've tried continuity terms ranging from 0.1 to 100 and group sizes from 3 to 30 and it doesn't make a significant difference.
Various Other Strategies
I've also tried iteratively growing the fit, 2-stage training with a collocation, and mini-batching as outlined here: https://diffeqflux.sciml.ai/dev/examples/local_minima, https://diffeqflux.sciml.ai/dev/examples/collocation/, https://diffeqflux.sciml.ai/dev/examples/minibatch/. Iteratively growing the fit works well the first couple of iterations, but as the length increases it goes back to fitting a straight line again. 2-stage collocation training works really well for stage 1, but it doesn't actually improve performance on the second stage (I've tried both single and multiple shooting for the second stage). Finally, mini-batching worked about as well as single-shooting (which is to say not very well) but much more quickly.
My Question
In summary, I have no idea what to try. There are so many strategies, each with so many parameters that can be tweaked. I need a way to diagnose the problem more precisely so I can better decide how to proceed. If anyone has experience with this sort of problem, I'd appreciate any advice or guidance I can get.
This isn't a great SO question because it's more exploratory. Did you lower your ODE tolerances? That would improve your gradient calculation which could help. What activation function are you using? I would use something like softplus instead of tanh so that you don't have the saturating behavior. Did you scale the eigenvalues and take into account the issues explored in the stiff neural ODE paper? Larger neural networks? Different learning rates? ADAM? Etc.
This is much better suited for a forum for discussion like the JuliaLang Discourse. We can continue there since walking through this will not be fruitful without some back and forth.

scipy.integrate.odeint time dependend stepsize

I have the following problem:
I have to use an ode-solver to solve a chemical reaction equation. The rate constants are functions of time and can suddenly change (puls from electric discharge).
One way to solve this is to keep the stepsize very small hmax < dt. This results in a high comp. affort --> time consuming. My question is: Is there an efficient way to make this work? I thought about to def hmax(puls_ON) with plus_ON=True within the puls and plus_ON=False between. However, since dt is increasing in time, it may dose not even recognize the puls, because the time interval is growing hmax=hmax(t).
A time-grid would be the best option I thin, but I don't think this is possible with odeint?
Or is it possible to somehow force the solver to integrate at a specific point in time (e.g. t0 ->(hmax=False)->tpuls_1_start->(hmax=dt)->tpuls_1_end->(hmax=False)->puls_2_start.....)?
thx
There is an optional parameter tcrit for the odeint that you could try:
Vector of critical points (e.g. singularities) where integration care should be taken.
I don't know what it actually does but it may help to not simply step over the pulse.
If that does not work you can of course manually split your integration into different intervals. Integrate until your tpuls_1_start. Then restart the integration using the results from the previous one as initial values.

Using Gurobi to run a MIQP: how can I improve time performance?

I am using Gurobi to run a MIQP (Mixed Integer Quadratic Programming) with linear constraints in Matlab. The solver is very slow and I would like your help to understand whether I can do something about it.
These are the lines which I use to launch the problem
clear model;
clear params;
model.A=[Aineq; Aeq];
model.rhs=[bineq; beq];
model.sense=[repmat('<', size(Aineq,1),1); repmat('=', size(Aeq,1),1)];
model.Q=Q;
model.obj=c;
model.vtype=type;
model.lb=total_lb;
model.ub=total_ub;
params.MIPGap=10^(-1);
result=gurobi(model,params);
This is a screenshot of the output in the Matlab window.
Question 1: It is the first time I am trying to run a MIQP and I would like to have your advice to understand what I can do to improve performance. Let me tell what I have tried so far:
I cheated by imposing params.MIPGap=10^(-1). In this way the phase of node exploration is made shorter. What are the cons of doing this?
I have big-M coefficients and I have tied them to the smallest possible values.
I have tried setting params.ScaleFlag=2; params.ObjScale=2 but it makes things slower
I have changed params.method but it does not seem to help (unless you have some specific recommendation)
I have increase params.Threads but it does not seem to help
Question 2 (minor): Why do I get a negative objective in the root simplex log? How can the objective function be negative?
Without having the full model here, there is not much on advise to give. Tight Big-M formulations are important, but you said, you checked them already. Sometimes splitting them up might help, but this is a complex field.
What might give great benefits for some problems is using the Gurobi parameter tuning tool. So try to export your model and feed the tuning tool with it. It automatically tries different of the hundreds of tuning parameters and might give some nice results.
Regarding the question about negative objectives in the simplex logs, I can think of a couple of possible explanations. First, note that the negative objective values occur in the presence of dual infeasibilities in the dual simplex run. In such a case, I'm not sure exactly what the primal objective values correspond to. Second, if you have a MIQP with products of binaries in the objective, Gurobi may convexify the objective in a way that makes it possible for a negative objective to appear in the reformulated model even when the original model must have a nonnegative objective in any feasible solution.

What is the ideal value of loss function for a GAN

GAN originally proposed by IJ Goodfellow uses following loss function,
D_loss = - log[D(X)] - log[1 - D(G(Z))]
G_loss = - log[D(G(Z))]
So, discriminator tries to minimize D_loss and generator tries to minimize G_loss, where X and Z are training input and noise input respectively. D(.) and G(.) are map for discriminator and generator neural networks respectively.
As original paper says, when GAN is trained for several steps it reaches at a point where neither generator nor discriminator can improve and D(Y) is 0.5 everywhere, Y is some input to the discriminator. In this case, when GAN is sufficiently trained to this point,
D_loss = - log(0.5) - log(1 - 0.5) = 0.693 + 0.693 = 1.386
G_loss = - log(0.5) = 0.693
So, why can we not use D_loss and G_loss values as a metric for evaluating GAN?
If two loss functions deviate away from these ideal values then GAN surely needs to be trained well or architecture needs to designed well. As theorem 1 in the original paper discusses that these are the optimal values for the D_loss and G_loss but then why can't these be used as evaluation metric?
I think this question belongs on Cross-Validated, but anyway :
I struggled with this for quite some time, and wondered why the question wasn't asked.
What follows is where I'm currently at. Not sure if it'll help you, but it is some of my intuition.
G and D losses are good indicators of failure cases...
Of course, if G loss is a really big number and D is zero, then nothing good is happening in your GAN.
... but not good indicators of performance.
I've trained a bunch of GANs and have almost never seen the "0.5/0.5 case" except on very simple examples. Most of the time, you're happy when outputs D(x) and D(G(z)) (and therefore, the losses) are more or less stable. So don't take these values for "gold standard".
A key intuition I was missing was in simultaneousity of G and D training. At the beginning, sure G is really bad at generating stuff, but D is also really bad at discriminating them. As time passes, G gets better, but D also gets better. So after many epochs, we can think that D is really good at discriminating between fake and real. Therefore, even if G "fools" D only 5% of the time (i.e. D(x)=0.95 and D(G(z))=0.05) then it can mean that G is actually pretty good because it fools sometimes a really good discriminator.
As you know, there are not reliable metrics of image quality besides looking at it for the moment, but I've found that for my usecases, G could produce great images while fooling D only a few % of the time.
A corrolary to this simultaneous training is what's happening at the beginning of the training : You can have D(X)=0.5 and D(G(Z))=0.5, and still have G produce almost random images : it's just that D is not good enough yet to tell them apart from real images.
I see it's been a couple months since you've posted this question. If you've gained intuition in the meantime, I'd be happy to hear it !

3D matrix summation (complex double) very slow - possible reasons?

This might sound a little like a beginner's question but bear with me.
I have two large matrices A and B with the dimensions (1502x128x128 complex double)
I am trying to add them together but its seems like a forever lasting processes for some reasons.
I was wondering if you could direct me to a faster way of doing this
So far I have tried the following two scripts:
First:
C=zeros(1502,128,128);
C=[A+B]
Second:
C=zeros(1502,128,128);
for ss=1:128
C(:,:,ss)=squeeze(A(:,:,ss))+squeeze(B(:,:,ss));
end
Is it slow because it is complex and there is no way around that or what are your thoughts on this.
Thank you
How much free memory do you have?
A = rand(1502,128,128)+1i*rand(1502,128,128);
B = rand(1502,128,128)+1i*rand(1502,128,128);
tic
C = A+B;
toc
takes:
Elapsed time is 0.211576 seconds.
and it occupies about 1.2 GB RAM.
I can't imagine another reason than memory issues.
Preallocating (C=zeros(1502,128,128);) is not necessary in this case. But you could try to clear C at the beginning.