CP-SAT to balance task assignment - or-tools

Dears,
I'm wondering if CP-SAT as any built-in function that allow me to balance the number of tasks assigned to the users.
Now I'm calculating the variance and I'm minimizing it in the objective function, so it would be great to have the possibility to calculate the variance with a built-in function, but I'm wondering if I have alternatives.
Thanks

There are no alternative.
If the number of tasks and of workers is fixed. You can compute the average, and using an epsilon, add the following constraints:
for all workers:
model.Add(sum(assigned_tasks) <= average + epsilon)
model.Add(sum(assigned_tasks) >= average - epsilon)
model.Minimize(epsilon)
This is less precise, but much faster.

Related

How to calculate an exponentially weighted moving average on a conditional basis?

I am using MATLAB R2020a on a MacOS. I have a signal 'cycle_periods' consisting of the cycle periods of an ECG signal on which I would like to perform an exponentially weighted mean, such that older values are less weighted than newer ones. However, I would like this to be done on an element-by-element basis such that a given element is only included in the overall weighted mean if the weighted mean with the current sample does not exceed 1.5 times or go below 0.5 times the weighted mean without the element.
I have used the dsp.MovingAverage function as shown below to calculate the weighted mean, but I am really unsure as to how to manipulate the function to include my conditions.
% Exponentially weighted moving mean for stable cycle periods
movavgExp = dsp.MovingAverage('Method', 'Exponential weighting', 'ForgettingFactor', 0.1);
mean_cycle_period_exp = movavgExp(cycle_period_stable);
I would very much appreciate any help regarding this matter, thanks in advance.

Normalization method for Convolutional Neural Network

There have three common image data normalization methods, which are
1. X = (X - X.mean) / X.std
2. X /= 255. # (based on formula: (X - min) / (max - min) which can converge data into [0, 1].)
3. X = 2 * (X - min) / (max - min) - 1 # converge into [-1, 1]
I found in different CNN tutorials or posts, people may use one of them to normalize data. But I am a bit confused about them, how should I select one in different situations?
Thanks for any explanations in advance.
Broadly speaking, the reason we normalize the images is to make the model converge faster. When the data is not normalized, the shared weights of the network have different calibrations for different features, which can make the cost function to converge very slowly and ineffectively. Normalizing the data makes the cost function much easier to train.
Exactly which normalization method you choose depends on the data that you are dealing with and the assumptions you make about that data. All the above three normalization methods are based on two ideas, that are, centering and scaling. Method 2. involves only scaling the data into a particular range. This makes sure that the scale of the various features is in a similar range and hence gives stable gradients. Method 1. involves centering the data around the mean datapoint and then dividing each dimension of the datapoint with its standard deviation so that all the dimensions hold equal importance for the learning algorithm. This normalization is more effective when you have a reason to believe that different dimensions in the data have vastly different range. Bringing all the dimensions in the same range thus make sharing of the parameters effective. Method 3 can also be seen as somewaht doing the sam job as method 1.

How to understand AnyLogic source generated pattern?

whats the meaning of setting up a new source element from the PML library with the following parameters?
Arrivals defined by: interarrival time;
Interarrival time: exponential(1) - minutes;
how often it generates new agents? I'have read the documentation but I don't understood.
I'm not sure if you're interested in all the statistical part of the distribution so I'll keep it simple.
When you have an exponential distribution exponential(lambda) you have the following:
Expected value E(X) (mean): 1/lambda
Variance V(X): 1/lambda^2
Standard Deviation = sqrt(Variance)
Therefore the standard deviation is the same as the mean.
Those are the basics about the exponential distribution.
In your case, you are using an exponential distribution with lambda = 1 which means the mean will also be 1 so, in average, your interarrival time will be 1.
If you had lambda = 2 your average interarrival time would be 1/2 = 0.5.
Hope that helps

Tolerances in Numerical quadrature - MATLAB

What is the difference between abtol and reltol in MATLAB when performing numerical quadrature?
I have an triple integral that is supposed to generate a number between 0 and 1 and I am wondering what would be the best tolerance for my application?
Any other ideas on decreasing the time of integral3 execution.
Also does anyone know whether integral3 or quadgk is faster?
When performing the integration, MATLAB (or most any other integration software) computes a low-order solution qLow and a high-order solution qHigh.
There are a number of different methods of computing the true error (i.e., how far either qLow or qHigh is from the actual solution qTrue), but MATLAB simply computes an absolute error as the difference between the high and low order integral solutions:
errAbs = abs(qLow - qHigh).
If the integral is truly a large value, that difference may be large in an absolute sense but not a relative sense. For example, errAbs might be 1E3, but qTrue is 1E12; in that case, the method could be said to converge relatively since at least 8 digits of accuracy has been reached.
So MATLAB also considers the relative error :
errRel = abs(qLow - qHigh)/abs(qHigh).
You'll notice I'm treating qHigh as qTrue since it is our best estimate.
Over a given sub-region, if the error estimate falls below either the absolute limit or the relative limit times the current integral estimate, the integral is considered converged. If not, the region is divided, and the calculation repeated.
For the integral function and integral2/integral3 functions with the iterated method, the low-high solutions are a Gauss-Kronrod 7-15 pair (the same 7-th order/15-th order set used by quadgk.
For the integral2/integral3 functions with the tiled method, the low-high solutions are a Gauss-Kronrod 3-7 pair (I've never used this option, so I'm not sure how it compares to others).
Since all of these methods come down to a Gauss-Kronrod quadrature rule, I'd say sticking with integral3 and letting it do the adaptive refinement as needed is the best course.

How can I efficiently model the sum of Bernoullli random variables?

I am using Perl to model a random variable (Y) which is the sum of some ~15-40k independent Bernoulli random variables (X_i), each with a different success probability (p_i). Formally, Y=Sum{X_i} where Pr(X_i=1)=p_i and Pr(X_i=0)=1-p_i.
I am interested in quickly answering queries such as Pr(Y<=k) (where k is given).
Currently, I use random simulations to answer such queries. I randomly draw each X_i according to its p_i, then sum all X_i values to get Y'. I repeat this process a few thousand times and return the fraction of times Pr(Y'<=k).
Obviously, this is not totally accurate, although accuracy greatly increases as the number of simulations I use increases.
Can you think of a reasonable way to get the exact probability?
First, I would avoid using the rand built-in for this purpose which is too dependent on the underlying C library implementation to be reliable (see, for example, my blog post pointing out that the range of rand on Windows has cardinality 32,768).
To use the Monte-Carlo approach, I would start with a known good random generator, such as Rand::MersenneTwister or just use one of Random.org's services and pre-compute a CDF for Y assuming Y is pretty stable. If each Y is only used once, pre-computing the CDF is obviously pointless.
To quote Wikipedia:
In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials.
In other words, it is the probability distribution of the number of successes in a sequence of n independent yes/no experiments with success probabilities p1, …, pn. (emphasis mine)
Closed-Form Expression for the Poisson-Binomial Probability Density Function might be of interest. The article is behind a paywall:
and we discuss several of its advantages regarding computing speed and implementation and in simplifying analysis, with examples of the latter including the computation of moments and the development of new trigonometric identities for the binomial coefficient and the binomial cumulative distribution function (cdf).
As far as I recall, shouldn't this end up asymptotically as a normal distribution? See also this newsgroup thread: http://newsgroups.derkeiler.com/Archive/Sci/sci.stat.consult/2008-05/msg00146.html
If so, you can use Statistics::Distrib::Normal.
To obtain the exact solution you can exploit the fact that the probability distribution of the sum of two or more independent random variables is the convolution of their individual distributions. Convolution is a bit expensive but must be calculated only if the p_i change.
Once you have the probability distribution, you can easily obtain the CDF by calculating the cumulative sum of the probabilities.