Optimize on two correlated arrays in ORTools - or-tools

Is there some general rule on how to model soft constraints of correlation between two arrays of dependent variables in OR-Tools?
I am trying to solve a bit of a complex shift scheduling problem and I cannot wrap my head around it. The staff is split into two teams and the general rule is that only people from one team work except when there is a need to cover vacation or sick days. I imagine something like this:
day 1
day 2
day 3
day 4
day 5
team 1
x
x
-
-
x
team 2
-
-
x
x
-
x - work
'- - rest
and then each worker works primarily on the days where their team works except when there is a need to cover for somebody from the other team:
day 1
day 2
day 3
day 4
day 5
worker 1, team 1
x
x
-
-
x
worker 2, team 1
x
x
-
-
-
worker 3, team 2
-
-
x
x
x
worker 4, team 2
-
-
x
x
-
Notes:
day 5 is an example where worker 2 had to take day off and worker 3 from the other team covers.
there are also other complications (shifts, skills, etc.) for workers, omited for simplicity
Now if I have the below arrays, how do I tell or-tools to assign workers in the working team first and cover with others only when not possible to meet some of the other constraints, i.e. model the soft constraint between workers and teams?
team_assignments = {}
for d in range(num_days):
for t in range(num_teams):
team_assignments[d,t] = model.NewBoolVar(f'day_team:{d},{t}')
work = {}
for d in range(num_days):
for e in range(num_employees):
work[d,e] = model.NewBoolVar(f'emp_day:{e}_{d}')
In other words how do I express the penalty if workers 3 and 4 are working on days 1. 2. and 5?
Note this is doing a hard constraint where I need a soft one:
for d in range(num_days):
for e in range(num_emp):
model.Add(work[d,e] == team_assignments[d,emp_team[e]])

This seems to do the trick:
for d in range(num_days):
for e in range(num_emp):
t = emp_team[e]
works_with_other_team = model.NewBoolVar(f'emp_team:{e}_{t}')
model.Add(works_with_other_team == True).OnlyEnforceIf(work[d,e], team_assignments[d,t].Not())
obj_bool_vars.append(works_with_other_team)
obj_bool_coeffs.append(3) # pick appropriate penalty
and then minimize over var*coeff
model.Minimize(
sum(obj_bool_vars[i] * obj_bool_coeffs[i] for i in range(len(obj_bool_vars)))
)
Initially I was trying to do something like the below and it was failing
model.Add(works_with_other_team == work[d,e] & team_assignments[d,t].Not())
and when this failed with NotImplementedError: calling and on a linear expression is not supported, please use CpModel.AddBoolAnd I went on trying to implement it with AddBoolAnd or AddBoolOr but the much more elegant solution is to use OnlyEnforceIf

Related

How can I partition data based on date column to get subsets of max 1 year (365 consecutive days)

I'm quite new to R, so I still struggle a bit with what may be some of the basics. I have movement data for several individuals, and some of them have been monitored for over a year. In these cases, I'm trying to partition the data into bins of max. one year (i.e. 365 consecutive days) per individual. For example, if one individual was tracked for 800 days, I would need to partition the data into the first 365 days, the second 365 days, and the remaining 70 days. Or, if one individual started on 2019/04/17, I'd like to subset the data until 2020/04/16, with the next subset beginning on 2020/04/17.
Individuals were added in different moments so not all monitoring periods begin on the same day, and per individual there can be more than one observation per day, so many rows share the same date. Naturally I want to use the timestamp column for this, but have been looking for ways and can't seem to find one. Is there a way to tell R to pick the first date and extract the next 365 days?
I could manually calculate each bin and try to partition it by hand, but I was wondering if there was a simpler way for this. I can however separate the data per individual.
Thanks!
My data looks something like this
Date.and.time Ind Lat Long
2019-04-02 08:54:03 Animal_1 Y X
2019-04-02 09:01:13 Animal_2 Y X
2019-04-02 15:45:22 Animal_1 Y X
2019-04-03 17:31:50 Animal_1 Y X
.
.
.
2021-10-14 12:34:56 Animal_1 Y X
2021-10-15 16:05:50 Animal_20 Y X
2021-10-15 22:29:37 Animal_15 Y X

Calculating group means with own group excluded in MATLAB

To be generic the issue is: I need to create group means that exclude own group observations before calculating the mean.
As an example: let's say I have firms, products and product characteristics. Each firm (f=1,...,F) produces several products (i=1,...,I). I would like to create a group mean for a certain characteristic of the product i of firm f, using all products of all firms, excluding firm f product observations.
So I could have a dataset like this:
firm prod width
1 1 30
1 2 10
1 3 20
2 1 25
2 2 15
2 4 40
3 2 10
3 4 35
To reproduce the table:
firm=[1,1,1,2,2,2,3,3]
prod=[1,2,3,1,2,4,2,4]
hp=[30,10,20,25,15,40,10,35]
x=[firm' prod' hp']
Then I want to estimate a mean which will use values of all products of all other firms, that is excluding all firm 1 products. In this case, my grouping is at the firm level. (This mean is to be used as an instrumental variable for the width of all products in firm 1.)
So, the mean that I should find is: (25+15+40+10+35)/5=25
Then repeat the process for other firms.
firm prod width mean_desired
1 1 30 25
1 2 10 25
1 3 20 25
2 1 25
2 2 15
2 4 40
3 2 10
3 4 35
I guess my biggest difficulty is to exclude the own firm values.
This question is related to this page here: Calculating group mean/medians in MATLAB where group ID is in a separate column. But here, we do not exclude the own group.
p.s.: just out of curiosity if anyone works in economics, I am actually trying to construct Hausman or BLP instruments.
Here's a way that avoids loops, but may be memory-expensive. Let x denote your three-column data matrix.
m = bsxfun(#ne, x(:,1).', unique(x(:,1))); % or m = ~sparse(x(:,1), 1:size(x,1), true);
result = m*x(:,3);
result = result./sum(m,2);
This creates a zero-one matrix m such that each row of m multiplied by the width column of x (second line of code) gives the sum of other groups. m is built by comparing each entry in the firm column of x with the unique values of that column (first line). Then, dividing by the respective count of other groups (third line) gives the desired result.
If you need the results repeated as per the original firm column, use result(x(:,1))

How to calculate within a factor in Tableau

Apologies if this question is trivially easy, I'm still learning Tableau.
I have data where the variables Set and Subset are arranged by week (W1 to W52) and by Source (A or B). So if I put Week into Rows and create the calculated fields
SUM(Set)
SUM(Subset)
Rate = {INCLUDE Source: SUM(Subset) / SUM(Set)}
I get data that look like this:
Week SUM(Set) SUM(Subset) Rate
A B A B A B
W1 1234 123 567 56 45.95% 45.53%
So far, so good. But what I really want is the percentage difference between Rate(A) and Rate(B) by week:
Diff = (Rate.A - Rate.B) / Rate.B
I could do this in a second if I were using Excel or R, but I can't seem to figure out how Tableau does it. Help?
There's a built in table calculation "Percent Difference" , you can deploy it using compute using Table across and relative to previous. For that you need to have continuous measures.
Something like this will be the calculation-:
(ZN(SUM([Quantity])) - LOOKUP(ZN(SUM([Quantity])), -1)) / ABS(LOOKUP(ZN(SUM([Quantity])), -1))
Create two different for "Set" & "Subset"

wrong partitions with matlab's cvpartition

I am having trouble with the cvpartition function of Matlab. I want to perform a 5-fold cross-validation (for classification) with a dataset that has 134 instances from class 1 (negative) and 19 intances from class 2 (positive).
With 5-fold CV one should have something like 4 - 4 - 4 - 4 - 3 positive instances partitioned along the 5 folds or close to that (5 - 4 - 3 - 4 - 3 would also be OK). I make 30 repetitions of the 5-fold CV and sometimes Matlab builds partitions like 1 - 5 - 5 -4 - 4 or even 5 - 5 - 5 - 4 - 0 , that is, on of the folds has no positive instances! How is this possible and how can I correct this? At least it should guarantee that the two classes were always represented in each fold...
This brings me problems when trying to compute PRecision, Recall, F-measure and so on...
LS
Are you using the stratified form of cross-validation that cvpartition provides?
Use the second syntax described in the documentation page, i.e. c = cvpartition(group,'kfold',k) rather than c = cvpartition(n,'kfold',k). Here group is a vector (or categorical array, cell array of strings etc) of class labels, and will stratify the selection of observations into folds rather than just splitting everything randomly into groups.

Why does crossvalind fail?

I am using cross valind function on a very small data... However I observe that it gives me incorrect results for the same. Is this supposed to happen ?
I have Matlab R2012a and here is my output
crossvalind('KFold',1:1:11,5)
ans =
2
5
1
3
2
1
5
3
5
1
5
Notice the absence of set 4.. Is this a bug ? I expected atleast 2 elements per set but it gives me 0 in one... and it happens a lot that is the values are not uniformly distributed in the sets.
The help for crossvalind says that the form you are using is: crossvalind(METHOD, GROUP, ...). In this case, GROUP is the e.g. the class labels of your data. So 1:11 as the second argument is confusing here, because it suggests no two examples have the same label. I think this is sufficiently unusual that you shouldn't be surprised if the function does something strange.
I tried doing:
numel(unique(crossvalind('KFold', rand(11, 1) > 0.5, 5)))
and it reliably gave 5 as a result, which is what I would expect; my example would correspond to a two-class problem (I would guess that, as a general rule, you'd want something like numel(unique(group)) <= numel(group) / folds) - my hypothesis would be that it tries to have one example of each class in the Kth fold, and at least 2 examples in every other, with a difference between fold sizes of no more than 1 - but I haven't looked in the code to verify this.
It is possible that you mean to do:
crossvalind('KFold', 11, 5);
which would compute 5 folds for 11 data points - this doesn't attempt to do anything clever with labels, so you would be sure that there will be K folds.
However, in your problem, if you really have very few data points, then it is probably better to do leave-one-out cross validation, which you could do with:
crossvalind('LeaveMOut', 11, 1);
although a better method would be:
for leave_out=1:11
fold_number = (1:11) ~= leave_out;
<code here; where fold_number is 0, this is the leave-one-out example. fold_number = 1 means that the example is in the main fold.>
end