Multi pickup locations for a single delivery - or-tools

I have a list of orders, each consists of a disjunction of pickup and a disjunction of delivery nodes (using AddDisjunction with a positive penalty and max cardinality of 1).
Some of these orders form groups that must be delivered to the same location at the same time by the same vehicle, or not at all.
AddPickupAndDelivery/AddPickupAndDeliverySets cannot be used on the same node/disjunction twice so I cannot merge the delivery disjunctions into one and link all pickup disjunctions to it.
I have tried setting the NextVar of one delivery disjunction to the other delivery disjunction, however the other disjunction was still sometimes reached without the first one (but not vice versa).
I have tried combining the NextVar method with giving penalty for reaching only part of the delivery nodes, in two different ways:
first by using AddSoftSameVehicleConstraint, however it did not give penalty for unperformed nodes,
second by creating a new dimension with positive arc values for reaching all disjunctions other than the first one in the NextVar chain, and a negative arc value for reaching the latter, which is the only one that can be reached only if all the rest of the disjunctions were reached. Combined with SetSpanCostCoefficientForAllVehicles and a big cumul var start value at the start nodes, the idea was that reaching part of the nodes would induce a positive span, while reaching all of them would reset the span back to 0.
However at this point the algorithm stopped reaching any nodes, I presume due to the fact that the local search operators do not include a single addition of multiple nodes, and each addition of a single node induces a higher cost. Is there a way of implementing multiple pickups to a single delivery which abides the constraints I have stated while using the python version of or-tools?


Why Add Disjunction in VRP solver Not Working As Expected?

cross posted from OR-tools google group
I am working with a multi-vehicle VRP with due dates over 5 periods (ex. some nodes are due at time t=0, so I give them a penalty cost of 10000 and 1000 for all other nodes, etc. till period 4). Initially I followed the exact steps as laid out here with "AddDisjunction" to set priorities for certain nodes, and so expected that the solution would always pick up more, rather than less, nodes. However, in my example code , you'll see that the solver is dropping multiple nodes with smaller demand and picking up nodes with larger demand instead. I came across this same issue when working on a single-vehicle problem, but I was able to use AddSoftSameVehicleConstraint as a workaround.
My code here: I would direct you to the cell titled "Basic SVRP" onwards; the cells prior are for data generation. The most important thing in the output is that, all nodes starting with "a" or "b" are only of demand 1-2 units, while "c" and "d" are of demands between 4-8 units. Therefore, I should ideally see nodes of "a" and "b" only dropped as a last resort.
Any help here to rectify this would be greatly appreciated, happy to simplify/clarify where needed.

load factor in separate chaining?

Why is it recommended to have a load factor of 1.0 in separate chaining?
I've seen plenty of people saying that it is recommended, but not given a clear explanation of why.
With open addressing, I know the load factor should be between 0.5 and 0.7 because it should be a fast operation to find an unoccupied index when dealing with collisions. But I can't see why a load factor of 1 should be better in separate chaining. I mean, if I have a table of size 100, isn't there still a chance that all 100 elements hashes to the same index and get placed in the same list? So I really can't comprehend why this specific load factor for separate chaining should be 1.
tl;dr: To save memory space by not having slots unoccupied and speed up access by minimizing the number of list traversal operation.
If you understand the load factor as n_used_slots / n_total_slots:
Having a load factor of 1 just describes the ideal situation for a well-implemented hash table using Separate Chaining collision handling: no slots are left empty.
The other classical approach, Open Addressing, requires the table to always have a free slot available when adding a new item. Resizing the table is way too costly to do it for each item, but we are also restricted on memory and wouldn’t want to have too many unused slots lying around. One has to find a balance between speed (few table resizes, quick inserts and lookups) and memory (few empty slots) [as ever so often in programming]. The ideal load factor is based on this idea of balancing and can be estimated based on the actual hash function, the value domain and other factors.
With Separate Chaining, on the other hand, we usually expect from the start to have (way) more items than available hash table slots. If a collision occurs, we need to add the item to the linked list stored in a specific slot. Since searching in a linked list is costly, we would like to minimize list traversal operations. For that, the best case is to have all slots filled with lists of ideally the same length! Having all slots filled corresponds to a load factor of 1.
To put it another way: A load factor < 1 means that there are empty slots and items had to be added to a linked list in another slot, increasing the number of list traversal operations and wasting some memory.
Concerning your example of a table with size 100: yes, there is a chance that all items collide and occupy just one single slot. In that case, the effective load factor would be 0.01 and performance would be heavily impacted.
If you understand the load factor as n_items / n_total_slots:
In that case, the load factor can be larger than 1. A factor < 1 means you have empty slots, while factor > 1 means that there are slots holding more than one item and consequently, list traversals are required. In the first case, you are wasting space and in the second case list traversals lead to a (small) performance hit, depending on the size of the lists.
Example: A load factor of 10 means that on average each slot holds 10 items. Searching for an item therefore implies traversing 5 list nodes on average.
A load factor of 1 means you waste no space and have the fastest lookup, if you use a decent hash function that ensures a regular and evenly balanced usage of slots.

AlphaGo Zero board evaluation function uses multiple time steps as an input... Why?

According to AlphaGo Cheat Sheet, AlphaGo Zero uses a sequence of consecutive board configurations to encode its game state.
In theory, all the necessary information is contained in the latest state, and yet they include the previous 7 configurations.
Why did they choose to inject so much complexity ?
What are they listening for ??
The sole reason is because in all games - Go, Chess, and Shogi - there is a repetition rule. What this means is that the game is not fully observable from the current board position. In other words, there may be two identical positions with two very different evaluations. For example in one Go position there may be a winning move, but in an identical Go position that move is either illegal or one of the next few moves in the would-be-winning continuation creates an illegal position.
You could try feeding in only the current board position and handling repetitions in the tree only. But I think this would be weaker because the evaluation function would be wrong in some cases, leading to a horizon effect if that branch of the tree had not been explored deeply enough to correct the problem.

How should one set up the immediate reward in a RL program?

I want my RL agent to reach the goal as quickly as possible and at the same time to minimize the number of times it uses a specific resource T (which sometimes though is necessary).
I thought of setting up the immediate rewards as -1 per step, an additional -1 if the agent uses T and 0 if it reaches the goal.
But the additional -1 is completely arbitrary, how do I decide how much punishment should the agent get for using T?
You should use a reward function which mimics your own values. If the resource is expensive (valuable to you), then the punishment for consuming it should be harsh. The same thing goes for time (which is also a resource if you think about it).
If the ratio between the two punishments (the one for time consumption and the one for resource consumption) is in accordance to how you value these resources, then the agent will act precisely in your interest. If you get it wrong (because maybe you don't know the precise cost of the resource nor the precise cost of slow learning), then it will strive for a pseudo optimal solution rather than an optimal one, which in a lot of cases is okay.

Mahout K-means has different behavior based on the number of mapping tasks

I experience a strange situation when running Mahout K-means:
Using the a pre-selected set of initial centroids, I run K-means on a SequenceFile generated by lucene.vector. The run is for testing purposes, so the file is small (around 10MB~10000 vectors).
When K-means is executed with a single mapper (the default considering the Hadoop split size which in my cluster is 128MB), it reaches a given clustering result in 2 iterations (Case A).
However, I wanted to test if there would be any improvement/deterioration in the algorithm's execution speed by firing more mapping tasks (the Hadoop cluster has in total 6 nodes).
I therefore set the -Dmapred.max.split.size parameter to 5242880 bytes, in order to make mahout fire 2 mapping tasks (Case B).
I indeed succeeded in starting two mappers, but the strange thing was that the job finished after 5 iterations instead of 2, and that even at the first assignment of points to clusters, the mappers made different choices compared to the single-map execution . What I mean is that after close inspection of the clusterDump for the first iteration for both two cases, I found that in case B some points were not assigned to their closest cluster.
Could this behavior be justified by the existing K-means Mahout implementation?
From a quick look at the sources, I see two problems with the Mahout k-means implementation.
First of all, the way the S0, S1, S2 statistics are kept is probably not numerically stable for large data sets. Oh, and since k-means actually does not even use S2, it is also unnecessary slow. I bet a good implementation can beat this version of k-means by a factor of 2-5 at least.
For small data sets split onto multiple machines, there seems to be an error in the way they compute their means. Ouch. This will amplify if the reducer is applied to more than one input, in particular when the partitions are small. To be more verbose, the cluster mean apparently is initialized with the previous mean instead of the 0 vector. Now if you if you reduce 't' copies of it, the resulting vector will be off by 't' times the previous mean.
Initialization of AbstractCluster:
Update of the mean:
getS1().assign(x, Functions.PLUS);
Merge of multiple copies of a cluster:
Finalization to new center:
So with this approach, the center will be offset from the proper value by the previous center times t / n where t is the number of splits, and n the number of objects.
To fix the numerical instability (which arises whenever the data set is not centered on the 0 vector), I recommend replacing the S1 statistic by the true mean, not S0*mean. Both S1 and S2 can be incrementally updated at little cost using the incremental mean formula which AFAICT was used in the original "k-means" publication by MacQueen (which actually is an online kmeans, while this is Lloyd style batch iterations). Well, for an incremental k-means you obviously need the updatable mean vector anyway... I believe the formula was also discussed by Knuth in his essential books. I'm surprised that Mahout does not seem to use it. It's fairly cheap (just a few CPU instructions more, no additional data, so it all happens in the CPU cache line) and gives you extra precision when you are dealing with large data sets.