Is it a fallacy to use Little's Law to prove that you need WIP limits in Kanban? - kanban

I have heard the argument that Little's Law proves that you should limit the work in process in your Kanban columns, because the more work that arrives, the longer it will take to move through the system.
Is this an accurate conclusion? If I have N items of work in the backlog, they all need to flow to "Done". So if I limit WIP to n, that will ensure that I am finishing the n items faster, but it also ensures that I have more work (N-n) left to do. So I am not convinced by the Little's Law argument.
What am I missing?

Simply put, Little's Law is a queue theory proof that describes the behaviour of queues. It says nothing about WIP limits. And it may not be as simple as increasing or decreasing WIP to get better lead-times.
Here is a blog post that discusses Little's Law in a bit more detail.
https://agileramblings.com/2012/12/11/littles-law-its-not-about-the-numbers/
The important part of Little's Law from a Kanban perspective is that it tells us with certainty what will happen to the other two parameters in the equation if we modify any one of the three parameters.
The other important component of Little's Law is that it is only useful if the 5 assumptions about the queue are true. You'll find the assumptions listed in the blog post above. The numbers that will come out of your Little's Law calculations at any point in time will differ if you are not able to satisfy the 5 assumptions required for the proof to work and as such, will provide questionable value to your decision-making process.

Related

In Paxos, why can't we use random backoff to avoid collision?

I understand that the heart of Paxos consensus algorithm is that there is only one "majority" in any given set of nodes, therefore if a proposer gets accepted by a majority, there cannot be another majority that accepts a different value, given that any acceptor can only accept 1 single value.
So the simplest "happy path" of a consensus algorithm is just for any proposer to ping a majority of acceptors and see if it can get them to accept its value, and if so, we're done.
The collision comes when concurrent proposers leads to a case where no majority of nodes agrees on a value, which can be demonstrated with the simplest case of 3 nodes, and every node tries to get 2 nodes to accept its value but due to concurrency, every node ends up only get itself to "accept" the value, and therefore no majority agrees on anything.
Paxos algorithm continues to invent a 2-phase algorithm to solve this problem.
But why can't we just simply backoff a random amount of time and retry, until eventually one proposer will succeed in grabbing a majority opinion? This can be demonstrated to succeed eventually, since every proposer will backoff a random amount of time if it fails to grab a majority.
I understand that this is not going to be ideal in terms of performance. But let's get performance out of the way first and only look at the correctness. Is there anything I'm missing here? Is this a correct (basic) consensus algorithm at all?
The designer of paxos is a Mathematician first, and he leaves the engineering to others.
As such, Paxos is designed for the general case to prove consensus is always safe, irrespective of any message delays or colliding back-offs.
And now the sad part. The FLP impossibility result is a proof that any system with this guarantee may run into an infinite loop.
Raft is also designed with this guarantee and thus the same mathematical flaw.
But, the author of Raft also made design choices to specialize Paxos so that an engineer could read the description and make a well-functioning system.
One of these design choices is the well-used trick of exponential random backoff to get around the FLP result in a practical way. This trick does not take away the mathematical possibility of an infinite loop, but does make its likelihood extremely, ridiculously, very small.
You can tack on this trick to Paxos itself, and get the same benefit (and as a professional Paxos maintainer, believe me we do), but then it is not Pure Paxos.
Just to reiterate, the Paxos protocol was designed to be in its most basic form SO THAT mathematicians could prove broad statements about consensus. Any practical implementation details are left to the engineers.
Here is a case where a liveness issue in RAFT caused a 6-hour outage: https://decentralizedthoughts.github.io/2020-12-12-raft-liveness-full-omission/.
Note 1: Yes, I said that the Raft author specialized Paxos. Raft can be mapped onto the more general Vertical Paxos model, which in turn can be mapped onto the Paxos model. As can any system that implements consensus.
Note 2: I have worked with Lamport a few times. He is well aware of these engineering tricks, and he assumes everyone else is, too. Thus he focuses on the math of the problem in his papers, and not the engineering.
The logic you are describing is how leader election is implemented in Raft:
when there is no leader (or leader goes offline) every node will have a random delay
after the random delay, the node will contact every other node and propose "let me be the leader"
if the node gets the majority of votes, then the node considers itself the leader: which is equivalent of saying "the cluster got the consensus on who is the leader"
if the node did not get the majority, then after a timeout and a random delay, the node will attempt again
Raft also has a concept of term, but on a high level, the randomized waits is the feature with helps to get to consensus faster.
Answering your questions "why can't we..." - we can, it will be a different protocol.

checking for convergence in complex hierarchical models JAGS

I have estimated a complex hierarchical model with many random effects, but don't really know what the best approach is to checking for convergend. I have complex longitudinal data from a few hundred individuals and estimate quite a few parameters for every individual. Because of that, I have way to many traceplots to inspect visually. Or should I really spend a day going through all the traceplots? What would be a better way to check for convergence? Do I have to calculate Gelman and Rubin's Rhat for every parameter on the person level? And when can I conclude that the model converged? When absolutely all of the thousends of parameters reached convergence? Is it even sensible to expect that? Or is there something like "overall convergence"? And what does it mean when some person-level parameters did not converge? Does it make sense to use autorun.jags from the R2jags package with such a model or will it just run for ever? I know, these are a lot of question, but I just don't know how to approach that.
The measure I am using for convergence is a potential scale reduction factor (psrf)* using the gelman.diag function from the R package coda.
But nevertheless, I am also quickly visually inspecting all the traceplots, even though I also have tens/hundreds of them. It can be really fast if you put them in PNG files and then quickly go through them using e.g. IrfanView (let me know if you need me to expand on this).
The reason you should inspect the traceplots is pretty well described by an example from Marc Kery (author of great Bayesian books): see "Never blindly trust Rhat for convergence in a Bayesian analysis", here I include a self explanatory image from this email:
This is related to Rhat statistics while I use psrf, but it's pretty likely that psrf suffers from this too... and better to check the chains.
*) Gelman, A. & Rubin, D. B. Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–472 (1992).

How should one set up the immediate reward in a RL program?

I want my RL agent to reach the goal as quickly as possible and at the same time to minimize the number of times it uses a specific resource T (which sometimes though is necessary).
I thought of setting up the immediate rewards as -1 per step, an additional -1 if the agent uses T and 0 if it reaches the goal.
But the additional -1 is completely arbitrary, how do I decide how much punishment should the agent get for using T?
You should use a reward function which mimics your own values. If the resource is expensive (valuable to you), then the punishment for consuming it should be harsh. The same thing goes for time (which is also a resource if you think about it).
If the ratio between the two punishments (the one for time consumption and the one for resource consumption) is in accordance to how you value these resources, then the agent will act precisely in your interest. If you get it wrong (because maybe you don't know the precise cost of the resource nor the precise cost of slow learning), then it will strive for a pseudo optimal solution rather than an optimal one, which in a lot of cases is okay.

Boolean expression simplification

I am trying to simplify a Boolean expression with exactly 39 inputs, and about 500 million - 800million terms (as in that many and/not/or statements).
A perfect simplification is not needed, but a good one would be nice.
I am aware of the K-maps , Quine–McCluskey, Espresso algorithms. However I am also aware that these mechanisms would take way too long to simplify a circuit of this size based on what I have read
I would need to simplify this expression as much as possible within a 24 hour period.
After google searching, I find it difficult to find any resources for attempting to simplify a machine of quite this magnitude! Any resources out there or a library out there that can attempt to at least simplify this to some extent within a 24 time period?
A greedy heuristic Simplify is described in the somewhat dated book
Robert K. Brayton , Gary D. Hachtel , C. McMullen , Alberto Sangiovanni-Vincentelli
Logic Minimization Algorithms for VLSI Synthesis
You can find the chapter online.
Simplify is based on the unate paradigm. In divide-and-conquer style, it recursively applies Shannon's expansion theorem to split the function into smaller sub-functions. The heuristic rule is to split by the most binate variable first, i.e. the variable which separates the largest number of terms.
A second approach could be to use graph partitioning tools like METIS to split the terms into independent (or at least loosely related) subsets. But I am not aware that this has been tried sucessfully for logic synthesis tasks. My favorite search engine is sceptical and does not return any hits.
A more recent algorithm based on Binary Decision Diagrams was published in
Olivier Coudert: Doing Two-Level Logic Minimization 100 Times Faster
The paper lists examples with very high number of terms similar to your task at hand.
A somewhat related simplification technique BDD Sweeping as described in A Study of Sweeping Algorithms in the Context of Model Checking.
This is a duplicate question. See https://stackoverflow.com/a/60535990/1531728 for resources about logic optimization, or simplication of boolean expressions.

Does heuristics in constraint satisfaction problems ensure no backtracking? (when there exists a solution)

I'm doing a map-coloring problem with Scheme, and I used minimum remaining values (Select the vertex with the fewest legal colors) and degree heuristics select the vertex that has the largest number of neighbors). If there exists a solution for a certain configuration, will these heuristics ensures that it won't need to backtrack?
Let's do a simple theoretical analysis.
Graph coloring is NP-complete for general graphs (if not asking for a coloring with less than 4 colors). This means there exists no known polynomial time algorithm.
Your heuristic is computable in polynomial time.
Assuming you need no backtracking, then you need to make n steps, each of which requires polynomial time (n is number of vertices). Thus you can solve the problem in polynomial time.
Either you have proven P=NP or your assumption is wrong.
I leave it up to you to decide upon which option in point (4) is more plausible.
In general: no, MRV and your other heuristic will not guarantee a straight walk to the goal. (I imagine they might if your problem has some very specific structure, but don't count on it until you've seen the theorem.)
Heuristics prune the search space, or change the order of the search to make an early termination more likely. This is not the same thing as backtracking.
But it's a related concept.
We prune some spaces because we are confident that the solution does not lie in those branches of the search tree, or change the order because we have some reason to believe that it will be quicker if we look in some subtrees before others.
We also cut ourselves off from backtracking because we are confident that the solution is in the branch of the space we are in now (so that if we don't find it in this subtree, we can declare failure and don't bother).
Both kinds of strategies are ultimately about searching less of the space somehow and getting to the answer (positive or negative) without searching everything.
MRV and the degrees heuristic are about reordering the sub-searches, not about avoiding backtracking. Heuristics can be right and make a short search but that's not the same
thing as eliminating backtracking (e.g. the "cut" operator in Prolog). When you find what you're looking for, you can declare success, and of course that eliminates further backtracking. But real backtracking elimination means making a decision not to backtrack no matter what, before the search completes.
E.g. if you're doing a depth-first search, and you find what you're looking for by dumb luck without backtracking, we cannot say that dumb luck is a fence operation that eliminates backtracking. :)