proof of time complexity of union find with path compression - disjoint-sets

The Wikipedia page: wikipedia page states that
If m operations, either Union or Find, are applied to n elements, the total run time is O(m log*n).
The detailed analysis arrived at this result is :
My questions are:
Shouldn't the sum be (m+n)log*n instead of mlog*n?
Are the average time complexity for 1000 Find operations the same as the time complexity of each individual Find?

Disclaimer: I'm still trying to understand these proofs myself, so make no claim to being an expert! I think I may have some insights though.
1) I think they have assumed that m = O(n), thereby making O((m + n)lg*(n)) into O(mlg*(n)). In Tarjan's original proof (of the inverse Ackermann function bound) (found here: https://dl.acm.org/doi/pdf/10.1145/321879.321884?download=true) he assumes that the number m of FIND operations exceeds n. In Introduction to Algorithms (CLRS - ch.21), the bound they prove is for m operations of which n are MAKE-SET. It seems like people are assuming that m will be asymptotically greater than or equal to n.
2) What they have proved is an amortized cost for each operation. This is an analysis technique which bounds to within a constant factor the time taken for a series of operations, from which you can trivially compute the average time per operation. There are several different ways to go about it (I believe this is an example of aggregate analysis?). It's worth looking into!

Related

Finding Conditional Moments in a Markov Process

This question combines math and programming. I will first describe the general problem and then give an example that is (hopefully) simpler to understand.
General Question: Consider a Markov-chain process of N-states with transition matrix Π. Each state is associated with a value x_n (n in {1,…,n}). Our goal is to find the unconditional average of the first two moments (mean and var) along T-period paths conditional on (i) the path starts in a subset of states, N_0, (ii) it ends in a subset of states, N_T, and (iii) it is not going through a subset of states, N_not, in any of the periods between 1 to T-1. By saying we are interested in the unconditional average of these two moments, I basically mean what would be the average of these two moments in the stationary distribution. To be more concrete, let me illustrate the goal of the exercise in a simple case.
Simple Example: Consider a 3-state Markov-chain process with transition matrix Π, and let the three state be denoted by A, B, and C. Each of these states are associated with some value (x_A, x_B, and x_C), respectively. We are interested in what happens along paths that satisfy the following condition. The path starts at point A, after 3 periods are in either points B or C, and between periods 1 to 3 never go again through point A. Denote this condition by (#). So, for example, a path which we are interested in would be {A,B,B,C} with the associated values {x_A, x_B, x_B, x_C}. We are interested in the average and standard deviation along such paths. In particular, we would like to find the unconditional average of these first two moments in paths that satisfy (#).
Let me now propose a solution based on simulating the process. Since both T and N are quite large, this solution is too slow for my purpose.
Simulation Solution: Starting from some initial point simulate the process for a very long time period, and drop the first τ periods. Extract all paths along the simulation that satisfy condition (#) and compute the mean and std along each of these paths. Finally, simply take the average across these paths.
I’m hoping there is a better and more efficient way to achieve the goal. Since I want the solution to be accurate and the size of T and N the simulation takes a long time.
I would love to hear your thoughts and if you know of efficient methods to achieve this goal. Please let me know if something is not clear and I'll try to clarify it.
Thank you!!!
I think I know how to do this if N_0 consists of one state, let's call that state A.
The long run probability of being in A is pi(A) and can be obtained by solving pi = pi*P, with P the transition matrix.
The other thing you need to calculate is the probability of those transient paths. You probably need to introduce a modified P, where all states i in the set N_not are absorbing (i.e. P[i,i]=1 and P[i,j]=0 for j is not i). Then starting from a vector p(0) which has a 1 in the element corresponding to state A and 0 otherwise, you can keep calculating p(n) = p(n-1)*P to get the probabilities of your transient paths.
Multiply the result of that by pi(A) to get the unconditional probability.
You can probably do something like this as well when N_0 is a set, but I don't know how you should select p(0) in that case.

How is O(n)/n=1 in aggregate method of amortized analysis

How is O(n)/n=1 in aggregate method of amortized analysis as given in the coursera course on data structures in lesson 5-Amortized analysis:Aggregate method?
Short answer
O(n)/n = cn/n = c = O(1)
Long answer
We use amortized analysis in order to analyze the cost of a sequence of operations rather that the cost of a single operation. In the last case we use asymptotic analysis (some of the asymptotic notations are: Theta, Big O, Big Omega, Little O and Little Omega), but it doesn't work that well when we come across a sequence of operations and want to understand the cost of that sequence.
The reason is that if we apply "regular" asymptotic analysis, our, for example, asymptotical upper bound in the worst case analysis might be too pessimistic. Classical example is inserting into a dynamic array. You insert elements into a dynamically allocated array and when it's full, you define new array (twice as big, for example) and copy all the elements. The thing is most of the insertions will work in constant time (or in O(1)), but when you need to redefine your array, it will take linear time (O(n)), because you need to copy all the elements.
So imagine that you insert n elements and you need to redefine your array only once, then you have n operations, each operation is O(n) in the worst case, hence the cost of the sequence of operations in the worst case is O(n^2), which seems too pessimistic considering the fact that most of your operations are O(1) in the worst case and only one of them is O(n).
We define the amortized cost of a sequence of operations as (cost of n operations) / n. In your case the cost of n operations is O(n) which is equal to cn (where c is some constant) just by the definition of the Big O notation, divide it by n and you get just c, which is equal to O(1) because, once again, c is just some constant.

Genetic algorithm encoding technique to be used in this scenario

The problem is to find the optimum quantity that incurs minimum total cost in a number of warehouses using genetic algorithm.
Let's say there are n warehouses. Associated with each warehouse are a few factors:
LCosti: loading cost for warehouse i
HCosti: holding cost for warehouse i
TCosti: transportation cost for warehouse i
OCosti: ordering cost for warehouse i
Each warehouse has quantity Qi associated with it that must satisfy these 4 criteria:
loading constraint: Qi * LCosti >= Ai for warehouse i
holding constraint: Qi * HCosti >= Bi for warehouse i
Transportation constraint: Qi * TCosti >= Ci for warehouse i
Ordering constraint: Qi * OCosti >= Di for warehouse i
where A, B, C and D are constants for each of the warehouses.
Another important criterion is that each Qi must satisfy:
Di >= Qi
where Di is the demand in warehouse i.
And the equation of total cost is:
Total cost = sum(Qi * (LCosti + HCosti + TCosti) + OCosti / Qi)
How do I encode a chromosome for this problem? What I am thinking is that combining one of the four constraints that gives a minimum allowable value for Qi and the last constraint, I can get a range for Qi. Then I can randomly generate values in that range for the initial population. But how do I perform crossover, and mutation in the above scenario? How do I encode the chromosomes?
Generally, in constrained problems you have basically three possible approaches (regarding evolutionary algorithms):
1. Incorporate constraint violation into fitness
You can design your fitness as a sum of the actual objective and penalties for violation of constraints. The extreme case is a "death penalty", i.e. any individual which violates any constraint in any way receives the worst possible fitness.
This approach is usually very easy to implement but, however, has a big drawback: it often penalizes solutions that have good building blocks but violate the constraints too much.
2. Correction operators, resistant encoding
If it is possible for your problem, you can implement "correction operators" - operators that take a solution that violate constraints and transform it into another one that does not violate the constraints, preserving as much structure from the original solution as possible. Similar thing is to use such an encoding that guarantees that the solution will always be feasible, i.e. you have such a decoding algorithm that always produces valid solution.
If it is possible, this is probably the best approach you can take. However, it is often quite hard to implement, or not possible without major changes in the solutions which can then significantly slow the search down, or even make it useless.
3. Multi-objective approach
Use some multi-objective (MO) algorithm, e.g. NSGA-II, and turn your measure(s) of constraint violation into objectives and optimize all the objectives at once. The MO algorithms usually provide a pareto-front of solutions - a set of solutions that are on the front of the objective-violation tradeoff space.
Using Differential Evolution you can keep the same representation and avoid the double conversion (integer -> binary, binary -> integer).
The mutation operation is:
V(g+1, i) = X(g, r1) + F ⋅ (X(g, r2) − X(g, r3))
where:
i, r1, r2, r3 are references to vectors in the population and none is equal to the other
F is a random constant in the [0, 1.5] range
V (the mutant vector) is recombined with elements of a target vector (X(g, i)) to build a trial vector u(g+1, i). The selection process chooses the better candidate from the trial vector and the target vector (see the references below for further details).
The interesting aspects of this approach are:
you haven't to redesign the code. You need a different mutation / recombination operator and (perhaps) you have to cast some reals to integers, but it's simple and fast;
for constraint management you can adopt the techniques described in zegkljan's answer;
DE has been shown to be effective on a large range of optimization problems and it seems to be suitable for your problem.
References:
Explain the Differential Evolution method and an old Dr.Dobb's article (by Kenneth Price and Rainer Storn) as introduction;
Storn's page for more details and many code examples.

What happens behind MATLAB's factor() function?

Mainly, why is it so fast (for big numbers)? The documentation only tells me how to use it. For example, it needs at most one second to find the largest prime factor of 1234567890987654, which, to me, seems insane.
>>max(factor(1234567890987654))
ans =
69444443
The largest factor to be tried is sqrt(N), or 35136418 in this case. Also even the most elementary optimizations would skip all even numbers > 2, leaving only 17568209 candidates to be tested. Once the candidate 17777778 (and it's cofactor 69444443) is found, the algorithm would be wise enough to stop.
This can be somewhat easily improved further by a modified sieve to skip multiples of small primes 2,3,5[,7].
Basically even the sqrt(N) optimization is enough for the noted performance, unless you are working on an exceptionally old CPU (8086).
It's interesting to look at the source code of the factor and primes functions.
factor(N) essentiallty calls primes to find out all primes up to sqrt(N). Once they have been identified, it tests them one by one to see if they divide N.
primes(n) uses Eratosthenes' sieve: for each identified prime, remove all its multiples, exploiting sqrt again to reduce complexity.

Why is merge sort's worst case still n log n?

It was a question on my final I took earlier and I had no idea how to answer it.
Well it was
What is Merge sort's worst case runtime but MORE IMPORTANTLY, why?
The divide-and-conquer contributes a log(n) factor. You divide the array in half log(n) times, and each time you do, for each segment, you have to do a merge on two sorted array. Merging two sorted arrays is O(n). The algorithm is just to walk up the two arrays, and walk up the one that's lagging.
The recursion you get is r(n) = O(n) + r(roundup(n/2))+r(rounddown(n/2).
The problem is that you cant use the Masters Theorem for solving this due to the rounding. Hence you can ether do the math or use a little hack-like solution. If ur input isn't a power of two number just "blow it up". Then u can use the masters theorem on r(n) = O(n) + 2r(n/2). Obviously this leads to O(nlogn). The function merge() itself is in O(n), because in the worst case you need n-1 compares.