Could you explain why the expected running time of randomized quick sort is Theta of nlogn? - quicksort

What is difference between expected running time and running time AND Could you explain why the expected running time of randomized quick sort is Theta of nlogn.

Expected Running Time:
For often times expected running time just means the average running time for random inputs. But if it's about a randomized algorithm, which is the case here, it means that the algorithm is running inputs with random choices made by the algorithm.
Proofing running time:
As for proofing the theta n(logn) running time of quicksort, it will be involved with more complicated mathematics, here's reference from CMU that proofs the running time of theta n(logn)(https://www.cs.cmu.edu/~avrim/451f11/lectures/lect0906.pdf).
Not to bother with intricate mathematics, I am only going to focus on how we can comprehend this running time by intuition
Non-random Inputs:If each pivot has rank somewhere in the middle 50 percent, that is, between the 25th percentile and the 75th percentile, then it splits the elements with at least 25% and at most 75% on each side. Then splitting up the data gives us O(logn) and calling on each of them gives us total running time theta of nlog(n). Make sure you understand why this non-randomized quicksort has average running time theta of nlog(n).
Now let's talk about expected running time of randomized quicksort...
When inputs are random, the pivot is not guarateed to be in the middle 50 percent. However, when we start from a random pivot, the pivot would land in the middle 50 percent for about half of the time. Imagine that you are flipping a coin coin until you get n heads. On average cases, you will only need to flip 2k times. By the same token, Quicksort's recursion will terminate on expected 2 times of what non-random inputs would need, which should only be constant multiple of O(logn). Each level of call tree would be called on n times, the expected total work will still be theta (nlogn).

Related

Tunning gain table to match two-curves

I have two data set, let us name them "actual speed" and "desired speed". My main objective is to match actual speed with the desired speed.
But for doing that in my case, I need to tune FF(1x10), Integral(10x8) and Proportional gain table(10x8).
My approach till now was as follows:-
First, start the iteration with having 0.1 as the initial value in the first cells(FF[0]) of the FF table
Then find the R-square or Co-relation between two dataset( i.e. Actual Speed and Desired Speed)
Increment the value of first cell(FF[0]) by 0.25 and then again compute R-square or Co-relation of two data set.
Once the cell(FF[0]) value reaches 2(Gains Maximum value. Already defined by the lab). Evaluate R-square and re-write the gain value in FF[0] which gives min. error between the two curve.
Then tune the Integral and Proportional table in the same way for the same RPM Range
Once It is tune then go for higher RPM range and repeat step 2-5 (RPM Range: 800-1000; 1000-1200;....;3000-3200)
Now the problem is that this process is taking way too long time to complete. For example it takes around 1 Hr. time to tune one cell of FF. Which is actually very slow.
If possible, Please suggest any other approach which I can try to tune the tables. I am using MATLAB R2010a and I can't shift to any other version of MATLAB because my controller can communicate with this version only and I can't use any app for tuning since my GUI is already communicating with the controller and those two datasets are being made in real-time
In the given figure, lets us take (X1,Y1) curve as Desired speed and (X2,Y2) curve as Actual speed
UPDATE

Calculate time of script execution previously with Matlab

Good morning,
I have a question about the time execution of a script on Matlab. Is it possible to know previously how long spend the execution of a script before running it (an estimated time, for example)? I know that with tic and toc command, among others, is it possible to know the time at the end but I don't know if it's possible to know it before.
Thanks in advance,
It is not too hard to make an estimate of how long your calculation will take.
You already know how to record calculation times with tic and toc, so now you can do this:
Start with a small scale test (example, n=1) and record the calculation time
Multiply n with a constant k (I usually choose 2 or 10 for easy calculations), record the calculation time
Keep multiplying with n untill you find a consistent relation: 'If I multiply my input size with k, my calculation time changes like so ...'
Now you can extrapolate your estimated calculation time by:
calculating how many times you need to multiply input size of the biggest small scale example to get your real data size
Applying the consistent relation that you found exactly that many times to the calculation time of your biggest small scale example
Of course this combines well with some common sense, like if you do certain things t times they will take about t times as long. This can easily be used when you have to perform a certain calculation a million times. Just interrupt the loop after a minute or so, if it is still in the first ten calculations you may want to give up!

matlab minimum spanning tree keep busy

I use the grMinSpanTree function in matlab toolbox. But, when the number of nodes is high the code execution doesn't come to an end, it remains in forever busy state.
I tried a lot of samples and they all work well when number of nodes is below 4000. But when I try the one with 8000 nodes I run for several hours and still no result.
I am only beginner for graph theory and matlab. Is there any reason that may cause dead loop?
If E is the number of edges and V is the number of vertices, this greedy algorithm runs in O(E * V).
Therefore, the time growth is quadratic when E and V increase. There is no dead loop.
In addition, the memory space needed also increases and may force your computer to swap thus increasing dramatically the overall time.

MATLAB: Slow convergence of convex optimization algorithm

I want to speed up the convergence of a convex optimization problem in MATLAB.
My objective function is convex having three parameters and I am using gradient ascent for the maximization.
Right now I am manually writing the iteration with the termination condition being the difference between the new parameter value and old parameter value is very small (around 0.0000001). I cannot terminate based upon the number of iterations because it doesn't guarantee that it has converged to the optimum solution.
So, it takes a lot of time to converge - almost 2 days! Is there any way to speed this up?
Actually my objective function has only three parameters. I know that my first parameter's value should be greater than that of the second.
So starting with the initial condition, the second parameter's value starts increasing rapidly. After it has reached a certain point, the first parameter's value starts increasing rapidly. While the first parameter's value starts increasing, the second parameter's value starts decreasing slowly. Eventually, I have the first parameter's value greater than that of second.
Is there any way to speed up the process? 2 days is a very long time. Furthermore, calculating the gradient is also time consuming. It needs a lot of matrix computations.
I don't want to start with the defined parameter values like parameter1's value greater than that of second. Also it's not necessary that the first parameter always has to be greater than the the second. I just know which parameter value should be greater. Any suggestions?
If the calculation of gradients is very slow and you still want to do a manual implementation you could try this, it will take more steps but could be a lot quicker as the steps are so simple:
Define a stepsize
Try all the points where your variable moves -1, 0 or 1 times in the direction of the stepsize (3^3 = 27 possibilities)
Pick the best one
If the best one is your previous one, multiply the stepsize with a factor 0.5
Of course the success of this process depends on the properties of your function. Furthermore it should be noted that a much simpler solution could be to set the desired difference to something like 0.0001

Amdahl's law example

Can someone help me with this example please and show me how to work the second part?
the question is :
If one third of a weather prediction algorithm is inherently serial and the remainder
parallelizable, what is the minimum number of cores needed to guarantee a 150% speedup over a
single core implementation?
ii. Your boss revises the figure to 200%. What is your new answer?
Thanks very much in advance !!
Guess: If the algorithm is 1/3 serial and 2/3 parallel...I would think that each core you added would give you a 66% increase in performance...So for 150% increase, you'd need 3 more cores, and for a 200% increase, you'd need 4.
This is a guess. Your textbook might be more helpful :)
If the algorithm runs on a single core and takes 90 minutes then 30 minutes is for the serial part and 60 minutes for the parallel part.
Add a CPU:
30 is for the serial part and 30 for the parallel part(half of the 60 overlaps with the serial part).
90 / 60 = 150% increase.
I am a bit late, but here are the answers:
1) 150% increase -> 2 cores at least required as dbasnett said;
2) 200% increase -> 4 cores at least required basing on the Amahld's law:
Here, 90 minutes overall required to perform the calculation. P is the actually enhanced part of the algorithm (the parallelizable part) which is 2/3 of 90, N is the number of cores, so when there's a core only:
You get 1, which means 100%, which is how the algorithm performs the standard way (without multi-core acceleration and therefore no parallelization speedup).
Now, we must find N number of cores for which the previous equation equals 2, where 2 means that the algorithm performs in half time (45 minutes instead of 90 when there's no parallelization) and therefore with a 200% speedup:
Since:
We see that:
So with 4 cores computing in parallel the 2/3 of the algoritm you get 200% speedup. The same goes for 150%, you will get 2, as dbasnett already told you.
Pretty simple.
Note that a complex algorithm may imply further divisions of its parallelizable parts (and in theory you can have a different number of processing units per parallelizable part concurrently):
You can further look at Wikipedia (there's also an example):
http://en.wikipedia.org/wiki/Amdahl%27s_law#Description
Anyway, the principle is the same:
Let T be the time an algorithm needs to execute in order to complete, A be the serial part of it, B its parallelizable part and N the number of parallel CPUs, you can divide B in further small sections and perform calculations on each part:
You may for C, D, G e.g. adopt M CPUs instead of N (the speedup will of course differ if M != N).
And at the end, you will arrive at a point when having more CPUs doesn't matter anymore, since:
And your algorithm speedup will at most tend to total execution time (T) divided by the execution time of the Serial part only (A).
Therefore parallel calculation comes really handy only when you have low execution time for the serial part of your algorithm.