Quicksort pivote choice - quicksort

I've read that the pivote can be the median of 3 numbers, bottom, middle and top. But, could that generate overflow? What happens if the median returns a value larger than the array size?
I assume that the this choice is by assuming that they array values can't be longer than the array size.
I think I'm confused at what a pivote really is.

The pivot is just the value that you compare other values against - lower values go the left, higher to the right. The pivot can be chosen by taking any of the existing values in the array. If the array is completely unsorted, it won't matter which value you choose. If it is somewhat sorted, you should choose a value from the middle of the array.
UPDATE: Some reading informs me that a better pivot choice may be to choose the median value of 3 values in the array (such as middle, bottom and top or 3 random positions). Some people advocate taking the median of 5 values. The worst-case performance of quicksort occurs when pivot is close to the smallest or largest value in the array, and this tactic is intended to defend against that occurring. This is just an optimisation for certain kinds of data - it is not a necessity.

Related

Quicksort, given 3 values, how can I get to 9 operations?

Well, I want to use Quick sort on given 3 values, doesn't matter what values, how can I get to the worst case which is 9 operations?
Can anyone draw a tree and show how it show nlogn and n^2 operations? I've tried to find on the internet, but I still didnt manage to draw one properly to show that.
The worst case complexity of quick sort depends on the chosen pivot. If the pivot chosen is the leftmost or the rightmost element. Then the worst case complexity will occur in the following cases:
1) Array is already sorted in same order.
2) Array is already sorted in reverse order.
3) All elements are same (special case of case 1 and 2).
Since these cases occur very frequently, the pivot is chosen randomly. By choosing pivot randomly the chances of worst case are reduced.
The analysis of quicksort algorithm is explained in this blogpost by khan academy.

Median filter's out is wrong. What is the right median filter algorithm?

I want to write a 1-D median filter for eliminating glitches from the signal.
I wrote my median filter code on matlab and ı compare it with medfilt1 function out. My median filter is not working.
-- My median filter order is 8.
In my implementation,
when data comes, I fill the array( size of the array is 8).
When incoming data count is 8, I take the middle value and write this
middle value to the median filter output array. And then I wait for the next 8 data. when ı take the 8 data, ı take the middle value and write this middle value to the median filter output array. and so on. (I implement sorting algoritm and ı test it. it is working good).
here is my screenshots,
my incoming data's color is red.
matlab's medfilt1's out is green.
my median filter's out is blue.
Overall picture
blown-up image
I think my algorithm is wrong, but I don't know what is the right algorithm?
Your implementation is wrong, probably in two ways (hard to tell as you didn't show us your code).
You should be scrolling 1 element at a time, not 8 elements at a time. That is, you should drop just the oldest element and add just the newest element, before taking the median. (Note that your output has a frequency 8 times too high because you are replacing all 8 elements.)
You say that you take the middle value. The middle value is not the median. But perhaps you forgot to tell us that you do a sort first?

How to use Morton Order(z order curve) in range search?

How to use Morton Order in range search?
From the wiki, In the paragraph "Use with one-dimensional data structures for range searching",
it says
"the range being queried (x = 2, ..., 3, y = 2, ..., 6) is indicated
by the dotted rectangle. Its highest Z-value (MAX) is 45. In this
example, the value F = 19 is encountered when searching a data
structure in increasing Z-value direction. ......BIGMIN (36 in the
example).....only search in the interval between BIGMIN and MAX...."
My questions are:
1) why the F is 19? Why the F should not be 16?
2) How to get the BIGMIN?
3) Are there any web blogs demonstrate how to do the range search?
EDIT: The AWS Database Blog now has a detailed introduction to this subject.
This blog post does a reasonable job of illustrating the process.
When searching the rectangular space x=[2,3], y=[2,6]:
The minimum Z Value (12) is found by interleaving the bits of the lowest x and y values: 2 and 2, respectively.
The maximum Z value (45) is found by interleaving the bits of the highest x and y values: 3 and 6, respectively.
Having found the min and max Z values (12 and 45), we now have a linear range that we can iterate across that is guaranteed to contain all of the entries inside of our rectangular space. The data within the linear range is going to be a superset of the data we actually care about: the data in the rectangular space. If we simply iterate across the entire range, we are going to find all of the data we care about and then some. You can test each value you visit to see if it's relevant or not.
An obvious optimization is to try to minimize the amount of superfluous data that you must traverse. This is largely a function of the number of 'seams' that you cross in the data -- places where the 'Z' curve has to make large jumps to continue its path (e.g. from Z value 31 to 32 below).
This can be mitigated by employing the BIGMIN and LITMAX functions to identify these seams and navigate back to the rectangle. To minimize the amount of irrelevant data we evaluate, we can:
Keep a count of the number of consecutive pieces of junk data we've visited.
Decide on a maximum allowable value (maxConsecutiveJunkData) for this count. The blog post linked at the top uses 3 for this value.
If we encounter maxConsecutiveJunkData pieces of irrelevant data in a row, we initiate BIGMIN and LITMAX. Importantly, at the point at which we've decided to use them, we're now somewhere within our linear search space (Z values 12 to 45) but outside the rectangular search space. In the Wikipedia article, they appear to have chosen a maxConsecutiveJunkData value of 4; they started at Z=12 and walked until they were 4 values outside of the rectangle (beyond 15) before deciding that it was now time to use BIGMIN. Because maxConsecutiveJunkData is left to your tastes, BIGMIN can be used on any value in the linear range (Z values 12 to 45). Somewhat confusingly, the article only shows the area from 19 on as crosshatched because that is the subrange of the search that will be optimized out when we use BIGMIN with a maxConsecutiveJunkData of 4.
When we realize that we've wandered outside of the rectangle too far, we can conclude that the rectangle in non-contiguous. BIGMIN and LITMAX are used to identify the nature of the split. BIGMIN is designed to, given any value in the linear search space (e.g. 19), find the next smallest value that will be back inside the half of the split rectangle with larger Z values (i.e. jumping us from 19 to 36). LITMAX is similar, helping us to find the largest value that will be inside the half of the split rectangle with smaller Z values. The implementations of BIGMIN and LITMAX are explained in depth in the zdivide function explanation in the linked blog post.
It appears that the quoted example in the Wikipedia article has not been edited to clarify the context and assumptions. The approach used in that example is applicable to linear data structures that only allow sequential (forward and backward) seeking; that is, it is assumed that one cannot randomly seek to a storage cell in constant time using its morton index alone.
With that constraint, one's strategy begins with a full range that is the mininum morton index (16) and the maximum morton index (45). To make optimizations, one tries to find and eliminate large swaths of subranges that are outside the query rectangle. The hatched area in the diagram refers to what would have been accessed (sequentially) if such optimization (eliminating subranges) had not been applied.
After discussing the main optimization strategy for linear sequential data structures, it goes on to talk about other data structures with better seeking capability.

What element of the array would be the median if the the size of the array was even and not odd?

I read that it's possible to make quicksort run at O(nlogn)
the algorithm says on each step choose the median as a pivot
but, suppose we have this array:
10 8 39 2 9 20
which value will be the median?
In math if I remember correct the median is (39+2)/2 = 41/2 = 20.5
I don't have a 20.5 in my array though
thanks in advance
You can choose either of them; if you consider the input as a limit, it does not matter as it scales up.
We're talking about the exact wording of the description of an algorithm here, and I don't have the text you're referring to. But I think in context by "median" they probably meant, not the mathematical median of the values in the list, but rather the middle point in the list, i.e. the median INDEX, which in this cade would be 3 or 4. As coffNjava says, you can take either one.
The median is actually found by sorting the array first, so in your example, the median is found by arranging the numbers as 2 8 9 10 20 39 and the median would be the mean of the two middle elements, (9+10)/2 = 9.5, which doesn't help you at all. Using the median is sort of an ideal situation, but would work if the array were at least already partially sorted, I think.
With an even numbered array, you can't find an exact pivot point, so I believe you can use either of the middle numbers. It'll throw off the efficiency a bit, but not substantially unless you always ended up sorting even arrays.
Finding the median of an unsorted set of numbers can be done in O(N) time, but it's not really necessary to find the true median for the purposes of quicksort's pivot. You just need to find a pivot that's reasonable.
As the Wikipedia entry for quicksort says:
In very early versions of quicksort, the leftmost element of the partition would often be chosen as the pivot element. Unfortunately, this causes worst-case behavior on already sorted arrays, which is a rather common use-case. The problem was easily solved by choosing either a random index for the pivot, choosing the middle index of the partition or (especially for longer partitions) choosing the median of the first, middle and last element of the partition for the pivot (as recommended by R. Sedgewick).
Finding the median of three values is much easier than finding it for the whole collection of values, and for collections that have an even number of elements, it doesn't really matter which of the two 'middle' elements you choose as the potential pivot.

Is there any formula to calculate the no of passes that a Quick Sort algorithm will take?

While working with Quick Sort algorithm I wondered whether any formula or some kind of stuff might be available for finding the no of passes that a particular set of values may take to completely sorted in ascending order.
Is there any formula to calculate the no of passes that a Quick Sort algorithm will take?
Any given set of values will have a different number of operations, based on pivot value selection method, and the actual values being sorted.
So...no, unless the approximations of 'between O(N log(N)) and O(N^2)' is good enough.
That one has to qualify the average versus worst case should be enough to show that the only way to determine the number of operations is to actually run the quicksort.