Trying to use quickselect by hand - quickselect

I have a sub array that is {8,9,7}. Assume the pivot that was picked is 8. Running Quickselect on this array is giving me some issue.
So the left pointer starts from the left looking for elements greater than 8 it finds 9. The right pointer starts from the right look for elements smaller than 8 it finds 7. 7 and 9 swap places. {8,7,9} now the left pointer finds 9 again and the right pointer finds 7. But now they have crossed each-other so we don't perform that swap. Instead the left pointer is swapped with the pivot creating the array {9,7,8} but this is not good since smaller elements are not to the left of the pivot now. So what did I do wrong?

This is much later, so you no doubt already figured it out, but for posterity:
The first part of your description above matches the first partition step in QuickSelect (or QuickSort) using a zero index (value 8 here) as the pivot.
The {8,7,9} variation is correct - the two parts {8,7} and {9} meet the partition criterion for a pivot value of 8. These are then recursively processed to complete the sort, if sorting. Of course, if you're (quick)selecting, you only process the part that the index you seek is in.
The left pointer is swapped with the pivot step does not apply here. You should only do that if you use the variation where you move the pivot to the front or the back, if it's not already there, and then exclude the pivot index from the partitioning process. Only if you did that would the pivot value need to be moved to the where the two partitions meet.

Related

Changing max node capacity in M-tree affects the results

Posting the code for the entire tree for this problem would be pointless (too long and chaotic), and I've tried to fix this problem for a while now, so I don't really want some concrete solution, but more like ideas as to why this might be happening. So:
I have a dataset of 1.000.000 coordinates and I insert them into the tree. I do a range search after and for MaxCapacity=10 I get the correct results (and for any number >= 10). If I switch to MaxCapacity=4 results are wrong. But if I shrink the dataset to about 20.000 coordinates the results are again correct for MaxCapacity=4.
So to me, this looks like an incorrect split algorithm and it just shows for small MaxCapacities and large datasets where we have an enormous amount of splits. But the algorithm checks out for almost everything so I can't really find a mistake there. Any other ideas? Tree is written in SCALA, promotion policy promotes the two points that are the furthest away from each other and for split policy we iterate through the entries of the overflown node and we put each entry into the group of the promoted point that is closer to.
Don't know if anyone will be interested in this but I found the reasons causing this. I thought the problem was in split but I was wrong. The problem was when I was choosing in the Insert Recursion algorithm what node to jump to next in order to place the entry. So I was choosing this node by calculating the distance between each node's center and the entry's point. The node with minimum said distance was chosen.
This works fine if the entry happens to reside inside the radius of multiple nodes. In this case the minDistance works as intended but if the entry doesn't reside in any node's radius? In this case we would have to expand the radius as well to contain the entry. So we would need to find the node whose radius would expand less if it were to include the entry into its children. For a node, its distance from the entry point might be minimum but the expansion needed might be catastrophically big. I had not considered this case and as a result entries were placed in wrong nodes, causing huge expansions, causing huge overlaps. When I implemented this case the problem was fixed!

Quicksort, given 3 values, how can I get to 9 operations?

Well, I want to use Quick sort on given 3 values, doesn't matter what values, how can I get to the worst case which is 9 operations?
Can anyone draw a tree and show how it show nlogn and n^2 operations? I've tried to find on the internet, but I still didnt manage to draw one properly to show that.
The worst case complexity of quick sort depends on the chosen pivot. If the pivot chosen is the leftmost or the rightmost element. Then the worst case complexity will occur in the following cases:
1) Array is already sorted in same order.
2) Array is already sorted in reverse order.
3) All elements are same (special case of case 1 and 2).
Since these cases occur very frequently, the pivot is chosen randomly. By choosing pivot randomly the chances of worst case are reduced.
The analysis of quicksort algorithm is explained in this blogpost by khan academy.

MATLAB: Dividing Items using a For-loop

I needed some help with a problem I'd been assigned in class. It's our introduction to for loops. Here is the problem:
Consider the following riddle.
This is all I have so far:
function pile = IslandBananas(numpeople, numbears)
for pilesize=1:10000000
end
I would really appreciate your input. Thank you!
I will help you, but you need to try harder than that. And also, you only need one for loop. First, think about how you would construct this algorithm. Well you know you have to use a for loop so that is a start. So let's think about what is going on in the problem.
1) You have a pile.
2) First night someone takes the pile and divides it into 3 and finds that one is left over, this means mod(pile,3) = 1.
3) But he discards the extra banana. This means (pile-1).
4) He takes a third of it, leaving two-thirds left. This means (2/3)*(pile-1).
5) In the morning they take the pile and divide it into 3 and find again that one is left over, so this means mod((2/3)*(pile-1),3) = 1.
6) But they discard the extra banana. This means (2/3)*(pile-1)-1.
7) Finally, they have to each have at least one banana if it is to be the smallest pile possible. Thus, the smallest pile must be such that (1/3)*((2/3)*(pile-1)-1) = 1.
I have essentially given you the answer, the rest you can write with the formula (1/3)*((2/3)*(pile-1)-1) and a simple if statement to test for the smallest possible integer which is 1. This can be done in four lines inside of your for loop.
Now, expanding this to any number of people and any number of bears requires two simple substitutions in that formula! If your teacher demands it, this can easily be split into two nested for loops.

What element of the array would be the median if the the size of the array was even and not odd?

I read that it's possible to make quicksort run at O(nlogn)
the algorithm says on each step choose the median as a pivot
but, suppose we have this array:
10 8 39 2 9 20
which value will be the median?
In math if I remember correct the median is (39+2)/2 = 41/2 = 20.5
I don't have a 20.5 in my array though
thanks in advance
You can choose either of them; if you consider the input as a limit, it does not matter as it scales up.
We're talking about the exact wording of the description of an algorithm here, and I don't have the text you're referring to. But I think in context by "median" they probably meant, not the mathematical median of the values in the list, but rather the middle point in the list, i.e. the median INDEX, which in this cade would be 3 or 4. As coffNjava says, you can take either one.
The median is actually found by sorting the array first, so in your example, the median is found by arranging the numbers as 2 8 9 10 20 39 and the median would be the mean of the two middle elements, (9+10)/2 = 9.5, which doesn't help you at all. Using the median is sort of an ideal situation, but would work if the array were at least already partially sorted, I think.
With an even numbered array, you can't find an exact pivot point, so I believe you can use either of the middle numbers. It'll throw off the efficiency a bit, but not substantially unless you always ended up sorting even arrays.
Finding the median of an unsorted set of numbers can be done in O(N) time, but it's not really necessary to find the true median for the purposes of quicksort's pivot. You just need to find a pivot that's reasonable.
As the Wikipedia entry for quicksort says:
In very early versions of quicksort, the leftmost element of the partition would often be chosen as the pivot element. Unfortunately, this causes worst-case behavior on already sorted arrays, which is a rather common use-case. The problem was easily solved by choosing either a random index for the pivot, choosing the middle index of the partition or (especially for longer partitions) choosing the median of the first, middle and last element of the partition for the pivot (as recommended by R. Sedgewick).
Finding the median of three values is much easier than finding it for the whole collection of values, and for collections that have an even number of elements, it doesn't really matter which of the two 'middle' elements you choose as the potential pivot.

Quicksort pivote choice

I've read that the pivote can be the median of 3 numbers, bottom, middle and top. But, could that generate overflow? What happens if the median returns a value larger than the array size?
I assume that the this choice is by assuming that they array values can't be longer than the array size.
I think I'm confused at what a pivote really is.
The pivot is just the value that you compare other values against - lower values go the left, higher to the right. The pivot can be chosen by taking any of the existing values in the array. If the array is completely unsorted, it won't matter which value you choose. If it is somewhat sorted, you should choose a value from the middle of the array.
UPDATE: Some reading informs me that a better pivot choice may be to choose the median value of 3 values in the array (such as middle, bottom and top or 3 random positions). Some people advocate taking the median of 5 values. The worst-case performance of quicksort occurs when pivot is close to the smallest or largest value in the array, and this tactic is intended to defend against that occurring. This is just an optimisation for certain kinds of data - it is not a necessity.