Is there any formula to calculate the no of passes that a Quick Sort algorithm will take? - quicksort

While working with Quick Sort algorithm I wondered whether any formula or some kind of stuff might be available for finding the no of passes that a particular set of values may take to completely sorted in ascending order.
Is there any formula to calculate the no of passes that a Quick Sort algorithm will take?

Any given set of values will have a different number of operations, based on pivot value selection method, and the actual values being sorted.
So...no, unless the approximations of 'between O(N log(N)) and O(N^2)' is good enough.
That one has to qualify the average versus worst case should be enough to show that the only way to determine the number of operations is to actually run the quicksort.

Related

Given an array, find top 'x' in O(1)

I'm gonna write the problem as I found it and I will then explain what confuses me.
"A teacher is marking his students' work from 0-10 but he only marks with an 8 or above for a certain number 'x'(x=15 for example) of the 'n' students. You are given an array with all the students' marks in random order. Find the 'x' best marks in O(1)."
We certainly have been taught hashing but this requires me to store all the data in a hash table which is definitely not O(1). Maybe we don't have to take the 'conversion' into account? If we do , maybe the coversion combined with the search time after will lead to a method different than hashing.
In that case, leaving O(1) aside , what is the fastest algorithm including both the conversion and the search time?
Simple: It's not possible.
O(1) can only achieved if all of input size, number of necessary comparisons and output size are constants. You may argue that x could be treated as constant, but it still doesn't work:
You need to inspect every single input element, all n of them, as the random input order does not even allow any heuristics to guess where the xth element would be, even if you already had correctly guessed the other x-1 elements already in constant time.
As the problem is stated, there is no solution which can do it in the upper bounds of O(1) or O(x).
Let's just assume your instructor corrects his mistake, and gives you a revised version which correctly states O(n) as the required upper bound.
In that case your hash approach is (almost) correct. The catch of using a hash function, is that you now need to account for potential collisions on the hash function, which are the reason why hash maps don't work strictly in O(1), but rather only "on average" in O(1).
As you know all possible values (grades from 0-10), you can just allocate buckets with a known index. Inside each bucket you may use linked lists, as they also allow constant time insertions and linear time iteration.

Is QuickSort really the fastet sorting technique

Hello all this is my very first question here. I am new to datastructure and algorithms my teacher asked me to compare time complexity of different algorithms including: merge sort, heap sort, insertion sort, and quick sort. I search over internet and find out that quick sort is the fastest of all but my version of quick sort is the slowest of all (it sort 100 random integers in almost 1 second while my other sorting algorithms took almost 0 second). I tweak my quick sort logic many times (taking first value as pivot than tried to take middle value as pivot but in vain) I finally search the code over internet and there was not much difference in my code and code on internet. Now I really am confused that if this is behaviour of quick sort is natural (I mean whatever your logic is you are gonna get same results.) or there are some specific situations where you should use quick sort. In the end I know my question is not clear (I don't know how to ask besides my english is also not very good.) I hope someone can help me I really wanted to attach picture of awkward result I am having but I can't (reputation < 10).
Theoretically, quicksort is supposed to be the fastest algorithm for sorting, with a runtime of O(nlogn). It's worst case would be O(n^2), but only occurs if there are repeated values are equal to the pivot.
In your situation, I can only assume that your pivot value is not ideal in your data array, but is still able to sort the values using that pivot. Otherwise, your quicksort implementation is unfortunately incorrect.
Quicksort has O(n^2) worst-case runtime and O(nlogn) average case runtime. A good reason why Quicksort is so fast in practice compared to most other O(nlogn) algorithms such as Heapsort, is because it is relatively cache-efficient. Its running time is actually O(n/Blog(n/B)), where B is the block size. Heapsort, on the other hand, doesn't have any such speedup: it's not at all accessing memory cache-efficiently.
The value you choose as pivot may not be appropriate hence your sorting may be taking some time.You can avoid quicksort’s worst-case run time of O(n^2) almost entirely by using an appropriate choice of the pivot – such as picking it at random.
Also , the best and worst case often are extremes rarely occurring in practice.But any average case analysis assume some distribution of inputs. For sorting, the typical choice is the random permutation model (as assumed on Wikipedia).

Efficient Function to Map (or Hash) Integers and Integer Ranges into Index

We are looking for the computationally simplest function that will enable an indexed look-up of a function to be determined by a high frequency input stream of widely distributed integers and ranges of integers.
It is OK if the hash/map function selection itself varies based on the specific integer and range requirements, and the performance associated with the part of the code that selects this algorithm is not critical. The number of integers/ranges of interest in most cases will be small (zero to a few thousand). The performance critical portion is in processing the incoming stream and selecting the appropriate function.
As a simple example, please consider the following pseudo-code:
switch (highFrequencyIntegerStream)
case(2) : func1();
case(3) : func2();
case(8) : func3();
case(33-122) : func4();
...
case(10,000) : func40();
In a typical example, there would be only a few thousand of the "cases" shown above, which could include a full range of 32-bit integer values and ranges. (In the pseudo code above 33-122 represents all integers from 33 to 122.) There will be a large number of objects containing these "switch statements."
(Note that the actual implementation will not include switch statements. It will instead be a jump table (which is an array of function pointers) or maybe a combination of the Command and Observer patterns, etc. The implementation details are tangential to the request, but provided to help with visualization.)
Many of the objects will contain "switch statements" with only a few entries. The values of interest are subject to real time change, but performance associated with managing these changes is not critical. Hash/map algorithms can be re-generated slowly with each update based on the specific integers and ranges of interest (for a given object at a given time).
We have searched around the internet, looking at Bloom filters, various hash functions listed on Wikipedia's "hash function" page and elsewhere, quite a few Stack Overflow questions, abstract algebra (mostly Galois theory which is attractive for its computationally simple operands), various ciphers, etc., but have not found a solution that appears to be targeted to this problem. (We could not even find a hash or map function that considered these types of ranges as inputs, much less a highly efficient one. Perhaps we are not looking in the right places or using the correct vernacular.)
The current plan is to create a custom algorithm that preprocesses the list of interesting integers and ranges (for a given object at a given time) looking for shifts and masks that can be applied to input stream to help delineate the ranges. Note that most of the incoming integers will be uninteresting, and it is of critical importance to make a very quick decision for as large a percentage of that portion of the stream as possible (which is why Bloom filters looked interesting at first (before we starting thinking that their implementation required more computational complexity than other solutions)).
Because the first decision is so important, we are also considering having multiple tables, the first of which would be inverse masks (masks to select uninteresting numbers) for the easy to find large ranges of data not included in a given "switch statement", to be followed by subsequent tables that would expand the smaller ranges. We are thinking this will, for most cases of input streams, yield something quite a bit faster than a binary search on the bounds of the ranges.
Note that the input stream can be considered to be randomly distributed.
There is a pretty extensive theory of minimal perfect hash functions that I think will meet your requirement. The idea of a minimal perfect hash is that a set of distinct inputs is mapped to a dense set of integers in 1-1 fashion. In your case a set of N 32-bit integers and ranges would each be mapped to a unique integer in a range of size a small multiple of N. Gnu has a perfect hash function generator called gperf that is meant for strings but might possibly work on your data. I'd definitely give it a try. Just add a length byte so that integers are 5 byte strings and ranges are 9 bytes. There are some formal references on the Wikipedia page. A literature search in ACM and IEEE literature will certainly turn up more.
I just ran across this library I had not seen before.
Addition
I see now that you are trying to map all integers in the ranges to the same function value. As I said in the comment, this is not very compatible with hashing because hash functions deliberately try to "erase" the magnitude information in a bit's position so that values with similar magnitude are unlikely to map to the same hash value.
Consequently, I think that you will not do better than an optimal binary search tree, or equivalently a code generator that produces an optimal "tree" of "if else" statements.
If we wanted to construct a function of the type you are asking for, we could try using real numbers where individual domain values map to consecutive integers in the co-domain and ranges map to unit intervals in the co-domain. So a simple floor operation will give you the jump table indices you're looking for.
In the example you provided you'd have the following mapping:
2 -> 0.0
3 -> 1.0
8 -> 2.0
33 -> 3.0
122 -> 3.99999
...
10000 -> 42.0 (for example)
The trick is to find a monotonically increasing polynomial that interpolates these points. This is certainly possible, but with thousands of points I'm certain you'ed end up with something much slower to evaluate than the optimal search would be.
Perhaps our thoughts on hashing integers can help a little bit. You will also find there a hashing library (hashlib.zip) based on Bob Jenkins' work which deals with integer numbers in a smart way.
I would propose to deal with larger ranges after the single cases have been rejected by the hashing mechanism.

algorithm to compare numbers within a certain distance from each other

So I have an array of numbers that look something like
1,708,234
2,802,532
11,083,432
5,098,123
5,777,111
I want to find out when two numbers are within a certain distance from each other (say 1,500,000) so I can group them into the same location and have just one UI element represent both for the level of zoom I am looking at. How would one go about doing this smartly or efficiently. I'm thinking I would just start with the first entry, loop through all the elements, and if one was close to another, flag those two and put it in a dictionary of some sort. That would be my brute force method, but I'm thinking there has to be a better way.
I'm coding in obj-c btw if that makes or breaks any design decisions.
How many numbers are we dealing with here? If it's small enough:
Sort the numbers (generally n-log-n)
Run through each number, n, and compare its bigger neighbor, n+1, to see if it's within your range.
Repeat for n+2, n+3, until the number is no longer within your range.
Your brute force method there is O((n/2)^2). This method will bring it to O(n + n log(n)), or O(n log n) on the average case.

What element of the array would be the median if the the size of the array was even and not odd?

I read that it's possible to make quicksort run at O(nlogn)
the algorithm says on each step choose the median as a pivot
but, suppose we have this array:
10 8 39 2 9 20
which value will be the median?
In math if I remember correct the median is (39+2)/2 = 41/2 = 20.5
I don't have a 20.5 in my array though
thanks in advance
You can choose either of them; if you consider the input as a limit, it does not matter as it scales up.
We're talking about the exact wording of the description of an algorithm here, and I don't have the text you're referring to. But I think in context by "median" they probably meant, not the mathematical median of the values in the list, but rather the middle point in the list, i.e. the median INDEX, which in this cade would be 3 or 4. As coffNjava says, you can take either one.
The median is actually found by sorting the array first, so in your example, the median is found by arranging the numbers as 2 8 9 10 20 39 and the median would be the mean of the two middle elements, (9+10)/2 = 9.5, which doesn't help you at all. Using the median is sort of an ideal situation, but would work if the array were at least already partially sorted, I think.
With an even numbered array, you can't find an exact pivot point, so I believe you can use either of the middle numbers. It'll throw off the efficiency a bit, but not substantially unless you always ended up sorting even arrays.
Finding the median of an unsorted set of numbers can be done in O(N) time, but it's not really necessary to find the true median for the purposes of quicksort's pivot. You just need to find a pivot that's reasonable.
As the Wikipedia entry for quicksort says:
In very early versions of quicksort, the leftmost element of the partition would often be chosen as the pivot element. Unfortunately, this causes worst-case behavior on already sorted arrays, which is a rather common use-case. The problem was easily solved by choosing either a random index for the pivot, choosing the middle index of the partition or (especially for longer partitions) choosing the median of the first, middle and last element of the partition for the pivot (as recommended by R. Sedgewick).
Finding the median of three values is much easier than finding it for the whole collection of values, and for collections that have an even number of elements, it doesn't really matter which of the two 'middle' elements you choose as the potential pivot.