Is QuickSort really the fastet sorting technique - quicksort

Hello all this is my very first question here. I am new to datastructure and algorithms my teacher asked me to compare time complexity of different algorithms including: merge sort, heap sort, insertion sort, and quick sort. I search over internet and find out that quick sort is the fastest of all but my version of quick sort is the slowest of all (it sort 100 random integers in almost 1 second while my other sorting algorithms took almost 0 second). I tweak my quick sort logic many times (taking first value as pivot than tried to take middle value as pivot but in vain) I finally search the code over internet and there was not much difference in my code and code on internet. Now I really am confused that if this is behaviour of quick sort is natural (I mean whatever your logic is you are gonna get same results.) or there are some specific situations where you should use quick sort. In the end I know my question is not clear (I don't know how to ask besides my english is also not very good.) I hope someone can help me I really wanted to attach picture of awkward result I am having but I can't (reputation < 10).

Theoretically, quicksort is supposed to be the fastest algorithm for sorting, with a runtime of O(nlogn). It's worst case would be O(n^2), but only occurs if there are repeated values are equal to the pivot.
In your situation, I can only assume that your pivot value is not ideal in your data array, but is still able to sort the values using that pivot. Otherwise, your quicksort implementation is unfortunately incorrect.

Quicksort has O(n^2) worst-case runtime and O(nlogn) average case runtime. A good reason why Quicksort is so fast in practice compared to most other O(nlogn) algorithms such as Heapsort, is because it is relatively cache-efficient. Its running time is actually O(n/Blog(n/B)), where B is the block size. Heapsort, on the other hand, doesn't have any such speedup: it's not at all accessing memory cache-efficiently.
The value you choose as pivot may not be appropriate hence your sorting may be taking some time.You can avoid quicksort’s worst-case run time of O(n^2) almost entirely by using an appropriate choice of the pivot – such as picking it at random.
Also , the best and worst case often are extremes rarely occurring in practice.But any average case analysis assume some distribution of inputs. For sorting, the typical choice is the random permutation model (as assumed on Wikipedia).

Related

Hash table O(1) amortized or O(1) average amortized?

This question may seem a bit pedantic but i've been really trying to dive deeper into Amortized analysis and am a bit confused as to why insert for a hash table is O(1) amortized.(Note: Im not talking about table doubling, I understand that)
Using this definition, "Amortized analysis gives the average performance (over time) of each operation in the worst case." It seems like the worst case for N inserts into a hashtable would result in a collision for every operation. I believe universal hashing guarantees collision at a rate of 1/m when the load balance is kept low, but isn't it still theoretically possible to get a collision for every insert?
It seems like technically the average amortized analysis for hashtable's insert is O(1).
Edit: You can assume the hashtable uses basic chaining where the element is placed at the end of the corresponding linked list. The real meat of my question refers to amortized analysis on probabilistic algorithms.
Edit 2:
I found this post on quicksort,
"Also there’s a subtle but important difference between amortized running time and expected running time. Quicksort with random pivots takes O(n log n) expected running time, but its worst-case running time is in Θ(n^2). This means that there is a small possibility that quicksort will cost (n^2) dollars, but the probability that this will happen approaches zero as n grows large." I think this probably answers my question.
You could theoretically get a collision every insert but that would mean that you had a poor performing hashing function that failed to space out values across the "buckets" for keys. A theoretically perfect hash function would always put a new value into a new bucket so that each key would refer to it's own bucket. (I am assuming a chained hash table and referring to the chain field as a "bucket", just how I was taught). A theoretically worst case function would stick all keys into the same bucket leading to a chain in that bucket of length N.
The idea behind the amortization is that given a reasonably good hashing function you should end up with a linear time for insert because the amount of times that insertion is > O(1) would be greatly dwarfed by the number of times that insertion is simple and O(1). That is not to say that insertion is without any calculation (the hash function still has to be calculated and in some special cases hash functions can be more calc heavy than just looking through a list).
At the end of the day this brings us to an important concept in big-O which is the idea that when calculating time complexity you need to look at the most frequently executed action. In this case that is the insertion of a value that does not collide with another hash.

The best threshold for a Mergesort to switch to an insertion sort at?

I've been working on a piece of java code to determine the best threshold for a mergesort to switch to insertion sort at and my results have been less than satisfactory.
The tests I'm running take nearly an hour and produce data which doesn't really represent any particular pattern to me. So I'm hoping to ask what I should expect for the best threshold. Should it be constant? Should it be N/(some number)? is it constant after a certain N value? Roughly what would you expect?
(if it matters I am comparing Integer objects in java)
It depends somewhat on your actual hardware.
The best approach is to benchmark on your target hardware.
It's usually between 10 and 50, but test between 10 and 100.
In implementations I worked on some time ago, the threshold was 22 items.

improve hashing using genetic programming/algorithm

I'm writing a program which can significantly lessen the number of collisions that occur while using hash functions like 'key mod table_size'. For this I would like to use Genetic Programming/Algorithm. But I don't know much about it. Even after reading many articles and examples I don't know that in my case (as in program definition) what would be the fitness function, target (target is usually the required result), what would pose as the population/individuals and parents, etc.
Please help me in identifying the above and with a few codes/pseudo-codes snippets if possible as this is my project.
Its not necessary to be using genetic programming/algorithm, it can be anything using evolutionary programming/algorithm.
thanks..
My advice would be: don't do this that way. The literature on hash functions is vast and we more or less understand what makes a good hash function. We know enough mathematics not to look for them blindly.
If you need a hash function to use, there is plenty to choose from.
However, if this is your uni project and you cannot possibly change the subject or steer it in a more manageable direction, then as you noticed there will be complex issues of getting fitness function and mutation operators right. As far as I can tell off the top of my head, there are no obvious candidates.
You may look up e.g. 'strict avalanche criterion' and try to see if you can reason about it in terms of fitness and mutations.
Another question is how do you want to represent your function? Just a boolean expression? Something built from word operations like AND, XOR, NOT, ROT ?
Depending on your constraints (or rather, assumptions) the question of fitness and mutation will be different.
Broadly fitness is clearly minimize the number of collisions in your 'hash modulo table-size' model.
The obvious part is to take a suitably large and (very important) representative distribution of keys and chuck them through your 'candidate' function.
Then you might pass them through 'hash modulo table-size' for one or more values of table-size and evaluate some measure of 'niceness' of the arising distribution(s).
So what that boils down to is what table-sizes to try and what niceness measure to apply.
Niceness is context dependent.
You might measure 'fullest bucket' as a measure of 'worst case' insert/find time.
You might measure sum of squares of bucket sizes as a measure of 'average' insert/find time based on uniform distribution of amongst the keys look-up.
Finally you would need to decide what table-size (or sizes) to test at.
Conventional wisdom often uses primes because hash modulo prime tends to be nicely volatile to all the bits in hash where as something like hash modulo 2^n only involves the lower n-1 bits.
To keep computation down you might consider the series of next prime larger than each power of two. 5(>2^2) 11 (>2^3), 17 (>2^4) , etc. up to and including the first power of 2 greater than your 'sample' size.
There are other ways of considering fitness but without a practical application the question is (of course) ill-defined.
If your 'space' of potential hash functions don't all have the same execution time you should also factor in 'cost'.
It's fairly easy to define very good hash functions but execution time can be a significant factor.

Is there any formula to calculate the no of passes that a Quick Sort algorithm will take?

While working with Quick Sort algorithm I wondered whether any formula or some kind of stuff might be available for finding the no of passes that a particular set of values may take to completely sorted in ascending order.
Is there any formula to calculate the no of passes that a Quick Sort algorithm will take?
Any given set of values will have a different number of operations, based on pivot value selection method, and the actual values being sorted.
So...no, unless the approximations of 'between O(N log(N)) and O(N^2)' is good enough.
That one has to qualify the average versus worst case should be enough to show that the only way to determine the number of operations is to actually run the quicksort.

Time complexity of QuickSort+Insertion sort hybrid algorithm?

I am implementing an algorithm that perform Quick sort with Leftmost pivot selection up to a certain limit and when the list of arrays becomes almost sorted, I will use Insertion sort to sort those elements.
For left most pivot selection,I know the Average case complexity of Quick sort is O(nlogn) and worst case complexity ,i.e. when the list is almost sorted, is O(n^2). On the other hand, Insertion sort is very efficient on almost sorted list of elements with a complexity is O(n).
SO I think the complexity of this hybrid algorithm should be O(n). Am I correct?
The most important thing for the performance of qsort is picking a good pivot above all. This means choosing an element that's as close to the average of the elements you're sorting as possible.
The worse case of O(n2) in qsort comes about from consistently choosing 'bad' pivots every time for each partition pass. This causes the partitions to be extremely lopsided rather than balanced eg. 1 : n-1 element partition ratio.
I don't see how adding insertion sort into the mix as you've describe would help or mitigate this problem.