Time complexity of QuickSort+Insertion sort hybrid algorithm? - quicksort

I am implementing an algorithm that perform Quick sort with Leftmost pivot selection up to a certain limit and when the list of arrays becomes almost sorted, I will use Insertion sort to sort those elements.
For left most pivot selection,I know the Average case complexity of Quick sort is O(nlogn) and worst case complexity ,i.e. when the list is almost sorted, is O(n^2). On the other hand, Insertion sort is very efficient on almost sorted list of elements with a complexity is O(n).
SO I think the complexity of this hybrid algorithm should be O(n). Am I correct?

The most important thing for the performance of qsort is picking a good pivot above all. This means choosing an element that's as close to the average of the elements you're sorting as possible.
The worse case of O(n2) in qsort comes about from consistently choosing 'bad' pivots every time for each partition pass. This causes the partitions to be extremely lopsided rather than balanced eg. 1 : n-1 element partition ratio.
I don't see how adding insertion sort into the mix as you've describe would help or mitigate this problem.

Related

quick sort is slower than merge sort

I think the speed of quick sort is less efficient when arranging an array with duplicate data, right? when datatype is char, the bigger the array(over 100000), the closer it gets to the n^2 order.
and assuming there is no duplicate data, to get the best case of a quick sort where the first element is placed as a pivot, first elementsI think we can recursively change the first and intermediate elements by dividing the already aligned array like a merge sort. right? is there general best case?
Lomuto partition scheme, which scans from one end to the other during partition, is slower with duplicates. If all the values are the same, then each partition step splits it into sizes 1 and n-1, a worst case scenario.
Hoare partition scheme, which scans from both both ends towards each other until the indexes (or iterators or pointers) cross, is usually faster with duplicates. Even though duplicates result in more swaps, each swap occurs just after reading and comparing two values to the pivot and are still in the cache for the swap (assuming object size is not huge). As the number of duplicates increases, the splitting improves towards the ideal case where each partition step splits the data into two equal halves. I ran a benchmark sorting 16 million 64 bit integers: with random data, it took about 1.37 seconds, improving with duplicates and with all values the same, it took about about 0.288 seconds.
Another alternative is a 3 way partition, which splits a partition into elements < pivot, elements == pivot, elements > pivot. If all the elements are the same, it's done in O(n) time. For n elements with only k possible values, then time complexity is O(n ⌈log3(k)⌉), and since k is constant, the time complexity is still O(n).
Wiki links:
https://en.wikipedia.org/wiki/Quicksort#Repeated_elements
https://en.wikipedia.org/wiki/Dutch_national_flag_problem

Hash table O(1) amortized or O(1) average amortized?

This question may seem a bit pedantic but i've been really trying to dive deeper into Amortized analysis and am a bit confused as to why insert for a hash table is O(1) amortized.(Note: Im not talking about table doubling, I understand that)
Using this definition, "Amortized analysis gives the average performance (over time) of each operation in the worst case." It seems like the worst case for N inserts into a hashtable would result in a collision for every operation. I believe universal hashing guarantees collision at a rate of 1/m when the load balance is kept low, but isn't it still theoretically possible to get a collision for every insert?
It seems like technically the average amortized analysis for hashtable's insert is O(1).
Edit: You can assume the hashtable uses basic chaining where the element is placed at the end of the corresponding linked list. The real meat of my question refers to amortized analysis on probabilistic algorithms.
Edit 2:
I found this post on quicksort,
"Also there’s a subtle but important difference between amortized running time and expected running time. Quicksort with random pivots takes O(n log n) expected running time, but its worst-case running time is in Θ(n^2). This means that there is a small possibility that quicksort will cost (n^2) dollars, but the probability that this will happen approaches zero as n grows large." I think this probably answers my question.
You could theoretically get a collision every insert but that would mean that you had a poor performing hashing function that failed to space out values across the "buckets" for keys. A theoretically perfect hash function would always put a new value into a new bucket so that each key would refer to it's own bucket. (I am assuming a chained hash table and referring to the chain field as a "bucket", just how I was taught). A theoretically worst case function would stick all keys into the same bucket leading to a chain in that bucket of length N.
The idea behind the amortization is that given a reasonably good hashing function you should end up with a linear time for insert because the amount of times that insertion is > O(1) would be greatly dwarfed by the number of times that insertion is simple and O(1). That is not to say that insertion is without any calculation (the hash function still has to be calculated and in some special cases hash functions can be more calc heavy than just looking through a list).
At the end of the day this brings us to an important concept in big-O which is the idea that when calculating time complexity you need to look at the most frequently executed action. In this case that is the insertion of a value that does not collide with another hash.

What is the runtime for initializing a hash table with n elements?

Is it O(n) or O(n logn)? I have n elements that I need to setup in a hash table, what is the worst-case and average runtime?
Worst case is unlimited. You need to calculate hash codes and may have to compare elements, and the time for that is not limited.
Assuming that calculating hashes and comparing elements is constant time, for insertion the worst case is O (n^2). What saves you is the fact that the worst case would be exceedingly rare, assuming a halfway decent has function. Average time for a decent implementation is O (n).

Is QuickSort really the fastet sorting technique

Hello all this is my very first question here. I am new to datastructure and algorithms my teacher asked me to compare time complexity of different algorithms including: merge sort, heap sort, insertion sort, and quick sort. I search over internet and find out that quick sort is the fastest of all but my version of quick sort is the slowest of all (it sort 100 random integers in almost 1 second while my other sorting algorithms took almost 0 second). I tweak my quick sort logic many times (taking first value as pivot than tried to take middle value as pivot but in vain) I finally search the code over internet and there was not much difference in my code and code on internet. Now I really am confused that if this is behaviour of quick sort is natural (I mean whatever your logic is you are gonna get same results.) or there are some specific situations where you should use quick sort. In the end I know my question is not clear (I don't know how to ask besides my english is also not very good.) I hope someone can help me I really wanted to attach picture of awkward result I am having but I can't (reputation < 10).
Theoretically, quicksort is supposed to be the fastest algorithm for sorting, with a runtime of O(nlogn). It's worst case would be O(n^2), but only occurs if there are repeated values are equal to the pivot.
In your situation, I can only assume that your pivot value is not ideal in your data array, but is still able to sort the values using that pivot. Otherwise, your quicksort implementation is unfortunately incorrect.
Quicksort has O(n^2) worst-case runtime and O(nlogn) average case runtime. A good reason why Quicksort is so fast in practice compared to most other O(nlogn) algorithms such as Heapsort, is because it is relatively cache-efficient. Its running time is actually O(n/Blog(n/B)), where B is the block size. Heapsort, on the other hand, doesn't have any such speedup: it's not at all accessing memory cache-efficiently.
The value you choose as pivot may not be appropriate hence your sorting may be taking some time.You can avoid quicksort’s worst-case run time of O(n^2) almost entirely by using an appropriate choice of the pivot – such as picking it at random.
Also , the best and worst case often are extremes rarely occurring in practice.But any average case analysis assume some distribution of inputs. For sorting, the typical choice is the random permutation model (as assumed on Wikipedia).

Separate chain Hashing for avoiding Hash collision

My knowledge of hash tables is limited and I am currently learning it. I have a question on Hash collision resolution by open hashing or separate chain hashing.
I understand that the hash buckets in this case hold the pointer to the linked list where all the elements that map into the same key are linked. so the search complexity would be in the order of o(n) where n is the number of elements in the linked list. Is there a way to make this simpler ?
Also if there is a constraint on the size of the linked list, say it can hold only 5 elements max and if more than 5 elements hash into the same bucket, what would be the best way to handle this scenario ?
Any pointers for learning more on the above and any help would be greatly appreciated.
Hash collisions shouldn't be too common, otherwise you're doing something wrong (e.g. a bad hash function or not a big enough hash table). So the number of elements in each linked-list should be minimal and the O(n) complexity shouldn't be too bad.
You could theoretically replace it with one of many other data structures. A binary search tree, for example, would get O(log n) search time (assuming the items are comparable), but then insert time will be up to O(log n) instead of O(1), and it would take more space.
There should be no maximum on the number of elements in a list. If there were, you could probably resort to probing (e.g. linear probing), but deletions could be a nightmare as you may need to move elements around quite a bit.