What is the runtime for initializing a hash table with n elements? - hash

Is it O(n) or O(n logn)? I have n elements that I need to setup in a hash table, what is the worst-case and average runtime?

Worst case is unlimited. You need to calculate hash codes and may have to compare elements, and the time for that is not limited.
Assuming that calculating hashes and comparing elements is constant time, for insertion the worst case is O (n^2). What saves you is the fact that the worst case would be exceedingly rare, assuming a halfway decent has function. Average time for a decent implementation is O (n).

Related

Amortization complexity when resizing arrays by a constant?

I know that when you resize an array by a scalar (like doubling the length of the array, then copying all elements into the new big array) the amortized time complexity is O(1).
But why is it the case that when you do it with a constant (say, resizing it by +10 each time) not O(1) as well?
Edit: https://www.cs.utexas.edu/~slaberge/docs/topics/amortized/dynamic_arrays/ this site seems to explain it, but I am very confused on the math. Where does big $N$ come from? I thought we were dealing with k?
If every kth consecutive insertions cost as much as the number of elements that are already in the array (denote by n+N*k where n is the initial size of the array) then you got sequences of this type:
n O(1) Operations
Expensive operation of O(n)
k O(1) Operations
Expensive operation of O(n+k)
k O(1) Operations
Expensive operation of (n+2k)
k O(1) Operations
Expensive operation of O(n+3k)
See where this is going? each expensive insertion happens every k insertions (expect first time) and costs as the current number of elements.
This means that after, lets simplify, n+A*k insertions we had A copies of n elements, and also we had A-1 copy of the first set of k elements, A-2 copies of the second set of k elements, and so on..
This sums up to O(An + A^2 * k). And because we did n+Ak, we can divide to get amortized cost.
This gives us (An + A^2 * k)/(n+Ak)=A
So, this implies that we are amortized dependent in this array on the NUMBER OF INSERTIONS, which is bad because we won't be able to state that this array, does a constant work in average.

quick sort is slower than merge sort

I think the speed of quick sort is less efficient when arranging an array with duplicate data, right? when datatype is char, the bigger the array(over 100000), the closer it gets to the n^2 order.
and assuming there is no duplicate data, to get the best case of a quick sort where the first element is placed as a pivot, first elementsI think we can recursively change the first and intermediate elements by dividing the already aligned array like a merge sort. right? is there general best case?
Lomuto partition scheme, which scans from one end to the other during partition, is slower with duplicates. If all the values are the same, then each partition step splits it into sizes 1 and n-1, a worst case scenario.
Hoare partition scheme, which scans from both both ends towards each other until the indexes (or iterators or pointers) cross, is usually faster with duplicates. Even though duplicates result in more swaps, each swap occurs just after reading and comparing two values to the pivot and are still in the cache for the swap (assuming object size is not huge). As the number of duplicates increases, the splitting improves towards the ideal case where each partition step splits the data into two equal halves. I ran a benchmark sorting 16 million 64 bit integers: with random data, it took about 1.37 seconds, improving with duplicates and with all values the same, it took about about 0.288 seconds.
Another alternative is a 3 way partition, which splits a partition into elements < pivot, elements == pivot, elements > pivot. If all the elements are the same, it's done in O(n) time. For n elements with only k possible values, then time complexity is O(n ⌈log3(k)⌉), and since k is constant, the time complexity is still O(n).
Wiki links:
https://en.wikipedia.org/wiki/Quicksort#Repeated_elements
https://en.wikipedia.org/wiki/Dutch_national_flag_problem

Hash table O(1) amortized or O(1) average amortized?

This question may seem a bit pedantic but i've been really trying to dive deeper into Amortized analysis and am a bit confused as to why insert for a hash table is O(1) amortized.(Note: Im not talking about table doubling, I understand that)
Using this definition, "Amortized analysis gives the average performance (over time) of each operation in the worst case." It seems like the worst case for N inserts into a hashtable would result in a collision for every operation. I believe universal hashing guarantees collision at a rate of 1/m when the load balance is kept low, but isn't it still theoretically possible to get a collision for every insert?
It seems like technically the average amortized analysis for hashtable's insert is O(1).
Edit: You can assume the hashtable uses basic chaining where the element is placed at the end of the corresponding linked list. The real meat of my question refers to amortized analysis on probabilistic algorithms.
Edit 2:
I found this post on quicksort,
"Also there’s a subtle but important difference between amortized running time and expected running time. Quicksort with random pivots takes O(n log n) expected running time, but its worst-case running time is in Θ(n^2). This means that there is a small possibility that quicksort will cost (n^2) dollars, but the probability that this will happen approaches zero as n grows large." I think this probably answers my question.
You could theoretically get a collision every insert but that would mean that you had a poor performing hashing function that failed to space out values across the "buckets" for keys. A theoretically perfect hash function would always put a new value into a new bucket so that each key would refer to it's own bucket. (I am assuming a chained hash table and referring to the chain field as a "bucket", just how I was taught). A theoretically worst case function would stick all keys into the same bucket leading to a chain in that bucket of length N.
The idea behind the amortization is that given a reasonably good hashing function you should end up with a linear time for insert because the amount of times that insertion is > O(1) would be greatly dwarfed by the number of times that insertion is simple and O(1). That is not to say that insertion is without any calculation (the hash function still has to be calculated and in some special cases hash functions can be more calc heavy than just looking through a list).
At the end of the day this brings us to an important concept in big-O which is the idea that when calculating time complexity you need to look at the most frequently executed action. In this case that is the insertion of a value that does not collide with another hash.

How is O(n)/n=1 in aggregate method of amortized analysis

How is O(n)/n=1 in aggregate method of amortized analysis as given in the coursera course on data structures in lesson 5-Amortized analysis:Aggregate method?
Short answer
O(n)/n = cn/n = c = O(1)
Long answer
We use amortized analysis in order to analyze the cost of a sequence of operations rather that the cost of a single operation. In the last case we use asymptotic analysis (some of the asymptotic notations are: Theta, Big O, Big Omega, Little O and Little Omega), but it doesn't work that well when we come across a sequence of operations and want to understand the cost of that sequence.
The reason is that if we apply "regular" asymptotic analysis, our, for example, asymptotical upper bound in the worst case analysis might be too pessimistic. Classical example is inserting into a dynamic array. You insert elements into a dynamically allocated array and when it's full, you define new array (twice as big, for example) and copy all the elements. The thing is most of the insertions will work in constant time (or in O(1)), but when you need to redefine your array, it will take linear time (O(n)), because you need to copy all the elements.
So imagine that you insert n elements and you need to redefine your array only once, then you have n operations, each operation is O(n) in the worst case, hence the cost of the sequence of operations in the worst case is O(n^2), which seems too pessimistic considering the fact that most of your operations are O(1) in the worst case and only one of them is O(n).
We define the amortized cost of a sequence of operations as (cost of n operations) / n. In your case the cost of n operations is O(n) which is equal to cn (where c is some constant) just by the definition of the Big O notation, divide it by n and you get just c, which is equal to O(1) because, once again, c is just some constant.

Time complexity of QuickSort+Insertion sort hybrid algorithm?

I am implementing an algorithm that perform Quick sort with Leftmost pivot selection up to a certain limit and when the list of arrays becomes almost sorted, I will use Insertion sort to sort those elements.
For left most pivot selection,I know the Average case complexity of Quick sort is O(nlogn) and worst case complexity ,i.e. when the list is almost sorted, is O(n^2). On the other hand, Insertion sort is very efficient on almost sorted list of elements with a complexity is O(n).
SO I think the complexity of this hybrid algorithm should be O(n). Am I correct?
The most important thing for the performance of qsort is picking a good pivot above all. This means choosing an element that's as close to the average of the elements you're sorting as possible.
The worse case of O(n2) in qsort comes about from consistently choosing 'bad' pivots every time for each partition pass. This causes the partitions to be extremely lopsided rather than balanced eg. 1 : n-1 element partition ratio.
I don't see how adding insertion sort into the mix as you've describe would help or mitigate this problem.