I know that when you resize an array by a scalar (like doubling the length of the array, then copying all elements into the new big array) the amortized time complexity is O(1).
But why is it the case that when you do it with a constant (say, resizing it by +10 each time) not O(1) as well?
Edit: https://www.cs.utexas.edu/~slaberge/docs/topics/amortized/dynamic_arrays/ this site seems to explain it, but I am very confused on the math. Where does big $N$ come from? I thought we were dealing with k?
If every kth consecutive insertions cost as much as the number of elements that are already in the array (denote by n+N*k where n is the initial size of the array) then you got sequences of this type:
n O(1) Operations
Expensive operation of O(n)
k O(1) Operations
Expensive operation of O(n+k)
k O(1) Operations
Expensive operation of (n+2k)
k O(1) Operations
Expensive operation of O(n+3k)
See where this is going? each expensive insertion happens every k insertions (expect first time) and costs as the current number of elements.
This means that after, lets simplify, n+A*k insertions we had A copies of n elements, and also we had A-1 copy of the first set of k elements, A-2 copies of the second set of k elements, and so on..
This sums up to O(An + A^2 * k). And because we did n+Ak, we can divide to get amortized cost.
This gives us (An + A^2 * k)/(n+Ak)=A
So, this implies that we are amortized dependent in this array on the NUMBER OF INSERTIONS, which is bad because we won't be able to state that this array, does a constant work in average.
Related
I think the speed of quick sort is less efficient when arranging an array with duplicate data, right? when datatype is char, the bigger the array(over 100000), the closer it gets to the n^2 order.
and assuming there is no duplicate data, to get the best case of a quick sort where the first element is placed as a pivot, first elementsI think we can recursively change the first and intermediate elements by dividing the already aligned array like a merge sort. right? is there general best case?
Lomuto partition scheme, which scans from one end to the other during partition, is slower with duplicates. If all the values are the same, then each partition step splits it into sizes 1 and n-1, a worst case scenario.
Hoare partition scheme, which scans from both both ends towards each other until the indexes (or iterators or pointers) cross, is usually faster with duplicates. Even though duplicates result in more swaps, each swap occurs just after reading and comparing two values to the pivot and are still in the cache for the swap (assuming object size is not huge). As the number of duplicates increases, the splitting improves towards the ideal case where each partition step splits the data into two equal halves. I ran a benchmark sorting 16 million 64 bit integers: with random data, it took about 1.37 seconds, improving with duplicates and with all values the same, it took about about 0.288 seconds.
Another alternative is a 3 way partition, which splits a partition into elements < pivot, elements == pivot, elements > pivot. If all the elements are the same, it's done in O(n) time. For n elements with only k possible values, then time complexity is O(n ⌈log3(k)⌉), and since k is constant, the time complexity is still O(n).
Wiki links:
https://en.wikipedia.org/wiki/Quicksort#Repeated_elements
https://en.wikipedia.org/wiki/Dutch_national_flag_problem
For clarification, first(where:) method keeps iterating through the sequence until it finds the satisfied element and returns it.
Based on that I would assume that it is not O(n) (linear time) because at some point it doesn't has to iterate through the whole sequence until its end.
You could check: What is the difference between filter(_:).first and first(where:)?
I'm not sure if it could be something relates to O(log n), AFAIK it has something to do with splitting into halves...
It would be great if someone could describe how we can determine the time complexity for such a process.
We are usually interested in the worst case running time of a program. Based on that, it should be O(n) as the worst case is when it iterates through all the elements.
On average you'll only have to check 1/2 of the values, so you'd think first(where:) would be O(1/2 N). But O() notation ignores constants. O(N) means it grows linearly as the number of elements grows. For 10 items, you'd check 5 on average, for 100, you'd check 50, for 1000, you'd check 500 on average. Connect the points (10,5), (100,50), (1000, 500). That's a straight line.
How is O(n)/n=1 in aggregate method of amortized analysis as given in the coursera course on data structures in lesson 5-Amortized analysis:Aggregate method?
Short answer
O(n)/n = cn/n = c = O(1)
Long answer
We use amortized analysis in order to analyze the cost of a sequence of operations rather that the cost of a single operation. In the last case we use asymptotic analysis (some of the asymptotic notations are: Theta, Big O, Big Omega, Little O and Little Omega), but it doesn't work that well when we come across a sequence of operations and want to understand the cost of that sequence.
The reason is that if we apply "regular" asymptotic analysis, our, for example, asymptotical upper bound in the worst case analysis might be too pessimistic. Classical example is inserting into a dynamic array. You insert elements into a dynamically allocated array and when it's full, you define new array (twice as big, for example) and copy all the elements. The thing is most of the insertions will work in constant time (or in O(1)), but when you need to redefine your array, it will take linear time (O(n)), because you need to copy all the elements.
So imagine that you insert n elements and you need to redefine your array only once, then you have n operations, each operation is O(n) in the worst case, hence the cost of the sequence of operations in the worst case is O(n^2), which seems too pessimistic considering the fact that most of your operations are O(1) in the worst case and only one of them is O(n).
We define the amortized cost of a sequence of operations as (cost of n operations) / n. In your case the cost of n operations is O(n) which is equal to cn (where c is some constant) just by the definition of the Big O notation, divide it by n and you get just c, which is equal to O(1) because, once again, c is just some constant.
I've been given two homework problems on the complexity of hash tables, but I'm struggling to understand the difference between them.
They are as follows:
Consider a hash function which is to take n inputs and map them to a table of size m.
Write the complexity of insert, search, and deletion for the hash function which distributes all n inputs evenly over the buckets of the hash table.
Write the complexity of insert, search, and deletion for the (supposedly perfect but unrealistic) hash function which will never has two items to the same bucket, i.e. this hash function will never result in a collision.
These two questions seem quite similar to me and I'm not really sure of their differences.
For question one, since the n inputs are distributed evenly we can assume there will be zero or one items in each bucket, so all of insert, search and delete will be O(1). Is this correct?
How then does question two differ in any way? If the function never results in a collision then all the items will be spread evenly so wouldn't this result in O(1) for each operation?
Is my thinking correct for these problems or am I missing something?
EDIT:
I believe I've identified where I've gone wrong. O(1) is correct for every operation in question 3 because the hash function is ideal and never results in collision.
However for question 2, the items are spread evenly BUT DOES NOT MEAN there is only 1 item in each bucket, every bucket could have 20 items in a linked list, for example. So insertion would be O(1).
But what about search? It would be O(1) + cost of searching the linked list. But we don't know the size, only know it's spread evenly. Can we get an expression for the length in terms of n (number of inputs) and m (size of table)?
Your edit is on the right track.
Can we get an expression for the length in terms of n (number of inputs) and m (size of table)?
For 1, if the hash table sizing is inhibited in some way that means the load factor (i.e. number of items per bucket) n/m is greater than 1 and not constant nor within constant bounds, then you can postulate a relationship m = f(n), then the load factor will be n / f(n), so the complexity will be O(n/f(n)) too.
In the second case, the complexity is always O(1).
Is it O(n) or O(n logn)? I have n elements that I need to setup in a hash table, what is the worst-case and average runtime?
Worst case is unlimited. You need to calculate hash codes and may have to compare elements, and the time for that is not limited.
Assuming that calculating hashes and comparing elements is constant time, for insertion the worst case is O (n^2). What saves you is the fact that the worst case would be exceedingly rare, assuming a halfway decent has function. Average time for a decent implementation is O (n).