What is the time complexity if we apply merge sort on an array containing all the elements same? - mergesort

Given array is 1,1,1,1,1,1,1,1,1,1
If we apply merge sort on this array,what will be the time complexity in big oh notation?

The time complexity of generic merge sort does not depend on the contents of the array. It performs N.log(N) comparisons and moves. Some optimized versions might detect special cases and execute in linear time O(N).

Related

Why are bloom filters not implemented like count-min sketch?

So I only recently learned about these, but from what I understood counting bloom filters are very similar to count-min sketches. The difference being that the former use a single array for all hash functions and the latter use an array per hash function.
If using separate arrays for each hash function will result in less collisions and reduce false positives, why are counting bloom filters not implemented as such?
Though both are space-efficient probabilistic data structures, BloomFilter and Count-min-sketch solve diff use-cases.
BloomFilter is used to test whether an element is a member of a set or not. and It gives boolean False positive results. False positive means, it might tell that a given element is already present but actually, it’s not. See here for working details: https://www.geeksforgeeks.org/bloom-filters-introduction-and-python-implementation/
Count-min-sketch tells about keeping track of the count of things i.e, How many times an element is present in a set. See here for working details: https://www.geeksforgeeks.org/count-min-sketch-in-java-with-examples/
I would like to add to #roottraveller answer and try to answer the OP question. First, I find the following resources really helpful for understanding the basic difference between Bloom Filter, Counting Bloom Filter and Count-min Sketch: https://octo.vmware.com/bloom-filter/
As can be find the document:
Bloom Filter is used to test whether an element is a member of a set or not
Count-min-sketch is a probabilistic data structure that serves as a frequency table of events in a stream of data
Counting Bloom Filter an extension of the Bloom filter that allows deletion of elements by storing the frequency of occurrence
So, in short, Counting Bloom Filter only supports deletion of elements and cannot return the frequency of elements. Only CM sketch can return the frequency of elements. And, to answer OP question, sketches are a family of probabilistic data structures that deals with data stream with efficient space time complexity and they have always been constructed using an array per hash function. (https://www.sciencedirect.com/science/article/abs/pii/S0196677403001913)

Is there a solution to creating a perfect hash table for non-finite inputs?

So hash tables are really cool for constant-time lookups of data in sets, but as I understand they are limited by possible hashing collisions which leads to increased small amounts of time-complexity.
It seems to me like any hashing function that supports a non-finite range of inputs is really a heuristic for reducing collision. Are there any absolute limitations to creating a perfect hash table for any range of inputs, or is it just something that no one has figured out yet?
I think this depends on what you mean by "any range of inputs."
If your goal is to create a hash function that can take in anything and never produce a collision, then there's no way to do what you're asking. This is a consequence of the pigeonhole principle - if you have n objects that can be hashed, you need at least n distinct outputs for your hash function or you're forced to get at least one hash collision. If there are infinitely many possible input objects, then no finite hash table could be built that will always avoid collisions.
On the other hand, if your goal is to build a hash table where lookups are worst-case O(1) (that is, you only have to look at a fixed number of locations to find any element), then there are many different options available. You could use a dynamic perfect hash table or a cuckoo hash table, which supports worst-case O(1) lookups and expected O(1) insertions and deletions. These hash tables work by using a variety of different hash functions rather than any one fixed hash function, which helps circumvent the above restriction.
Hope this helps!

Separate chain Hashing for avoiding Hash collision

My knowledge of hash tables is limited and I am currently learning it. I have a question on Hash collision resolution by open hashing or separate chain hashing.
I understand that the hash buckets in this case hold the pointer to the linked list where all the elements that map into the same key are linked. so the search complexity would be in the order of o(n) where n is the number of elements in the linked list. Is there a way to make this simpler ?
Also if there is a constraint on the size of the linked list, say it can hold only 5 elements max and if more than 5 elements hash into the same bucket, what would be the best way to handle this scenario ?
Any pointers for learning more on the above and any help would be greatly appreciated.
Hash collisions shouldn't be too common, otherwise you're doing something wrong (e.g. a bad hash function or not a big enough hash table). So the number of elements in each linked-list should be minimal and the O(n) complexity shouldn't be too bad.
You could theoretically replace it with one of many other data structures. A binary search tree, for example, would get O(log n) search time (assuming the items are comparable), but then insert time will be up to O(log n) instead of O(1), and it would take more space.
There should be no maximum on the number of elements in a list. If there were, you could probably resort to probing (e.g. linear probing), but deletions could be a nightmare as you may need to move elements around quite a bit.

Time complexity of QuickSort+Insertion sort hybrid algorithm?

I am implementing an algorithm that perform Quick sort with Leftmost pivot selection up to a certain limit and when the list of arrays becomes almost sorted, I will use Insertion sort to sort those elements.
For left most pivot selection,I know the Average case complexity of Quick sort is O(nlogn) and worst case complexity ,i.e. when the list is almost sorted, is O(n^2). On the other hand, Insertion sort is very efficient on almost sorted list of elements with a complexity is O(n).
SO I think the complexity of this hybrid algorithm should be O(n). Am I correct?
The most important thing for the performance of qsort is picking a good pivot above all. This means choosing an element that's as close to the average of the elements you're sorting as possible.
The worse case of O(n2) in qsort comes about from consistently choosing 'bad' pivots every time for each partition pass. This causes the partitions to be extremely lopsided rather than balanced eg. 1 : n-1 element partition ratio.
I don't see how adding insertion sort into the mix as you've describe would help or mitigate this problem.

When should I choose Vector in Scala?

It seems that Vector was late to the Scala collections party, and all the influential blog posts had already left.
In Java ArrayList is the default collection - I might use LinkedList but only when I've thought through an algorithm and care enough to optimise. In Scala should I be using Vector as my default Seq, or trying to work out when List is actually more appropriate?
As a general rule, default to using Vector. It’s faster than List for almost everything and more memory-efficient for larger-than-trivial sized sequences. See this documentation of the relative performance of Vector compared to the other collections. There are some downsides to going with Vector. Specifically:
Updates at the head are slower than List (though not by as much as you might think)
Another downside before Scala 2.10 was that pattern matching support was better for List, but this was rectified in 2.10 with generalized +: and :+ extractors.
There is also a more abstract, algebraic way of approaching this question: what sort of sequence do you conceptually have? Also, what are you conceptually doing with it? If I see a function that returns an Option[A], I know that function has some holes in its domain (and is thus partial). We can apply this same logic to collections.
If I have a sequence of type List[A], I am effectively asserting two things. First, my algorithm (and data) is entirely stack-structured. Second, I am asserting that the only things I’m going to do with this collection are full, O(n) traversals. These two really go hand-in-hand. Conversely, if I have something of type Vector[A], the only thing I am asserting is that my data has a well defined order and a finite length. Thus, the assertions are weaker with Vector, and this leads to its greater flexibility.
Well, a List can be incredibly fast if the algorithm can be implemented solely with ::, head and tail. I had an object lesson of that very recently, when I beat Java's split by generating a List instead of an Array, and couldn't beat that with anything else.
However, List has a fundamental problem: it doesn't work with parallel algorithms. I cannot split a List into multiple segments, or concatenate it back, in an efficient manner.
There are other kinds of collections that can handle parallelism much better -- and Vector is one of them. Vector also has great locality -- which List doesn't -- which can be a real plus for some algorithms.
So, all things considered, Vector is the best choice unless you have specific considerations that make one of the other collections preferable -- for example, you might choose Stream if you want lazy evaluation and caching (Iterator is faster but doesn't cache), or List if the algorithm is naturally implemented with the operations I mentioned.
By the way, it is preferable to use Seq or IndexedSeq unless you want a specific piece of API (such as List's ::), or even GenSeq or GenIndexedSeq if your algorithm can be run in parallel.
Some of the statements here are confusing or even wrong, especially the idea that immutable.Vector in Scala is anything like an ArrayList.
List and Vector are both immutable, persistent (i.e. "cheap to get a modified copy") data structures.
There is no reasonable default choice as their might be for mutable data structures, but it rather depends on what your algorithm is doing.
List is a singly linked list, while Vector is a base-32 integer trie, i.e. it is a kind of search tree with nodes of degree 32.
Using this structure, Vector can provide most common operations reasonably fast, i.e. in O(log_32(n)). That works for prepend, append, update, random access, decomposition in head/tail. Iteration in sequential order is linear.
List on the other hand just provides linear iteration and constant time prepend, decomposition in head/tail. Everything else takes in general linear time.
This might look like as if Vector was a good replacement for List in almost all cases, but prepend, decomposition and iteration are often the crucial operations on sequences in a functional program, and the constants of these operations are (much) higher for vector due to its more complicated structure.
I made a few measurements, so iteration is about twice as fast for list, prepend is about 100 times faster on lists, decomposition in head/tail is about 10 times faster on lists and generation from a traversable is about 2 times faster for vectors. (This is probably, because Vector can allocate arrays of 32 elements at once when you build it up using a builder instead of prepending or appending elements one by one).
Of course all operations that take linear time on lists but effectively constant time on vectors (as random access or append) will be prohibitively slow on large lists.
So which data structure should we use?
Basically, there are four common cases:
We only need to transform sequences by operations like map, filter, fold etc:
basically it does not matter, we should program our algorithm generically and might even benefit from accepting parallel sequences. For sequential operations List is probably a bit faster. But you should benchmark it if you have to optimize.
We need a lot of random access and different updates, so we should use vector, list will be prohibitively slow.
We operate on lists in a classical functional way, building them by prepending and iterating by recursive decomposition: use list, vector will be slower by a factor 10-100 or more.
We have an performance critical algorithm that is basically imperative and does a lot of random access on a list, something like in place quick-sort: use an imperative data structure, e.g. ArrayBuffer, locally and copy your data from and to it.
For immutable collections, if you want a sequence, your main decision is whether to use an IndexedSeq or a LinearSeq, which give different guarantees for performance. An IndexedSeq provides fast random-access of elements and a fast length operation. A LinearSeq provides fast access only to the first element via head, but also has a fast tail operation. (Taken from the Seq documentation.)
For an IndexedSeq you would normally choose a Vector. Ranges and WrappedStrings are also IndexedSeqs.
For a LinearSeq you would normally choose a List or its lazy equivalent Stream. Other examples are Queues and Stacks.
So in Java terms, ArrayList used similarly to Scala's Vector, and LinkedList similarly to Scala's List. But in Scala I would tend to use List more often than Vector, because Scala has much better support for functions that include traversal of the sequence, like mapping, folding, iterating etc. You will tend to use these functions to manipulate the list as a whole, rather than randomly accessing individual elements.
In situations which involve a lot random access and random mutation, a Vector (or – as the docs say – a Seq) seems to be a good compromise. This is also what the performance characteristics suggest.
Also, the Vector class seems to play nicely in distributed environments without much data duplication because there is no need to do a copy-on-write for the complete object. (See: http://akka.io/docs/akka/1.1.3/scala/stm.html#persistent-datastructures)
If you're programming immutably and need random access, Seq is the way to go (unless you want a Set, which you often actually do). Otherwise List works well, except it's operations can't be parallelized.
If you don't need immutable data structures, stick with ArrayBuffer since it's the Scala equivalent to ArrayList.