Does Quicksort preserve the original order if the array is already sorted? - quicksort

I have an array that is sorted in most of the cases, but not always. So I still need to use an sorting algorithm to guarantee that the elements are in ascending order.
I know that QuickSort is not stable, so the relative order of elements with the same value may change. But I need to know if it preserves the original order of the elements in an array that is ALREADY sorted.
I'm using C++, so I can simply use std::stable_sort (MergeSort) instead of std::sort (QuickSort). But this is more a matter of curiosity than efficiency, as I couldn't find an answer to my question.

Related

Unusual hash map implementation

Does anyone know the original hash table implementation?
Every realization I've found is based on separate chaining or open addressing methods
Chaining, by Hans Peter Luhn, in 1953.
https://en.wikipedia.org/wiki/Hash_table#History
The first implementation, not that the most common, is probably the one that uses an array (which is resized as needed) where each entry points to a list of elements.
The hash code, computed mod the size of the array, points to the integer index at which the list of the element to be searched is located. In case of hash code collision, the elements will accumulate in the list of the related entry.
So, once the hash code is computed, we have O(1) for accessing the entry of the array and O(N) for the actual search of the element in the list by verifying its actual equality. The value of N must be kept low for obvious performance consequences.
In case the collision becomes high we resize the array by increasing the number of entries and decreasing the collisions accordingly. This occurs as the hash code mod a higher number than the previous one is computed.
Some more complicated implementations convert the lists to trees if they become too long so that O(N) to O(log(N)) for equality search.

Why are bloom filters not implemented like count-min sketch?

So I only recently learned about these, but from what I understood counting bloom filters are very similar to count-min sketches. The difference being that the former use a single array for all hash functions and the latter use an array per hash function.
If using separate arrays for each hash function will result in less collisions and reduce false positives, why are counting bloom filters not implemented as such?
Though both are space-efficient probabilistic data structures, BloomFilter and Count-min-sketch solve diff use-cases.
BloomFilter is used to test whether an element is a member of a set or not. and It gives boolean False positive results. False positive means, it might tell that a given element is already present but actually, it’s not. See here for working details: https://www.geeksforgeeks.org/bloom-filters-introduction-and-python-implementation/
Count-min-sketch tells about keeping track of the count of things i.e, How many times an element is present in a set. See here for working details: https://www.geeksforgeeks.org/count-min-sketch-in-java-with-examples/
I would like to add to #roottraveller answer and try to answer the OP question. First, I find the following resources really helpful for understanding the basic difference between Bloom Filter, Counting Bloom Filter and Count-min Sketch: https://octo.vmware.com/bloom-filter/
As can be find the document:
Bloom Filter is used to test whether an element is a member of a set or not
Count-min-sketch is a probabilistic data structure that serves as a frequency table of events in a stream of data
Counting Bloom Filter an extension of the Bloom filter that allows deletion of elements by storing the frequency of occurrence
So, in short, Counting Bloom Filter only supports deletion of elements and cannot return the frequency of elements. Only CM sketch can return the frequency of elements. And, to answer OP question, sketches are a family of probabilistic data structures that deals with data stream with efficient space time complexity and they have always been constructed using an array per hash function. (https://www.sciencedirect.com/science/article/abs/pii/S0196677403001913)

Given an array, find top 'x' in O(1)

I'm gonna write the problem as I found it and I will then explain what confuses me.
"A teacher is marking his students' work from 0-10 but he only marks with an 8 or above for a certain number 'x'(x=15 for example) of the 'n' students. You are given an array with all the students' marks in random order. Find the 'x' best marks in O(1)."
We certainly have been taught hashing but this requires me to store all the data in a hash table which is definitely not O(1). Maybe we don't have to take the 'conversion' into account? If we do , maybe the coversion combined with the search time after will lead to a method different than hashing.
In that case, leaving O(1) aside , what is the fastest algorithm including both the conversion and the search time?
Simple: It's not possible.
O(1) can only achieved if all of input size, number of necessary comparisons and output size are constants. You may argue that x could be treated as constant, but it still doesn't work:
You need to inspect every single input element, all n of them, as the random input order does not even allow any heuristics to guess where the xth element would be, even if you already had correctly guessed the other x-1 elements already in constant time.
As the problem is stated, there is no solution which can do it in the upper bounds of O(1) or O(x).
Let's just assume your instructor corrects his mistake, and gives you a revised version which correctly states O(n) as the required upper bound.
In that case your hash approach is (almost) correct. The catch of using a hash function, is that you now need to account for potential collisions on the hash function, which are the reason why hash maps don't work strictly in O(1), but rather only "on average" in O(1).
As you know all possible values (grades from 0-10), you can just allocate buckets with a known index. Inside each bucket you may use linked lists, as they also allow constant time insertions and linear time iteration.

How does Top-K sort algorithm work in MongoDB

Based on the answer and from MongoDB Documentation, I understood that MongoDB is able to sort a large data set and provide sorted results when limit() is used.
However, when the same data set is queried using sort() results into a memory exception.
From the second answer in the above post, poster mentions that whole collection is scanned, sorted and top N results are returned. I would like to know how the collection is sorted when I use limit().
From document I found that when limit() is used it does Top-K sort, however there is not much explanation available about it anywhere. I would like to see any references about Top-K Sort algorithm.
In general, you can do an efficient top-K sort with a min-heap of size K. The min-heap represents the largest K elements seen so far in the data set. It also gives you constant-time access to the smallest element of those top K elements.
As you scan over the data set, if a given element is larger than the smallest element in the min-heap (i.e. the smallest of the largest top K so far), you replace the smallest from the min-heap with that element and re-heapify (O(lg K)).
At the end, you're left with the top K elements of the entire data set, without having had to sort them all (worst-case running time is O(N lg K)), using only Θ(K) memory.
I actually learnt this in school for a change :-)

example where quicksort crushes

I know that quicksort is not stable method, namely for equal elements, maybe member of array will not be placed at correct position, I need example of array (in which elements are repeated several times) quicksort does not work (need for example three way of partitioning method). I could not be able to find such example of array in internet and could you help me?
sure I can use other kind of sorting methods for this problem (like heap sort ,merge sort, etc), but my attitude is to know in real world example, what kind of data contains risk for quicksort, because as know it is one most useful method and is used often
Quicksort shouldn't crash no matter what array it is given.
When a sorting algorithm is called 'stable' or 'not stable', it does not refer to safety of the algorithm or whether or not it crashes. It is related to maintaining relative order of elements that have the same key.
As a brief example, if you have:
[9, 5, 7, 5, 1]
Then a 'stable' sorting algorithm should guarantee that in the sorted array the first 5 is still placed before the second 5. Even though for this trivial example there is no difference, there are examples in which it makes a difference, such as when sorting a table based on one column (you want the other columns to stay in the same order as before).
See more here: http://en.wikipedia.org/wiki/Stable_sort#Stability