This is not a code questions iam just trying to understand the concept of hashmaps and time complexity.
Well i think i know how hashMaps/sets etc. work and i think i understand why HashMaps.get has a constant time but when we have a very big hashMap the indicies where the values get stored should overlap. When 2 hashcodes resolve to the same index they get stored in a LinkList at this index right?.
Couldn't it be that all elements got stored in one index as a link List. Shouldn't now HashMap.get run at worst case in O(n).
Yes the worst case time complexity for HashMap is O(n) where n is the number of elements stored in one bucket. The worst case will be when all the n elements of the HashMap gets stored in one bucket.
However this can be improved if binary search is implemented for each of the buckets. In that case the worst case time complexity will be O(log(n)) as binary search tree is used to store the elements instead of a doubly linked list.
The service fabric documentation doesn't explicitly define the ordering of keys in a Reliable Dictionary during enumeration. A quick test enumerates this using key-order, regardless of insertion order.
Is the key-ordering intentional? Can I write my services assuming that the first key will always be the smallest value?
What is the data structure that powers the key index?
If its not a well-known data structure, what is the time complexity of add/delete/get/update?
Is efficient enumeration from the reverse possible?
Are key-range queries possible?
Writes and Point Gets:
Commits go into a hashtable initially and then gets moved into a sorted data structure post checkpoint. So your Adds/Updates/Deletes will have best case runtime of O(1) and worst case runtime of O(log n)(validation to check the presence of your key might make it O(log n)since we do not do blind writes)
Gets might be O(1) or O(log n) depending on whether you are reading from a recent commit or from an older commit.
Enumeration:
To make enumeration efficient, contents of the hahstable get added into a temporary sorted data structure until it is moved into the main sorted data structure post checkpoint. So it is O(log n).
Key range queries are possible. You can use the overload that takes in a filter.
In our next version, we will expose the api with start range and end range and sorting (ascending and descending).
What is the right data structure for a queue that support Enque, Dequeue, Peak, Min, and Max operation and perform all these operations in O(1) time.
The most obvious data structure is linked list but Min, Max operations would be O(n). Priority Queue is another perfect choice but Enqueue, Dequeue should works in the normal fashion of a Queue. (FIFO)
And another option that comes to mind is a Heap, but I can not quite figure out how one can design a queue with Min, Max operation using Heaps.
Any help is much appreciated.
The data structure you seek cannot be designed, if min() and max() actually change the structure. If min() and max() are similar to peek(), and provide read-only access, then you should follow the steps in this question, adding another deque similar to the one used for min() operations for use in max() operation. The rest of this answer assumes that min() and max() actually remove the corresponding elements.
Since you require enqueue() and dequeue(), elements must be added and removed by order of arrival (FIFO). A simple double-ended queue (either linked or using a circular vector) would provide this in O(1).
But the elements to be added could change the current min() and max(); however, when removed, the old min() and max() values should be restored... unless they were removed in the interim. This restriction forces you to keep elements sorted somehow. Any sorting structure (min-heap, max-heap, balanced binary tree, ...) will require at least O(log n) to find the position of a new arrival.
Your best bet is to pair a balanced binary tree (for min() and max()) with a doubly-linked list. Your tree nodes would store a set of pointers to the list nodes, sorted by whatever key you use in min() and max(). In Java:
// N your node class; can return K, comparable, used for min() and max()
LinkedList<N> list; // sorted by arrival
TreeMap<K,HashMap<N>> tree; // sorted by K
on enque(), you would add a new node to the end of list, and add that same node, by its key, to the HashMap in its node in tree. O(log n).
on dequeue(), you would remove the node from the start of list, and from its HashMap in its node in tree. O(log n).
on min(), you would look for the 1st element in the tree. O(1). If you need to remove it, you have the pointer to the linked list, so O(1) on that side; but O(log n) to re-balance the tree if it was the last element with that specific K.
on max(), the same logic applies; except that you would be looking for the last element in the tree. So O(log n).
on peek(), looking at but not extracting the 1st element in the queue would be O(1).
You can simplify this (by removing the HashMap) if you know that all keys will be unique. However, this does not impact asymptotic costs: they would all remain the same.
In practice, the difference between O(log n) and O(1) is so low that the default map implementation in C++'s STL is O(log n)-based (Tree instead of Hash).
Any data structure that can retrieve Min or Max in O(1) time needs to spend at least O(log n) on every Insert and Remove to maintain elements in partially sorted order. The data structures that do achieve this are called priority queues.
The basic priority queue supports Insert, Max, and RemoveMax. There are a number of ways to build them, but binary heaps work best.
Supporting all of Insert, Min, RemoveMin, Max, and RemoveMax with a single priority queue is more complex. A way to do this with a single data structure, adapted from a binary heap, is described in the paper:
Atkinson, Michael D., et al. "Min-max heaps and generalized priority queues." Communications of the ACM 29.10 (1986): 996-1000.
It is fast and memory-efficient, but requires a good amount of care to implement correctly.
This structure DOES NOT exist!
There is a simple way to approve this conclusion.
As we all know,the complexity of sorting problem is O(nlogn).
But if the structure you said exists,there will be a solution for sorting:
Enque every element one by one costs O(n)
Dequeue every max(or min) element one by one costs O(n)
which means the sorting problem can be solved by O(n).But it is IMPOSSIBLE.
Assumptions:
that you only care about performance and not about space / memory / ...
A solution:
That the index is a set, not a list (will work for list, but may need some extra love)
You could do a queue and a hash table side by side.
Example:
Lets say the order is 5 4 7 1 8 3
Queue -> 547813
Hash table -> 134578
Enqueue:
1) Take your object, and insert into the hash table in the right bucket Min / Max will always be the first and last index. (see sorted hash tables)
2) Next, insert into your queue like normal.
3) You can / should link the two. One idea would be to use the hash table value as a pointer to the queue.
Both operations with large hash table will be O(1)
Dequeue:
1) Pop the fist element O(1)
2) remove element from hash table O(1)
Min / Max:
1) Look at your hash table. Depending on the language used, you could in theory find it by looking at the head of the table, or the tail of the table.
For a better explanation of sorted hash tables, https://stackoverflow.com/questions/2007212
Note:
I would like to note, that there is no "normal" data structure that will do what you are requiring that I know of. However, that does not mean it is not possible. If you are going to attempt to implement the data structure, most likely you will have to do it for your needs and will not be able to use current libraries available. You may have to look at using a very low level language like assembly in order to achieve this, but maybe C or Java might be able to if your good with those languages.
Good luck
EDITED:
I did not explain sorted hash tables, so added a link to another SO to explain them.
In this slides, the author said that capped collection is perfect for logging because it is speedy by natural ordering. Could you please explain for me why it is speedy?
Natural order means "return the data in the same order it is stored on disk, no sorting necessary". This is fast. Unfortunately, it usually is no "meaningful" order at all. To get a meaningful order, you have to sort by data in a field, and this implies either in-memory sorting, or random access through an index (which is slower than sequential access).
In a capped collection, natural order happens to be the same order as document creation.
So if you want log entries in chronological order, a capped collection can provide that cheaply.
(Unless explicitly created) there is no index on the collection, which means insertion is very quick. Think of it as appending to a list, as opposed to inserting an element to a sorted data structure.
I have an algoritm which takes many iterations, each of which scores items in a collection and removes the one with the highest score.
I could populate a Vector with the initial population, continually replacing it as a var, or choose a mutable collection as a val. Which of the mutable collections would best fit the bill?
You could consider a DoubleLinkedList, which has a convenient remove() method to remove the current list cell.
I think a Map (or its close relative, the Set) might do well. It doesn't have indexed access, but that doesn't seem to be what you want. If you go for a TreeMap, you'll even get an ordered collection.
However, might I point out that your algorithm seems to call for a Heap? A heap is optimized for repeatedly finding/removing the maximum element (or minimum, if you invert the the comparison building the heap). Scala doesn't have a ready made heap, but a heap is easily implemented with an array.