Reliable Dictionary Performance Semantics - azure-service-fabric

The service fabric documentation doesn't explicitly define the ordering of keys in a Reliable Dictionary during enumeration. A quick test enumerates this using key-order, regardless of insertion order.
Is the key-ordering intentional? Can I write my services assuming that the first key will always be the smallest value?
What is the data structure that powers the key index?
If its not a well-known data structure, what is the time complexity of add/delete/get/update?
Is efficient enumeration from the reverse possible?
Are key-range queries possible?

Writes and Point Gets:
Commits go into a hashtable initially and then gets moved into a sorted data structure post checkpoint. So your Adds/Updates/Deletes will have best case runtime of O(1) and worst case runtime of O(log n)(validation to check the presence of your key might make it O(log n)since we do not do blind writes)
Gets might be O(1) or O(log n) depending on whether you are reading from a recent commit or from an older commit.
Enumeration:
To make enumeration efficient, contents of the hahstable get added into a temporary sorted data structure until it is moved into the main sorted data structure post checkpoint. So it is O(log n).
Key range queries are possible. You can use the overload that takes in a filter.
In our next version, we will expose the api with start range and end range and sorting (ascending and descending).

Related

A questions about HashMap's time Complexity

This is not a code questions iam just trying to understand the concept of hashmaps and time complexity.
Well i think i know how hashMaps/sets etc. work and i think i understand why HashMaps.get has a constant time but when we have a very big hashMap the indicies where the values get stored should overlap. When 2 hashcodes resolve to the same index they get stored in a LinkList at this index right?.
Couldn't it be that all elements got stored in one index as a link List. Shouldn't now HashMap.get run at worst case in O(n).
Yes the worst case time complexity for HashMap is O(n) where n is the number of elements stored in one bucket. The worst case will be when all the n elements of the HashMap gets stored in one bucket.
However this can be improved if binary search is implemented for each of the buckets. In that case the worst case time complexity will be O(log(n)) as binary search tree is used to store the elements instead of a doubly linked list.

MATLAB Data Structure with O(log(N)) Insertion, Deletion and Smallest Key Lookup?

I am trying to implement a queuing system with out of order insertion. Keys would correspond to the time members will leave the queue. Thus, by finding the smallest key, one can find the next member leaving the queue.
If it were c++, a map would solve it. MATLAB offers a structure named map, however it is a hashmap. It supports O(1) insertion and deletion however to find the smallest key one has to traverse all keys (O(N)). Another way would be using a vector as a queue. Smallest key would always be in the front, so access would be O(1). But in this scenario insertion to the middle of the queue would shift everything higher than the inserted key by one (O(N)).
So the question is:
Is there a way to implement a map in MATLAB with similar behaviour to std::map?
Is there another data structure that would give O(log(N)) performance for these operations?

What is the right data structure for a queue that support Min, Max operations in O(1) time?

What is the right data structure for a queue that support Enque, Dequeue, Peak, Min, and Max operation and perform all these operations in O(1) time.
The most obvious data structure is linked list but Min, Max operations would be O(n). Priority Queue is another perfect choice but Enqueue, Dequeue should works in the normal fashion of a Queue. (FIFO)
And another option that comes to mind is a Heap, but I can not quite figure out how one can design a queue with Min, Max operation using Heaps.
Any help is much appreciated.
The data structure you seek cannot be designed, if min() and max() actually change the structure. If min() and max() are similar to peek(), and provide read-only access, then you should follow the steps in this question, adding another deque similar to the one used for min() operations for use in max() operation. The rest of this answer assumes that min() and max() actually remove the corresponding elements.
Since you require enqueue() and dequeue(), elements must be added and removed by order of arrival (FIFO). A simple double-ended queue (either linked or using a circular vector) would provide this in O(1).
But the elements to be added could change the current min() and max(); however, when removed, the old min() and max() values should be restored... unless they were removed in the interim. This restriction forces you to keep elements sorted somehow. Any sorting structure (min-heap, max-heap, balanced binary tree, ...) will require at least O(log n) to find the position of a new arrival.
Your best bet is to pair a balanced binary tree (for min() and max()) with a doubly-linked list. Your tree nodes would store a set of pointers to the list nodes, sorted by whatever key you use in min() and max(). In Java:
// N your node class; can return K, comparable, used for min() and max()
LinkedList<N> list; // sorted by arrival
TreeMap<K,HashMap<N>> tree; // sorted by K
on enque(), you would add a new node to the end of list, and add that same node, by its key, to the HashMap in its node in tree. O(log n).
on dequeue(), you would remove the node from the start of list, and from its HashMap in its node in tree. O(log n).
on min(), you would look for the 1st element in the tree. O(1). If you need to remove it, you have the pointer to the linked list, so O(1) on that side; but O(log n) to re-balance the tree if it was the last element with that specific K.
on max(), the same logic applies; except that you would be looking for the last element in the tree. So O(log n).
on peek(), looking at but not extracting the 1st element in the queue would be O(1).
You can simplify this (by removing the HashMap) if you know that all keys will be unique. However, this does not impact asymptotic costs: they would all remain the same.
In practice, the difference between O(log n) and O(1) is so low that the default map implementation in C++'s STL is O(log n)-based (Tree instead of Hash).
Any data structure that can retrieve Min or Max in O(1) time needs to spend at least O(log n) on every Insert and Remove to maintain elements in partially sorted order. The data structures that do achieve this are called priority queues.
The basic priority queue supports Insert, Max, and RemoveMax. There are a number of ways to build them, but binary heaps work best.
Supporting all of Insert, Min, RemoveMin, Max, and RemoveMax with a single priority queue is more complex. A way to do this with a single data structure, adapted from a binary heap, is described in the paper:
Atkinson, Michael D., et al. "Min-max heaps and generalized priority queues." Communications of the ACM 29.10 (1986): 996-1000.
It is fast and memory-efficient, but requires a good amount of care to implement correctly.
This structure DOES NOT exist!
There is a simple way to approve this conclusion.
As we all know,the complexity of sorting problem is O(nlogn).
But if the structure you said exists,there will be a solution for sorting:
Enque every element one by one costs O(n)
Dequeue every max(or min) element one by one costs O(n)
which means the sorting problem can be solved by O(n).But it is IMPOSSIBLE.
Assumptions:
that you only care about performance and not about space / memory / ...
A solution:
That the index is a set, not a list (will work for list, but may need some extra love)
You could do a queue and a hash table side by side.
Example:
Lets say the order is 5 4 7 1 8 3
Queue -> 547813
Hash table -> 134578
Enqueue:
1) Take your object, and insert into the hash table in the right bucket Min / Max will always be the first and last index. (see sorted hash tables)
2) Next, insert into your queue like normal.
3) You can / should link the two. One idea would be to use the hash table value as a pointer to the queue.
Both operations with large hash table will be O(1)
Dequeue:
1) Pop the fist element O(1)
2) remove element from hash table O(1)
Min / Max:
1) Look at your hash table. Depending on the language used, you could in theory find it by looking at the head of the table, or the tail of the table.
For a better explanation of sorted hash tables, https://stackoverflow.com/questions/2007212
Note:
I would like to note, that there is no "normal" data structure that will do what you are requiring that I know of. However, that does not mean it is not possible. If you are going to attempt to implement the data structure, most likely you will have to do it for your needs and will not be able to use current libraries available. You may have to look at using a very low level language like assembly in order to achieve this, but maybe C or Java might be able to if your good with those languages.
Good luck
EDITED:
I did not explain sorted hash tables, so added a link to another SO to explain them.

Why natural order is speedy?

In this slides, the author said that capped collection is perfect for logging because it is speedy by natural ordering. Could you please explain for me why it is speedy?
Natural order means "return the data in the same order it is stored on disk, no sorting necessary". This is fast. Unfortunately, it usually is no "meaningful" order at all. To get a meaningful order, you have to sort by data in a field, and this implies either in-memory sorting, or random access through an index (which is slower than sequential access).
In a capped collection, natural order happens to be the same order as document creation.
So if you want log entries in chronological order, a capped collection can provide that cheaply.
(Unless explicitly created) there is no index on the collection, which means insertion is very quick. Think of it as appending to a list, as opposed to inserting an element to a sorted data structure.

Scala working with very large list

I have a List with about 1176^3 positions.
Making smth like
val x = list.length
takes hours ..
When in list is 1271256 positions is ok, just few seconds.
Any one have idea how to speed up it ?
List is possibly the wrong data structure for a length operation as it is O(n) - it takes longer to complete the longer the list is.
Vector is possibly a better data structure to use if you are needing to invoke length as its storage supports random access in a finite time.
This, of course, does not mean that List is a poor structure to use, just in this case it might not be preferable.
To add to gpampara's answer, in cases like these you may actually be able to justify using an Array, since it has the lowest overhead per item stored and O(1) access to elements and length determination (since it's recorded in the array header itself).
Array has many down-sides, but I consider it justifiable when memory overhead is a primary consideration (and when a fixed-size collection whose size is known at the time of creation is feasible).