A questions about HashMap's time Complexity - hash

This is not a code questions iam just trying to understand the concept of hashmaps and time complexity.
Well i think i know how hashMaps/sets etc. work and i think i understand why HashMaps.get has a constant time but when we have a very big hashMap the indicies where the values get stored should overlap. When 2 hashcodes resolve to the same index they get stored in a LinkList at this index right?.
Couldn't it be that all elements got stored in one index as a link List. Shouldn't now HashMap.get run at worst case in O(n).

Yes the worst case time complexity for HashMap is O(n) where n is the number of elements stored in one bucket. The worst case will be when all the n elements of the HashMap gets stored in one bucket.
However this can be improved if binary search is implemented for each of the buckets. In that case the worst case time complexity will be O(log(n)) as binary search tree is used to store the elements instead of a doubly linked list.

Related

Implementing Hash Table with binary search tree

This is the controversial line from Cracking the Coding Interview on hash tables.
Another common implementation(besides linked-list) for a hash table is to use a BST as the underlying data structure.
I know this question has been asked before... it's so confusing because everyone is giving two different answers. For example
Why implement a Hashtable with a Binary Search Tree?
The highest voted answer in this post says that the quoted statement above is saying talking about a hash table implementation using a binary search tree, without an underlying array. I understood that since each element inserted gets a hash value (an integer), the elements form a total order (every pair can be compared with < and >). Therefore, we can simply use a binary search tree to hold the elements of the hash table.
On the other hand, others say
Hash table - implementing with Binary Search Tree
the book is saying that we should handle collisions with a binary search tree. So there is an underlying array and when collisions because multiple elements get the same hash value and get placed in the same slot in the array, that's where the BST comes in.
So each slot in the array will be a pointer to a BST, which holds elements with the same hash value.
I'm leaning towards the second post's argument because the first post does not really explain how such implementation of a hash table can handle collisions. And I don't think it can achieve expected O(1) insert/delete/lookup time.
But for the second post, if we have multiple elements that get the same hash value and placed in a BST, I'm not sure how these elements are ordered (how can they be compared against each other?)
Please, help me put an end to this question once and for all!
the first post does not really explain how such implementation of a hash table can handle collisions
With a BST, you can use a hashing function that would produce no duplicate keys so there would be no collisions. The advantage here isn't speed but to reduce memory consumption, and to have better worst-case guarantees. If you're writing software for a critical real-time system, you might not be able to tolerate a O(n) resizing of your hash table.
if we have multiple elements that get the same hash value and placed in a BST, I'm not sure how these elements are ordered (how can they be compared against each other?)
Rehash with another function.
In the end, it all depends on what your data structure is used for (Is memory vs. speed more important? Is amortized performance vs worst-case performance more important? etc.)

What is the right data structure for a queue that support Min, Max operations in O(1) time?

What is the right data structure for a queue that support Enque, Dequeue, Peak, Min, and Max operation and perform all these operations in O(1) time.
The most obvious data structure is linked list but Min, Max operations would be O(n). Priority Queue is another perfect choice but Enqueue, Dequeue should works in the normal fashion of a Queue. (FIFO)
And another option that comes to mind is a Heap, but I can not quite figure out how one can design a queue with Min, Max operation using Heaps.
Any help is much appreciated.
The data structure you seek cannot be designed, if min() and max() actually change the structure. If min() and max() are similar to peek(), and provide read-only access, then you should follow the steps in this question, adding another deque similar to the one used for min() operations for use in max() operation. The rest of this answer assumes that min() and max() actually remove the corresponding elements.
Since you require enqueue() and dequeue(), elements must be added and removed by order of arrival (FIFO). A simple double-ended queue (either linked or using a circular vector) would provide this in O(1).
But the elements to be added could change the current min() and max(); however, when removed, the old min() and max() values should be restored... unless they were removed in the interim. This restriction forces you to keep elements sorted somehow. Any sorting structure (min-heap, max-heap, balanced binary tree, ...) will require at least O(log n) to find the position of a new arrival.
Your best bet is to pair a balanced binary tree (for min() and max()) with a doubly-linked list. Your tree nodes would store a set of pointers to the list nodes, sorted by whatever key you use in min() and max(). In Java:
// N your node class; can return K, comparable, used for min() and max()
LinkedList<N> list; // sorted by arrival
TreeMap<K,HashMap<N>> tree; // sorted by K
on enque(), you would add a new node to the end of list, and add that same node, by its key, to the HashMap in its node in tree. O(log n).
on dequeue(), you would remove the node from the start of list, and from its HashMap in its node in tree. O(log n).
on min(), you would look for the 1st element in the tree. O(1). If you need to remove it, you have the pointer to the linked list, so O(1) on that side; but O(log n) to re-balance the tree if it was the last element with that specific K.
on max(), the same logic applies; except that you would be looking for the last element in the tree. So O(log n).
on peek(), looking at but not extracting the 1st element in the queue would be O(1).
You can simplify this (by removing the HashMap) if you know that all keys will be unique. However, this does not impact asymptotic costs: they would all remain the same.
In practice, the difference between O(log n) and O(1) is so low that the default map implementation in C++'s STL is O(log n)-based (Tree instead of Hash).
Any data structure that can retrieve Min or Max in O(1) time needs to spend at least O(log n) on every Insert and Remove to maintain elements in partially sorted order. The data structures that do achieve this are called priority queues.
The basic priority queue supports Insert, Max, and RemoveMax. There are a number of ways to build them, but binary heaps work best.
Supporting all of Insert, Min, RemoveMin, Max, and RemoveMax with a single priority queue is more complex. A way to do this with a single data structure, adapted from a binary heap, is described in the paper:
Atkinson, Michael D., et al. "Min-max heaps and generalized priority queues." Communications of the ACM 29.10 (1986): 996-1000.
It is fast and memory-efficient, but requires a good amount of care to implement correctly.
This structure DOES NOT exist!
There is a simple way to approve this conclusion.
As we all know,the complexity of sorting problem is O(nlogn).
But if the structure you said exists,there will be a solution for sorting:
Enque every element one by one costs O(n)
Dequeue every max(or min) element one by one costs O(n)
which means the sorting problem can be solved by O(n).But it is IMPOSSIBLE.
Assumptions:
that you only care about performance and not about space / memory / ...
A solution:
That the index is a set, not a list (will work for list, but may need some extra love)
You could do a queue and a hash table side by side.
Example:
Lets say the order is 5 4 7 1 8 3
Queue -> 547813
Hash table -> 134578
Enqueue:
1) Take your object, and insert into the hash table in the right bucket Min / Max will always be the first and last index. (see sorted hash tables)
2) Next, insert into your queue like normal.
3) You can / should link the two. One idea would be to use the hash table value as a pointer to the queue.
Both operations with large hash table will be O(1)
Dequeue:
1) Pop the fist element O(1)
2) remove element from hash table O(1)
Min / Max:
1) Look at your hash table. Depending on the language used, you could in theory find it by looking at the head of the table, or the tail of the table.
For a better explanation of sorted hash tables, https://stackoverflow.com/questions/2007212
Note:
I would like to note, that there is no "normal" data structure that will do what you are requiring that I know of. However, that does not mean it is not possible. If you are going to attempt to implement the data structure, most likely you will have to do it for your needs and will not be able to use current libraries available. You may have to look at using a very low level language like assembly in order to achieve this, but maybe C or Java might be able to if your good with those languages.
Good luck
EDITED:
I did not explain sorted hash tables, so added a link to another SO to explain them.

Scala working with very large list

I have a List with about 1176^3 positions.
Making smth like
val x = list.length
takes hours ..
When in list is 1271256 positions is ok, just few seconds.
Any one have idea how to speed up it ?
List is possibly the wrong data structure for a length operation as it is O(n) - it takes longer to complete the longer the list is.
Vector is possibly a better data structure to use if you are needing to invoke length as its storage supports random access in a finite time.
This, of course, does not mean that List is a poor structure to use, just in this case it might not be preferable.
To add to gpampara's answer, in cases like these you may actually be able to justify using an Array, since it has the lowest overhead per item stored and O(1) access to elements and length determination (since it's recorded in the array header itself).
Array has many down-sides, but I consider it justifiable when memory overhead is a primary consideration (and when a fixed-size collection whose size is known at the time of creation is feasible).

Appropriate collection type for selecting a random element efficiently in Scala

For a project I am working on, I need to keep track of up to several thousand objects. The collection I choose needs to support insertion, selection, and deletion of random elements. My algorithm performs each of these operations several times, so I would like a collection that can do all these in constant time.
Is there such a collection? If not, what are some trade-offs with existing collections? I am using Scala 2.9.1.
EDIT: By "random", I mean mathematically/probabilistically random, i.e., I would like to select elements randomly from the collection using Random or some other appropriate generator.
Define "random". If you mean indexed, then there's no such collection. You can have insertion/deletion in constant time if you give up the "random element" requirement -- ie, you have have non-constant lookup of the element which will be deleted or which will be the point of insertion. Or you can have constant lookup without constant insertion/deletion.
The collection that best approaches that requirement is the Vector, which provides O(log n) for these operations.
On the other hand, if you have the element which you'll be looking up or removing, then just pick a HashMap. It's not precisely constant time, but it is a fair approximation. Just make sure you have a good hash function.
As a starting point, take a look at The Scala 2.8 Collections API especially at Performance Characteristics.

Seq for fast random access and fast growth in Scala

What would be the best Scala collection (in 2.8+), mutable or immutable, for the following scenario:
Sequentially ordered, so I can access items by position (a Seq)
Need to insert items frequently, so the collection must be able to grow without too much penalty
Random access, frequently need to remove and insert items at arbitrary indexes in the collection
Currently I seem to be getting good performance with the mutable ArrayBuffer, but is there anything better? Is there an immutable alternative that would do as well? Thanks in advance.
Mutable: ArrayBuffer
Immutable: Vector
If you insert items at random positions more than log(N)/N of the time that you access them, then you should probably use immutable.TreeSet as all operations are O(log(N)). If you mostly do accesses or add to the (far) end, ArrayBuffer and Vector work well.
Vector. IndSeq from scalaz should be even better.