I need to sort an array that holds numeric elements. However, I'm not sure if it would be efficient to use the sort method since I don't know time complexity of this method. Apple document doesn't mention it either. So I would like to know what the time complexity of this method is and what kind of sorting algorithm it uses.
Swift uses Introsort. Looking at the source code we see that the chosen pivot is the first element.
link to wikipedia: https://en.wikipedia.org/wiki/Introsort
from the wikipedia page:
(...), one of the critical operations is choosing the pivot: the element around which the list is partitioned. The simplest pivot selection algorithm is to take the first or the last element of the list as the pivot, causing poor behavior for the case of sorted or nearly sorted input.
Looking at that wikipedia page, it is entirely predictable, given the implementation choice, that Swift's sorting performance is worst for sorted inputs. Its average performance is O(n log n) which happens to also be its worst case.
Related
I have a table in my DB called text. It will have something like this is an example of lite coin. I want to query this for litecoin and things that are close (like lite coin). Is there some way to do this generically as I will have multiple queries. Maybe something with a max Levenshtein distance?
There is a core extension to PostgreSQL which implements the Levenshtein distance. For strings of very unequal length, as in your example, the distance will of necessity be large. So you would have to implement some normalization method, unless all phrases being searched within are the same length.
I don't think Levenshtein is indexable. You could instead look into trigram distance, which is indexable.
+1 on the trigram suggestion. Trigrams in Postgres are excellent and, for sure, indexible. Depending on the index option you choose (GIN or GiST), you get access to different operators. If I remember correctly off the top of my head, GiST gives you distance tolerances for the words, and lets you search for them in order. You can specify the number of words expected between two searches words, and more. (If I'm remembering correctly.) Both GIN and GiST are worth experimenting with.
Levenshtein compares two specific strings, so it doesn't lend itself to indexing. What would you index? The comparison string is unknown in advance. You could index every string by every string in a column and, apart from the O(aaaargh!) complexity, you still might not have unything like your search string in the index.
Tip: If you must use Levenshtein, and it is pretty great where it's useful, you can eliminate many rows from your comparison cheaply. If you've got a 10 character search string and want strings only with a distance of 2, you can eliminate shorter and longer strings from consideration without fear of losing any matches.
You might find that you want to apply Levenshtein (or Jaccard, etc.) to possible matches found by the trigrams. But, honestly, Levenshtein is, by nature, biased towards strings in the same order. That's okay for lite coin/light coin/litecoin, but not helpful when the words can be in any order, like with first and last name, much address data, and many, many phrase-like searches.
The other thing to consider, depending on your range of queries, are full text searches with tsvectors. These are also indexable, and also support a range of operators.
This is the controversial line from Cracking the Coding Interview on hash tables.
Another common implementation(besides linked-list) for a hash table is to use a BST as the underlying data structure.
I know this question has been asked before... it's so confusing because everyone is giving two different answers. For example
Why implement a Hashtable with a Binary Search Tree?
The highest voted answer in this post says that the quoted statement above is saying talking about a hash table implementation using a binary search tree, without an underlying array. I understood that since each element inserted gets a hash value (an integer), the elements form a total order (every pair can be compared with < and >). Therefore, we can simply use a binary search tree to hold the elements of the hash table.
On the other hand, others say
Hash table - implementing with Binary Search Tree
the book is saying that we should handle collisions with a binary search tree. So there is an underlying array and when collisions because multiple elements get the same hash value and get placed in the same slot in the array, that's where the BST comes in.
So each slot in the array will be a pointer to a BST, which holds elements with the same hash value.
I'm leaning towards the second post's argument because the first post does not really explain how such implementation of a hash table can handle collisions. And I don't think it can achieve expected O(1) insert/delete/lookup time.
But for the second post, if we have multiple elements that get the same hash value and placed in a BST, I'm not sure how these elements are ordered (how can they be compared against each other?)
Please, help me put an end to this question once and for all!
the first post does not really explain how such implementation of a hash table can handle collisions
With a BST, you can use a hashing function that would produce no duplicate keys so there would be no collisions. The advantage here isn't speed but to reduce memory consumption, and to have better worst-case guarantees. If you're writing software for a critical real-time system, you might not be able to tolerate a O(n) resizing of your hash table.
if we have multiple elements that get the same hash value and placed in a BST, I'm not sure how these elements are ordered (how can they be compared against each other?)
Rehash with another function.
In the end, it all depends on what your data structure is used for (Is memory vs. speed more important? Is amortized performance vs worst-case performance more important? etc.)
What is the right data structure for a queue that support Enque, Dequeue, Peak, Min, and Max operation and perform all these operations in O(1) time.
The most obvious data structure is linked list but Min, Max operations would be O(n). Priority Queue is another perfect choice but Enqueue, Dequeue should works in the normal fashion of a Queue. (FIFO)
And another option that comes to mind is a Heap, but I can not quite figure out how one can design a queue with Min, Max operation using Heaps.
Any help is much appreciated.
The data structure you seek cannot be designed, if min() and max() actually change the structure. If min() and max() are similar to peek(), and provide read-only access, then you should follow the steps in this question, adding another deque similar to the one used for min() operations for use in max() operation. The rest of this answer assumes that min() and max() actually remove the corresponding elements.
Since you require enqueue() and dequeue(), elements must be added and removed by order of arrival (FIFO). A simple double-ended queue (either linked or using a circular vector) would provide this in O(1).
But the elements to be added could change the current min() and max(); however, when removed, the old min() and max() values should be restored... unless they were removed in the interim. This restriction forces you to keep elements sorted somehow. Any sorting structure (min-heap, max-heap, balanced binary tree, ...) will require at least O(log n) to find the position of a new arrival.
Your best bet is to pair a balanced binary tree (for min() and max()) with a doubly-linked list. Your tree nodes would store a set of pointers to the list nodes, sorted by whatever key you use in min() and max(). In Java:
// N your node class; can return K, comparable, used for min() and max()
LinkedList<N> list; // sorted by arrival
TreeMap<K,HashMap<N>> tree; // sorted by K
on enque(), you would add a new node to the end of list, and add that same node, by its key, to the HashMap in its node in tree. O(log n).
on dequeue(), you would remove the node from the start of list, and from its HashMap in its node in tree. O(log n).
on min(), you would look for the 1st element in the tree. O(1). If you need to remove it, you have the pointer to the linked list, so O(1) on that side; but O(log n) to re-balance the tree if it was the last element with that specific K.
on max(), the same logic applies; except that you would be looking for the last element in the tree. So O(log n).
on peek(), looking at but not extracting the 1st element in the queue would be O(1).
You can simplify this (by removing the HashMap) if you know that all keys will be unique. However, this does not impact asymptotic costs: they would all remain the same.
In practice, the difference between O(log n) and O(1) is so low that the default map implementation in C++'s STL is O(log n)-based (Tree instead of Hash).
Any data structure that can retrieve Min or Max in O(1) time needs to spend at least O(log n) on every Insert and Remove to maintain elements in partially sorted order. The data structures that do achieve this are called priority queues.
The basic priority queue supports Insert, Max, and RemoveMax. There are a number of ways to build them, but binary heaps work best.
Supporting all of Insert, Min, RemoveMin, Max, and RemoveMax with a single priority queue is more complex. A way to do this with a single data structure, adapted from a binary heap, is described in the paper:
Atkinson, Michael D., et al. "Min-max heaps and generalized priority queues." Communications of the ACM 29.10 (1986): 996-1000.
It is fast and memory-efficient, but requires a good amount of care to implement correctly.
This structure DOES NOT exist!
There is a simple way to approve this conclusion.
As we all know,the complexity of sorting problem is O(nlogn).
But if the structure you said exists,there will be a solution for sorting:
Enque every element one by one costs O(n)
Dequeue every max(or min) element one by one costs O(n)
which means the sorting problem can be solved by O(n).But it is IMPOSSIBLE.
Assumptions:
that you only care about performance and not about space / memory / ...
A solution:
That the index is a set, not a list (will work for list, but may need some extra love)
You could do a queue and a hash table side by side.
Example:
Lets say the order is 5 4 7 1 8 3
Queue -> 547813
Hash table -> 134578
Enqueue:
1) Take your object, and insert into the hash table in the right bucket Min / Max will always be the first and last index. (see sorted hash tables)
2) Next, insert into your queue like normal.
3) You can / should link the two. One idea would be to use the hash table value as a pointer to the queue.
Both operations with large hash table will be O(1)
Dequeue:
1) Pop the fist element O(1)
2) remove element from hash table O(1)
Min / Max:
1) Look at your hash table. Depending on the language used, you could in theory find it by looking at the head of the table, or the tail of the table.
For a better explanation of sorted hash tables, https://stackoverflow.com/questions/2007212
Note:
I would like to note, that there is no "normal" data structure that will do what you are requiring that I know of. However, that does not mean it is not possible. If you are going to attempt to implement the data structure, most likely you will have to do it for your needs and will not be able to use current libraries available. You may have to look at using a very low level language like assembly in order to achieve this, but maybe C or Java might be able to if your good with those languages.
Good luck
EDITED:
I did not explain sorted hash tables, so added a link to another SO to explain them.
I have a List with about 1176^3 positions.
Making smth like
val x = list.length
takes hours ..
When in list is 1271256 positions is ok, just few seconds.
Any one have idea how to speed up it ?
List is possibly the wrong data structure for a length operation as it is O(n) - it takes longer to complete the longer the list is.
Vector is possibly a better data structure to use if you are needing to invoke length as its storage supports random access in a finite time.
This, of course, does not mean that List is a poor structure to use, just in this case it might not be preferable.
To add to gpampara's answer, in cases like these you may actually be able to justify using an Array, since it has the lowest overhead per item stored and O(1) access to elements and length determination (since it's recorded in the array header itself).
Array has many down-sides, but I consider it justifiable when memory overhead is a primary consideration (and when a fixed-size collection whose size is known at the time of creation is feasible).
For a project I am working on, I need to keep track of up to several thousand objects. The collection I choose needs to support insertion, selection, and deletion of random elements. My algorithm performs each of these operations several times, so I would like a collection that can do all these in constant time.
Is there such a collection? If not, what are some trade-offs with existing collections? I am using Scala 2.9.1.
EDIT: By "random", I mean mathematically/probabilistically random, i.e., I would like to select elements randomly from the collection using Random or some other appropriate generator.
Define "random". If you mean indexed, then there's no such collection. You can have insertion/deletion in constant time if you give up the "random element" requirement -- ie, you have have non-constant lookup of the element which will be deleted or which will be the point of insertion. Or you can have constant lookup without constant insertion/deletion.
The collection that best approaches that requirement is the Vector, which provides O(log n) for these operations.
On the other hand, if you have the element which you'll be looking up or removing, then just pick a HashMap. It's not precisely constant time, but it is a fair approximation. Just make sure you have a good hash function.
As a starting point, take a look at The Scala 2.8 Collections API especially at Performance Characteristics.