This question already has answers here:
The elegant solution for function that gives everytime called different element of array(ordered and starting from new when all returned)?
(4 answers)
Closed 3 years ago.
In Python, there is itertools.cycle, which takes an iterable and makes an iterable iterator that repeatedly yields the contents from the source.
I would like to replicate this behavior in Swift.
A candidate for replicating this behavior would be the standard library's repeatElement(_:count:): doing repeatElement(seq, count: 5).flatMap({$0}) creates an array of the elements of seq five times, but this does not meet my requirements as it only repeats seq a finite number of times. Additionally, it creates an Array of length 5 * seq.length, where only a cache of the length of seq is actually needed.
So the question is: how can I create an infinite Sequence by repeating the elements of a source Sequence? The solution should not have a space cost more than O(n). (A O(1) would be impossible to guarantee in Swift, as a Sequence makes no guarentee that it can be iterated multiple times.)
What's wrong with manually implementing a Sequence backed by some array access modulo its length? With a bit of care your index need never overflow, and you can cycle on for as long as you like.
Note that the API docs warning reads more like a friendly reminder that re-use isn't a mandatory part of the Sequence interface contract. But that doesn't exclude your particular implementation from being reusable between multiple loops.
Related
This question already has answers here:
What Exactly is Hash Collision
(4 answers)
Closed 9 months ago.
I've often heard that hash functions cant be reversed with the analogy of :take a number and add all the digits together, ex: 412 => 7 but you cant get your original value (412) back from 7. While this does make sense wouldnt it also imply that there are multiple inputs that give the same output?
A hash function is a versatile one-way cryptographic algorithm that maps an input of any size to a unique output of a fixed length of bits. The resulting output, which is known as a hash digest, hash value, or hash code, is the resulting unique identifier.
So answering your question: Yes, multiple inputs can give the same output if and only if the inputs are exactly the same. The analogy(take a number and add all the digits together) does not satisfy the part of the definition that states, that even if the pattern of characters/numbers is changed the hash created will be completely different.
Even if something tiny changed in an input — you capitalize a letter instead of using one that’s lowercase, or you swap an exclamation mark where there was a period — it’s going to result in the generation of an entirely new hash value and that’s the whole idea here — no matter how big or small a change, a completely different hash value gets created.
This question already has an answer here:
Scala : fold vs foldLeft
(1 answer)
Closed 5 years ago.
I've been wondering, why there signatures for fold and fold[Left|Right] are different (apart from the name of course.
Subtle and very important difference
fold can be executed in parallel because the seed element can be passed
to an arbitrary number of workers.
In other words the next invocation does not depend on the last invocation
On the other hand foldLeft and foldRight must be executed sequentially because for the B parameter to be available for the second element it must first be computed for the first element of the sequence.
Less important and more obvious difference:
Note the seed argument to fold must match the type of the elements in the collection. foldLeft and foldRight don't have this restriction, they will always return an element with type equal to the type of the seed used.
So I have this large sequence, with a lot of repeats as well, and I need to convert it into a sequence with no repeats. What I have been doing so far has been converting the sequence to a set, and then back to the original sequence. Conversion to the set gets rid of the duplicates, and then I convert back into the set. However, this is very slow, as I'm given to understand that when converting to set, every pair of elements is compared, and the makes the complexity O(n^2), which is not acceptable. And since I have access to a computer with thousands of cores (through my university), I was wondering whether making things parallel would help.
Initially I thought I'd use scala Futures to parallelize the code in the following manner. Group the elements of the sequence into smaller subgroups by their hash code. That way, I have a subcollection of the original sequence, such that no element appears in two different subcollections and and every element is covered. Now I convert these smaller subcollections to sets, and back to sequences and concatenate them. This way I'm guaranteed to get a sequence with no repeats.
But I was wondering if applying the toSet method on a parallel sequence already does this. I thought I'd test this out in the scala interpreter, but I got roughly the same time for the conversion to parallel set vs the conversion to the non parallel set.
I was hoping someone could tell me whether conversion to parallel sets works this way or not. I'd be much obliged. Thanks.
EDIT: Is performing a toSet on a parallel collection faster than performing toSet on a non parallel collection?
.distinct with some of the Scala collection types is O(n) (as of Scala 2.11). It uses a hash map to record what has already been seen. With this, it linearly builds up a list:
def distinct: Repr = {
val b = newBuilder
val seen = mutable.HashSet[A]()
for (x <- this) {
if (!seen(x)) {
b += x
seen += x
}
}
b.result()
(newBuilder is like a mutable list.)
Just thinking outside the box, would it be possible that you prevent the creation of these doublons instead of trying to get rid of them afterwards ?
Suppose I want to groupBy on a iterator, compiler asks to "value groupBy is not a member of Iterator[Int]". One way would be to convert iterator to list which I want to avoid. I want to do the groupBy such that the input is Iterator[A] and output is Map[B, Iterator[A]]. Such that the part of the iterator is loaded only when that part of element is accessed and not loading the whole list into memory. I also know the possible set of keys, so I can say whether a particular key exists.
def groupBy(iter: Iterator[A], f: fun(A)->B): Map[B, Iterator[A]] = {
.........
}
One possibility is, you can convert Iterator to view and then groupBy as,
iter.toTraversable.view.groupBy(_.whatever)
I don't think this is doable without storing results in memory (and in this case switching to a list would be much easier). Iterator implies that you can make only one pass over the whole collection.
For instance let's say you have a sequence 1 2 3 4 5 6 and you want to groupBy odd an even numbers:
groupBy(it, v => v % 2 == 0)
Then you could query the result with either true and false to get an iterator. The problem should you loop one of those two iterators till the end you couldn't do the same thing for the other one (as you cannot reset an iterator in Scala).
This would be doable should the elements were sorted according to the same rule you're using in groupBy.
As said in other responses, the only way to achieve a lazy groupBy on Iterator is to internally buffer elements. The worst case for the memory will be in O(n). If you know in advance that the keys are well distributed in your iterator, the buffer can be a viable solution.
The solution is relatively complex, but a good start are some methods from the Iterator trait in the Scala source code:
The partition method that uses both the buffered method to keep the head value in memory, and two internal queues (lookahead) for each of the produced iterators.
The span method with also the buffered method and this time a unique queue for the leading iterator.
The duplicate method. Perhaps less interesting, but we can again observe another use of a queue to store the gap between the two produced iterators.
In the groupBy case, we will have a variable number of produced iterators instead of two in the above examples. If requested, I can try to write this method.
Note that you have to know the list of keys in advance. Otherwise, you will need to traverse (and buffer) the entire iterator to collect the different keys to build your Map.
I am comparing a number of different methods for organizing the nodes at the "frontier" in dijkstra's single source shortest path algorithm. One of the implementations that I am playing around with is using q: scala.collection.mutable.Queue.
Essentially, each time I add a node to q, I sort q. This method, as expected, takes significantly longer than using scala.collection.mutable.PriorityQueue and a MinHeap that I implemented. My question is, what kind of sort is Queue using when I call q.sorted? I am specifically interested in the time complexity of the sorted implementation.
I have tried looking at the API (http://www.scala-lang.org/api/2.10.2/index.html#scala.collection.mutable.Queue) and code (https://github.com/scala/scala/blob/v2.10.2/src/library/scala/collection/mutable/Queue.scala#L1) but haven't been able to track this down.
Thank you in advance for your help.
Queue inherits sorted method from SeqLike. And you can see, that it creates new array of same elements, sorts array via java.util.Arrays.sort and then creates new structure of original type.