scala seq.combinations returns out of order results - scala

scala docs say that combinations method should do the following
Iterates over combinations. A _combination_ of length n is a subsequence of the original sequence, with the elements taken in order.
https://www.scala-lang.org/api/2.13.3/scala/collection/immutable/List.html
but running
"aba".combinations(3).toList // return List("aab")
this is obviously out of order, it is expected to return List("aba"). As a side note, this is how it works in other languages, like python itertools.combinations, where we get List("aba") as expected. Am I misunderstanding how this should work, or is it a bug?
https://scastie.scala-lang.org/TdluwxaiQ6CGkckU4Uk3AA

Related

Behavior of scala fold in parallel collections

Let's run the following line of code several times:
Set(1,2,3,4,5,6,7).par.fold(0)(_ - _)
The results are quite interesting:
scala> Set(1,2,3,4,5,6,7).par.fold(0)(_ - _)
res10: Int = 8
scala> Set(1,2,3,4,5,6,7).par.fold(0)(_ - _)
res11: Int = 20
However clearly it should be like in its sequential version:
scala> Set(1,2,3,4,5,6,7).fold(0)(_ - _)
res15: Int = -28
I understand that operation - is non-associative on integers and that's the reason behind such behavior, but my question is quite simple: doesn't it mean that fold should not be parallelized in .par implementation of collections?
When you look at the standard library documentation, you see that fold is undeterministic here:
Folds the elements of this sequence using the specified associative binary operator.
The order in which operations are performed on elements is unspecified and may be nondeterministic.
As an alternative, there's foldLeft:
Applies a binary operator to a start value and all elements of this sequence, going left to right.
Applies a binary operator to a start value and all elements of this collection or iterator, going left to right.
Note: might return different results for different runs, unless the underlying collection type is ordered or the operator is associative and commutative.
As Set is not an ordered collection, there's no canonical order in which the elements could be folded, so the standard library allows itself to be undeterministic even for foldLeft. If you would use an ordered sequence here, foldLeft would be deterministic in that case.
The scaladoc does say:
The order in which the elements are reduced is unspecified and may be nondeterministic.
So, as you stated, a binary operation applied in ParSet#fold that is not associative is not guaranteed to produce a deterministic result. The above text is warning is all you get.
Does that mean ParSet#fold (and cousins) should not be parallelized? Not exactly. If your binary operation is commutative and you don't care about non-determinism of side-effects (not that a fold should have any), then there isn't a problem. However, you are hit with the caveat of needing to tread carefully around parallel collections.
Whether or not it is correct is more of a matter of opinion. One could argue that if a method can result in accidental non-determinism, that it should not exist in a language or library. But the alternative is to clip out functionality so that ParSet is missing functionality that is present in most of the other collection implementations. You could use the same line of thinking to also suggest the removal of Stream#foreach to prevent people from accidentally triggering infinite loops on infinite streams, but should you?
It is useful to parallelize fold operation with high workloads, however, to guarantee a deterministic output from calling of collection.par.fold(z)(f), the following conditions must hold:
1- f(f(a,b),c) == f(a,f(b,c)) // Associativity
2- f(z,a) == f(a,z) == a , where z is the neutral element for f (like 0 for sum, and 1 for multiplication).
Fabian's answer suggests using foldLeft instead. Although this is deterministic, using .par with it won't really parallelize anything. because foldLeft is sequential by nature.

How to select the second smallest element from sorted list?

How can I select the second smallest element after that a list has been sorted?
With this code I get an error and I do not understand why.
object find_the_median {
val L = List(2,4,1,2,5,6,7,2)
L(2)
L.sorted(2) // FIXME returns an error
}
It's because sorted receives implicitly an Ordering argument, and when you do it like L.sorted(2) the typechecker thinks you want to pass 2 as an Ordering. So one way to do it in one line is:
L.sorted.apply(2)
or to avoid the apply pass the ordering explicitly:
L.sorted(implicitly[Ordering[Int]])(2)
which I admit is somewhat confussing so I think the best one is in two lines:
val sorted = L.sorted
sorted(2)
(You may also want to adhere to the Scala convention of naming variables with lowercase).

comparing an element with all elements in a list

I'm learning Scala now, and I have a scenario where I have to compare an element (say num) with all the elements in a list.
Assume,
val MyList = List(1, 2, 3, 4)
If num is equal to anyone the elements in the list, I need to return true. I know to do it recursively using the head and tail functions, but is there a simpler way to it (I think I'll be able to do it using foreach, but I'm not sure how to implement it exactly)?
There is number of possibilities:
val x = 3
MyList.contains(x)
!MyList.forall(y => y != x) // early exit, basically the same as .contains
If you plan to do it frequently, you may consider to convert your list to Set, cause every .contains lookup on list in worst case is proportional to number of elements, whereas on Set it is effectively constant
val mySet = MyList.toSet
mySet.contains(x)
or simply:
mySet(x)
A contains method is pretty standard for lists in any language. Scala's List has it too:
http://www.scala-lang.org/api/current/scala/collection/immutable/List.html
As others have answered, the contains method on the list will do exactly this, and it's the most understandable/performant way.
Looking at your closing comments though, you wouldn't be able to do it (in an elegant fashion) with foreach, since that returns Unit. Foreach "does" something for each element, but you don't get any result back. It's useful for logging/println statements, but it doesn't act as a transformation.
If you want to run a function on every element individually, you would use map, which returns a List of the results of applying the function. So assuming num = 3, then MyList.map(_ == num) would return List(false, false, true, false). Since you're looking for a single result, and not a list of results, then this is not what you're after.
In order to collapse a sequence of things into a single result, you would use a fold over the data. Folding involves a function that takes two arguments (the result so far, and the current thing in the list) and returns the new running result. So that this can work on the very first element, you also need to provide the initial value to use for the ongoing result (usually some sort of zero).
In your particular case, then, you want a Boolean answer at the end - "was an element found that was equal to num". So the running result would be "have I seen an element so far that was equal to num". Which means the initial value is false. And the function itself should return true if an element has already been seen, or if the current element is equal to num.
Putting this together, it would look like this:
MyList.foldLeft(false) { case (runningResult, listElem) =>
// return true if runningResult is true, or if listElem is the target number
runningResult || listElem == num
}
This doesn't have the nice aspect of stopping as soon as the target value has been found - and it's nowhere near as concise as calling MyList.contains. But as an instructional example, this is how you could implement this yourself from the primitive functional operations on a list.
List has a method for that:
val found = MyList.contains(num)

How to correctly get current loop count from a Iterator in scala

I am looping over the following lines from a csv file to parse them. I want to identify the first line since its the header. Whats the best way of doing this instead of making a var counter holder.
var counter = 0
for (line <- lines) {
println(CsvParser.parse(line, counter))
counter++
}
I know there is got to be a better way to do this, newbie to Scala.
Try zipWithIndex:
for (line <- lines.zipWithIndex) {
println(CsvParser.parse(line._1, line._2))
}
#tenshi suggested the following improvement with pattern matching:
for ((line, count) <- lines.zipWithIndex) {
println(CsvParser.parse(line, count))
}
I totally agree with the given answer, still that I've to point something important out and initially I planned to put in a simple comment.
But it would be quite long, so that, leave me set it as a variant answer.
It's prefectly true that zip* methods are helpful in order to create tables with lists, but they have the counterpart that they loop the lists in order to create it.
So that, a common recommendation is to sequence the actions required on the lists in a view, so that you combine all of them to be applied only producing a result will be required. Producing a result is considered when the returnable isn't an Iterable. So is foreach for instance.
Now, talking about the first answer, if you have lines to be the list of lines in a very big file (or even an enumeratee on it), zipWithIndex will go through all of 'em and produce a table (Iterable of tuples). Then the for-comprehension will go back again through the same amount of items.
Finally, you've impacted the running lenght by n, where n is the length of lines and added a memory footprint of m + n*16 (roughtly) where m is the lines' footprint.
Proposition
lines.view.zipWithIndex map Function.tupled(CsvParser.parse) foreach println
Some few words left (I promise), lines.view will create something like scala.collection.SeqView that will hold all further "mapping" function producing new Iterable, as are zipWithIndex and map.
Moreover, I think the expression is more elegant because it follows the reader and logical.
"For lines, create a view that will zip each item with its index, the result as to be mapped on the result of the parser which must be printed".
HTH.

How to delete elements from a transformed collection using a predicate?

If I have an ArrayList<Double> dblList and a Predicate<Double> IS_EVEN I am able to remove all even elements from dblList using:
Collections2.filter(dblList, IS_EVEN).clear()
if dblList however is a result of a transformation like
dblList = Lists.transform(intList, TO_DOUBLE)
this does not work any more as the transformed list is immutable :-)
Any solution?
Lists.transform() accepts a List and helpfully returns a result that is RandomAccess list. Iterables.transform() only accepts an Iterable, and the result is not RandomAccess. Finally, Iterables.removeIf (and as far as I see, this is the only one in Iterables) has an optimization in case that the given argument is RandomAccess, the point of which is to make the algorithm linear instead of quadratic, e.g. think what would happen if you had a big ArrayList (and not an ArrayDeque - that should be more popular) and kept removing elements from its start till its empty.
But the optimization depends not on iterator remove(), but on List.set(), which is cannot be possibly supported in a transformed list. If this were to be fixed, we would need another marker interface, to denote that "the optional set() actually works".
So the options you have are:
Call Iterables.removeIf() version, and run a quadratic algorithm (it won't matter if your list is small or you remove few elements)
Copy the List into another List that supports all optional operations, then call Iterables.removeIf().
The following approach should work, though I haven't tried it yet.
Collection<Double> dblCollection =
Collections.checkedCollection(dblList, Double.class);
Collections2.filter(dblCollection, IS_EVEN).clear();
The checkCollection() method generates a view of the list that doesn't implement List. [It would be cleaner, but more verbose, to create a ForwardingCollection instead.] Then Collections2.filter() won't call the unsupported set() method.
The library code could be made more robust. Iterables.removeIf() could generate a composed Predicate, as Michael D suggested, when passed a transformed list. However, we previously decided not to complicate the code by adding special-case logic of that sort.
Maybe:
Collection<Double> odds = Collections2.filter(dblList, Predicates.not(IS_EVEN));
or
dblList = Lists.newArrayList(Lists.transform(intList, TO_DOUBLE));
Collections2.filter(dblList, IS_EVEN).clear();
As long as you have no need for the intermediate collection, then you can just use Predicates.compose() to create a predicate that first transforms the item, then evaluates a predicate on the transformed item.
For example, suppose I have a List<Double> from which I want to remove all items where the Integer part is even. I already have a Function<Double,Integer> that gives me the Integer part, and a Predicate<Integer> that tells me if it is even.
I can use these to get a new predicate, INTEGER_PART_IS_EVEN
Predicate<Double> INTEGER_PART_IS_EVEN = Predicates.compose(IS_EVEN, DOUBLE_TO_INTEGER);
Collections2.filter(dblList, INTEGER_PART_IS_EVEN).clear();
After some tries, I think I've found it :)
final ArrayList<Integer> ints = Lists.newArrayList(1, 2, 3, 4, 5);
Iterables.removeIf(Iterables.transform(ints, intoDouble()), even());
System.out.println(ints);
[1,3,5]
I don't have a solution, instead I found some kind of a problem with Iterables.removeIf() in combination with Lists.TransformingRandomAccessList.
The transformed list implements RandomAccess, thus Iterables.removeIf() delegates to Iterables.removeIfFromRandomAccessList() which depends on an unsupported List.set() operation.
Calling Iterators.removeIf() however would be successful, as the remove() operation IS supported by Lists.TransformingRandomAccessList.
see: Iterables: 147
Conclusion: instanceof RandomAccess does not guarantee List.set().
Addition:
In special situations calling removeIfFromRandomAccessList() even works:
if and only if the elements to erase form a compact group at the tail of the List or all elements are covered by the Predicate.