Accessing Previous output while operator chaining in Scala - scala

How to access the resulting output value to perform an upcoming operation for example:
scala> List(1,4,3,4,4,5,6,7)
res0: List[Int] = List(1, 4, 3, 4, 4, 5, 6, 7)
scala> res0.removeDuplicates.slice(0, ???.size -2)
In the above line, i need to perform slice operation after removing duplicates. To do this, how to access output of .removeDuplicate(), so that i can use it to find size for slice operation.
I need to perform this in a single step. Not in multiple steps like:
scala> res0.removeDuplicates
res1: List[Int] = List(1, 4, 3, 5, 6, 7)
scala> res1.slice(0, res1.size -2)
res2: List[Int] = List(1, 4, 3, 5)
I want to access intermediate results in the final operation. removeDuplicates() is just an example.
list.op1().op2().op3().finalop() here i want to access: output of op1,op2,op3 in finalop

Wrapping into into an Option may be one option (no pun intended):
val finalResult = Some(foo).map { foo =>
foo.op1(foo.stuff)
}.map { foo =>
foo.op2(foo.stuff)
}.map { foo =>
foo.op3(foo.stuff)
}.get.finalOp
You can make the wrapping part implicit to make it a little nicer:
object Tapper {
implicit class Tapped[T] extends AnyVal(val v: T) {
def tap[R](f: T => R) = f(v)
}
}
import Tapper._
val finalResult = foo
.tap(f => f.op1(f.stuff))
.tap(f => f.op2(f.stuff))
.tap(f => f.finalOp(f.stuff))

With for comprehension it is possible to compose operations in quite readable way with ability to access intermediate results:
val res = for {
ls1 <- Option(list.op1)
ls2 = ls1.op2() // Possible to access list, ls1
ls3 = ls2.op3() // Possible to access list, ls1, ls2
} yield ls4.finalOp() // Possible to access list, ls1, ls2, ls3
For example:
scala> val ls = List(1,1,2,2,3,3,4,4)
ls: List[Int] = List(1, 1, 2, 2, 3, 3, 4, 4)
scala> :paste
// Entering paste mode (ctrl-D to finish)
for {
ls1 <- Option(ls.map(_ * 2))
ls2 = ls1.map(_ + ls1.size)
ls3 = ls2.filter(_ < ls1.size + ls2.size)
} yield ls3.sum
// Exiting paste mode, now interpreting.
res15: Option[Int] = Some(72)

You will not need to know the length if you use dropRight:
scala> val a = List(1,4,3,4,4,5,6,7)
a: List[Int] = List(1, 4, 3, 4, 4, 5, 6, 7)
scala> a.dropRight(2)
res0: List[Int] = List(1, 4, 3, 4, 4, 5)
So do this: res0.removeDuplicates.dropRight(2)

If you really need it in one function, you can write a custom foldLeft, something like this:
var count = 0
val found = new HashSet()
res0.foldLeft(List[Int]()) { (z, i) =>
if(!found.contains(i)){
if(count < 4){
z :+ i
found += i
count += 1
}
}
}
However I don't really see the problem in chaining calls like in res0.removeDuplicates.slice. One benefit of functional programming is that our compiler can optimize in situations like this where we just want a certain behavior and don't want to specify the implementation.

You want to process some data through a series of transformations: someData -> op1 -> op2 -> op3 -> finalOp. However, inside op3, you would like to have access to intermediate results from the processing done in op1. The key here is to pass to the next function in the processing chain all the information that will be required downstream.
Let's say that your input is xs: Seq[String] and op1 is of type (xs: Seq[String]) => Seq[String]. You want to modify op1 to return case class ResultWrapper(originalInputLength: Int, deduplicatedItems: Seq[String], somethingNeededInOp5: SomeType). If all of your ops pass along what the other ops need down the line, you will get what you need. It's not very elegant, because there is coupling between your ops: the upstream needs to save the info that the downstream needs. They are not really "different operations" any more at this point.
One thing you can do is to use a Map[A,B] as your "result wrapper". This way, there is less coupling between ops, but less type safety as well.

Related

apply/get methods in Scala

If we go by the definition in "Programming in Scala" book:
When you apply parentheses surrounding one or more values to a
variable, Scala will transform the code into an invocation of a method
named apply on that variable
Then what about accessing the elements of an array? eg: x(0) is transformed to x.apply(0) ? (let's assume that x is an array). I tried to execute the above line. It was throwing error. I also tried x.get(0) which was also throwing error.
Can anyone please help?
() implies apply(),
Array example,
scala> val data = Array(1, 1, 2, 3, 5, 8)
data: Array[Int] = Array(1, 1, 2, 3, 5, 8)
scala> data.apply(0)
res0: Int = 1
scala> data(0)
res1: Int = 1
not releated but alternative is to use safer method which is lift
scala> data.lift(0)
res4: Option[Int] = Some(1)
scala> data.lift(100)
res5: Option[Int] = None
**Note: ** scala.Array can be mutated,
scala> data(0) = 100
scala> data
res7: Array[Int] = Array(100, 1, 2, 3, 5, 8)
In this you can not use apply, think of apply as a getter not mutator,
scala> data.apply(0) = 100
<console>:13: error: missing argument list for method apply in class Array
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `apply _` or `apply(_)` instead of `apply`.
data.apply(0) = 100
^
You better use .update if you want to mutate,
scala> data.update(0, 200)
scala> data
res11: Array[Int] = Array(200, 1, 2, 3, 5, 8)
User defined apply method,
scala> object Test {
|
| case class User(name: String, password: String)
|
| object User {
| def apply(): User = User("updupd", "password")
| }
|
| }
defined object Test
scala> Test.User()
res2: Test.User = User(updupd,password)
If you add an apply method to an object, you can apply that object (like you can apply functions).
The way to do that it is just apply the object as if it was a function, directly with (), without a "dot".
val array:Array[Int] = Array(1,2,3,4)
array(0) == array.apply(0)
For
x(1)=200
which you mention in the comment, the answer is different. It also gets translated to a method call, but not to apply; instead it's
x.update(1, 200)
Just like apply, this will work with any type which defines a suitable update method.

Scala method to side effect on map and return it

What is the best way to apply a function to each element of a Map and at the end return the same Map, unchanged, so that it can be used in further operations?
I'd like to avoid:
myMap.map(el => {
effectfullFn(el)
el
})
to achieve syntax like this:
myMap
.mapEffectOnKV(effectfullFn)
.foreach(println)
map is not what I'm looking for, because I have to specify what comes out of the map (as in the first code snippet), and I don't want to do that.
I want a special operation that knows/assumes that the map elements should be returned without change after the side-effect function has been executed.
In fact, this would be so useful to me, I'd like to have it for Map, Array, List, Seq, Iterable... The general idea is to peek at the elements to do something, then automatically return these elements.
The real case I'm working on looks like this:
calculateStatistics(trainingData, indexMapLoaders)
.superMap { (featureShardId, shardStats) =>
val outputDir = summarizationOutputDir + "/" + featureShardId
val indexMap = indexMapLoaders(featureShardId).indexMapForDriver()
IOUtils.writeBasicStatistics(sc, shardStats, outputDir, indexMap)
}
Once I have calculated the statistics for each shard, I want to append the side effect of saving them to disk, and then just return those statistics, without having to create a val and having that val's name be the last statement in the function, e.g.:
val stats = calculateStatistics(trainingData, indexMapLoaders)
stats.foreach { (featureShardId, shardStats) =>
val outputDir = summarizationOutputDir + "/" + featureShardId
val indexMap = indexMapLoaders(featureShardId).indexMapForDriver()
IOUtils.writeBasicStatistics(sc, shardStats, outputDir, indexMap)
}
stats
It's probably not very hard to implement, but I was wondering if there was something in Scala already for that.
Function cannot be effectful by definition, so I wouldn't expect anything convenient in scala-lib. However, you can write a wrapper:
def tap[T](effect: T => Unit)(x: T) = {
effect(x)
x
}
Example:
scala> Map(1 -> 1, 2 -> 2)
.map(tap(el => el._1 + 5 -> el._2))
.foreach(println)
(1,1)
(2,2)
You can also define an implicit:
implicit class TapMap[K,V](m: Map[K,V]){
def tap(effect: ((K,V)) => Unit): Map[K,V] = m.map{x =>
effect(x)
x
}
}
Examples:
scala> Map(1 -> 1, 2 -> 2).tap(el => el._1 + 5 -> el._2).foreach(println)
(1,1)
(2,2)
To abstract more, you can define this implicit on TraversableOnce, so it would be applicable to List, Set and so on if you need it:
implicit class TapTraversable[Coll[_], T](m: Coll[T])(implicit ev: Coll[T] <:< TraversableOnce[T]){
def tap(effect: T => Unit): Coll[T] = {
ev(m).foreach(effect)
m
}
}
scala> List(1,2,3).tap(println).map(_ + 1)
1
2
3
res24: List[Int] = List(2, 3, 4)
scala> Map(1 -> 1).tap(println).toMap //`toMap` is needed here for same reasons as it needed when you do `.map(f).toMap`
(1,1)
res5: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1)
scala> Set(1).tap(println)
1
res6: scala.collection.immutable.Set[Int] = Set(1)
It's more useful, but requires some "mamba-jumbo" with types, as Coll[_] <: TraversableOnce[_] doesn't work (Scala 2.12.1), so I had to use an evidence for that.
You can also try CanBuildFrom approach: How to enrich a TraversableOnce with my own generic map?
Overall recommendation about dealing with passthrough side-effects on iterators is to use Streams (scalaz/fs2/monix) and Task, so they've got an observe (or some analog of it) function that does what you want in async (if needed) way.
My answer before you provided example of what you want
You can represent effectful computation without side-effects and have distinct values that represent state before and after:
scala> val withoutSideEffect = Map(1 -> 1, 2 -> 2)
withoutSideEffect: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 2 -> 2)
scala> val withSideEffect = withoutSideEffect.map(el => el._1 + 5 -> (el._2 + 5))
withSideEffect: scala.collection.immutable.Map[Int,Int] = Map(6 -> 6, 7 -> 7)
scala> withoutSideEffect //unchanged
res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 2 -> 2)
scala> withSideEffect //changed
res1: scala.collection.immutable.Map[Int,Int] = Map(6 -> 6, 7 -> 7)
Looks like the concept you're after is similar to the Unix tee
utility--take an input and direct it to two different outputs. (tee
gets its name from the shape of the letter 'T', which looks like a
pipeline from left to right with another line branching off downwards.)
Here's the Scala version:
package object mypackage {
implicit class Tee[A](a: A) extends AnyVal {
def tee(f: A => Unit): A = { f(a); a }
}
}
With that, we can do:
calculateStatistics(trainingData, indexMapLoaders) tee { stats =>
stats foreach { case (featureShardId, shardStats) =>
val outputDir = summarizationOutputDir + "/" + featureShardId
val indexMap = indexMapLoaders(featureShardId).indexMapForDriver()
IOUtils.writeBasicStatistics(sc, shardStats, outputDir, indexMap)
}
}
Note that as defined, Tee is very generic--it can do an effectful
operation on any value and then return the original passed-in value.
Call foreach on your Map with your effectfull function. You original Map will not be changed as Maps in scala are immutable.
val myMap = Map(1 -> 1)
myMap.foreach(effectfullFn)
If you are trying to chain this operation, you can use map
myMap.map(el => {
effectfullFn(el)
el
})

How to convert Iterable[Try[U]] filter successed to Iterable[U]?

I tried
val tryValues : Iterable[Try[Int]] = ...
val successValues = tryValues.filter(_.isSuccess).map(_.get)
but compiler give warning that map may throw exception.
Is there any way free of warning?
You want to use collect to pattern match out all the values which are Success, and discard anything else.
val successValues: List[Int] = tryValues collect { case Success(x) => x }
collect accepts a PartialFunction as an argument. Any values from the collection that the PartialFunction is defined for will be mapped, and the rest will be discarded.
Example:
scala> val tryValues = List(1, 1, 0, 1, 1).map(x => Try(1 / x))
tryValues: List[scala.util.Try[Int]] = List(Success(1), Success(1), Failure(java.lang.ArithmeticException: / by zero), Success(1), Success(1))
scala> val successValues = tryValues collect { case Success(x) => x }
successValues: List[Int] = List(1, 1, 1, 1)
Another option here, if you don't care to log anything about the fails is to flatMap using toOption on the Try. Like so:
val successValues = tryValues.flatMap(_.toOption)
The following is a for-comprehension approach
val successValues = for { Success(n) <- tryValues } yield(p)
For more information have a look at the answer

ScalaTest matcher syntax for checking whether one collection contains the elements of another

In ScalaTest it's easy to check whether a container has certain elements:
val theList = List(1, 2, 3, 4, 5)
theList should contain allOf(5, 3, 1) // passes
However, if you already have a list containing those elements you want to check for, it's not obvious how to make use of it. The code below doesn't compile, because allOf() only takes collection elements, not collections, and expects at least two of them.
val theList = List(1, 2, 3, 4, 5)
val expected = List(5, 3, 1)
theList should contain allOf(expected) // doesn't compile
Since a Scala List doesn't have containsAll(), you can't even do this:
val theList = List(1, 2, 3, 4, 5)
theList.containsAll(expected) should be(true) // doesn't compile
Right now I'm doing the following, but I'm not happy with it:
for(x <- expected) {
theList should contain(x)
}
Is there a more fluent / Scala-ish / standard way to make this assertion?
You can use implicit classes to add missing method
trait AllElementsOf {
implicit class AllElementsOf[L <: GenTraversable[_]](resultOfContainWord: ResultOfContainWord[L]) {
def allElementsOf(l: L)(implicit aggregating: Aggregating[L]) = {
val list = l.toList
assume(list.size >= 2, s"Expected to see list longer than 2")
resultOfContainWord.allOf(list(0), list(1), list.drop(2):_*)
}
}
}
class AllOfListSpec extends FlatSpec with ShouldMatchers with AllElementsOf {
"list" should "contain all of another list" in {
val theList = List(1, 2, 3, 4, 5)
val expected = List(5, 3, 1)
theList should contain allElementsOf expected
}
}
Update
Official allElementsOf will be in scalatest 3.0

How to use takeWhile with an Iterator in Scala

I have a Iterator of elements and I want to consume them until a condition is met in the next element, like:
val it = List(1,1,1,1,2,2,2).iterator
val res1 = it.takeWhile( _ == 1).toList
val res2 = it.takeWhile(_ == 2).toList
res1 gives an expected List(1,1,1,1) but res2 returns List(2,2) because iterator had to check the element in position 4.
I know that the list will be ordered so there is no point in traversing the whole list like partition does. I like to finish as soon as the condition is not met. Is there any clever way to do this with Iterators? I can not do a toList to the iterator because it comes from a very big file.
The simplest solution I found:
val it = List(1,1,1,1,2,2,2).iterator
val (r1, it2) = it.span( _ == 1)
println(s"group taken is: ${r1.toList}\n rest is: ${it2.toList}")
output:
group taken is: List(1, 1, 1, 1)
rest is: List(2, 2, 2)
Very short but further you have to use new iterator.
With any immutable collection it would be similar:
use takeWhile when you want only some prefix of collection,
use span when you need rest also.
With my other answer (which I've left separate as they are largely unrelated), I think you can implement groupWhen on Iterator as follows:
def groupWhen[A](itr: Iterator[A])(p: (A, A) => Boolean): Iterator[List[A]] = {
#annotation.tailrec
def groupWhen0(acc: Iterator[List[A]], itr: Iterator[A])(p: (A, A) => Boolean): Iterator[List[A]] = {
val (dup1, dup2) = itr.duplicate
val pref = ((dup1.sliding(2) takeWhile { case Seq(a1, a2) => p(a1, a2) }).zipWithIndex collect {
case (seq, 0) => seq
case (Seq(_, a), _) => Seq(a)
}).flatten.toList
val newAcc = if (pref.isEmpty) acc else acc ++ Iterator(pref)
if (dup2.nonEmpty)
groupWhen0(newAcc, dup2 drop (pref.length max 1))(p)
else newAcc
}
groupWhen0(Iterator.empty, itr)(p)
}
When I run it on an example:
println( groupWhen(List(1,1,1,1,3,4,3,2,2,2).iterator)(_ == _).toList )
I get List(List(1, 1, 1, 1), List(2, 2, 2))
I had a similar need, but the solution from #oxbow_lakes does not take into account the situation when the list has only one element, or even if the list contains elements that are not repeated. Also, that solution doesn't lend itself well to an infinite iterator (it wants to "see" all the elements before it gives you a result).
What I needed was the ability to group sequential elements that match a predicate, but also include the single elements (I can always filter them out if I don't need them). I needed those groups to be delivered continuously, without having to wait for the original iterator to be completely consumed before they are produced.
I came up with the following approach which works for my needs, and thought I should share:
implicit class IteratorEx[+A](itr: Iterator[A]) {
def groupWhen(p: (A, A) => Boolean): Iterator[List[A]] = new AbstractIterator[List[A]] {
val (it1, it2) = itr.duplicate
val ritr = new RewindableIterator(it1, 1)
override def hasNext = it2.hasNext
override def next() = {
val count = (ritr.rewind().sliding(2) takeWhile {
case Seq(a1, a2) => p(a1, a2)
case _ => false
}).length
(it2 take (count + 1)).toList
}
}
}
The above is using a few helper classes:
abstract class AbstractIterator[A] extends Iterator[A]
/**
* Wraps a given iterator to add the ability to remember the last 'remember' values
* From any position the iterator can be rewound (can go back) at most 'remember' values,
* such that when calling 'next()' the memoized values will be provided as if they have not
* been iterated over before.
*/
class RewindableIterator[A](it: Iterator[A], remember: Int) extends Iterator[A] {
private var memory = List.empty[A]
private var memoryIndex = 0
override def next() = {
if (memoryIndex < memory.length) {
val next = memory(memoryIndex)
memoryIndex += 1
next
} else {
val next = it.next()
memory = memory :+ next
if (memory.length > remember)
memory = memory drop 1
memoryIndex = memory.length
next
}
}
def canRewind(n: Int) = memoryIndex - n >= 0
def rewind(n: Int) = {
require(memoryIndex - n >= 0, "Attempted to rewind past 'remember' limit")
memoryIndex -= n
this
}
def rewind() = {
memoryIndex = 0
this
}
override def hasNext = it.hasNext
}
Example use:
List(1,2,2,3,3,3,4,5,5).iterator.groupWhen(_ == _).toList
gives: List(List(1), List(2, 2), List(3, 3, 3), List(4), List(5, 5))
If you want to filter out the single elements, just apply a filter or withFilter after groupWhen
Stream.continually(Random.nextInt(100)).iterator
.groupWhen(_ + _ == 100).withFilter(_.length > 1).take(3).toList
gives: List(List(34, 66), List(87, 13), List(97, 3))
You could use method toStream on Iterator.
Stream is a lazy equivalent of List.
As you can see from implementation of toStream it creates a Stream without traversing the whole Iterator.
Stream keeps all element in memory. You should localize usage of link to Stream in some local scope to prevent memory leaking.
With Stream you should use span like this:
val (res1, rest1) = stream.span(_ == 1)
val (res2, rest2) = rest1.span(_ == 2)
I'm guessing a bit here but by the statement "until a condition is met in the next element", it sounds like you might want to look at the groupWhen method on ListOps in scalaz
scala> import scalaz.syntax.std.list._
import scalaz.syntax.std.list._
scala> List(1,1,1,1,2,2,2) groupWhen (_ == _)
res1: List[List[Int]] = List(List(1, 1, 1, 1), List(2, 2, 2))
Basically this "chunks" up the input sequence upon a condition (a (A, A) => Boolean) being met between an element and its successor. In the example above the condition is equality, so, as long as an element is equal to its successor, they will be in the same chunk.