Bind extra information to a future sequence - scala

Say I have been given a list of futures with each one linked to an key such as:
val seq: Seq[(Key, Future[Value])]
And my goal is to produce a list of key value tuples once all futures have completed:
val complete: Seq[(Key, Value)]
I am wondering if this can be achieved using a sequence call. For example I know I can do the following:
val complete = Future.sequence(seq.map(_._2).onComplete {
case Success(s) => s
case Failure(NonFatal(e)) => Seq()
}
But this will only returns me a sequence of Value objects and I lose the pairing information between Key and Value. The problem being that Future.sequence expects a sequence of Futures.
How could I augment this to maintain the key/value pairing in my complete sequence?
Thanks
Des

How about transforming your Seq[(Key, Future[Value])] to Seq[Future[(Key, Value)]] first.
val seq: Seq[(Key, Future[Value])] = // however your implementation is
val futurePair: Seq[Future[(Key, Value)]] = for {
(key, value) <- seq
} yield value.map(v => (key, v))
Now you can use sequence to get Future[Seq[(Key, Value)]].
val complete: Future[Seq[(String, Int)]] = Future.sequence(futurePair)

Just a different expression of the other answer, using unzip and zip.
scala> val vs = Seq(("one",Future(1)),("two",Future(2)))
vs: Seq[(String, scala.concurrent.Future[Int])] = List((one,scala.concurrent.impl.Promise$DefaultPromise#4e38d975), (two,scala.concurrent.impl.Promise$DefaultPromise#35f8a9d3))
scala> val (ks, fs) = vs.unzip
ks: Seq[String] = List(one, two)
fs: Seq[scala.concurrent.Future[Int]] = List(scala.concurrent.impl.Promise$DefaultPromise#4e38d975, scala.concurrent.impl.Promise$DefaultPromise#35f8a9d3)
scala> val done = (Future sequence fs) map (ks zip _)
done: scala.concurrent.Future[Seq[(String, Int)]] = scala.concurrent.impl.Promise$DefaultPromise#56913163
scala> done.value
res0: Option[scala.util.Try[Seq[(String, Int)]]] = Some(Success(List((one,1), (two,2))))
or maybe save on zippage:
scala> val done = (Future sequence fs) map ((ks, _).zipped)
done: scala.concurrent.Future[scala.runtime.Tuple2Zipped[String,Seq[String],Int,Seq[Int]]] = scala.concurrent.impl.Promise$DefaultPromise#766a52f5
scala> done.value.get.get.toList
res1: List[(String, Int)] = List((one,1), (two,2))

Related

Merging two arrays in Scala

My requirement is that :
arr1 : Array[(String, String)] = Array((bangalore,Kanata), (kannur,Kerala))
arr2 : Array[(String, String)] = Array((001,anup), (002,sithu))
should give me
Array((001,anup,bangalore,Krnata), (002,sithu,kannur,Kerala))
I tried this :
val arr3 = arr2.map(field=>(field,arr1))
but it didn't work
#nicodp's answer addressed your question very nicely. zip and then map will give you the resultant array.
Recall that if one list is larger than the other, its remaining elements are ignored.
My attempt tries to address this:
Consider:
val arr1 = Array(("bangalore","Kanata"), ("kannur","Kerala"))
val arr2 = Array(("001","anup", "ramakrishan"), ("002","sithu", "bhattacharya"))
zip and mapping on tuples will give the result as:
arr1.zip(arr2).map(field => (field._1._1, field._1._2, field._2._1, field._2._2))
Array[(String, String, String, String)] = Array((bangalore,Kanata,001,anup), (kannur,Kerala,002,sithu))
// This ignores the last field of arr2
While mapping, you can convert the tuple in iterator and get a list from it. This will enable you to not keep a track of Tuple2 or Tuple3
arr1.zip(arr2).map{ case(k,v) => List(k.productIterator.toList, v.productIterator.toList).flatten }
// Array[List[Any]] = Array(List(bangalore, Kanata, 001, anup, ramakrishan), List(kannur, Kerala, 002, sithu, bhattacharya))
You can do a zip followed by a map:
scala> val arr1 = Array((1,2),(3,4))
arr1: Array[(Int, Int)] = Array((1,2), (3,4))
scala> val arr2 = Array((5,6),(7,8))
arr2: Array[(Int, Int)] = Array((5,6), (7,8))
scala> arr1.zip(arr2).map(field => (field._1._1, field._1._2, field._2._1, field._2._2))
res1: Array[(Int, Int, Int, Int)] = Array((1,2,5,6), (3,4,7,8))
The map acts as a flatten for tuples, that is, takes things of type ((A, B), (C, D)) and maps them to (A, B, C, D).
What zip does is... meh, let's see its type:
def zip[B](that: GenIterable[B]): List[(A, B)]
So, from there, we can argue that it takes an iterable collection (which can be another list) and returns a list which is the combination of the corresponding elements of both this: List[A] and that: List[B] lists. Recall that if one list is larger than the other, its remaining elements are ignored. You can dig more about list functions in the documentation.
I agree that the cleanes solution is using the zip method from collections
val arr1 = Array(("bangalore","Kanata"), ("kannur","Kerala"))
val arr2 = Array(("001","anup"), ("002","sithu"))
arr1.zip(arr2).foldLeft(List.empty[Any]) {
case (acc, (a, b)) => acc ::: List(a.productIterator.toList ++ b.productIterator.toList)
}

How to do flatten in scala horizantally?

I am trying some basic logic using scala . I tried the below code but it throws error .
scala> val data = ("HI",List("HELLO","ARE"))
data: (String, List[String]) = (HI,List(HELLO, ARE))
scala> data.flatmap( elem => elem)
<console>:22: error: value flatmap is not a member of (String, List[String])
data.flatmap( elem => elem)
Expected Output :
(HI,HELLO,ARE)
Could some one help me to fix this issue?
You are trying to flatMap over a tuple, which won't work. The following will work:
val data = List(List("HI"),List("HELLO","ARE"))
val a = data.flatMap(x => x)
This will be very trivial in scala:
val data = ("HI",List("HELLO","ARE"))
println( data._1 :: data._2 )
what exact data structure are you working with?
If you are clear about you data structure:
type rec = (String, List[String])
val data : rec = ("HI",List("HELLO","ARE"))
val f = ( v: (String, List[String]) ) => v._1 :: v._2
f(data)
A couple of observations:
Currently there is no flatten method for tuples (unless you use shapeless).
flatMap cannot be directly applied to a list of elements which are a mix of elements and collections.
In your case, you can make element "HI" part of a List:
val data = List(List("HI"), List("HELLO","ARE"))
data.flatMap(identity)
Or, you can define a function to handle your mixed element types accordingly:
val data = List("HI", List("HELLO","ARE"))
def flatten(l: List[Any]): List[Any] = l.flatMap{
case x: List[_] => flatten(x)
case x => List(x)
}
flatten(data)
You are trying to flatMap on Tuple2 which is not available in current api
If you don't want to change your input, you can extract the values from Tuple2 and the extract the values for second tuple value as below
val data = ("HI",List("HELLO","ARE"))
val output = (data._1, data._2(0), data._2(1))
println(output)
If that's what you want:
val data = ("HI",List("HELLO,","ARE").mkString(""))
println(data)
>>(HI,HELLO,ARE)

applying partial function on a tuple field, maintaining the tuple structure

I have a PartialFunction[String,String] and a Map[String,String].
I want to apply the partial functions on the map values and collect the entries for which it was applicaple.
i.e. given:
val m = Map( "a"->"1", "b"->"2" )
val pf : PartialFunction[String,String] = {
case "1" => "11"
}
I'd like to somehow combine _._2 with pfand be able to do this:
val composedPf : PartialFunction[(String,String),(String,String)] = /*someMagicalOperator(_._2,pf)*/
val collected : Map[String,String] = m.collect( composedPf )
// collected should be Map( "a"->"11" )
so far the best I got was this:
val composedPf = new PartialFunction[(String,String),(String,String)]{
override def isDefinedAt(x: (String, String)): Boolean = pf.isDefinedAt(x._2)
override def apply(v1: (String, String)): (String,String) = v1._1 -> pf(v1._2)
}
is there a better way?
Here is the magical operator:
val composedPf: PartialFunction[(String, String), (String, String)] =
{case (k, v) if pf.isDefinedAt(v) => (k, pf(v))}
Another option, without creating a composed function, is this:
m.filter(e => pf.isDefinedAt(e._2)).mapValues(pf)
There is a function in Scalaz, that does exactly that: second
scala> m collect pf.second
res0: scala.collection.immutable.Map[String,String] = Map(a -> 11)
This works, because PartialFunction is an instance of Arrow (a generalized function) typeclass, and second is one of the common operations defined for arrows.

Simplest way to extract Option from Scala collections

Imagine you have a Map[Option[Int], String] and you want to have a Map[Int, String] discarding the entry which contain None as the key.
Another example, that should be somehow similar is List[(Option[Int], String)] and transform it to List[(Int, String)], again discarding the tuple which contain None as the first element.
What's the best approach?
collect is your friend here:
example data definition
val data = Map(Some(1) -> "data", None -> "")
solution for Map
scala> data collect { case ( Some(i), s) => (i,s) }
res4: scala.collection.immutable.Map[Int,String] = Map(1 -> data)
the same approach works for a list of tuples
scala> data.toList collect { case ( Some(i), s) => (i,s) }
res5: List[(Int, String)] = List((1,data))

How to convert RDD[(String, String)] into RDD[Array[String]]?

I am trying to append filename to each record in the file. I thought if the RDD is Array it would have been easy for me to do it.
Some help with converting RDD type or solving this problem would be much appreciated!
In (String, String) type
scala> myRDD.first()(1)
scala><console>:24: error: (String, String) does not take parametersmyRDD.first()(1)
In Array(string)
scala> myRDD.first()(1)
scala> res1: String = abcdefgh
My function:
def appendKeyToValue(x: Array[Array[String]){
for (i<-0 to (x.length - 1)) {
var key = x(i)(0)
val pattern = new Regex("\\.")
val key2 = pattern replaceAllIn(key1,"|")
var tempvalue = x(i)(1)
val finalval = tempvalue.split("\n")
for (ab <-0 to (finalval.length -1)){
val result = (I am trying to append filename to each record in the filekey2+"|"+finalval(ab))
}
}
}
If you have a RDD[(String, String)], you can access the first tuple field of the first tuple by calling
val firstTupleField: String = myRDD.first()._1
If you want to convert a RDD[(String, String)] into a RDD[Array[String]] you can do the following
val arrayRDD: RDD[Array[String]] = myRDD.map(x => Array(x._1, x._2))
You may also employ a partial function to destructure the tuples:
val arrayRDD: RDD[Array[String]] = myRDD.map { case (a,b) => Array(a, b) }