How to do flatten in scala horizantally? - scala

I am trying some basic logic using scala . I tried the below code but it throws error .
scala> val data = ("HI",List("HELLO","ARE"))
data: (String, List[String]) = (HI,List(HELLO, ARE))
scala> data.flatmap( elem => elem)
<console>:22: error: value flatmap is not a member of (String, List[String])
data.flatmap( elem => elem)
Expected Output :
(HI,HELLO,ARE)
Could some one help me to fix this issue?

You are trying to flatMap over a tuple, which won't work. The following will work:
val data = List(List("HI"),List("HELLO","ARE"))
val a = data.flatMap(x => x)

This will be very trivial in scala:
val data = ("HI",List("HELLO","ARE"))
println( data._1 :: data._2 )
what exact data structure are you working with?
If you are clear about you data structure:
type rec = (String, List[String])
val data : rec = ("HI",List("HELLO","ARE"))
val f = ( v: (String, List[String]) ) => v._1 :: v._2
f(data)

A couple of observations:
Currently there is no flatten method for tuples (unless you use shapeless).
flatMap cannot be directly applied to a list of elements which are a mix of elements and collections.
In your case, you can make element "HI" part of a List:
val data = List(List("HI"), List("HELLO","ARE"))
data.flatMap(identity)
Or, you can define a function to handle your mixed element types accordingly:
val data = List("HI", List("HELLO","ARE"))
def flatten(l: List[Any]): List[Any] = l.flatMap{
case x: List[_] => flatten(x)
case x => List(x)
}
flatten(data)

You are trying to flatMap on Tuple2 which is not available in current api
If you don't want to change your input, you can extract the values from Tuple2 and the extract the values for second tuple value as below
val data = ("HI",List("HELLO","ARE"))
val output = (data._1, data._2(0), data._2(1))
println(output)

If that's what you want:
val data = ("HI",List("HELLO,","ARE").mkString(""))
println(data)
>>(HI,HELLO,ARE)

Related

How to properly iterate over Array[String]?

I have a function in scala which I send arguments to, I use it like this:
val evega = concat.map(_.split(",")).keyBy(_(0)).groupByKey().map{case (k, v) => (k, f(v))}
My function f is:
val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd")
implicit val localDateOrdering: Ordering[LocalDate] = Ordering.by(_.toEpochDay)
def f(v: Array[String]): Int = {
val parsedDates = v.map(LocalDate.parse(_, formatter))
parsedDates.max.getDayOfYear - parsedDates.min.getDayOfYear}
And this is the error I get:
found : Iterable[Array[String]]
required: Array[String]
I already tried using:
val evega = concat.map(_.split(",")).keyBy(_(0)).groupByKey().map{case (k, v) => (k, for (date <- v) f(date))}
But I get massive errors.
Just to get a better picture, data in concat is:
1974,1974-06-22
1966,1966-07-20
1954,1954-06-19
1994,1994-06-27
1954,1954-06-26
2006,2006-07-04
2010,2010-07-07
1990,1990-06-30
...
It is type RDD[String].
How can I properly iterate over that and get a single Int from that function f?
The RDD types alongside your pipeline are:
concat.map(_.split(",")) gives an RDD[Array[String]]
for instance Array("1954", "1954-06-19")
concat.map(_.split(",")).keyBy(_(0)) gives RDD[(String, Array[String])]
for instance ("1954", Array("1954", "1954-06-19"))
concat.map(_.split(",")).keyBy(_(0)).groupByKey() gives RDD[(String, Iterable[Array[String]])]
for instance Iterable(("1954", Iterable(Array("1954", "1954-06-19"), Array("1954", "1954-06-24"))))
Thus when you map at the end, the type of values is Iterable[Array[String]].
Since your input is "1974,1974-06-22", the solution could consist in replacing your keyBy transformation by a map:
input.map(_.split(",")).map(x => x(0) -> x(1)).groupByKey().map{case (k, v) => (k, f(v))}
Indeed, .map(x => x(0) -> x(1)) (instead of .map(x => x(0) -> x) whose keyBy(_(0)) is syntactic sugar for) will provide for the value the second element of the split array instead of the array itself. Thus giving RDD[(String, String)] during this second step rather than RDD[(String, Array[String])].

Scala - pattern head/tail on Map

I'm trying to make a tail recursive method but i'm using Map and i don't know how to use Pattern Matching to check if Map is empty/null and get head/tail:
def aa(a:Map[String, Seq[Operation]]): Map[String, (Seq[Operation], Double)] = {
def aaRec(xx:Map[String, Seq[Operation]],
res:Map[String, (Seq[Operation], Double)],
acc:Double = 0): Map[String, (Seq[Operation], Double)] = xx match {
case ? =>
res
case _ =>
val head = xx.head
val balance = head._2.foldLeft(acc)(_ + _.amount)
aaRec(xx.tail, res + (head._1 -> (head._2, balance)), balance)
}
aaRec(a, Map[String, (Seq[Operation], Double)]())
}
}
What is the correct syntax on case empty map and case h :: t?
Thank's in advance
Map has no order so it has no head or tail. It also has no unapply/unapplySeq method so you can't do pattern matching on a Map.
I think going with a foldLeft might be your best option.
I'm not sure if it's possible to pattern match on a map, but this code could be rewritten using basic combinator methods:
def aa(a:Map[String, Seq[Operation]]): Map[String, (Seq[Operation], Double)] =
a.mapValues(seq => (seq, seq.map(_.amount).sum))

applying partial function on a tuple field, maintaining the tuple structure

I have a PartialFunction[String,String] and a Map[String,String].
I want to apply the partial functions on the map values and collect the entries for which it was applicaple.
i.e. given:
val m = Map( "a"->"1", "b"->"2" )
val pf : PartialFunction[String,String] = {
case "1" => "11"
}
I'd like to somehow combine _._2 with pfand be able to do this:
val composedPf : PartialFunction[(String,String),(String,String)] = /*someMagicalOperator(_._2,pf)*/
val collected : Map[String,String] = m.collect( composedPf )
// collected should be Map( "a"->"11" )
so far the best I got was this:
val composedPf = new PartialFunction[(String,String),(String,String)]{
override def isDefinedAt(x: (String, String)): Boolean = pf.isDefinedAt(x._2)
override def apply(v1: (String, String)): (String,String) = v1._1 -> pf(v1._2)
}
is there a better way?
Here is the magical operator:
val composedPf: PartialFunction[(String, String), (String, String)] =
{case (k, v) if pf.isDefinedAt(v) => (k, pf(v))}
Another option, without creating a composed function, is this:
m.filter(e => pf.isDefinedAt(e._2)).mapValues(pf)
There is a function in Scalaz, that does exactly that: second
scala> m collect pf.second
res0: scala.collection.immutable.Map[String,String] = Map(a -> 11)
This works, because PartialFunction is an instance of Arrow (a generalized function) typeclass, and second is one of the common operations defined for arrows.

Bind extra information to a future sequence

Say I have been given a list of futures with each one linked to an key such as:
val seq: Seq[(Key, Future[Value])]
And my goal is to produce a list of key value tuples once all futures have completed:
val complete: Seq[(Key, Value)]
I am wondering if this can be achieved using a sequence call. For example I know I can do the following:
val complete = Future.sequence(seq.map(_._2).onComplete {
case Success(s) => s
case Failure(NonFatal(e)) => Seq()
}
But this will only returns me a sequence of Value objects and I lose the pairing information between Key and Value. The problem being that Future.sequence expects a sequence of Futures.
How could I augment this to maintain the key/value pairing in my complete sequence?
Thanks
Des
How about transforming your Seq[(Key, Future[Value])] to Seq[Future[(Key, Value)]] first.
val seq: Seq[(Key, Future[Value])] = // however your implementation is
val futurePair: Seq[Future[(Key, Value)]] = for {
(key, value) <- seq
} yield value.map(v => (key, v))
Now you can use sequence to get Future[Seq[(Key, Value)]].
val complete: Future[Seq[(String, Int)]] = Future.sequence(futurePair)
Just a different expression of the other answer, using unzip and zip.
scala> val vs = Seq(("one",Future(1)),("two",Future(2)))
vs: Seq[(String, scala.concurrent.Future[Int])] = List((one,scala.concurrent.impl.Promise$DefaultPromise#4e38d975), (two,scala.concurrent.impl.Promise$DefaultPromise#35f8a9d3))
scala> val (ks, fs) = vs.unzip
ks: Seq[String] = List(one, two)
fs: Seq[scala.concurrent.Future[Int]] = List(scala.concurrent.impl.Promise$DefaultPromise#4e38d975, scala.concurrent.impl.Promise$DefaultPromise#35f8a9d3)
scala> val done = (Future sequence fs) map (ks zip _)
done: scala.concurrent.Future[Seq[(String, Int)]] = scala.concurrent.impl.Promise$DefaultPromise#56913163
scala> done.value
res0: Option[scala.util.Try[Seq[(String, Int)]]] = Some(Success(List((one,1), (two,2))))
or maybe save on zippage:
scala> val done = (Future sequence fs) map ((ks, _).zipped)
done: scala.concurrent.Future[scala.runtime.Tuple2Zipped[String,Seq[String],Int,Seq[Int]]] = scala.concurrent.impl.Promise$DefaultPromise#766a52f5
scala> done.value.get.get.toList
res1: List[(String, Int)] = List((one,1), (two,2))

How to invoke map() from a ProductIterator (Tuple)

Given following placeholder logging method:
def testshow(value: Any) = value.toString
In the following code snippet:
case t : Product =>
t.productIterator.foreach( a => println(a.toString))
val lst = t.productIterator.map(a => testshow(a))
val lst2 = t.productIterator.map(_.toString)
lst.mkString("(",",",")")
lst2.mkString("(",",",")")
And given an input tuple :
(Some(OP(_)),Some(a),1)
The println successfully shows entries for the given tuple.
Some(OP(_))
Some(a)
1
lst2 (with toString) says: Non-empty iterator. However the list "lst" says:
empty iterator
So what is wrong with the syntax to invoke the map() method on the productIterator?
Note: if putting "toString" in place of testshow this works properly.
Update: A "self contained" snippet does work. It is still not clear why the above code does not..
def testshow(value: Any) = "TestShow%s".format(value.toString)
val obj = ("abc",123,"def")
obj match {
case t : Product =>
t.productIterator.foreach( a => println(a.toString))
val lst = t.productIterator.map(a => testshow(a))
val lst2 = t.productIterator.map(_.toString)
println("testshow list %s".format(lst.mkString("(",",",")")))
println("toString list %s".format(lst2.mkString("(",",",")")))
}
Output:
abc
123
def
testshow list (**abc**,**123**,**def**)
toString list (abc,123,def)
Iterators can be traversed only once, then they are exhausted. Mapping an iterator produces another iterator. If you see your iterator empty, you must have forced a traversal.
scala> case class Foo(a: Int, b: Int)
defined class Foo
scala> Foo(1, 2).productIterator.map(_.toString)
res1: Iterator[String] = non-empty iterator
It is non-empty. Are you sure you used a fresh iterator? Because if you used the same iterator for the first foreach loop, then it would be empty if you tried to map the same iterator afterwards.
Edit: The shape of the map function argument has nothing to do with this:
def testshow(value: Any) = value.toString
case class OP(x: Any)
def test(x: Any) = x match {
case t: Product =>
val lst = t.productIterator.map(a => testshow(a))
lst.mkString("(", ",", ")")
case _ => "???"
}
test((Some(OP(_)),Some('a'),1)) // "(Some(<function1>),Some(a),1)"
Looks like an Intellij Bug. I just changed the name of the "lst" variable to "lst3" and it works. I repeated the process of name/rename back/forth and it is repeatable bug. There are no other occurrences of "lst" in the entire file and in any case it was a local variable.