I have a file parser that produces a collection of elements all belonging to the same trait. It is similar to the following.
trait Data {
val identifier: String
}
case class Meta(identifier: String, props: Properties) extends Data
case class Complete(identifier: String, contents: Map[String, Any]) extends Data
case class Partial(identifier: String, name: String, value: Any) extends Data
...
def parse(file: File): Iterator[Data] = ... // this isn't relevant
What I am attempting to do is traverse the collection in a functional manner since I am processing a lot of data and want to be as memory conscious as possible. The collection when it is returned from the parse method is a mix of Complete, Meta, and Partial elements. The logic is that I need to pass the Complete and Meta elements through unchanged, while collecting the Partial elements and grouping on the identifier to create Complete elements.
With just a collection of Partial elements (Iterator[Partial]), I can do the following:
partialsOnly.groupBy(_.identifier)
.map{
case (ident, parts) =>
Complete(ident, parts.map(p => p.name -> p.value).toMap)
}
Is there a functional way, somewhat similar to scan that will accumulate elements, but only some elements, while letting the rest through unchanged?
You can use the partition function to split a collection in two based on a predicate.
val (partial: List[Data], completeAndMeta: List[Data]) = parse("file").partition(_ match{
case partial: Partial => true
case _ => false
})
From there, you want to make sure you can process partial as a List[Partial], ideally without tripping compiler warnings about type erasure or doing messy casts. You can do this with a call to collect, using a function that only accepts Partial's.
val partials: List[Partial] = partial.collect(_.match{case partial: Partial => partial}}
Unfortunately, when used on an Iterator, partition may need to buffer arbitrary amounts of data, so isn't necessarily the most memory efficient technique. If memory management is a huge concern, you may need to sacrifice functional purity. Alternately, if you add some way of knowing when a Partial is completed, you can accumulate them in a Map via a foldLeft and emit the final value as they finish.
Recursion might be functional way to solve your problem:
def parse(list: List[Data]): (List[Data], List[Data]) = {
list match {
case (x:Partial) :: xs =>
val (partials, rest) = parse(xs)
(x :: partials, rest) //instead of creating list, you can join partials here
case x :: xs =>
val (partials, rest) = parse(xs)
(partials, x :: rest)
case _ => (Nil, Nil)
}
}
val (partials, rest) = parse(list)
Unfortunately, this function is not tail recursive, so it might blow up the stack for longer lists.
You can solve it by using Eval from cats:
def parse2(list: List[Data]): Eval[(List[Data], List[Data])] =
Eval.now(list).flatMap {
case (x:Partial) :: xs =>
parse2(xs).map {
case (partials, rest) => (x :: partials, rest) //instead of creating list, you can join partials here
}
case x :: xs =>
parse2(xs).map {
case (partials, rest) => (partials, x :: rest)
}
case _ => Eval.now((Nil, Nil))
}
val (partialsResult, restResult) = parse2(longList).value
This solution would be safe for the stack because it uses Heap, not Stack.
And here's version, which also groups partials:
def parse3(list: List[Data]): Eval[(Map[String, List[Partial]], List[Data])] =
Eval.now(list).flatMap {
case (x:Partial) :: xs =>
parse3(xs).map {
case (partials, rest) =>
val newPartials = x :: partials.getOrElse(x.identifier, Nil)
(partials + (x.identifier -> newPartials), rest)
}
case x :: xs =>
parse3(xs).map {
case (partials, rest) => (partials, x :: rest)
}
case _ => Eval.now((Map.empty[String, List[Partial]], Nil))
}
Related
I'm micro-optimising some code as a challenge.
I have a list of objects with a list of keys in each of them.
What's the most efficient way of grouping them by key, with each object being in every group of which it has a key.
This is what I have, but I have a feeling it can be improved.
I have many objects (100k+), each has ~2 keys, and there's less than 50 possible keys.
I've tried parallelising the list with listOfObjs.par, but there doesn't seem to be much of an improvement overall.
case class Obj(value: Option[Int], key: Option[List[String]])
listOfObjs
.filter(x => x.key.isDefined && x.value.isDefined)
.flatMap(x => x.key.get.map((_, x.value.get)))
.groupBy(_._1)
If you have that many object, the logical next step would be to distribute the work by using a MapReduce framework. At the end of the day you still need to go over every single object to determine the group it belongs in and your worst case will be bottlenecked by that.
The best you can do here is to replace these 3 operations by a fold so you only iterate through the collection once.
Edit: Updated the order based on Luis' recommendation in the comments
listOfObj.foldLeft(Map.empty[String, List[Int]]){ (acc, obj) =>
(obj.key, obj.value) match {
case (Some(k), Some(v)) =>
k.foldLeft(acc)((a, ky) => a + (ky -> {v +: a.getOrElse(ky, List.empty)}))))
case _ => acc
}
}
I got the impression you are looking for a fast alternative; thus a little bit of encapsulated mutability can help.
So, what about something like this:
def groupObjectsByKey(objects: List[Obj]): Map[String, List[Int]] = {
val iter =
objects.iterator.flatMap {
case Obj(Some(value), Some(keys)) =>
keys.iterator.map(key => key -> value)
case _ =>
Iterator.empty[(String, Int)]
}
val m =
mutable
.Map
.empty[String, mutable.Builder[Int, List[Int]]
iter.foreach {
case (k, v) =>
m.get(key = k) match {
case Some(builder) =>
builder.addOne(v)
case None =>
m.update(key = k, value = List.newBuilder[Int].addOne(v))
}
}
immutable
.Map
.mapFactory[String, List[Int]]
.fromSpecific(m.view.mapValues(_.result()))
}
Or if you don't care about the order of the elements of each group we can simplify and speed up the code a lot:
def groupObjectsByKey(objects: List[Obj]): Map[String, List[Int]] = {
val iter = objects.iterator.flatMap {
case Obj(Some(value), Some(keys)) =>
keys.iterator.map(key => key -> value)
case _ =>
Iterator.empty[(String, Int)]
}
val m = mutable.Map.empty[String, List[Int]]
iter.foreach {
case (k, v) =>
m.updateWith(key = k) match {
case Some(list) =>
Some(v :: list)
case None =>
Some(v :: Nil)
}
}
m.to(immutable.Map)
}
I have a list of tuples look like this:
Seq("ptxt"->"how","list"->"you doing","ptxt"->"whats up","ptxt"-> "this ","list"->"is ","list"->"cool")
On the keys, merge ptxt with all the list that will come after it.
e.g.
create a new seq look like this :
Seq("how you doing", "whats up", "this is cool")
You could fold your Seq with foldLeft:
val s = Seq("ptxt"->"how ","list"->"you doing","ptxt"->"whats up","ptxt"-> "this ","list"->"is ","list"->"cool")
val r: Seq[String] = s.foldLeft(List[String]()) {
case (xs, ("ptxt", s)) => s :: xs
case (x :: xs, ("list", s)) => (x + s) :: xs
}.reverse
If you don't care about an order you can omit reverse.
Function foldLeft takes two arguments first is the initial value and the second one is a function taking two arguments: the previous result and element of the sequence. Result of this method is then fed the next function call as the first argument.
For example for numbers foldLeft, would just create a sum of all elements starting from left.
List(5, 4, 8, 6, 2).foldLeft(0) { (result, i) =>
result + i
} // 25
For our case, we start with an empty list. Then we provide function, which handles two cases using pattern matching.
Case when the key is "ptxt". In this case, we just prepend the value to list.
case (xs, ("ptxt", s)) => s :: xs
Case when the key is "list". Here we take the first string from the list (using pattern matching) and then concatenate value to it, after that we put it back with the rest of the list.
case (x :: xs, ("list", s)) => (x + s) :: xs
At the end since we were prepending element, we need to revert our list. Why we were prepending, not appending? Because append on the immutable list is O(n) and prepend is O(1), so it's more efficient.
Here another solution:
val data = Seq("ptxt"->"how","list"->"you doing","ptxt"->"whats", "list" -> "up","ptxt"-> "this ", "list"->"is cool")
First group Keys and Values:
val grouped = s.groupBy(_._1)
.map{case (k, l) => k -> l.map{case (_, v) => v.trim}}
// > Map(list -> List(you doing, up, is cool), ptxt -> List(how, whats, this))
Then zip and concatenate the two values:
grouped("ptxt").zip(grouped("list"))
.map{case (a, b) => s"$a $b"}
// > List(how you doing, whats up, this is cool)
Disclaimer: This only works if the there is always key, value, key, value,.. in the list - I had to adjust the input data.
If you change Seq for List, you can solve that with a simple tail-recursive function.
(The code uses Scala 2.13, but can be rewritten to use older Scala versions if needed)
def mergeByKey[K](list: List[(K, String)]): List[String] = {
#annotation.tailrec
def loop(remaining: List[(K, String)], acc: Map[K, StringBuilder]): List[String] =
remaining match {
case Nil =>
acc.valuesIterator.map(_.result()).toList
case (key, value) :: tail =>
loop(
remaining = tail,
acc.updatedWith(key) {
case None => Some(new StringBuilder(value))
case Some(oldValue) => Some(oldValue.append(value))
}
)
}
loop(remaining = list, acc = Map.empty)
}
val data = List("ptxt"->"how","list"->"you doing","ptxt"->"whats up","ptxt"-> "this ","list"->"is ","list"->"cool")
mergeByKey(data)
// res: List[String] = List("howwhats upthis ", "you doingis cool")
Or a one liner using groupMap.
(inspired on pme's answer)
data.groupMap(_._1)(_._2).view.mapValues(_.mkString).valuesIterator.toList
Adding another answer since I don't have enough reputation points for adding a comment. just an improvment on Krzysztof Atłasik's answer. to compensate for the case where the Seq starts with a "list" you might want to add another case as:
case (xs,("list", s)) if xs.isEmpty=>xs
So the final code could be something like:
val s = Seq("list"->"how ","list"->"you doing","ptxt"->"whats up","ptxt"-> "this ","list"->"is ","list"->"cool")
val r: Seq[String] = s.foldLeft(List[String]()) {
case (xs,("list", s)) if xs.isEmpty=>xs
case (xs, ("ptxt", s)) => s :: xs
case (x :: xs, ("list", s)) => (x + s) :: xs
}.reverse
I have a List with n as 3 ls as List(a,b,c,d,e) My query is to code for the (3,List(a,b,c,d,e)) I want to split them in to two parts such as List(a,b,c),List(d,e). For this the scala program is like below.
I don't understand val(pre,post). why it is used and what do we get from it? can someone please elaborate?
def splitRecursive[A](n: Int, ls: List[A]): (List[A], List[A]) = (n, ls) match {
case (_, Nil) => (Nil, Nil)
case (0, list) => (Nil, list)
case (n, h :: tail) => {
val (pre, post) = splitRecursive(n - 1, tail)
(h :: pre, post)
}
}
Your splitRecursive function returns a pair of lists. To get the two lists out of the pair, you can either fetch them like this:
val result = splitRecursive(n - 1, tail)
val pre = result._1
val post = result._2
Or you can use destructuring to get them without first having to bind the pair to result. That is what the syntax in splitRecursive is doing.
val (pre, post) = splitRecursive(n - 1, tail)
It is simply a convenient way to get the elements out of a pair (or some other structure that can be destructured).
I've been parsing a proprietary file format that has sections and each section has a number of records. The sections can be in any order and the records can be in any order. The order is not significant. While sections should not be duplicated, I can't guarantee that.
I've been using parboiled2 to generate the AST using a format like the following:
oneOrMore( Section1 | Section2 | Section3 )
Where every section generates a case class. They don't inherit from anything resulting in Seq[Any]
These section case classes also contain a Seq[T] of records specific to the section type.
I would like to transform the Seq[Any] into a
case class (section1:Seq[T1], section2:Seq[T2], section3:Seq[T3] )
Does someone have a clever and easy to read technique for that or should I make some mutable collections and use a foreach with a match?
I always feel like I am missing some Scala magic when I fall back to a foreach with vars.
EDIT 1:
It was brought up that I should extend a common base class, it is true that I could. But I don't see what that changes about the solution if I still have to use match to identify the type. I want to separate out the different case class types, for instance below I want to collect all the B's, C's, E's, and F's together into a Seq[B], Seq[C], Seq[E], and Seq[F]
class A()
case class B(v:Int) extends A
case class C(v:String) extends A
case class E(v:Int)
case class F(v:String)
val a:Seq[A] = B(1) :: C("2") :: Nil
val d:Seq[Any] = E(3) :: F("4") :: Nil
a.head match {
case B(v) => v should equal (1)
case _ => fail()
}
a.last match {
case C(v) => v should equal ("2")
case _ => fail()
}
d.head match {
case E(v) => v should equal (3)
case _ => fail()
}
d.last match {
case F(v) => v should equal ("4")
case _ => fail()
}
EDIT 2: Folding solution
case class E(v:Int)
case class F(v:String)
val d:Seq[Any] = E(3) :: F("4") :: Nil
val Ts = d.foldLeft((Seq[E](), Seq[F]()))(
(c,r) => r match {
case e:E => c.copy(_1=c._1 :+ e)
case e:F => c.copy(_2=c._2 :+ e)
}
)
Ts should equal ( (E(3) :: Nil, F("4") :: Nil) )
EDIT 3: Exhaustivity
sealed trait A //sealed is important
case class E(v:Int) extends A
case class F(v:String) extends A
val d:Seq[Any] = E(3) :: F("4") :: Nil
val Ts = d.foldLeft((Seq[E](), Seq[F]()))(
(c,r) => r match {
case e:E => c.copy(_1=c._1 :+ e)
case e:F => c.copy(_2=c._2 :+ e)
}
)
Ts should equal ( (E(3) :: Nil, F("4") :: Nil) )
While this could be done with shapeless to make a solution that is more terse (As Travis pointed out) I chose to go with a pure Scala solution based on Travis' feedback.
Here is an example of using foldLeft to manipulate a tuple housing strongly typed Seq[]. Unfortunately every type that is possible requires a case in the match which can become tedious if there are many types.
Also note, that if the base class is sealed, then the match will give an exhaustivity warning in the event a type was missed making this operation type safe.
sealed trait A //sealed is important
case class E(v:Int) extends A
case class F(v:String) extends A
val d:Seq[A] = E(3) :: F("4") :: Nil
val Ts = d.foldLeft((Seq[E](), Seq[F]()))(
(c,r) => r match {
case e:E => c.copy(_1=c._1 :+ e)
case e:F => c.copy(_2=c._2 :+ e)
}
)
Ts should equal ( (E(3) :: Nil, F("4") :: Nil) )
I am trying to write a recursive function in scala that takes in a list of Strings and returns a list with alternating elements from original list:
For example:
List a = {"a","b","c"}
List b = {"a","c"}
the head should always be included.
def removeAlt(list:List[String], str:String):List[String]=lst match{
case Nil=> List()
case => head::tail
if(head == true)
removeAlternating(list,head)
else
head::removeAlternating(list,head)
I get a stack overflow error.
I understand that the code is incorrect but I am trying to understand the logic on how to accomplish this with only recursion and no built in classes.
def remove[A](xs:List[A]):List[A] = xs match {
case Nil => Nil
case x::Nil => List(x)
case x::y::t => x :: remove(t)
}
if the list is empty, return a empty list.
If we're at the last element of the list, return that.
Otherwise, there must be two or more elements. Add to the first element the alternate elements of the rest of the list (and omit the second element)
Great exercise. This is what I came up with. It is not super optimized or anything:
def altList[T](rest: List[T], skip: Boolean): List[T] = {
rest match {
case Nil => Nil
case a :: tail if skip == false => a :: altList(tail, true)
case a :: tail if skip == true => altList(tail, false)
}
}
A bit shorter alternative:
def remove[A](xs:List[A]):List[A] = xs match {
case x::_::t => x :: remove(t)
case _ => xs
}
UPDATE
What is not so good with the above approach is eventual stack overflow for long lists, so I would suggest tail recursion:
import scala.annotation.tailrec
def remove[A](xs:List[A]):List[A] = {
#tailrec
def remove_r(xs:List[A], ys:List[A]):List[A] = xs match {
case x::_::t => remove_r(t, x::ys)
case _ => xs ++ ys
}
remove_r(xs, Nil).reverse
}