Fold left to create a immutable list - scala

I am trying to create a list of string and then concatenate them using mkstring
List("a", "b1")
.foldLeft(ListBuffer.empty[String]) { (a, l) =>
{
if (StringUtils.isNotBlank(l))
a += "abc" +l
else a
}
}
.mkString(";")
Output
abca;abcb1
I want to used a mutable list .
Solution tried
List("a", "b1").
foldLeft(List[String]())((b,a) => b:+"abc"+a).mkString(";")
I can perform the empty check.Can we refactor it to better to get rid of if and else
List("a", "b1","","c2").
foldLeft(List[String]())((b,a) =>
if (StringUtils.isNotBlank(a))
b:+"abc"+a
else b
).mkString(";")
Can anyone help

List("a", "b1").foldLeft("") { case (acc, el) => acc + el }
You're slightly misusing foldLeft. The key thing to remember is that you pass in a function that takes an accumulator and the "current element", as well as a seed value, and feeds in the result of the "current" step as the "seed" or accumulator for the next step.
Reading from the top:
Take my list of List("a", "b1")
Starting from the empty string "" as the accumulator
For every element in the list, call the function against the "current" value of the accumulator.
In the above case, concatenate the "current" element to the existing accumulator.
Pass the result to the next step as the seed value.
There's no += like in your example, as you're not mutating the value, instead the return of the "current step", will be the initial accumulator value for the next step, it's all immutable.
In effect:
- Step 0: acc = "", el = "a", so you get "" + "a" = "a"(this is the value of acc at the next stage)
- Step 1: acc = "a", el = "b1", so you get "a" + "b1" = "ab1"
It's also worth nothing that the empty string "" is a the zero element for string concatenation, so there's no value in checking for empty.
For your specific example:
List("a", "b1").foldLeft("") { case (acc, el) =>
if (el.isEmpty) acc else acc + "abc" + el
}
In your case, collect is probably better
l.collect {
case s if s.nonEmpty => "abc" + s
} mkString ";"

Related

How to add elements to an array in Scala, and find variable type?

I have the below block of Scala code as part of my data processing pipeline. Form what I understand so far, the UDF takes in one argument file_contents which is of type String type. The UDF then does a bunch of string processing, including a split.
The code works without any errors, but I'm trying to edit in the following way and struggling, mostly due to my inexperience with Scala, and the difficulty in finding answers online.
I want to be able 2 empty strings and 2 zeros to info based on the length of info. If the length of info is 28, then add these four values, else continue. How can I accomplish this in the below code? I want to add this code before val param_data.
I also have the following questions about this code if someone doesn't mind answering.
If the split converts the string to an Array, why am I not able to print the length of it using println(info)? This line instead seems to be printing a very large number which I believe is the summed length of all the strings.
How do you know what is being returned by this UDF? I don't see a return statement like in Python, etc.
def extract_FileContent_test = udf((file_contents: String) => {
val info = (file_contents.replace("\",\"", " ")
.replace("\"", "")
.replaceAll(" ", "|")
.replaceAll(" : \r\n", " : empty\r\n")
.replaceAll("\r\n", "|")
.replaceAll(" : ", "|")
.replaceAll(": ", "|")
.split("\\|")
.map(x => x.trim.replaceAll(" -", ""))
.filterNot(s => s == ""))
println(info.length)
// type info : Array[String]
// type sec_index : Array[Int]
val sec_index = info.zipWithIndex.filter(_._1.startsWith("---")).map(_._2)
if (sec_index.length > 2) {
// parse meta_data (beam tuning context) and param_data (beam tuning parameter) separately
val meta_data = (info.slice(0, sec_index(0)).toList.grouped(2)
.filter(l => l.length == 2)
.filter(l => l(1) != "Start" & l(1) != "")
.map { case List(a, b) => b }
.toArray.mkString(",")
)
// println(meta_data)
val param_data = (info.slice(sec_index(0) + 1, sec_index(1)).toList.grouped(3)
.filter(l => l.length == 3)
.filter { case List(a, b, c) => Try(c.split(" ")(0).toDouble).isSuccess }
.map { case List(a, b, c) => Array(a, c.split(" ")(0)).mkString(",") }
.toArray)
// println(param_data)
/* one meta data will have > 100 param
so besides meta columns, we add 2 columns for param_name, param_value
*/
param_data.map(meta_data + "," + _)
}
else {
Array[String]()
}
})
to get the length of info use info.length
in Scala the last statement is the return value - here it is the if (sec_index.length > 2) so it either returns an empty array of Strings or the params_data after the last map
regarding adding data to info you can do something like
val info_with_filler = if ( info.length<28) info ++ List("","" ,"0","0") else info
and then use info_with_filler later in the code instead of info

How can I emit periodic results over an iteration?

I might have something like this:
val found = source.toCharArray.foreach{ c =>
// Process char c
// Sometimes (e.g. on newline) I want to emit a result to be
// captured in 'found'. There may be 0 or more captured results.
}
This shows my intent. I want to iterate over some collection of things. Whenever the need arrises I want to "emit" a result to be captured in found. It's not a direct 1-for-1 like map. collect() is a "pull", applying a partial function over the collection. I want a "push" behavior, where I visit everything but push out something when needed.
Is there a pattern or collection method I'm missing that does this?
Apparently, you have a Collection[Thing], and you want to obtain a new Collection[Event] by emitting a Collection[Event] for each Thing. That is, you want a function
(Collection[Thing], Thing => Collection[Event]) => Collection[Event]
That's exactly what flatMap does.
You can write it down with nested fors where the second generator defines what "events" have to be "emitted" for each input from the source. For example:
val input = "a2ba4b"
val result = (for {
c <- input
emitted <- {
if (c == 'a') List('A')
else if (c.isDigit) List.fill(c.toString.toInt)('|')
else Nil
}
} yield emitted).mkString
println(result)
prints
A||A||||
because each 'a' emits an 'A', each digit emits the right amount of tally marks, and all other symbols are ignored.
There are several other ways to express the same thing, for example, the above expression could also be rewritten with an explicit flatMap and with a pattern match instead of if-else:
println(input.flatMap{
case 'a' => "A"
case d if d.isDigit => "|" * (d.toString.toInt)
case _ => ""
})
I think you are looking for a way to build a Stream for your condition. Streams are lazy and are computed only when required.
val sourceString = "sdfdsdsfssd\ndfgdfgd\nsdfsfsggdfg\ndsgsfgdfgdfg\nsdfsffdg\nersdff\n"
val sourceStream = sourceString.toCharArray.toStream
def foundStreamCreator( source: Stream[Char], emmitBoundaryFunction: Char => Boolean): Stream[String] = {
def loop(sourceStream: Stream[Char], collector: List[Char]): Stream[String] =
sourceStream.isEmpty match {
case true => collector.mkString.reverse #:: Stream.empty[String]
case false => {
val char = sourceStream.head
emmitBoundaryFunction(char) match {
case true =>
collector.mkString.reverse #:: loop(sourceStream.tail, List.empty[Char])
case false =>
loop(sourceStream.tail, char :: collector)
}
}
}
loop(source, List.empty[Char])
}
val foundStream = foundStreamCreator(sourceStream, c => c == '\n')
val foundIterator = foundStream.toIterator
foundIterator.next()
// res0: String = sdfdsdsfssd
foundIterator.next()
// res1: String = dfgdfgd
foundIterator.next()
// res2: String = sdfsfsggdfg
It looks like foldLeft to me:
val found = ((List.empty[String], "") /: source.toCharArray) {case ((agg, tmp), char) =>
if (char == '\n') (tmp :: agg, "") // <- emit
else (agg, tmp + char)
}._1
Where you keep collecting items in a temporary location and then emit it when you run into a character signifying something. Since I used List you'll have to reverse at the end if you want it in order.

Append auto-incrementing suffix to duplicated elements of a List

Given the following list :
val l = List("A", "A", "C", "C", "B", "C")
How can I add an auto-incrementing suffix to every elements so that I end up with a list containing no more duplicates, like the following (the ordering doesn't matter) :
List("A0", "A1", "C0", "C1", "C2", "B0")
I found it out by myself just after having written this question
val l = List("A", "A", "C", "C", "B", "C")
l.groupBy(identity) // Map(A->List(A,A),C->List(C,C,C),B->List(B))
.values.flatMap(_.zipWithIndex) // List((A,0),(A,1),(C,0),(C,1),(C,2),(B,0))
.map{ case (str, i) => s"$str$i"}
If there is a better solution (using foldLeft maybe) please let me know
In a single pass straightforward way :
def transformList(list : List[String]) : List[String] = {
val buf: mutable.Map[String, Int] = mutable.Map.empty
list.map {
x => {
val i = buf.getOrElseUpdate(x, 0)
val result = s"${x.toString}$i"
buf.put(x, i + 1)
result
}
}
}
transformList( List("A", "A", "C", "C", "B", "C"))
Perhaps not the most readable solution, but...
def appendCount(l: List[String]): List[String] = {
// Since we're doing zero-based counting, we need to use `getOrElse(e, -1) + 1`
// to indicate a first-time element count as 0.
val counts =
l.foldLeft(Map[String, Int]())((acc, e) =>
acc + (e -> (acc.getOrElse(e, -1) + 1))
)
val (appendedList, _) =
l.foldRight(List[String](), counts){ case (e, (li, m)) =>
// Prepend the element with its count to the accumulated list.
// Decrement that element's count within the map of element counts
(s"$e${m(e)}" :: li, m + (e -> (m(e) - 1)))
}
appendedList
}
The idea here is that you create a count of each element in the list. You then iterate from the back of the list of original values and append the count to the value while decrementing the count map.
You need to define a helper here because foldRight will require both the new List[String] and the counts as an accumulator (and, as such, will return both). You'll just ignore the counts at the end (they'll all be -1 anyway).
I'd say your way is probably more clear. You'll need to benchmark to see which is faster if that's a concern.
Ideone.

How to skip keys in map function on map in scala

Given a map of Map[String, String].
I want to know how to skip a key from map
val m = Map("1"-> "1", "2"-> "2")
m.map[(String, String), Map[String, String]].map{
case(k,v)=>
if (v == "1") {
// Q1: how to skip this key
// Do not need to return anything
} else {
// If the value is value that I want, apply some other transformation on it
(k, someOtherTransformation(v))
}
}
.collect is doing exactly what you want, it takes partial function, if function is not defined for some element (pair for Map), that element is dropped:
Map("1"-> "1", "2"-> "2").collect { case (k, v) if v != "1" => (k, v * 2) }
//> scala.collection.immutable.Map[String,String] = Map(2 -> 22)
Here partial function is defined for v != "1" (because of guard), hence element with v == "1" is dropped.
You could put a "guard" on your case clause ...
case (k,v) if v != "1" => // apply some transformation on it
case (k,v) => (k,v) // leave as is
... or simply leave the elements you're not interested in unchanged.
case (k,v) => if (v == "1") (k,v) else // apply some transformation on it
The output of map is a new collection the same size as the input collection with all/some/none of the elements modified.
Victor Moroz's answer is good for this case, but for cases where you can't make the decision on whether to skip immediately in the pattern match, use flatMap:
Map("1"-> "1", "2"-> "2").flatMap {
case (k,v) =>
val v1 = someComplexCalculation(k, v)
if (v1 < 0) {
None
} else {
// If the value is value that I want, apply some other transformation on it
Some((k, someOtherTransformation(v1)))
}
}
Why not .filterNot to remove all unwanted values(according to your condition) and then a .map?
Sample code:
Map("1"-> "1", "2" -> "2").filterNot(_._2 == "1").map(someFunction)
//someFunction -> whatever you would implement

Scala - Automatic Iterator inside pattern match

I have Array Data like this : [("Bob",5),("Andy",10),("Jim",7),...(x,y)].
How to do pattern matching in Scala? so they will match automatically based on Array Data that i have provided (instead of define "Case" one by one)
i mean dont like this, pseudocode :
val x = y.match {
case "Bob" => get and print Bob's Score
case "Andy" => get and print Andy's Score
..
}
but
val x = y.match {
case automatically defined by given Array => print each'score
}
Any Idea ? thanks in advance
If printing and storing results in an array is your main concern than the following will work well:
val ls = Array(("Bob",5),("Andy",10),("Jim",7))
ls.map({case (x,y) => println(y); y}) // print and store the score in an array
A bit confused about the question however if you just wish to print all the data in the array i would go about it doing this:
val list = Array(("Foo",3),("Tom",3))
list.foreach{
case (name,score) =>
println(s"$name scored $score")
}
//output:
//Foo scored 3
//Tom scored 3
Consider
val xs = Array( ("Bob",5),("Andy",10),("Jim",7) )
for ( (name,n) <- xs ) println(s"$name scores $n")
and also
xs.foreach { t => println(s"{t._1} scores ${t._2}") }
xs.foreach { t => println(t._1 + " scores " + t._2) }
xs.foreach(println)
A simple way to print the contents of xs,
println( xs.mkString(",") )
where mkString creates a string out of xs and separates each item by a comma.
Miscellany notes
To illustrate pattern matching on Scala Array, consider
val x = xs match {
case Array( t # ("Bob", _), _*) => println("xs starts with " + t._1)
case Array() => println("xs is empty")
case _ => println("xs does not start with Bob")
}
In the first case we extract the first tuple, and neglect the rest. In the first tuple we match against string "Bob" and neglect the second item. Moreover, we bind the first tuple to tag t, which is used in the printing where we refer to its first item.
The second case means every other case not covered.