How to get more functional Scala based code when manipulating string by counting its length? - scala

I am quite new to Scala and functional programming.
I wrote the simple codes as below, which manipulates the string by counting the word.
When the 4th comma-delimitted part is empty then, I concated only three columns, otherwise I concated all the columns including the values as code above.
But I think that it is not quite proper to the functional programming. Because I used the if statement to see the input value contains the value or not.
How to change it to the more scala-like code?
str = "aa,bb,1668268540040,34.0::aa,bb,1668268540040"
val parts = str.split("::")
for (case <- parts) {
val ret = case.map(c => if (c.value.isEmpty) {
c.columnFamily + "," + c.qualifier + "," + c.ts
} else {
c.columnFamily + "," + c.qualifier + "," + c.ts + "," + c.value
})
}
str = "aa,bb,1668268540040,34.0::aa,bb,166826434343"
val parts = str.split("::")
for (part <- parts) {
val elem = part.split(",", 4)
if (elem.length == 4) {
val Array(f, q, t, v) = elem
state.put(f + ":" + q, (v, t.toLong))
} else {
val Array(f, q, t) = elem
state.put(f + ":" + q, ("", t.toLong))
}
}

#LeviRamsey's comment tells you actually everything, but just to make your code more "scala-ish", you should avoid mutable data structures in the first place (what you're doing with state, which I think is a Map object), and use immutable data structures. About your if-else part, it's actually okay in FP, but in Scala, you can use pattern matching on a list, rather than manual length checking and using Arrays. Something like this:
parts.foldLeft(Map.empty[String, (String, Long)]) {
case (state, part) =>
part.split(",", 4).toList match {
case f :: q :: t :: v :: Nil =>
state.updated(f + ":" + q, (v, t.toLong))
case f :: q :: t :: Nil =>
state.updated(f + ":" + q, ("", t.toLong))
case _ => state // or whatever thing you want to do, in case neither 4 nor 3 elements are splitted
}
}

Related

How not to count space as a char using foldLeft

I have an exercise where I have to check how many DIFFERENT chars are used in a string.
I don't know how not to count space as a char. I wanted to put if inside the foldLeft, like:
str.foldLeft(lista)((acc, char) => if (char != ' ') char :: acc)
but then it says that required List[Char], found: Any.
What do you generally also think about my function? Could I do it faster/more effectivly?
def countChars(str: String): Int = {
val lista = List[Char]()
val letters = str.foldLeft(lista)((acc, char) => char :: acc)
val result = letters.toSet.size
return result
}
println(countChars("hello world"))
I am not sure if this is "faster or more efficiently", maybe a little bit, but certainly reads better and is more idiomatic :)
str.iterator.filterNot(_.isWhitespace).distinct.size
Or alternatively
str.iterator.distinct.count(!_.isWhitespace)
If you are set on using foldLeft then folding into Set would save you a few ticks compared to creating a list first, and then converting it into set:
str.foldLeft(Set.empty[Char]) {
case (s, ' ') => s
case (s, c) => s + c
}.size
Or you could get rid of the space afterwards:
(str.foldLeft(Set.empty[Char])(_ + _) - ' ').size
First of all you can use Set as accumulator from the start. Then you can use if-else and return current accumulator if current char should not be counted:
val set = Set[Char]()
val uniqueCount = str.foldLeft(set)((acc, char) => if (char != ' ') acc + char else acc)
.size
You can use pattern matching. Then you don't need the if...else.
def countChars(str: String): Int =
str.foldLeft(Set[Char]()){
case (acc, ' ') => acc
case (acc, c) => acc + c
}.size

Use foldLeft to replace occurrences of character in String

I'm trying to learn functional Scala and working on a simple problem - replace occurrences of \' or \\ contained in a String:
Here is my code so far:
val data : String = "\' this is a test \\ "
data.toCharArray.foldLeft(""){ (x, y) => x match {
case Nil => y :: Nil
case head :: tail =>
if head == '\'' ''
else if head == '\\' ''
else head :: tail
}
There are multiple errors:
I've not understood something fundamental with fold?
Simple examples of foldLeft such as:
val sum = prices.foldLeft(0.0)(_ + _)
are understandable but I'm unsure how to use foldLeft in a context where there is conditions. In the problem I posted the condition being matching on a character.
There are several issues here, starting with some syntactic problems, like missing parentheses around the conditionals. The first real substantive issue is that the initial value (the "" in foldLeft("")) must be the same type as the accumulator, and as the return type. You seem to want a List[Char] as the return type, so you'll need to use something like List.empty[Char] as the initial value.
Next I'd strongly recommend using names like acc and c instead of x and y to indicate more clearly which is the accumulator and which is the current value.
Another issue is that '' also isn't valid Scala syntax—there is no empty character literal. I'll use '_' as the replacement just for the sake of example.
A working implementation might look like this:
val data: String = "\' this is a test \\ "
data.toCharArray.foldLeft(List.empty[Char]) { (acc, c) =>
c match {
case '\'' => acc :+ '_'
case '\\' => acc :+ '_'
case other => acc :+ other
}
}
Which yields:
val data: String = "' this is a test \ "
val res1: List[Char] = List(_, , t, h, i, s, , i, s, , a, , t, e, s, t, , _, )
Which I think is what you're aiming for?
As a footnote, I'm assuming this is just an exercise, but it's worth noting that using a left fold for an operation like this is extremely inefficient, since you're building up a list by appending.
There are several errors in this code:
you haven't closed lambda's bracket
you use List pattern matching on... well string because
x here is result so far (so "" initially) and y are elements of data (chars)
This code should look like this:
val data : String = "\' this is a test \\ "
data.toCharArray.foldLeft("") { (result, ch) =>
if (ch == '\'' || ch == '\\') result
else result + ch
}

flatMap in scala, the compiler says it's wrong

I have a file which contains lines which contain items separated by ","
for example:
2 1,3
3 2,5,7
5 4
Now I want to flatMap this file to such rdd:
2 1
2 3
3 2
3 5
5 7
5 4
I wonder how to realize this function in scala:
val pairs = lines.flatMap { line =>
val a = line.split(" ")(0)
val partb = line.split(" ")(1)
for (b <- partb.split(",")) {
yield a + " " + b
}
}
Is this correct?
Thank you for clarifying your code example. In your case, the only problem is the location of your yield keyword. Move it to before the curly braces, like so:
for (b <- partb.split(",")) yield {
a + " " + b
}
You need to do yield THEN the return logic
yield {a}
The way you are doing it now is a for loop, not a for comprehension, which will yell about the yield keyword, and even if not it would return a Unit
val pairs = lines.flatMap { line =>
for (a <- line.split(",")) yield {
a
}
}
In addition to the relocation of yield for delivering a collection, as already exposed, consider this possible refactoring where we extract the first two entries from split,
val pairs = lines.flatMap { line =>
val Array(a, partb, _*) = line.split(" ")
for (b <- partb.split(","))
yield a + " " + b
}
and yet more concise is
val pairs = lines.flatMap { line =>
val Array(a,tail) = line.split(" |,", 2)
for (t <- tail) yield s"$a $t"
}
where we split by either " " or "," and extract the head and the tail, then we apply string interpolation to produce the desired result.

Fold left and fold right

I am trying to learn how to use fold left and fold right. This is my first time learning functional programming. I am having trouble understanding when to use fold left and when to use fold right. It seems to me that a lot of the time the two functions are interchangeable. For example (in Scala)the two functions:
val nums = List(1, 2, 3, 4, 5)
val sum1 = nums.foldLeft(0) { (total, n) =>
total + n
}
val sum2 = nums.foldRight(0) {(total, n) =>
total + n
}
both yield the same result. Why and when would I choose one or the other?
foldleft and foldright differ in the way the function is nested.
foldleft: (((...) + a) + a) + a
foldright: a + (a + (a + (...)))
Since the function you are using is addition, both of them give the same result. Try using subtraction.
Moreover, the motivation to use fold(left/right) is not the result - in most of the cases, both yield the same result. It depends on which you you want your function to be aggregated.
Since the operator you are using is associated & commutative operator means a + b = b + a that's why leftFold and rightFold worked equivalent but it's not the equivalent in general as you can visualised by below examples where operator(+) is not associative & commutative operation i.e in case of string concatenation '+' operator is not associative & commutative means 'a' + 'b' != 'b' + 'a'
val listString = List("a", "b", "c") // : List[String] = List(a,b,c)
val leftFoldValue = listString.foldLeft("z")((el, acc) => el + acc) // : String = zabc
val rightFoldValue = listString.foldRight("z")((el, acc) => el + acc) // : abcz
OR in shorthand ways
val leftFoldValue = listString.foldLeft("z")(_ + _) // : String = zabc
val rightFoldValue = listString.foldRight("z")(_ + _) // : String = abcz
Explanation:
leftFold is worked as ( ( ('z' + 'a') + 'b') + 'c') = ( ('za' + 'b') + 'c') = ('zab' + 'c') = 'zabc'
and rightFold as ('a' + ('b' + ('c' + 'z'))) = ('a' + ('b' + 'cz')) = ('a' + 'bcz') = 'abcz'
So in short for operators that are associative and commutative, foldLeft and
foldRight are equivalent (even though there may be a difference in
efficiency).
But sometimes, only one of the two operators is appropriate.

Why do I have to explicitly state Tuple2(a, b) to be able to use Map add in a foldLeft?

I wish to create a Map keyed by name containing the count of things with that name. I have a list of the things with name, which may contain more than one item with the same name. Coded like this I get an error "type mismatch; found : String required: (String, Int)":
//variation 0, produces error
(Map[String, Int]() /: entries)((r, c) => { r + (c.name, if (r.contains(c.name)) (c.name) + 1 else 1) })
This confuses me as I though (a, b) was a Tuple2 and therefore suitable for use with Map add. Either of the following variations works as expected:
//variation 1, works
(Map[String, Int]() /: entries)((r, c) => { r + Tuple2(c.name, if (r.contains(c.name)) (c.name) + 1 else 1) })
//variation 2, works
(Map[String, Int]() /: entries)((r, c) => {
val e = (c.name, if (r.contains(c.name)) (c.name) + 1 else 1) })
r + e
I'm unclear on why there is a problem with my first version; can anyone advise. I am using Scala-IDE 2.0.0 beta 2 to edit the source; the error is from the Eclipse Problems window.
When passing a single tuple argument to a method used with operator notation, like your + method, you should use double parentheses:
(Map[String, Int]() /: entries)((r, c) => { r + ((c.name, r.get(c.name).map(_ + 1).getOrElse(1) )) })
I've also changed the computation of the Int, which looks funny in your example…
Because + is used to concatenate strings stuff with strings. In this case, parenthesis are not being taken to mean a tuple, but to mean a parameter.
Scala has used + for other stuff, which resulted in all sorts of problems, just like the one you mention.
Replace + with updated, or use -> instead of ,.
r + (c.name, if (r.contains(c.name)) (c.name) + 1 else 1)
is parsed as
r.+(c.name, if (r.contains(c.name)) (c.name) + 1 else 1)
So the compiler looks for a + method with 2 arguments on Map and doesn't find it. The form I prefer over double parentheses (as Jean-Philippe Pellet suggests) is
r + (c.name -> if (r.contains(c.name)) (c.name) + 1 else 1)
UPDATE:
if Pellet is correct, it's better to write
r + (c.name -> r.getOrElse(c.name, 0) + 1)
(and of course James Iry's solution expresses the same intent even better).