I have this function in groovy
def tokens = ['Will', 'is', 'coding', 'in', 'groovy']
String sentence = tokens.inject({sent, word -> sent + ' ' + word})
println sentence
with this output:
"Will is coding in groovy"
Inject is to groovy as fold is to scala. If you do not set the accumulator value in inject, it defaults to the first item in the list. How can I do that in Scala?
val tokens = List("Will", "is", "coding", "in", "Scala")
val sentence = tokens.foldLeft(""){(sent, word) => sent + " " + word}
println(sentence)
produces this output (preceding space before the sentence):
" Will is coding in Scala"
I get why it happens, but I'm not sure how I would eliminate it while still folding in this similar manner. Anyway to do this in Scala?
The proper equivalent of inject is reduce (actually I just noticed that you passed an initial value to inject, so this isn't entirely accurate—I'm not enough of a Groovy person to say why the Groovy version works without knowing what the value of sent is):
scala> val sentence = tokens.reduce { (sent, word) => sent + " " + word }
sentence: String = Will is coding in Scala
Note that this will blow up if tokens is empty. reduceOption is safer if there's any chance of that—it'll return a None if the collection is empty and a Some[Whatever] otherwise.
mkString exists exactly for this use case.
scala> val tokens = List("Will", "is", "coding", "in", "Scala")
tokens: List[String] = List("Will", "is", "coding", "in", "Scala")
scala> tokens.mkString(" ")
res1: String = "Will is coding in Scala"
Related
I'm supposed to ignore all the ",", ".", "-", and " " " from the string which is in List.
The list is like this: e.g.: List("This is an exercise, which I have problem with", "and I don't know, how to do it., "text-,.")
What I've just tried is map, but it doesn't want to compile. I also wanted to use replace, but decided not to do it, because I would be supposed to make replace for each character I want to ignore, so e.g. replace(",", "").replace(".", "") etc, wouldn't I?
Maybe is there a method where I can put all the characters I want to ignore together?
My code:
val lines = io.Source.fromResource("ogniem-i-mieczem.txt").getLines.toList
println(lines.map{
case "," => ""
case "." => ""
case "-" => ""
case "''" => ""
})
A simple regex and replaceAllIn() should do it.
val inLst =
List("This is an exercise, which I have problem with"
, "and I don't know, how to do it."
, "text-,.")
inLst.map("[-,.\"]".r.replaceAllIn(_, ""))
//res0: List[String] =
// List(This is an exercise which I have problem with
// , and I don't know how to do it
// , text)
You could apply a regex to that and replace them all at once. something like this:
import scala.util.matching.Regex
val regex = "\\.*,*-*\"*".r
val sampleText = "Hi there. This comma, should be gone. and dots and quotes \"as well."
val result = regex.replaceAllIn(sampleText, "")
println(s"result: $result")
// result: Hi there This comma should be gone and dots and quotes as well
applied to your sample code, it could look like this:
import scala.util.matching.Regex
val regex = "\\.*,*-*\"*".r
val result = io.Source
.fromResource("ogniem-i-mieczem.txt")
.getLines
.toList
.map { line => regex.replaceAllIn(line, "") }
println(s"Result: $result")
Other answers based on regex are the way to go but just as learning exercise note that we can conceptualise strings as sequence of characters which means we can treat them as collections so the usual suspects map/filter etc. could also work
lines map { _.filterNot { exclusionList.contains } }
where
val exclusionList = Set(',', '.', '-', '"')
I'm pretty new to scala (and programming in general), but have come up with what feels like a less than perfect solution to an issue I have - I was wondering if anyone has a more elegant/efficient one?
I have a (very large) set of strings, a small example of which is below for replication purposes:
val brands = Set("Brand one", "BRAND-two!!!", "brand_Three1, brandThree2", "brand04")
Now what I want to do is clean up this set so that I have a new clean set where:
any strings separated by commas are split into separate strings
leading spaces and non-alphanumeric (and _ -) characters are
removed
any string with a space is replaced by three version of
that string (one with no space, one with "-" instead of a space, and one and one with "_")
The code I have so far does this, but it does it in two steps, thus iterating over the list twice (which is inefficient):
val brands_clean = brands.flatMap(
_.toLowerCase.split(",").map(
_.trim.replaceAll("[^A-Za-z0-9\\-\\_\\s]+", "")
)
)
def spaceVariations(v: String) = if (v.contains(" ")) Set(v.replaceAll(" ", "-"), v.replaceAll(" ", "_"), v.replaceAll(" ", "")) else Set(v)
val brands_final = brands_clean.flatMap(spaceVariations(_))
I have tried incorporating the spaceVariations function directly into the main code by appending to the replaceAll a map or flatMap:
// using the function call
.flatMap(spaceVariations(_))
// or using a function directly within the code
.flatMap {v => if (v.contains(" ")) Set(v.replaceAll(" ", "-"), v.replaceAll(" ", "_"), v.replaceAll(" ", "")) else Set(v) }
but I get the following error:
error: type mismatch;
found : Array[Nothing]
required: scala.collection.GenTraversableOnce[?]
I'm not sure I understand why this doesn't work here, or if there is a better way to achieve what I am trying to achieve?
brands.flatMap(
_.toLowerCase.split(",").map(
_.trim.replaceAll("""[^\w-]""", "")
)
).flatMap(spaceVariations)
Works for me, not sure where (or why) you are getting the error (I cleaned up your regex a little bit to make it more concise, but that shouldn't matter).
Note, that this still traverses the set twice though. Sets are not lazy in scala, so, it will complete the first flatMap with an intermediate set first, and then start with the next one.
If you want to save a scan, you should starts with an Iterator rather than a set, because iterators are lazy, and will send each element through the whole chain before moving on to the next one:
brands
.iterator
.flatMap { _.toLowerCase.split(",") }
.map(_.trim)
.map { _.replaceAll("""[^\w-]""", "") }
.flatMap(spaceVariations)
.toSet
Based on the assumption that your Set will always look like this:
def spaceVariations(v: String) = if (v.contains(" ")) Set(v.replaceAll(" ", "-"), v.replaceAll(" ", "_"), v.replaceAll(" ", "")) else Set(v)
val brands = Set("Brand one", "BRAND-two!!!", "brand_Three1, brandThree2", "brand04")
brands.map( x => if (x.contains(",") ) x.split(",") else x ).flatMap {
case str: String => Array(str)
case a : Array[String] => a
}.map(_.trim.toLowerCase.replaceAll("[^A-Za-z0-9\\-\\_\\s]+", "")).map(spaceVariations(_))
Gives the output :
Set(Set(brand-one, brand_one, brandone), Set(brandthree2), Set(brand04), Set(brand-two), Set(brand_three1))
I'm looking to do the simple task of counting words in a String. The easiest way I've found is to use a Map to keep track of word frequencies. Previously with Haskell, I used its Map's function insertWith, which takes a function that resolves key collisions, along with the key and value pair. I can't find anything similar in Scala's library though; only an add function (+), which presumably overwrites the previous value when re-inserting a key. For my purposes though, instead of overwriting the previous value, I want to add 1 to it to increase its count.
Obviously I could write a function to check if a key already exists, fetch its value, add 1 to it, and re-insert it, but it seems odd that a function like this isn't included. Am I missing something? What would be the Scala way of doing this?
Use a map with default value and then update with +=
import scala.collection.mutable
val count = mutable.Map[String, Int]().withDefaultValue(0)
count("abc") += 1
println(count("abc"))
If it's a string then why not use the split module
import Data.List.Split
let mywords = "he is a good good boy"
length $ nub $ splitOn " " mywords
5
If you want to stick with Scala's immutable style, you could create your own class with immutable semantics:
class CountMap protected(val counts: Map[String, Int]){
def +(str: String) = new CountMap(counts + (str -> (counts(str) + 1)))
def apply(str: String) = counts(str)
}
object CountMap {
def apply(counts: Map[String, Int] = Map[String, Int]()) = new CountMap(counts.withDefaultValue(0))
}
And then you can use it:
val added = CountMap() + "hello" + "hello" + "world" + "foo" + "bar"
added("hello")
>>2
added("qux")
>>0
You might also add apply overloads on the companion object so that you can directly input a sequence of words, or even a sentence:
object CountMap {
def apply(counts: Map[String, Int] = Map[String, Int]()): CountMap = new CountMap(counts.withDefaultValue(0))
def apply(words: Seq[String]): CountMap = CountMap(words.groupBy(w => w).map { case(word, group) => word -> group.length })
def apply(sentence: String): CountMap = CountMap(sentence.split(" "))
}
And then the you can even more easily:
CountMap(Seq("hello", "hello", "world", "world", "foo", "bar"))
Or:
CountMap("hello hello world world foo bar")
How I can build a string (not to print it) from the returned value of a foreach or for statement applied in a List?
Let say, I have this:
val names = List("Bob", "Fred", "Joe", "Julia", "Kim")
val x: String = for (name <- names) name //I don't need this println(name)
These name returned string I am trying to put in a String, connecting them by a space.?!
Use mkString(sep: String) function:
val names = List("Bob", "Fred", "Joe", "Julia", "Kim")
val x = names.mkString(" ") // "Bob Fred Joe Julia Kim"
for and foreach return Unit. They are only for side effects like printing. mkString is your best bet here, as Jean said. There's a second version that lets you pass in opening and closing strings, e.g.,
scala> names.mkString("[", ", ", "]")
res0: String = [Bob, Fred, Joe, Julia, Kim]
If you need more flexibility, reduceLeft might be what you want. Here's what mkString is doing, more or less:
"[" + names.reduceLeft((str, name) => str + ", " + name) + "]"
One difference, though. reduceLeft throws an exception for an empty container, while mkString just returns "". foldLeft works for empty collections, but getting the delimiters right is tricky.
I'm using Stanford NLP to split text into sentences, but it ignores contraction.
So this is an example of a resulting sentence that I have:
List(I, 'd, like, to, fix, this, sentence, because, it, 's, broken)
My goal is to concatenate contracted words so that the result would look like this:
List(I'd, like, to, fix, this, sentence, because, it's, broken)
Is there an elegant way of doing this in scala? Basically I'm looking for an expression that iterates through the list checking an element with the next one, concatenating if the condition is met and returning a result list as per my example.
scala> val l = List("I", "'d", "like", "to fix", "this", "sentence", "because", "it", "'s", "broken")
l: List[String] = List(I, 'd, like, to fix, this, sentence, because, it, 's, broken)
scala> l.reduceRight({(s1,s2) => if (s2.startsWith("'")) s1+s2 else s1+" "+s2})
.split(" ").toList
res2: List[String] = List(I'd, like, to, fix, this, sentence, because, it's, broken)
Note that this will raise an exception if the list is empty (due to the use of reduceRight).
You may want to use foldRight or reduceRightOption if this can happen.
An approach that extends accepted answer, for tackling cases such as ca, n't,
implicit class StanfordNLPConcat(val words: List[String]) extends AnyVal {
def SNLPConcat() = {
val sep = "#"
words.reduce{ (a,v) => if (v.contains("'")) a+v else a+sep+v }.split(sep).toList
}
}
Let
val words = List("I", "'d", "like", "to", "fix", "this", "sentence", "because", "it", "'s", "broken")
and so
words.SNLPConcat()
res: List[String] = List(I'd, like, to, fix, this, sentence, because, it's, broken)
Further,
List("It", "ca", "n't", "be", "wrong").SNLPConcat()
res: List[String] = List(It, can't, be, wrong)
val broken = List("I", "'d", "like", "to", "fix", "this", "sentence", "because", "it", "'s", "broken")
broken.foldLeft(List.empty[String]) { (list, str) =>
if (str.startsWith("'")) {
list.init :+ (list.last + str)
} else {
list :+ str
}
}
(I assumed the "to fix" element in your question was intended to be two elements and the comma was mistakenly omitted)