I'm looking to do the simple task of counting words in a String. The easiest way I've found is to use a Map to keep track of word frequencies. Previously with Haskell, I used its Map's function insertWith, which takes a function that resolves key collisions, along with the key and value pair. I can't find anything similar in Scala's library though; only an add function (+), which presumably overwrites the previous value when re-inserting a key. For my purposes though, instead of overwriting the previous value, I want to add 1 to it to increase its count.
Obviously I could write a function to check if a key already exists, fetch its value, add 1 to it, and re-insert it, but it seems odd that a function like this isn't included. Am I missing something? What would be the Scala way of doing this?
Use a map with default value and then update with +=
import scala.collection.mutable
val count = mutable.Map[String, Int]().withDefaultValue(0)
count("abc") += 1
println(count("abc"))
If it's a string then why not use the split module
import Data.List.Split
let mywords = "he is a good good boy"
length $ nub $ splitOn " " mywords
5
If you want to stick with Scala's immutable style, you could create your own class with immutable semantics:
class CountMap protected(val counts: Map[String, Int]){
def +(str: String) = new CountMap(counts + (str -> (counts(str) + 1)))
def apply(str: String) = counts(str)
}
object CountMap {
def apply(counts: Map[String, Int] = Map[String, Int]()) = new CountMap(counts.withDefaultValue(0))
}
And then you can use it:
val added = CountMap() + "hello" + "hello" + "world" + "foo" + "bar"
added("hello")
>>2
added("qux")
>>0
You might also add apply overloads on the companion object so that you can directly input a sequence of words, or even a sentence:
object CountMap {
def apply(counts: Map[String, Int] = Map[String, Int]()): CountMap = new CountMap(counts.withDefaultValue(0))
def apply(words: Seq[String]): CountMap = CountMap(words.groupBy(w => w).map { case(word, group) => word -> group.length })
def apply(sentence: String): CountMap = CountMap(sentence.split(" "))
}
And then the you can even more easily:
CountMap(Seq("hello", "hello", "world", "world", "foo", "bar"))
Or:
CountMap("hello hello world world foo bar")
Related
I want to build a map however I want to discard all keys with empty values as shown below:
#tailrec
def safeFiltersMap(
map: Map[String, String],
accumulator: Map[String,String] = Map.empty): Map[String, String] = {
if(map.isEmpty) return accumulator
val curr = map.head
val (key, value) = curr
safeFiltersMap(
map.tail,
if(value.nonEmpty) accumulator + (key->value)
else accumulator
)
}
Now this is fine however I need to use it like this:
val safeMap = safeFiltersMap(Map("a"->"b","c"->"d"))
whereas I want to use it like the way we instantiate a map:
val safeMap = safeFiltersMap("a"->"b","c"->"d")
What syntax can I follow to achieve this?
The -> syntax isn't a special syntax in Scala. It's actually just a fancy way of constructing a 2-tuple. So you can write your own functions that take 2-tuples as well. You don't need to define a new Map type. You just need a function that filters the existing one.
def safeFiltersMap(args: (String, String)*): Map[String, String] =
Map(args: _*).filter {
result => {
val (_, value) = result
value.nonEmpty
}
}
Then call using
safeFiltersMap("a"->"b","c"->"d")
I want to evaluate a function passed as a variable string in scala (sorry but i'm new to scala )
def concate(a:String,b:String): String ={
a+" "+b
}
var func="concate" //i'll get this function name from config as string
I want to perform something like
eval(func("hello","world)) //Like in Python
so output will be like
hello world
Eventually I want to execute few in built functions on a string coming from my config and I don't want to hard code the function names in the code.
EDIT
To Be More clear with my exact usecase
I have a Config file which has multiple functions defined in it that are Spark inbuilt functions on Data frame
application.conf looks like
transformations = [
{
"table" : "users",
"function" : "from_unixtime",
"column" : "epoch"
},
{
"table" : "users",
"function" : "yearofweek",
"column" : "epoch"
}
]
Now functions yearofweek and from_unixtime are Spark inbuilt functions now I want to eval my Dataframe by the functions defined in config. #all the functions are applied to a column defined.
the Obvious way is to write an if else and do string comparison calling a particular inbuilt function but that is way to much..
i am looking for a better solution.
This is indeed possible in scala, as scala is JSR 223 compliant scripting language. Here is an example (running with scala 2.11.8). Note that you need to import your method because otherwise the interpreter will not find it:
package my.example
object EvalDemo {
// evalutates scala code and returns the result as T
def evalAs[T](code: String) = {
import scala.reflect.runtime.currentMirror
import scala.tools.reflect.ToolBox
val toolbox = currentMirror.mkToolBox()
import toolbox.{eval, parse}
eval(parse(code)).asInstanceOf[T]
}
def concate(a: String, b: String): String = a + " " + b
def main(args: Array[String]): Unit = {
var func = "concate" //i'll get this function name from config as string
val code =
s"""
|import my.example.EvalDemo._
|${func}("hello","world")
|""".stripMargin
val result: String = evalAs[String](code)
println(result) // "hello world"
}
}
Have Function to name mapping in the code
def foo(str: String) = str + ", foo"
def bar(str: String) = str + ", bar"
val fmap = Map("foo" -> foo _, "bar" -> bar _)
fmap("foo")("hello")
now based on the function name we get from the config, pass the name to the map and lookup the corresponding function and invoke the arguments on it.
Scala repl
scala> :paste
// Entering paste mode (ctrl-D to finish)
def foo(str: String) = str + ", foo"
def bar(str: String) = str + ", bar"
val fmap = Map("foo" -> foo _, "bar" -> bar _)
fmap("foo")("hello")
// Exiting paste mode, now interpreting.
foo: (str: String)String
bar: (str: String)String
fmap: scala.collection.immutable.Map[String,String => String] = Map(foo -> $$Lambda$1104/1335082762#778a1250, bar -> $$Lambda$1105/841090268#55acec99)
res0: String = hello, foo
Spark offers you a way to write your transformations or queries using SQL. So, you really don't have to worry about Scala functions, casting and evaluation in this case. You just have to parse your config to generate the SQL query.
Let's say you have registered a table users with Spark and want to do a select and transform based on provided config,
// your generated query will look like this,
val query = "SELECT from_unixtime(epoch) as time, weekofyear(epoch) FROM users"
val result = spark.sql(query)
So, all you need to do is - build that query from your config.
I have tuple separated by a coma that looks like this:
("TRN_KEY", "88.330000;1;2")
I would like to add some more info to the second position.
For example:
I would like to add ;99;99 to the 88.330000;1;2 so that at the end it would look like:
(TRN_KEY, 88.330000;1;2;99;99)
One way is to de-compose your tuple and concat the additional string to the second element:
object MyObject {
val (first, second) = ("TRN_KEY","88.330000;1;2")
(first, second + ";3;4"))
}
Which yields:
res0: (String, String) = (TRN_KEY,88.330000;1;2;3;4)
Another way to go is copy to tuple with the new value using Tuple2.copy, as tuples are immutable by design.
You can not modify the data in place as Tuple2 is immutable.
An option would be to have a var and then use the copy method.
In Scala due to structural sharing this is a rather cheap and fast operation.
scala> var tup = ("TRN_KEY","88.330000;1;2")
tup: (String, String) = (TRN_KEY,88.330000;1;2)
scala> tup = tup.copy(_2 = tup._2 + "data")
tup: (String, String) = (TRN_KEY,88.330000;1;2data)
Here is a simple function that gets the job done. It takes a tuple and appends a string to the second element of the tuple.
def appendTup(tup:(String, String))(append:String):(String,String) = {
(tup._1, tup._2 + append)
}
Here is some code using it
val tup = ("TRN_KEY", "88.330000;1;2")
val tup2 = appendTup(tup)(";99;99")
println(tup2)
Here is my output
(TRN_KEY,88.330000;1;2;99;99)
If you really want to make it mutable you could use a case class such as:
case class newTup(col1: String, var col2: String)
val rec1 = newTup("TRN_KEY", "88.330000;1;2")
rec1.col2 = rec1.col2 + ";99;99"
rec1
res3: newTup = newTup(TRN_KEY,88.330000;1;2;99;99)
But, as mentioned above, it would be better to use .copy
I want to search the hashmap after a key and if the key is found give me the last value of the found value of the key. Here is my solution so far:
import scala.collection.mutable.HashMap
object Tmp extends Application {
val hashmap = new HashMap[String, String]
hashmap += "a" -> "288 | object | L"
def findNameInSymboltable(name: String) = {
if (hashmap.get(name) == None)
"N"
else
hashmap.get(name).flatten.last.toString
}
val solution: String = findNameInSymboltable("a")
println(solution) // L
}
Is there maybe a functional style of it which save me the overhead of locs?
Couldn't quite get your example to work. But maybe something like this would do the job?
hashmap.getOrElse("a", "N").split(" | ").last
The "getOrElse" will at least save you the if/else check.
In case your "N" is intended for display and not for computation, you can drag ouround the fact that there is no such "a" in a None until display:
val solution = // this is an Option[String], inferred
hashmap.get("a"). // if None the map is not done
map(_.split(" | ").last) // returns an Option, perhaps None
Which can alse be written:
val solution = // this is an Option[String], inferred
for(x <- hashmap.get("a"))
yield {
x.split(" | ").last
}
And finally:
println(solution.getOrElse("N"))
Here is the sample program I am working on to read a file with list of values one per line. I have to add all these values converting to double and also need to sort the values. Here is what I came up so far and it is working fine.
import scala.io.Source
object Expense{
def main(args: Array[String]): Unit = {
val lines = Source.fromFile("c://exp.txt").getLines()
val sum: Double = lines.foldLeft(0.0)((i, s) => i + s.replaceAll(",","").toDouble)
println("Total => " + sum)
println((Source.fromFile("c://exp.txt").getLines() map (_.replaceAll(",", "").toDouble)).toList.sorted)
}
}
The question here is, as you can see I am reading the file twice and I want to avoid it. As the Source.fromFile("c://exp.txt").getLines() gives you an iterator, I can loop through it only once and next operation it will be null, so I can't reuse the lines again for sorting and I need to read from file again. Also I don't want to store them into a temporary list. Is there any elegant way of doing this in a functional way?
Convert it to a List, so you can reuse it:
val lines = Source.fromFile("c://exp.txt").getLines().toList
As per my comment, convert it to a list, and rewrite as follows
import scala.io.Source
object Expense{
def main(args: Array[String]): Unit = {
val lines = Source.fromFile("c://exp.txt").getLines() map (_.replaceAll(",","").toDouble)
val sum: Double = lines.foldLeft(0.0)((i, s) => i + s)
println("Total => " + sum)
println(lines.toList.sorted)
}
}