What is a more Scala way of writing this code? - scala

I want to strip off the word "America/" from the start of each item in the list, and the code below does just that, but I feel like it can be done in a significantly better way.
var tz = java.util.TimeZone.getAvailableIDs
for(i <- 0 until tz.length) {
if(tz(i).startsWith("America/")) {
tz(i) = tz(i).replaceFirst("America/", "")
}
}

Simple and straight forward:
val tz = java.util.TimeZone.getAvailableIDs.map(_.replaceFirst("^America/", ""))

very similar to #Noah's answer, but using a for-yield iteration (so that you can add other filters with no more usage of parentheses).
import java.util.TimeZone
val tz = for(t <- TimeZone.getAvailableIDs) yield t.replaceFirst("^America/", "")

I will use regex for it:
val pattern = "^America/".r
tz = tz.map(pattern.replaceFirstIn(_, ""))
wonder if it is an effcient way.

Map is preferred to for loops in functional programming, so instead of changing the list in place with a for loop, passing the data around by mapping is more pure and (IMO) prettier.

this can work:
val tzs = java.util.TimeZone.getAvailableIDs map { tz =>
if(tz.startsWith("America/")) tz.replaceFirst("America/","")
else tz
}

If you only want the American time zones, you could do this:
val americanZones = {
val pattern = "^America/(.*)".r
( java.util.TimeZone.getAvailableIDs
flatMap pattern.findFirstMatchIn
map (_ group 1) )
}

Added an 'if' to the for/yield
val zones = java.util.TimeZone.getAvailableIDs
val formatted_zones = for(i <- 0 until zones.length if zones(i).startsWith("America/")) yield {
zones(i).replaceFirst("America/", "")
}

No regex:
val tz = java.util.TimeZone.getAvailableIDs.map(_ stripPrefix "America/")

Related

Cleanest way to check if a string value is a negative number?

We could make a method:
def isNegNumber(s : String) : Boolean =
(s.head == '-') && s.substring(1).forall(_.isDigit)
Is there a cleaner way to do this?
You can use Try and Option to do this in a safer way. This will prevent an error if the string is not a number at all
import scala.util.Try
val s = "-10"
val t = Try(s.toDouble).toOption
val result = t.fold(false)(_ < 0)
Or even better, based on Luis' comment, starting Scala 2.13 (simpler and more efficient):
val t = s.toDoubleOption
val result = t.fold(false)(_ < 0)
Use a regex pattern.
"-10" matches "-\\d+" //true
It can easily be adjusted to account for whitespace.
"\t- 7\n" matches raw"\s*-\s*\d+\s*" //true
Another possible solution could be:
def isNegativeNumber(s: String) = s.toDoubleOption.exists(_ < 0)

Can I have a condition inside of a where or a filter?

I have a dataframe with many columns, and to explain the situation, let's say, there is a column with letters in it from a-z. I also have a list, which includes some specific letters.
val testerList = List("a","k")
The dataframe has to be filtered, to only include entries with the specified letters in the list. This is very straightforward:
val resultDF = df.where($"column".isin(testerList:_*)))
So the problem is, that the list is given to this function as a parameter, and it can be an empty list, which situation could be solved like this (resultDF is defined here as an empty dataframe):
if (!(testerList.isEmpty)) {
resultDF = df.where(some other stuff has to be filtered away)
.where($"column".isin(testerList:_*)))
} else {
resultDF = df.where(some other stuff has to be filtered away)
}
Is there a way to make this in a more simple way, something like this:
val resultDF = df.where(some other stuff has to be filtered away)
.where((!(testerList.isEmpty)) && $"column".isin(testerList:_*)))
This one throws an error though:
error: type mismatch;
found : org.apache.spark.sql.Column
required: Boolean
.where( (!(testerList.isEmpty)) && (($"agent_account_homepage").isin(testerList:_*)))
^
So, thanks a lot for any kind of ideas for a solution!! :)
What about this?
val filtered1 = df.where(some other stuff has to be filtered away)
val resultDF = if (testerList.isEmpty)
filtered1
else
filtered1.where($"column".isin(testerList:_*))
Or if you don't want filtered1 to be available below and perhaps unintentionally used, it can be declared inside a block initializing resultDF:
val resultDF = {
val filtered1 = df.where(some other stuff has to be filtered away)
if (testerList.isEmpty) filtered1 else filtered1.where($"column".isin(testerList:_*))
}
or if you change the order
val resultDF = (if (testerList.isEmpty)
df
else
df.where($"column".isin(testerList:_*))
).where(some other stuff has to be filtered away)
Essentially what Spark expects to receive in where is plain object Column. This means that you can extract all your complicated where logic to separate function:
def testerFilter(testerList: List[String]): Column = testerList match {
//of course, you have to replace ??? with real conditions
//just apend them by joining with "and"
case Nil => $"column".isNotNull and ???
case tl => $"column".isin(tl: _*) and ???
}
And then you just use it like:
df.where(testerFilter(testerList))
The solution I use now, use sql code inside the where clause:
var testerList = s""""""
var cond = testerList.isEmpty().toString
testerList = if (cond == "true") "''" else testerList
val resultDF= df.where(some other stuff has to be filtered away)
.where("('"+cond+"' = 'true') or (agent_account_homepage in ("+testerList+"))")
What do you think?

Can anyone 1-line this Scala code (preferably with FP?)

Sup y'all. The below feels like a tragic waste of Scala. Can anyone save this code?
val tokenSplit = token.split(":")(1)
val candidateEntityName = tokenSplit.substring(0,tokenSplit.length-1)
if(!candidateEntityName.equals(entityName)) removeEnd = true
Here it is (You don't need to use equals):
val removeEnd = token.split(":")(1).init != entityName
should be sth like: (untested)
val removeEnd = !(token.split(":")(1).dropRight(1).equals(entityName))
or:
val removeEnd = !(token.split(":").last.dropRight(1).equals(entityName))
A different solution using regex's to match on the input. It also handles the case where the data isn't as expected (you could of course extend your regex to suite your needs).
val removeEnd = """<(\w+):(\w+)>""".r.findFirstMatchIn(token).map(!_.group(2).equals(entityName)).getOrElse(throw new Exception(s"Can't parse input: $token"))
If you want to default to false:
val removeEnd = """<(\w+):(\w+)>""".r.findFirstMatchIn(token).exists(!_.group(2).equals(entityName))

Scala: how to use immutable values returned from function

I am new to Scala and try to use it in a functional way. Here are my questions:
Why can't I create a new binding for 'cnt' variable with function return value using '<-' operator?
How can increment immutable variable in a functional way (similar to Haskell <-) ? For the sake of experiment I don't want to use mutable vars.
import scala.io.Source
object MyProgram {
def main(args: Array[String]): Unit = {
if (args.length > 0) {
val lines = Source.fromFile(args(0)).getLines()
val cnt = 0
for (line <- lines) {
cnt <- readLines(line, cnt)
}
Console.err.println("cnt = "+cnt)
}
}
def readLines(line: String, cnt:Int):Int = {
println(line.length + " " + line)
val newCnt = cnt + 1
return (newCnt)
}
}
As for side effects, I could never expect that (line <- lines) is so devastating! It completely unwinds lines iterator. So running the following snippet will make size = 0 :
val lines = Source.fromFile(args(0)).getLines()
var cnt = 0
for (line <- lines) {
cnt = readLines(line, cnt)
}
val size = lines.size
Is it a normal Scala practice to have well-hidden side-effects like this?
You could fold on lines like so:
val lines = Source.fromFile(args(0)).getLines()
val cnt = lines.foldLeft(0) { case (count, line) => readLines(line, count) }
Console.err.println("cnt = "+cnt)
Your readLines method does side-effect with the call to println, but using foldLeft guarantees left-to-right processing of the list, so the output should be the same.
Why can't I reassign immutable 'cnt' variable with function return value using '<-' operator?
Why would you? If you has java experience, <- has the simular meaning as : in for(Item x: someCollection). It is just a syntactic sugar for taking current item from collection and naming it, it is not a bind operator in general.
Moreover, isn't reassign immutable oxymoron?
How can increment immutable variable in a functional way (similar to Haskell <-)?
Scala people usually use .zipWithIndex but this will work only if you're going to use counter inside for comprehension:
for((x, i) <- lines.zipWithIndex) { println("the counter value is" + i) }
So I think you need to stick with lines.count or use fold/reduce or = to assign new value to variable.
<- is not an operator, just a syntax used in for expressions. You have to use =. If you want to use <- it must be within the for-iteration-expression. And you cannot increment a val. If you want to modify that variable, make it a var.

find by regular expression with Casbah

how to use regular expressions at Collection#find(/* HERE */) like:
val coll = MongoConnection()("foo")("bar")
for(x <- coll.find("name" -> ".*son$".r)) {
// some operations...
}
You are close, you just need to wrap your conditions in a MongoDBObject().
We had to pull out the implicit conversions of <key> -> <value> in a bunch of places because they were hard to catch properly and were breaking other code.
They'll probably be back in 2.1.
Do this instead:
val coll = MongoConnection()("foo")("bar")
for(x <- coll.find(MongoDBObject("name" -> ".*son$".r))) {
// some operations...
}
For adding IGNORECASE above answer will not work by appending "/i" at the end of regex in Scala, Casbah.
For this purpose use:
val EmailPattern = Pattern.compile(companyName,Pattern.CASE_INSENSITIVE)
val q = MongoDBObject("companyName" -> EmailPattern)
val result = MongoFactory.COLLECTION_NAME.findOne(q)