case map will not work in spark shell - scala

I have a code like this
val pop = sc.textFile("population.csv")
.filter(line => !line.startsWith("Country"))
.map(line => line.split(","))
.map { case Array(CountryName, CountryCode, Year, Value) => (CountryName, CountryCode, Year, Value) }
The file looks like this.
Country Name,Country Code,Year,Value
Arab World,ARB,1960,93485943
Arab World,ARB,1961,96058179
Arab World,ARB,1962,98728995
Arab World,ARB,1963,101496308
Arab World,ARB,1964,104359772
Arab World,ARB,1965,107318159
Arab World,ARB,1966,110379639
Arab World,ARB,1967,113543760
Arab World,ARB,1968,116787194
up until the .map {case}, I can print out by pop.take(10),
And I get Array[Array[String]].
But once the case is added, I'm getting
error: not found: value (all columns)
all columns meaning 4 different errors with CountryName, CountryCode, Year, Value, etc...
Not sure where I'm doing wrong.
The data is clean.

You need to use lowercase variable names in pattern matching. I.e:
.map { case Array(countryName, countryCode, year, value) => (countryName, countryCode, year, value) }
In Scala's pattern matching variables that are Capitalized as well as variables enclosed in backticks (`) are taken from outer scope and used as constants. Here is an example to illustrate what I'm saying:
Array("a") match {
case Array(a) => a
}
Will match array with any string, while:
val A = "a"
Array("a") match {
case Array(A) =>
}
will match only literal "a". Or, equivalent:
val a = "a"
Array("a") match {
case Array(`a`) =>
}
will also match only literal "a".

Related

Pattern matching a domain name

I don't use pattern matching as often as I should.
I am matching a domain name for the following:
1. If it starts with www., then remove that portion and return.
www.stackoverflow.com => "stackoverflow.com"
2. If it has either example.com or example.org, strip that out and return.
blog.example.com => "blog"
3. return request.domain
hello.world.com => "hello.world.com"
def filterDomain(request: RequestHeader): String = {
request.domain match {
case //?? case #1 => ?
case //?? case #2 => ?
case _ => request.domain
}
}
How do I reference the value (request.domain) inside the expression and see if it starts with "www." like:
if request.domain.startsWith("www.") request.domain.substring(4)
You can give the variable you pattern matching a name and Scala will infer its type, plus you can put an if statement in you case expression as follows
def filterDomain(request: RequestHeader): String = {
request.domain match {
case domain if domain.startsWith("www.") => domain.drop(4)
case domain if domain.contains("example.org") | domain.contains("example.com") => domain.takeWhile(_!='.')
case _ => request.domain
}
}
Note that the order of the case expressions matters.
When writing case clauses you can do something like:
case someVar if someVar.length < 2 => someVar.toLowerCase
This should make pretty clear how grabbing matched values works.
So in this case, you would need to write something like:
case d if d.startsWith("www.") => d.substring(4)
If you're dead set on using a regex rather than String methods such as startsWith and contains, you can do the following:
val wwwMatch = "(?:www\\.)(.*)".r
val exampleMatch = "(.*)(?:\\.example\\.(?:(?:com)|(?:org)))(.*)".r
def filterDomain(request: String): String = {
request.domain match {
case wwwMatch(d) => d
case exampleMatch(d1, d2) => d1 + d2
case _ => request.domain
}
}
Now, for maintainability's sake, I wouldn't go this way, because a month later, I will look at this and not remember what it's doing, but that's your call.
you don't need pattern matching for that:
request.domain
.stripPrefix("www.")
.stripSuffix(".example.org")
.stripSuffix(".example.com")

Split string with default value in scala

I have a list of strings as shown below, which lists fruits and the cost associated with each. In case of no value, it is assumed to be 5:
val stringList: List[String] = List("apples 20", "oranges", "pears 10")
Now I want to split the string to get tuples of the fruit and the cost. What is the scala way of doing this?
stringList.map(query => query.split(" "))
is not what I want.
I found this which is similar. What is the correct Scala way of doing this?
You could use a regular expression and pattern matching:
val Pat = """(.+)\s(\d+)""".r // word followed by whitespace followed by number
def extract(in: String): (String, Int) = in match {
case Pat(name, price) => (name, price.toInt)
case _ => (in, 5)
}
val stringList: List[String] = List("apples 20", "oranges", "pears 10")
stringList.map(extract) // List((apples,20), (oranges,5), (pears,10))
You have two capturing groups in the pattern. These will be extracted as strings, so you have to convert explicitly using .toInt.
You almost have it:
stringList.map(query => query.split(" "))
is what you want, just add another map to it to change lists to tuples:
.map { list => list.head -> list.lift(1).getOrElse("5").toInt }
or this instead, if you prefer:
.collect {
case Seq(a, b) => a -> b.toInt
case Seq(a) => a -> 5
}
(.collect will silently ignore the occurrences, where there are less than one or more than two elements in the list. You can replace it with .map if you would prefer it to through an error in such cases).

Concise way to assert a value matches a given pattern in ScalaTest

Is there a nice way to check that a pattern match succeeds in ScalaTest? An option is given in scalatest-users mailing list:
<value> match {
case <pattern> =>
case obj => fail("Did not match: " + obj)
}
However, it doesn't compose (e.g. if I want to assert that exactly 2 elements of a list match the pattern using Inspectors API). I could write a matcher taking a partial function literal and succeeding if it's defined (it would have to be a macro if I wanted to get the pattern in the message as well). Is there a better alternative?
I am not 100% sure I understand the question you're asking, but one possible answer is to use inside from the Inside trait. Given:
case class Address(street: String, city: String, state: String, zip: String)
case class Name(first: String, middle: String, last: String)
case class Record(name: Name, address: Address, age: Int)
You can write:
inside (rec) { case Record(name, address, age) =>
inside (name) { case Name(first, middle, last) =>
first should be ("Sally")
middle should be ("Ann")
last should be ("Jones")
}
inside (address) { case Address(street, city, state, zip) =>
street should startWith ("25")
city should endWith ("Angeles")
state should equal ("CA")
zip should be ("12345")
}
age should be < 99
}
That works for both assertions or matchers. Details here:
http://www.scalatest.org/user_guide/other_goodies#inside
The other option if you are using matchers and just want to assert that a value matches a particular pattern, you can just the matchPattern syntax:
val name = Name("Jane", "Q", "Programmer")
name should matchPattern { case Name("Jane", _, _) => }
http://www.scalatest.org/user_guide/using_matchers#matchingAPattern
The scalatest-users post you pointed to was from 2011. We have added the above syntax for this use case since then.
Bill
This might not be exactly what you want, but you could write your test assertion using an idiom like this.
import scala.util.{ Either, Left, Right }
// Test class should extend org.scalatest.AppendedClues
val result = value match {
case ExpectedPattern => Right("test passed")
case _ => Left("failure explained here")
})
result shouldBe 'Right withClue(result.left.get)
This approach leverages the fact that that Scala match expression results in a value.
Here's a more concise version that does not require trait AppendedClues or assigning the result of the match expression to a val.
(value match {
case ExpectedPattern => Right("ok")
case _ => Left("failure reason")
}) shouldBe Right("ok")

Scala case match default value

How can I get the default value in match case?
//Just an example, this value is usually not known
val something: String = "value"
something match {
case "val" => "default"
case _ => smth(_) //need to reference the value here - doesn't work
}
UPDATE: I see that my issue was not really understood, which is why I'm showing an example which is closer to the real thing I'm working on:
val db = current.configuration.getList("instance").get.unwrapped()
.map(f => f.asInstanceOf[java.util.HashMap[String, String]].toMap)
.find(el => el("url").contains(referer))
.getOrElse(Map("config" -> ""))
.get("config").get match {
case "" => current.configuration.getString("database").getOrElse("defaultDatabase")
case _ => doSomethingWithDefault(_)
}
something match {
case "val" => "default"
case default => smth(default)
}
It is not a keyword, just an alias, so this will work as well:
something match {
case "val" => "default"
case everythingElse => smth(everythingElse)
}
The "_" in Scala is a love-and-hate syntax which could really useful and yet confusing.
In your example:
something match {
case "val" => "default"
case _ => smth(_) //need to reference the value here - doesn't work
}
the _ means, I don't care about the value, as well as the type, which means you can't reference to the identifier anymore.
Therefore, smth(_) would not have a proper reference.
The solution is that you can give the a name to the identifier like:
something match {
case "val" => "default"
case x => smth(x)
}
I believe this is a working syntax and x will match any value but not "val".
More speaking. I think you are confused with the usage of underscore in map, flatmap, for example.
val mylist = List(1, 2, 3)
mylist map { println(_) }
Where the underscore here is referencing to the iterable item in the collection.
Of course, this underscore could even be taken as:
mylist map { println }
here's another option:
something match {
case "val" => "default"
case default#_ => smth(default)
}

Scala regex pattern match is working properly or not?

I'm new to Scala but I was told that "You are checking if "Toronto Raptor" == matchNY." for the following code snippet # https://issues.scala-lang.org/browse/SI-7210, and I really don't have any idea why "Totonto Raptor" is the only string chosen in the for loop to be matched with the regular expression, can someone explain this to me please ?
Thanks.
David
val matchNY = "^.*New.*$".r
val teams = List(
"Toronto Raptor",
"New York Nets",
"San Francisco 49ers",
"Dallas Mavericks"
)
for (team <- teams) {
team match {
case `matchNY` => println("Go New York.")
case _ => println("Boo")
}
}
Note-1: The usage of backticks is explained here # http://alvinalexander.com/scala/scala-unreachable-code-due-to-variable-pattern-message
I am assuming you meant
val matchNY = "^.*New.*$".r
and not ^.New.$, if you were expecting to match strings containing New.
In Scala, a block of case statements can be thought of as a sequence
of partial functions.
In your example,
case `matchNY` => // ...
Translates to something like:
case x if x == matchNY => // ..
So that will try to match the String "Toronto Raptor" with the Regexp object ^.*New.*$
using equality:
"Toronto Raptor" == ("^.*New.*$".r)
Which doesn't match because a String and a Regexp object are 2 different things.
The same goes for any of the other Strings in the list:
"New York Nets" != ("^.*New.*$".r)
Which doesn't match either. The way to use a regexp as a match in a case statement is:
case matchNY() => // .... Note the ()
Which, under the hood is (roughly) equivalant to something like
case x matchNY.unapplySeq(x).isDefined => // ...
Regexps in case statements are implemented as Extractor Objects with
an unapplySeq method. The last expression shows what the previous
translates into.
If matchNY had a capture such as:
val matchNY = "^.*New York\s(.*)$".r
Then you could use it to extract the captured match:
case matchNY(something) => // 'something' is a String variable
// with value "Nets"
Side Note
Your example could be condensed to
teams foreach {
case matchNY() => println("Go New York.")
case _ => println("Boo")
}
Yes, it is working properly. The magic behinds pattern matching is something called extractors.
If you dig through the ScalaDoc of Regex, you will see that it only defined unapplySeq, but not unapply.
That means if you want use Regex at Pattern Matching, you should do the following (note the parentheses after matchNY) :
val matchNY = "^.*New.*$".r
val teams = List(
"Toronto Raptor",
"New York Nets",
"San Francisco 49ers",
"Dallas Mavericks"
)
for (team <- teams) {
team match {
case matchNY() => println("Go New York.")
case _ => println("Boo")
}
}
Otherwise, you are simply checking if the elements in the list is == matchNY, which is not what you want anyway.
In your for loop you're literally checking if each item in the list of teams is equal to the regex matchNY and every item in the list is checked not just "Toronto Raptor". This is equivalent to your for loop:
for (team <- teams) {
if (team == matchNY) println("Go New York.")
else println("Boo")
}
Which breaks down to this:
if ("Toronto Raptor" == matchNY) println("Go New York.") else println("Boo")
if ("New York Nets" == matchNY) println("Go New York.") else println("Boo")
if ("San Francisco 49ers" == matchNY) println("Go New York.") else println("Boo")
if ("Dallas Mavericks" == matchNY) println("Go New York.") else println("Boo")
What I think your looking for is if something matches you're regex. You can do something like this:
for (team <- teams) {
matchNY.findFirstMatchIn(team) match {
case Some(teamMatch) => {
println("Go New York.")
println(teamMatch)
}
case _ => {
println("Boo")
println(team)
}
}
}
Which prints out:
Boo
Toronto Raptor
Go New York.
New York Nets
Boo
San Francisco 49ers
Boo
Dallas Mavericks