Explanation of what is happening with REPL - scala

I've been learning/experimenting with Scala in the REPL.
By the way, and as a side note, I'm quite impressed so far. I'm finding the language beautiful.
The following happened and need an explanation of what is going on.
Thanks in advance for any help given.
At the REPL entered:
def withMarks(mark: String)(body: => Unit){
println(mark + " Init")
body
println(mark + " End")
}
val a = "Testing clojure with paremeter by name as control structure"
withMarks("***"){
println(a)
println("more expressions")
}
Everything worked as expected.
Than happened what I consider weird, out of ignorance I suspect. I entered some more stuff:
class FileAsIterable{
def iterator = scala.io.Source.fromFile("/Users/MacBookProRetina/Google Drive/NewControl.scala").getLines()
}
val newIterator = new FileAsIterable with Iterable[String]
When evaluating the last line the REPL prints:
newIterator: FileAsIterable with Iterable[String] = (def withMarks(mark: String)(body: => Unit){, println(mark + " Init"), body, println(mark + " End"), }, val a = "Hola Mundo", withMarks("***"){, println(a), })
I keep getting the same result even after restarting the terminal in the Mac, and running the scala REPL at different directory locations.
Don't know how the newIterator val got connected to the the withMarks def.

Nevermind. I was just confused by the contents of the file

Related

Scala: perform multiple maps and flatmaps on a set of strings

I'm pretty new to scala (and programming in general), but have come up with what feels like a less than perfect solution to an issue I have - I was wondering if anyone has a more elegant/efficient one?
I have a (very large) set of strings, a small example of which is below for replication purposes:
val brands = Set("Brand one", "BRAND-two!!!", "brand_Three1, brandThree2", "brand04")
Now what I want to do is clean up this set so that I have a new clean set where:
any strings separated by commas are split into separate strings
leading spaces and non-alphanumeric (and _ -) characters are
removed
any string with a space is replaced by three version of
that string (one with no space, one with "-" instead of a space, and one and one with "_")
The code I have so far does this, but it does it in two steps, thus iterating over the list twice (which is inefficient):
val brands_clean = brands.flatMap(
_.toLowerCase.split(",").map(
_.trim.replaceAll("[^A-Za-z0-9\\-\\_\\s]+", "")
)
)
def spaceVariations(v: String) = if (v.contains(" ")) Set(v.replaceAll(" ", "-"), v.replaceAll(" ", "_"), v.replaceAll(" ", "")) else Set(v)
val brands_final = brands_clean.flatMap(spaceVariations(_))
I have tried incorporating the spaceVariations function directly into the main code by appending to the replaceAll a map or flatMap:
// using the function call
.flatMap(spaceVariations(_))
// or using a function directly within the code
.flatMap {v => if (v.contains(" ")) Set(v.replaceAll(" ", "-"), v.replaceAll(" ", "_"), v.replaceAll(" ", "")) else Set(v) }
but I get the following error:
error: type mismatch;
found : Array[Nothing]
required: scala.collection.GenTraversableOnce[?]
I'm not sure I understand why this doesn't work here, or if there is a better way to achieve what I am trying to achieve?
brands.flatMap(
_.toLowerCase.split(",").map(
_.trim.replaceAll("""[^\w-]""", "")
)
).flatMap(spaceVariations)
Works for me, not sure where (or why) you are getting the error (I cleaned up your regex a little bit to make it more concise, but that shouldn't matter).
Note, that this still traverses the set twice though. Sets are not lazy in scala, so, it will complete the first flatMap with an intermediate set first, and then start with the next one.
If you want to save a scan, you should starts with an Iterator rather than a set, because iterators are lazy, and will send each element through the whole chain before moving on to the next one:
brands
.iterator
.flatMap { _.toLowerCase.split(",") }
.map(_.trim)
.map { _.replaceAll("""[^\w-]""", "") }
.flatMap(spaceVariations)
.toSet
Based on the assumption that your Set will always look like this:
def spaceVariations(v: String) = if (v.contains(" ")) Set(v.replaceAll(" ", "-"), v.replaceAll(" ", "_"), v.replaceAll(" ", "")) else Set(v)
val brands = Set("Brand one", "BRAND-two!!!", "brand_Three1, brandThree2", "brand04")
brands.map( x => if (x.contains(",") ) x.split(",") else x ).flatMap {
case str: String => Array(str)
case a : Array[String] => a
}.map(_.trim.toLowerCase.replaceAll("[^A-Za-z0-9\\-\\_\\s]+", "")).map(spaceVariations(_))
Gives the output :
Set(Set(brand-one, brand_one, brandone), Set(brandthree2), Set(brand04), Set(brand-two), Set(brand_three1))

what is the "scala" way to write this code

I have this scala code that involves java:
val message = "Our servers encountered an error" +
(if (ex.getMessage == null) "" else (": " + ex.getMessage))
What is the scala best way to write it?
In scala you do not want to deal with null, it is more convenient to use Option
"Our servers encountered an error while processing your request " +
Option(ex.getMessage).map(":"+_).getOrElse("")
Probably something along the lines of:
val exMessage = Option(ex.getMessage).fold("")(m =>s": $m")
val message = s"Our servers encountered an error$exMessage"
or in one line:
s"Our servers encountered an error${ Option(ex.getMessage).fold("")(m =>s": $m") }"
Though, not the most efficient way. Nor will it make your code any clearer.
EDIT:
This will (as well as the original) probably yield an undesired result if ex,getMessage returns an empty String.
Additionally, as has been mentioned, using Option in such a situation is a good way to obfuscate your code. So here is an example that should be more readable:
def errorMessage(ex: Throwable): String= {
val formattedMsg = ex.getMessage match {
case msg: String if msg.nonEmpty => s": $msg"
case _ => ""
}
s"Our servers encountered an error$formattedMsg"
}

Text manipulation in Spark and Scala

This is my data:
review/text: The product picture and part number match, but they together do not math the description.
review/text: A necessity for the Garmin. Used the adapter to power the unit on my motorcycle. Works like a charm.
review/text: This power supply did the job and got my computer back online in a hurry.
review/text: Not only did the supply work. it was easy to install, a lot quieter than the PowMax that fried.
review/text: This is an awesome power supply that was extremely easy to install.
review/text: I had my doubts since best buy would end up charging me $60. at the time I bought my camera for the card and the cable.
review/text: Amazing... Installed the board, and that's it, no driver needed. Work great, no error messages.
and I've tried:
import org.apache.spark.{SparkContext, SparkConf}
object test12 {
def filterfunc(s: String): Array[((String))] = {
s.split( """\.""")
.map(_.split(" ")
.filter(_.nonEmpty)
.map(_.replaceAll( """\W""", "")
.toLowerCase)
.filter(_.nonEmpty)
.flatMap(x=>x)
}
def main(args: Array[String]): Unit = {
val conf1 = new SparkConf().setAppName("pre2").setMaster("local")
val sc = new SparkContext(conf1)
val rdd = sc.textFile("data/2012/2012.txt")
val stopWords = sc.broadcast(List[String]("reviewtext", "a", "about", "above", "according", "accordingly", "across", "actually",...)
var grouped_doc_words = rdd.flatMap({ (line) =>
val words = line.map(filterfunc).filter(word_filter.value))
words.map(w => {
(line.hashCode(), w)
})
}).groupByKey()
}
}
and I want to generate this output :
doc1: product picture number match together not math description.
doc2: necessity garmin. adapter power unit my motorcycle. works like charm.
doc3: power supply job computer online hurry.
doc4: not supply work. easy install quieter powmax fried.
...
some exception: 1- (not , n't , non , none) not to be emitted 2- all dot (.) symbols must be held
my above code doesn't work very well.
Why not just sth like this:
This way you don't need any grouping or flatMapping.
EDIT:
I was writing this by hand and indeed there was some little bugs but i hoped idea was clear. Here is tested code:
def processLine(s: String, stopWords: Set[String]): List[String] = {
s.toLowerCase()
.replaceAll(""""[^a-zA-Z\.]""", "")
.replaceAll("""\.""", " .")
.split("\\s+")
.filter(!stopWords.contains(_))
.toList
}
def main(args: Array[String]): Unit = {
val conf1 = new SparkConf().setAppName("pre2").setMaster("local")
val sc = new SparkContext(conf1)
val rdd = sc.parallelize(
List(
"The product picture and part number match, but they together do not math the description.",
"A necessity for the Garmin. Used the adapter to power the unit on my motorcycle. Works like a charm.",
"This power supply did the job and got my computer back online in a hurry."
)
)
val stopWords = sc.broadcast(
Set("reviewtext", "a", "about", "above",
"according", "accordingly",
"across", "actually", "..."))
val grouped_doc_words = rdd.map(processLine(_, stopWords.value))
grouped_doc_words.collect().foreach(p => println(p))
}
This as result gives you:
List(the, product, picture, and, part, number, match,, but, they, together, do, not, math, the, description, .)
List(necessity, for, the, garmin, ., used, the, adapter, to, power, the, unit, on, my, motorcycle, ., works, like, charm, .)
List(this, power, supply, did, the, job, and, got, my, computer, back, online, in, hurry, .)
Now if you want string not list just do:
grouped_doc_words.map(_.mkString(" "))
I think there is a bug at marked line:
var grouped_doc_words = rdd.flatMap({ (line) =>
val words = line.map(filterfunc).filter(word_filter.value)) // **
words.map(w => {
(line.hashCode(), w)
})
}).groupByKey()
Here:
line.map(filterfunc)
should be:
filterfunc(line)
Explanation:
line is a String. map runs over a collection of items. When you do line.map(...) it basically runs the passed function on each Char - not something that you want.
scala> val line2 = "This is a long string"
line2: String = This is a long string
scala> line2.map(_.length)
<console>:13: error: value length is not a member of Char
line2.map(_.length)
Additionally, I don't know what are you using this in filterfunction:
.map(_.replaceAll( """\W""", "")
I am not able to run spark-shell properly at my end. Can you please update if these fix your problem?

Why is there a syntax error in the following scala code?

def sortAndCountInv[T](vec: Vector[T]): (Int, Vector[T]) = {
val n = vec.length
if (n == 1) {
(0, vec)
}
else {
val (left, right) = vec.splitAt(n / 2)
val (leftInversions, sortedLeft) = sortAndCountInv(left)
val (rightInversions, sortedRight) = sortAndCountInv(right)
val (splitInversions, sortedArray) = countSplitInvAndMerge(left, right)
(leftInversions + rightInversions + splitInversions, sortedArray)
}
}
This code is for counting the number of inversions in a vector,
when I try to compile it Scala IDE for Eclipse gives me the following error: illegal start of simple expression, for the line val (left, right) ...
Why does this happen?
It's missing a final closing brace. In general "stale" errors will show up in an IDE when code is incorrect; when in doubt it's best to look at just the first error from a command-line compile (maven or similar).
If it works on REPL, this will be an IDE bug. Try IDEA community with Scala plugin. I found it quite nice also still have some problem understanding some complicated structures.

Simple control flow in scalaz effect

Take this simple bit of code:
var line = "";
do {
println("Please enter a non-empty line: ")
line = readLine()
} while (line.isEmpty())
println("You entered a non-empty line: " + line)
It's definitely not particularly elegant, especially with the unfortunate scoping of line -- however, I think it's quite simple to read.
Now trying to translate this directly to scalaz effect, I have come up with:
def nonEmptyLine: IO[String] = for {
_ <- putStrLn("Please enter a non-empty line:")
line <- readLn
r <- if (line.isEmpty()) nonEmptyLine else IO(line)
} yield r
(for {
line <- nonEmptyLine
_ <- putStrLn("You entered a non-empty line: " + line)
} yield ()).unsafePerformIO
Which makes me feel like I'm missing something, as this doesn't feel like an improvement at all? Is there some higher order control flow stuff I'm missing?
You can make this (at least arguably) a lot prettier by skipping the for notation and using the combinators *> and >>= to pipe everything together:
import scalaz._, Scalaz._, effect._, IO._
val prompt = putStrLn("Please enter a non-empty line:")
def report(line: String) = putStrLn("You entered a non-empty line: " + line)
def nonEmptyLine: IO[String] = prompt *> readLn >>= (
(line: String) => if (line.isEmpty) nonEmptyLine else line.point[IO]
)
And then:
scala> (nonEmptyLine >>= report).unsafePerformIO
Please enter a non-empty line:
You entered a non-empty line: This is a test.
In general, though, I'm not sure you should expect code written using scalaz.effect to be more concise or easier to read than a straightforward imperative solution.