Scala on Eclipse gives errors on Map operations - eclipse

I am trying to write a word Count program using Maps in Scala. From various sources on the internet, I found that 'contains', adding elements to the Map using '+' and updating the existing values are valid. But Eclipse gives me errors when I try to use those operations in my code:
object wc {
def main(args:Array[String])={
val story = """ Once upon a time there was a poor lady with a son who was lazy
she was worried how she will grow up and
survive after she goes """
count(story.split("\n ,.".toCharArray()))
}
def count(s:Array[String])={
var count = scala.collection.mutable.Map
for(i <- 0 until s.size){
if(count.contains(s(i))) {
count(s(i)) = count(s(i))+1
}
else count = count + (s(i),1)
}
println(count)
}
}
these are the error messages I get in eclipse:
1.)
2.)
3.)
I tried these operations on REPL and they were working fine without any errors. Any help would be appreciated. Thank you!

You need to instantiate a typed mutable Map (otherwise you're looking for the contains attribute on Map.type; which isn't there):
def count(s: Array[String]) ={
var count = scala.collection.mutable.Map[String, Int]()
for(i <- 0 until s.size){
if (count.contains(s(i))) {
// count += s(i) -> (count(s(i)) + 1)
// can be rewritten as
count(s(i)) += 1
}
else count += s(i) -> 1
}
println(count)
}
Note: I also fixed up the lines updating count.
Perhaps this is better written as a groupBy:
a.groupBy({s: String => s}).mapValues(_.length)
val a = List("a", "a", "b", "c", "c", "c")
scala> a.groupBy({s: String => s}).mapValues(_.length)
Map("b" -> 1, "a" -> 2, "c" -> 3): Map[String, Int]

Related

Cleaner way to find all indices of same value in Scala

I have a textfile like so
NameOne,2,3,3
NameTwo,1,0,2
I want to find the indices of the max value in each line in Scala. So the output of this would be
NameOne,1,2
NameTwo,2
I'm currently using the function below to do this but I can't seem to find a simple way to do this without a for loop and I'm wondering if there is a better method out there.
def findIndices(movieRatings: String): (String) = {
val tokens = movieRatings.split(",", -1)
val movie = tokens(0)
val ratings = tokens.slice(1, tokens.size)
val max = ratings.max
var indices = ArrayBuffer[Int]()
for (i<-0 until ratings.length) {
if (ratings(i) == max) {
indices += (i+1)
}
}
return movie + "," + indices.mkString(",")
}
This function is called as so:
val output = textFile.map(findIndices).saveAsTextFile(args(1))
Just starting to learn Scala so any advice would help!
You can zipWithIndex and use filter:
ratings.zipWithIndex
.filter { case(_, value) => value == max }
.map { case(index, _) => index }
I noticed that your code doesn't actually produce the expected result from your example input. I'm going to assume that the example given is the correct result.
def findIndices(movieRatings :String) :String = {
val Array(movie, ratings #_*) = movieRatings.split(",", -1)
val mx = ratings.maxOption //Scala 2.13.x
ratings.indices
.filter(x => mx.contains(ratings(x)))
.mkString(s"$movie,",",","")
}
Note that this doesn't address some of the shortcomings of your algorithm:
No comma allowed in movie name.
Only works for ratings from 0 to 9. No spaces allowed.
testing:
List("AA"
,"BB,"
,"CC,5"
,"DD,2,5"
,"EE,2,5, 9,11,5"
,"a,b,2,7").map(findIndices)
//res0: List[String] = List(AA, <-no ratings
// , BB,0 <-comma, no ratings
// , CC,0 <-one rating
// , DD,1 <-two ratings
// , EE,1,4 <-" 9" and "11" under valued
// , a,0 <-comma in name error
// )

Speeding up a method that iterates through text and creates a Map[Tuple2[String, String], Int] Scala

I have a method in a scala program that is creating a Map[Tuple2[String, String], Int] but its running very slow and cant process much text. I can't seem to figure out how to speed it up and make it more efficient. Any suggestions would be greatly appreciated.
def createTuple(words: List[String]): Map[Tuple2[String, String], Int] = {
var pairCountsImmutable = Map[Tuple2[String, String], Int]()
val pairCounts = collection.mutable.Map(pairCountsImmutable.toSeq: _*)
var i = 0
for (i <- 0 to words.length - 2) {
val currentCount: Int = pairCounts.getOrElse((words(i), words(i + 1)), 0)
if (pairCounts.exists(_ == (words(i), words(i + 1)) -> currentCount)) {
var key = pairCounts(words(i), words(i + 1))
key = key + 1
pairCounts((words(i), words(i + 1))) = key
} else {
pairCounts += (words(i), words(i + 1)) -> 1
}
}
var pairCountsImmutable2 = collection.immutable.Map(pairCounts.toList: _*)
return pairCountsImmutable2
}
Update
I have shamelessly borrowed from the answer by TRuhland to give this improved version of my answer that does not fail with empty or single-element lists:
def createTuple(words: List[String]): Map[Tuple2[String, String], Int] =
words
.zip(words.drop(1))
.groupBy(identity)
.mapValues(_.length)
Original
You appear to be counting adjacent pairs of words is a list of words. If so, something like this should work:
def createTuple(words: List[String]): Map[Tuple2[String, String], Int] =
words
.sliding(2)
.map(l => (l(0), l(1)))
.toList
.groupBy(identity)
.mapValues(_.length)
This works as follows
sliding(2) creates a list of adjacent pairs of words
map turns each pair from a List into a tuple
groupBy groups the tuples with the same value
mapValues counts the number of pairs with the same value for each pair
This may not be quite what you want, but hopefully it gives an idea of how it might be done.
As a general rule, don't iterate through a list using an index, but try to transform the list into something where you can iterate through the values.
Try to not create Maps element-by-element. Use groupBy or toMap.
Your big problem is that words is a List, and yet you are indexing into it with words(i). That's slow. Change it to be a Vector or rework your algorithm to not use indexing.
Also, pairCounts.exists is slow, you should use contains whenever possible, as it is constant time on a Map.
If we first reduce your code to essence:
def createTuple(words: List[String]): Map[(String, String), Int] = {
val pairCounts = collection.mutable.Map[(String, String), Int]()
for (i <- 0 until words.length - 1) {
val pair = (words(i), words(i + 1))
pairCounts += (pair -> (pairCounts.getOrElse(pair, 0) + 1))
}
pairCounts.toMap
}
To improve speed, don't use indexing on list (as mentioned elsewhere):
def createTuple(words: List[String]): Map[(String, String), Int] = {
val map = collection.mutable.Map[(String, String), Int]()
words
.zip(words.tail)
.foreach{ pair =>
map += (pair -> (map.getOrElse((pair, 0) + 1)) }
map.toMap
}

What would be the idiomatic way to map this data?

I'm pretty new to Scala and while working I found the need to map some data found within a log file. The log file follows this format (values changed from original):
1343,37284.ab1-tbd,283
1344,37284.ab1-tbd,284
1345,37284.ab1-tbd,0
1346,28374.ab1-tbd,107
1347,28374.ab1-tbd,0
...
The first number is not important, but the number portion of the second field and the third field are what need to be mapped. I need the map to have keys that correspond to the number portion of the second field that map to a list of every 3rd field that follows it. That was a bad explanation, so as an example here is what I would need after parsing the above log:
{
37284 => { 283, 284, 0 }
28374 => { 107, 0 }
}
The solution I came up with is this:
val data = for (line <- Source fromFile "path/to/log" getLines) yield line.split(',')
val ls = data.toList
val keys = ls.map(_(1).split('.')(0).toInt)
val vals = ls.map(_(2).toInt)
val keys2vals = for {
(k, v) <- (keys zip vals).groupBy(_._1)
list = v.map(_._2)
} yield (k, list)
Is there a more idiomatic way to do this in Scala? This seems kinda awkward and convoluted to me. (When explaining, please assume little to no background knowledge of langauge features, etc.) Also, if later down the line I wanted to exclude the number zero from the mappings, how would I do so?
EDIT:
In addition, how would I similarly turn the data into the form:
{
{ 37284, { 283 ,284, 0 } }
{ 28374, { 107, 0 } }
}
i.e. a List[(Int, List[Int])]? (This form is for use with apache-spark's indexed rdds)
How about:
val assocList = for {
line <- Source.fromFile("path/to/log").getLines
Array(_, snd, thd) = line.split(',')
} yield (snd.split('.')(0).toInt, thd.toInt)
assocList.toList.groupBy(_._1).mapValues(_.map(_._2))
If you want a List[(Int, List[Int])], add .toList.
I might be tempted to write it in fewer lines (arguably clearer too) like this:
val l = List((1343,"37284.ab1-tbd",283),
(1344,"37284.ab1-tbd",284),
(1345,"37284.ab1-tbd",0),
(1346,"28374.ab1-tbd",107),
(1347,"28374.ab1-tbd",0))
// drop the unused data
val m = l.map(a => a._2.split('.')(0).toInt -> a._3)
// transform to Map of key -> matchedValues
m.groupBy(_._1) mapValues (_ map (_._2))
gives:
m: List[(Int, Int)] = List((37284,283), (37284,284), (37284,0), (28374,107), (28374,0))
res0: scala.collection.immutable.Map[Int,List[Int]] = Map(37284 -> List(283, 284, 0), 28374 -> List(107, 0))
"Also, if later down the line I wanted to exclude the number zero from the mappings, how would I do so?" - You could filter the initial list:
val m = l.filter(_._3 != 0).map(a => a._2.split('.')(0) -> a._3)
To convert to List[(Int, List[Int])] you just need to call .toList on the resulting Map.
val lines = io.Source.fromFile("path/to/log").getLines.toList
lines.map{x=>
val Array(_,second,_,fourth) = x.split("[,.]")
(second,fourth)
}.groupBy(_._1)
.mapValues(_.map(_._2))

Assigning ordinal numeral to sequence without post-decrement in scala

As I'm new to scala I google many things and find good answers in most cases. But I couldn't find an answer to this specific question as googling "post-decrement in scala" only brings dontcha-use-post-decrement-in-scala-because-its-a-functional-language-answers to the top.
So, I really want to know what's the functional way of doing the following:
object A {
val list = List("a", "b", "c")
val map = {
var ord = list.size
Map(list map { x => (x, { val res = ord; ord -= 1; res } ) } : _* )
}
}
class Test extends org.scalatest.FunSuite {
test("") {
println(A.map) // Map(a -> 3, b -> 2, c -> 1)
}
}
It's basically creating a map from a given list and assigning decreasing ordinal numerals to each element of the list (real code is of course more complex than this minimal example).
I'm especially unhappy with var ord = ... (mutable) and { val res = ord; ord -= 1; res } (post-decrement) parts :/ Is there another (prettier) way of doing this?
A quick solution would require reversing the list and then you can use zipWithIndex:
scala> :pa
// Entering paste mode (ctrl-D to finish)
List("a", "b", "c")
.reverse
.zipWithIndex
.toMap
// Exiting paste mode, now interpreting.
res4: scala.collection.immutable.Map[String,Int] = Map(c -> 0, b -> 1, a -> 2)
If you don't mind the allocations of the extra collections, you can remove view.
zip is helpful for this,
list.reverse.zip(1 to list.size + 1).toMap
And yet another variant
list.zip(list.size to 1 by -1).toMap
Similar in performance to the .reverse.zipWithIndex, because both reverse and size are O(N)
You can simply write this:
val list = List("a","b","c")
list.map ( elem => (elem, list.size - list.indexOf(elem))).toMap
It will give you the result like this:
scala.collection.immutable.Map[String,Int] = Map(a -> 3, b -> 2, c -> 1)
Try this:
list.toIterator.zip(Iterator.iterate(list.size){_-1}).toMap

Creating a Map by reading elements of List in Scala

I have some records in a List .
Now I want to create a new Map(Mutable Map) from that List with unique key for each record. I want to achieve this my reading a List and calling the higher order method called map in scala.
records.txt is my input file
100,Surender,2015-01-27
100,Surender,2015-01-30
101,Raja,2015-02-19
Expected Output :
Map(0-> 100,Surender,2015-01-27, 1 -> 100,Surender,2015-01-30,2 ->101,Raja,2015-02-19)
Scala Code :
object SampleObject{
def main(args:Array[String]) ={
val mutableMap = scala.collection.mutable.Map[Int,String]()
var i:Int =0
val myList=Source.fromFile("D:\\Scala_inputfiles\\records.txt").getLines().toList;
println(myList)
val resultList= myList.map { x =>
{
mutableMap(i) =x.toString()
i=i+1
}
}
println(mutableMap)
}
}
But I am getting output like below
Map(1 -> 101,Raja,2015-02-19)
I want to understand why it is keeping the last record alone .
Could some one help me?
val mm: Map[Int, String] = Source.fromFile(filename).getLines
.zipWithIndex
.map({ case (line, i) => i -> line })(collection.breakOut)
Here the (collection.breakOut) is to avoid the extra parse caused by toMap.
Consider
(for {
(line, i) <- Source.fromFile(filename).getLines.zipWithIndex
} yield i -> line).toMap
where we read each line, associate an index value starting from zero and create a map out of each association.