I am trying to write word count program in Scala. I'm using a string "file" :
file.map( _.split(" ")).flatMap(word => (word, 1)).reduceByKey( _ + _ )
It is keep saying that:
value split is not a member of Char
Can't figure out how to solve it!
When you call map on a String it is wrapped with WrappedString which extends AbstractSeq[Char]. Therefore, when you call map it is as if you are doing so on a Seq of Char not a Seq of String.
See the link below for the code https://github.com/scala/scala/blob/v2.10.2/src/library/scala/collection/immutable/WrappedString.scala
The code below splits by whitespace and returns the size, a word counter.
val file = "Some test data"
file.split("\\s+").size
To get a count of the number of times each word in the string appears.
val file = "Some test data test"
println(file.split("\\s+").toList.groupBy(w => w).mapValues(_.length))
I found out that the code is perfect! Just because I was running it on Spark, the answer was kept in lazy RDD file that I needed to collect it somehow. Therefore, I saved it to a text file and problem solved! Here is the code:
file.flatMap(line=>line.split(" ")).map(w=>(w,1)).reduceByKey(+).saveAsTextFile("OUT.txt")
Thanks.
Related
I'm new to scala and FP in general and trying to practice it on a dummy example.
val counts = ransomNote.map(e=>(e,1)).reduceByKey{case (x,y) => x+y}
The following error is raised:
Line 5: error: value reduceByKey is not a member of IndexedSeq[(Char, Int)] (in solution.scala)
The above example looks similar to staring FP primer on word count, I'll appreciate it if you point on my mistake.
It looks like you are trying to use a Spark method on a Scala collection. The two APIs have a few similarities, but reduceByKey is not part of it.
In pure Scala you can do it like this:
val counts =
ransomNote.foldLeft(Map.empty[Char, Int].withDefaultValue(0)) {
(counts, c) => counts.updated(c, counts(c) + 1)
}
foldLeft iterates over the collection from the left, using the empty map of counts as the accumulated state (which returns 0 is no value is found), which is updated in the function passed as argument by being updated with the found value, incremented.
Note that accessing a map directly (counts(c)) is likely to be unsafe in most situations (since it will throw an exception if no item is found). In this situation it's fine because in this scope I know I'm using a map with a default value. When accessing a map you will more often than not want to use get, which returns an Option. More on that on the official Scala documentation (here for version 2.13.2).
You can play around with this code here on Scastie.
On Scala 2.13 you can use the new groupMapReduce
ransomNote.groupMapReduce(identity)(_ => 1)(_ + _)
val str = "hello"
val countsMap: Map[Char, Int] = str
.groupBy(identity)
.mapValues(_.length)
println(countsMap)
I am iterating with a map with key-value as
Map(fields -> List(
pangaea_customer_id, email_hash, savings_catcher_balance,
is_savings_catcher_member, billing_zipcode
))
I am trying below code to get the value of fields key
val fields = ValuesMap.get("fields")
But I am not able to convert fields to comma-separated String.
Please help me on how to do this.
I am trying with
val fields = ValuesMap.get("fields").mkString(",")
but it will return
List(pangaea_customer_id, email_hash, savings_catcher_balance,
is_savings_catcher_member, billing_zipcode)
get returns an Option[V] (because the key may be unmapped, and then it needs to return None).
Option can be iterated, just like a List, so you can call mkString on it, but it only ever returns at most one element, so the separator character will not be used.
Try getOrElse("fields", Seq.empty).mkString(",")
What your version did is:
get("fields") returns Some(List(....))
you call mkString on the Option, will will just give you either an empty String (if it was None), or (in your case), the result of toString for the element inside (which is the List as a whole).
You can try this:
val fields = res8.get("fields").getOrElse(List()).mkString(",")
// output: fields: String = pangaea_customer_id,email_hash,savings_catcher_balance,is_savings_catcher_member,billing_zipcode
I am learning scala and I have the following issue:
Given a list in input
val listin = List("Apple,January,10",
"Banana,August,15",
"Strawberry,June,20")
and a String val inputstring="Banana,August"
I want to find the price in column matching with the string.
I wrote the following code :
case class Fruit(name:String, month:String,price:Int)
val splitString=inputstring.split(",")
val listSplit=listin.map(_.spilt(","))
But I don't know how to match the case of equality between the string and a line in the list
The expected result is
val output="Banana_August_15"`
Not sure why you want to replace the commas with underscores, or what purpose the case class serves, but this produces the requested result.
listin.filter(_.startsWith(inputstring+","))
.map(_.replaceAllLiterally(",","_")
//res0: List[String] = List(Banana_August_15)
I have a string that I need transform to "canonical" view and for do that I need to call replaceAll() many times on string. I made it work next way:
val text = "Java Scala Fother Python JS C# Child"
val replacePatterns = List("Java", "Scala", "Python", "JS", "C#")
var replaced = text
for (pattern <- replacePatterns) {
replaced = replaced.replaceAll(pattern, "")
}
This code is result in replaced = "Fother Child" as I want, but it looks very imperative and I want eliminate accumulator "replaced".
Is there a way in Scala to handle it in one line without var's?
Thanks.
Use a fold over the list of patterns and the text to be processed as start point:
replacePatterns.foldLeft(text){case (res, pattern) => res.replaceAll(pattern, "")}
I'm reading a file line by line using this loop:
for(line <- s.getLines()){
mylist += otherFunction(line);
}
where the variable mylist is a ArrayBuffer which stores a collection of custom datatypes. The otherFunction(line); does something like this...
def otherFunction(list:String)={
val line = s.getLine(index);
val t = new CustomType(0,1,line(0));
t
}
and CustomType is defined as...
class CustomType(name:String,id:Int,num:Int){}
I've ommitted much of the code as you can see because it's not relevant. I can run the rest of my functions and it'll read the file line by line till EOF as long as I comment out the last line of otherFunction(). Why is returning a value in this function to my list causing my for loop to stop?
It's not clear exactly what you're trying to do here. I assume s is a scala.io.Source object. Why does otherFunction take a string argument that it doesn't use? getLine is deprecated, and you don't say where index comes from. Do you really want to refer to the first character in the line String with index 0, and is it really supposed to be an Int? Assuming that this is actually what you want to do, why not just use a map on the iterator?
val list = s.getLines.map(i => new CustomType("0", 1, i(0).asDigit)).toIndexedSeq