I'm new to scala and FP in general and trying to practice it on a dummy example.
val counts = ransomNote.map(e=>(e,1)).reduceByKey{case (x,y) => x+y}
The following error is raised:
Line 5: error: value reduceByKey is not a member of IndexedSeq[(Char, Int)] (in solution.scala)
The above example looks similar to staring FP primer on word count, I'll appreciate it if you point on my mistake.
It looks like you are trying to use a Spark method on a Scala collection. The two APIs have a few similarities, but reduceByKey is not part of it.
In pure Scala you can do it like this:
val counts =
ransomNote.foldLeft(Map.empty[Char, Int].withDefaultValue(0)) {
(counts, c) => counts.updated(c, counts(c) + 1)
}
foldLeft iterates over the collection from the left, using the empty map of counts as the accumulated state (which returns 0 is no value is found), which is updated in the function passed as argument by being updated with the found value, incremented.
Note that accessing a map directly (counts(c)) is likely to be unsafe in most situations (since it will throw an exception if no item is found). In this situation it's fine because in this scope I know I'm using a map with a default value. When accessing a map you will more often than not want to use get, which returns an Option. More on that on the official Scala documentation (here for version 2.13.2).
You can play around with this code here on Scastie.
On Scala 2.13 you can use the new groupMapReduce
ransomNote.groupMapReduce(identity)(_ => 1)(_ + _)
val str = "hello"
val countsMap: Map[Char, Int] = str
.groupBy(identity)
.mapValues(_.length)
println(countsMap)
I'm a beginner to scala and what i'm doing is to map dataset into (k, v) pairs where kv(0) and kv(1) are Strings and kv(2) is a list. The code is listed below:
val rdd_q1_bs = rdd_business.map(lines => lines.split('^')).map(kv =>
(kv(0), (kv(1), kv(2))))
But here's the problem, there are some empty lists for kv(2) in the dataset. So when I use .collect() to gather all the elements, there can be an out of bounds exception.
What I'm thinking is to define a function and check the length of kv. Is there any simple way I can ignore the exception and keep the process, or replace kv(2) by a String?
lines => lines.split('^') function suggests that rdd_business rdd are all RDD[String] and you are splitting the strings with ^ which would give you RDD[Array[String]] and from that you are trying to extract the elements of Array using kv(0), kv(1) and kv(2). The exception you are getting is because there might be only one ^ in one of the RDD[String] (rdd_business object).
So what you can do in such case is to use Try or Option.
import scala.util.Try
val rdd_q1_bs = rdd_business.map(lines => lines.split('^')).map(kv =>
(kv(0), (kv(1), Try(kv(2)) getOrElse("not found"))))
For better safety you can apply Try or Option on all the elements of the Array as
val rdd_q1_bs = rdd_business.map(lines => lines.split('^')).map(kv =>
(Try(kv(0)) getOrElse("notFound"), (Try(kv(1)) getOrElse("notFound"), Try(kv(2)) getOrElse("not found"))))
You can proceed the same way for Option as well.
I hope the answer is helpful
my scala list as below `enter code here
List((192.168.11.3,A,1413876302036,-,-,UP,,0.0,0.0,12,0,0,Null0,UP,0,0,4294967295,other), (192.168.11.3,A,1413876302036,-,-,UP,,0.0,0.0,8,0,0,C,DOWN,0,0,100000000,P), (192.168.1.1,A,1413876001775,-,-,UP,,0.0,0.0,12,0,0,E,UP,0,0,4294967295,other), (192.168.1.1,A,1413876001775,-,-,UP,,0.0,0.0,8,0,0,F,DOWN,0,0,100000000,E))
Now I want following operation, in list third parameter are changed in above is 1413876302036 and 1413876001775. I want to subtracts this as below
val sub = ((192.168.11.3,A,(1413876302036-1413876001775),-,-,UP,,0.0,0.0,12,0,0,Null0,UP,0,0,4294967295,other),(192.168.1.1,A,(1413876001775-1413876001775),-,-,UP,,0.0,0.0,12,0,0,E,UP,0,0,4294967295,other))
how should this calculate in scala
After 15 minutes of reading your question, I think I still don't understand it, but if I do here is an answer:
val list = List(("192.168.11.3",'A',1413876302036l,0,0,0), ("192.168.11.3",'A',1413876302036l,0,0,0),
("192.168.1.1",'A',1413876001775l,0,0,0), ("192.168.1.1",'A',1413876001775l,0,0,0))
val newList = list map { _ match {
case (a,b,value,c,d,e) => (a,b,value-1413876001775l,c,d,e)
}}
I allowed myself to rewrite your example a little. Next time try to keep it simple and go with SSCCE rules
If you want to create a pipe with more than 22 fields from a smaller one in Scalding you are limited by Scala tuples, which cannot have more than 22 items.
Is there a way to use collections instead of tuples? I imagine something like in the following example, which sadly doesn't work:
input.read.mapTo('line -> aLotOfFields) { line: String =>
(1 to 24).map(_.toString)
}.write(output)
actually you can. It's in FAQ - https://github.com/twitter/scalding/wiki/Frequently-asked-questions#what-if-i-have-more-than-22-fields-in-my-data-set
val toFields = (1 to 24).map(f => Symbol("field_" + f)).toList
input
.read
.mapTo('line -> toFields) { line: String =>
new Tuple((1 to 24).map(_.toString).map(_.asInstanceOf[AnyRef]): _*)
}
the last map(_.asInstanceOf[AnyRef]) looks ugly so if you find better solution let me know please.
Wrap your tuples into case classes. It will also make your code more readable and type safe than using tuples and collections respectively.
Im using scala Map#get function, and for every accurate query it returns as Some[String]
IS there an easy way to remove the Some?
Example:
def searchDefs{
print("What Word would you like defined? ")
val selection = readLine
println(selection + ":\n\t" + definitionMap.get(selection))
}
When I use this method and use the following Input:
What Word would you like defined? Ontology
The returned Value is:
Ontology:
Some(A set of representational primitives with which to model a domain of knowledge or discourse.)
I would like to remove the Some() around that.
Any tips?
There are a lot of ways to deal with the Option type. First of all, however, do realize how much better it is to have this instead of a potential null reference! Don't try to get rid of it simply because you are used to how Java works.
As someone else recently stated: stick with it for a few weeks and you will moan each time you have to get back to a language which doesn't offer Option types.
Now as for your question, the simplest and riskiest way is this:
mymap.get(something).get
Calling .get on a Some object retrieves the object inside. It does, however, give you a runtime exception if you had a None instead (for example, if the key was not in your map).
A much cleaner way is to use Option.foreach or Option.map like this:
scala> val map = Map(1 -> 2)
map: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2)
scala> map.get(1).foreach( i => println("Got: " + i))
Got: 2
scala> map.get(2).foreach( i => println("Got: " + i))
scala>
As you can see, this allows you to execute a statement if and only if you have an actual value. If the Option is None instead, nothing will happen.
Finally, it is also popular to use pattern matching on Option types like this:
scala> map.get(1) match {
| case Some(i) => println("Got something")
| case None => println("Got nothing")
| }
Got something
I personally like using .getOrElse(String) and use something like "None" as a default i.e. .getOrElse("None").
I faced similar issue, replaced with .Key() to resolve.
Solution:
definitionMap(selection)
In modern scala you can just map(key)