How do I operate the MapWithState function collectly - scala

I have a spark streaming job, the codes are below there:
val filterActions = userActions.filter(Utils.filterPageType)
val parseAction = filterActions.flatMap(record => ParseOperation.parseMatch(categoryMap, record))
val finalActions = parseAction.filter(record => record.get("invalid") == None)
val userModels = finalActions.map(record => (record("deviceid"), record)).mapWithState(StateSpec.function(stateUpdateFunction))
but all functions can compile smoothly except for the mapWithState function, the return type of ParseOperation.parseMatch(categoryMap, record) is ListBuffer[Map[String, Any]], the error like below:
[INFO] Compiling 9 source files to /Users/spare/project/campaign-project/stream-official-mall/target/classes at 1530404002409
[ERROR] /Users/spare/project/campaign-project/stream-official-mall/src/main/scala/com/shopee/mall/data/OfficialMallTracker.scala:77: error: overloaded method value function with alternatives:
[ERROR] [KeyType, ValueType, StateType, MappedType](mappingFunction: org.apache.spark.api.java.function.Function3[KeyType,org.apache.spark.api.java.Optional[ValueType],org.apache.spark.streaming.State[StateType],MappedType])org.apache.spark.streaming.StateSpec[KeyType,ValueType,StateType,MappedType] <and>
[ERROR] [KeyType, ValueType, StateType, MappedType](mappingFunction: org.apache.spark.api.java.function.Function4[org.apache.spark.streaming.Time,KeyType,org.apache.spark.api.java.Optional[ValueType],org.apache.spark.streaming.State[StateType],org.apache.spark.api.java.Optional[MappedType]])org.apache.spark.streaming.StateSpec[KeyType,ValueType,StateType,MappedType] <and>
[ERROR] [KeyType, ValueType, StateType, MappedType](mappingFunction: (KeyType, Option[ValueType], org.apache.spark.streaming.State[StateType]) => MappedType)org.apache.spark.streaming.StateSpec[KeyType,ValueType,StateType,MappedType] <and>
[ERROR] [KeyType, ValueType, StateType, MappedType](mappingFunction: (org.apache.spark.streaming.Time, KeyType, Option[ValueType], org.apache.spark.streaming.State[StateType]) => Option[MappedType])org.apache.spark.streaming.StateSpec[KeyType,ValueType,StateType,MappedType]
[ERROR] cannot be applied to ((Any, Map[String,Any], org.apache.spark.streaming.State[Map[String,Any]]) => Some[Map[String,Any]])
[ERROR] val userModels = finalActions.map(record => (record("deviceid"), record)).mapWithState(StateSpec.function(stateUpdateFunction))
[ERROR] ^
[ERROR] one error found
what caused the issue? How do I modify the code?

I had fixed it, it caused StateSpec.function(stateUpdateFunction)) required the type of input parameter is Map[String, Any], before calling it ,I used the map function, the code is below:
val finalActions = parseAction.filter(record => record.get("invalid") == None).map(Utils.parseFinalRecord)
val parseFinalRecord = (record: Map[String, Any]) => {
val recordMap = collection.mutable.Map(record.toSeq: _*)
logger.info(s"recordMap: ${recordMap}")
recordMap.toMap
}
it works!

Related

java.lang.NumberFormatException: For input string: "505c621128f97f31c5870f2a9e2d274fa432bd0

Running the sbt test, I've got the following error message:
error] java.lang.NumberFormatException: For input string: "505c621128f97f31c5870f2a9e2d274fa432bd0e"
[error] at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
[error] at java.lang.Long.parseLong(Long.java:589)
[error] at java.lang.Long.parseLong(Long.java:631)
[error] at scala.collection.immutable.StringLike.toLong(StringLike.scala:305)
[error] at scala.collection.immutable.StringLike.toLong$(StringLike.scala:305)
[error] at scala.collection.immutable.StringOps.toLong(StringOps.scala:29)
[error] at sbt.TestStatus$.$anonfun$read$1(TestStatusReporter.scala:42)
[error] at sbt.TestStatus$.$anonfun$read$1$adapted(TestStatusReporter.scala:42)
[error] at scala.collection.Iterator.foreach(Iterator.scala:937)
[error] at scala.collection.Iterator.foreach$(Iterator.scala:937)
[error] at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
[error] at sbt.TestStatus$.read(TestStatusReporter.scala:42)
[error] at sbt.TestStatusReporter.succeeded$lzycompute(TestStatusReporter.scala:20)
[error] at sbt.TestStatusReporter.succeeded(TestStatusReporter.scala:20)
[error] at sbt.TestStatusReporter.doComplete(TestStatusReporter.scala:31)
[error] at sbt.TestFramework$.$anonfun$createTestTasks$7(TestFramework.scala:240)
[error] at sbt.TestFramework$.$anonfun$createTestTasks$7$adapted(TestFramework.scala:240)
[error] at sbt.TestFramework$.$anonfun$safeForeach$1(TestFramework.scala:150)
[error] at sbt.TestFramework$.$anonfun$safeForeach$1$adapted(TestFramework.scala:149)
[error] at scala.collection.Iterator.foreach(Iterator.scala:937)
[error] at scala.collection.Iterator.foreach$(Iterator.scala:937)
[error] at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
[error] at scala.collection.IterableLike.foreach(IterableLike.scala:70)
[error] at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
[error] at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
[error] at sbt.TestFramework$.safeForeach(TestFramework.scala:149)
[error] at sbt.TestFramework$.$anonfun$createTestTasks$1(TestFramework.scala:226)
[error] at sbt.Tests$.$anonfun$testTask$1(Tests.scala:231)
[error] at sbt.Tests$.$anonfun$testTask$1$adapted(Tests.scala:231)
[error] at sbt.std.TaskExtra$$anon$1.$anonfun$fork$2(TaskExtra.scala:110)
[error] at sbt.std.Transform$$anon$3.$anonfun$apply$2(System.scala:46)
[error] at sbt.std.Transform$$anon$4.work(System.scala:67)
[error] at sbt.Execute.$anonfun$submit$2(Execute.scala:269)
[error] at sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:16)
[error] at sbt.Execute.work(Execute.scala:278)
[error] at sbt.Execute.$anonfun$submit$1(Execute.scala:269)
[error] at sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:178)
[error] at sbt.CompletionService$$anon$2.call(CompletionService.scala:37)
[error] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[error] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error] at java.lang.Thread.run(Thread.java:748)
[error] java.lang.NumberFormatException: For input string: "505c621128f97f31c5870f2a9e2d274fa432bd0e"
[info] ScalaTest
[info] Run completed in 522 milliseconds.
[info] Total number of tests run: 3
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 3, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 3, Failed 0, Errors 0, Passed 3
As you can see, all test passed. What is wrong? Hint, I am using Intellij.
Update
Here is the code:
import atto._
import Atto._
import cats._
import cats.implicits._
sealed trait PcpPair
case class PcpHead(key: String, value: String) extends PcpPair
case class PcpFieldValue(field: String, value: String) extends PcpPair
case class Pcp(head: List[PcpHead], fv: List[PcpFieldValue], body: String)
object PcpProtocol {
implicit val pcpProtocol: Protocol[Pcp] = new Protocol[Pcp] {
override def encode(text: String): ProtocolResult[Pcp] =
doc
.parseOnly(text)
.either
.flatMap { t =>
isValidPcp(t._1) match {
case true => Right(t)
case false => Left("It is not a valid PCP protocol.")
}
}
.map(t => Pcp(filterPcpHead(t._1), filterFieldValue(t._1), t._2))
override def decode(msg: Pcp): String = ???
}
private val PcpValidity = "pcp-"
private val key = stringOf(letter | char('-'))
private val value = stringOf(notChar('\n'))
private val kv = (key <~ char(':')) ~ value
private val kvs = sepBy(kv, char('\n'))
private val doc = (kvs <~ string("\n\n")) ~ takeText
private val isValidPcp: List[(String, String)] => Boolean = textList =>
textList
.map(kv => kv._1.startsWith(PcpValidity))
.foldLeft(true)(_ || _)
private val filterPcpHead: List[(String, String)] => List[PcpHead] = textList =>
textList
.filter(text => text._1.contains(PcpValidity))
.map(text => PcpHead(text._1, text._2))
private val filterFieldValue: List[(String, String)] => List[PcpFieldValue] = textList =>
textList
.filter(text => !text._1.contains(PcpValidity))
.map(text => PcpFieldValue(text._1, text._2))
private val decodeHead: List[PcpHead] => String = heads =>
heads.foldLeft("") { (acc, value) =>
acc |+| value.key |+| ":" |+| value.value |+| "\n"
}
private val decodePair: List[PcpPair] => ((String, PcpPair) => String) => String = pcpList => fnPcp =>
pcpList.foldLeft("")(fnPcp)
}
and here is the test:
class PcpParserSpec extends FunSpec with Matchers {
val valid =
"""pcp-action:MESSAGE
|pcp-channel:apc\:///
|pcp-body-type:text
|PUBLICKEY:THISPK
|TOPIC:SEND
|
|Hello Foo""".stripMargin
describe("Receive message from SAP server") {
it("should contains pcp-channel:apc") {
Protocol.encode(valid) should be('right)
}
it("should be separated by head and body") {
Protocol.encode(valid) match {
case Right(value) => assert(value.head.length > 0)
case Left(text) => assert(text.length > 0)
}
}
describe("The body of the message") {
it("should contains Hello Foo") {
Protocol.encode(valid) match {
case Right(value) => assert(value.body == "Hello Foo")
case Left(text) => assert(text.length > 0)
}
}
}
}
describe("Send message to SAP") {
it("should encode appropriate PCP protocol") {
}
}
}
What you'll probably find is that target/streams/test/test/\$global/streams/succeeded_tests (in your project directory) is getting garbled. (Mine was an encoded mess).
You should be getting something like
#Successful Tests
#Thu Apr 04 07:54:33 NZDT 2019
MyTest=1554317673393
The number it's trying to read is after the =.
The confusing bit (to me) is that it's outputting the summary - mine didn't when it failed parsing.
Before you run the following, could you share (some of) the contents of succeeded_tests (above) in a comment here (I'm curious to see these failures).
sbt clean test should fix you up...

How to pass two values as State in Spark Streaming?

I tried to implement a Spark Streaming application that reads streaming data from Kafka. The streaming data are (key, value) pairs in the form of "String,int", and I want to calculate the average value of each key.
The data is in form as below:
x,20
y,10
z,3
...
I want to measure the average value for each key in a stateful manner. Therefore, I intend to save the sum of value and how many times its corresponding key appears into the State in the mapping function.
def mappingFunc(key: String, value: Option[Double], state: State[Double], count: State[Int]): (String, Double) = {
val sum = value.getOrElse(0.0) + state.getOption.getOrElse(0.0)
val cnt = count.getOption.getOrElse(1) + 1
state.update(sum)
count.update(cnt)
val output = (key, sum/cnt)
output
}
It reminds me there's an error:
[error] /Users/Rabbit/Desktop/KTH_Second_Year/Periods/P1/Data-intensive_Computing/Lab_Assignment/lab3/src/sparkstreaming/KafkaSpark.scala:78: wrong number of type parameters for overloaded method value function with alternatives:
[error] [KeyType, ValueType, StateType, MappedType](mappingFunction: org.apache.spark.api.java.function.Function3[KeyType,org.apache.spark.api.java.Optional[ValueType],org.apache.spark.streaming.State[StateType],MappedType])org.apache.spark.streaming.StateSpec[KeyType,ValueType,StateType,MappedType] <and>
[error] [KeyType, ValueType, StateType, MappedType](mappingFunction: org.apache.spark.api.java.function.Function4[org.apache.spark.streaming.Time,KeyType,org.apache.spark.api.java.Optional[ValueType],org.apache.spark.streaming.State[StateType],org.apache.spark.api.java.Optional[MappedType]])org.apache.spark.streaming.StateSpec[KeyType,ValueType,StateType,MappedType] <and>
[error] [KeyType, ValueType, StateType, MappedType](mappingFunction: (KeyType, Option[ValueType], org.apache.spark.streaming.State[StateType]) => MappedType)org.apache.spark.streaming.StateSpec[KeyType,ValueType,StateType,MappedType] <and>
[error] [KeyType, ValueType, StateType, MappedType](mappingFunction: (org.apache.spark.streaming.Time, KeyType, Option[ValueType], org.apache.spark.streaming.State[StateType]) => Option[MappedType])org.apache.spark.streaming.StateSpec[KeyType,ValueType,StateType,MappedType]
How can I pass the sum of value and the count at the same time in Spark Streaming?
You need to combine the sum and the count as tuple (Double, Int) which is stored in the state. The following snippet should do the trick:
def mappingFunc(key: String, value: Option[Double], state: State[(Double, Int)]): (String, Double) = {
val (sum, cnt) = state.getOption.getOrElse((0.0, 0))
val newSum = value.getOrElse(0.0) + sum
val newCnt = cnt + 1
state.update((newSum, newCnt))
(key, newSum/newCnt)
}

StackOverflow during typeprovider macro expansion

I was playing around a bit with macros and I thought writing a json type provider would be a good start to get a deeper understanding of how all this works, but I hit a weird error that I can't seem to be able to figure out myself. Code is available on GitHub if you want to take a look at the whole stuff: https://github.com/qwe2/json-typeprovider/.
The problematic part is, I tried to make it as typesafe as I can, meaning I wanted to implement json arrays in such a way, that indexing into them would return the correct type (as a subsequent macro invocation). relevant methods of the code:
json to Tree method:
def jsonToTpe(value: JValue): Option[Tree] = value match {
case JNothing => None
case JNull => None
case JString(s) => Some(q"$s")
case JDouble(d) => Some(q"$d")
case JDecimal(d) => Some(q"scala.BigDecimal(${d.toString})")
case JInt(i) => Some(q"scala.BigInt(${i.toByteArray})")
case JLong(l) => Some(q"$l")
case JBool(b) => Some(q"$b")
case JArray(arr) =>
val arrTree = arr.flatMap(jsonToTpe)
val clsName = c.freshName[TypeName](TypeName("harraycls"))
val hArray =
q"""
class $clsName {
#_root_.com.example.json.provider.body(scala.Array[Any](..$arrTree))
def apply(i: Int): Any = macro _root_.com.example.json.provider.DelegatedMacros.arrApply_impl
#_root_.com.example.json.provider.body(scala.Array[Any](..$arrTree))
def toArray: scala.Array[Any] = macro _root_.com.example.json.provider.DelegatedMacros.selectField_impl
}
new $clsName {}
"""
Some(hArray)
case JSet(set) => Some(q"scala.Set(..${set.flatMap(jsonToTpe)})")
case JObject(fields) =>
val fs = fields.flatMap { case (k, v) =>
jsonToTpe(v).map(v => q"""
#_root_.com.example.json.provider.body($v) def ${TermName(k)}: Any =
macro _root_.com.example.json.provider.DelegatedMacros.selectField_impl""")
}
val clsName = c.freshName[TypeName](TypeName("jsoncls"))
Some(q"""
class $clsName {
..$fs
}
new $clsName {}
""")
}
reading the annotation:
class body(tree: Any) extends StaticAnnotation
def arrApply_impl(c: whitebox.Context)(i: c.Expr[Int]): c.Tree = {
import c.universe._
def bail(msg: String): Nothing = {
c.abort(c.enclosingPosition, msg)
}
def error(msg: String): Unit = {
c.error(c.enclosingPosition, msg)
}
val arrValue = selectField_impl(c)
val arrElems = arrValue match {
case q"scala.Array.apply[$tpe](..$elems)($cls)" => elems
case _ => bail("arr needs to be an array of constants")
}
val idx = i.tree match {
case Literal(Constant(ix: Int)) => ix
case _ => bail(s"i needs to be a constant Int, got ${showRaw(i.tree)}")
}
arrElems(idx)
}
def selectField_impl(c: whitebox.Context) : c.Tree = {
c.macroApplication.symbol.annotations.filter(
_.tree.tpe <:< c.typeOf[body]
).head.tree.children.tail.head
}
As you can see the way I tried to do it was, basically shove the actual array into a static annotation and when indexing it, I would dispatch that to another macro that can figure out the type. I got the idea from reading about vampire methods.
This is the json I'm trying to parse:
[
{"id": 1},
{"id": 2}
]
And this is how I invoke it:
val tpe3 = TypeProvider("arrayofobj.json")
println(tpe3.toArray.mkString(", "))
Reading an array of ints or an object of primitive fields works as expected but an array of objects throws a stackoverflow during compilation:
[error] /home/isti/projects/json-typeprovider/core/src/main/scala/com/example/Hello.scala:7:14: Internal error: unable to find the outer accessor symbol of object Hello
[error] object Hello extends App {
[error] ^
[error] ## Exception when compiling 1 sources to /home/isti/projects/json-typeprovider/core/target/scala-2.12/classes
[error] null
[error] java.lang.String.valueOf(String.java:2994)
[error] scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
[error] scala.collection.TraversableOnce.$anonfun$addString$1(TraversableOnce.scala:359)
[error] scala.collection.immutable.List.foreach(List.scala:389)
[error] scala.collection.TraversableOnce.addString(TraversableOnce.scala:357)
[error] scala.collection.TraversableOnce.addString$(TraversableOnce.scala:353)
[error] scala.collection.AbstractTraversable.addString(Traversable.scala:104)
[error] scala.collection.TraversableOnce.mkString(TraversableOnce.scala:323)
[error] scala.collection.TraversableOnce.mkString$(TraversableOnce.scala:322)
[error] scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
[error] scala.collection.TraversableOnce.mkString(TraversableOnce.scala:325)
[error] scala.collection.TraversableOnce.mkString$(TraversableOnce.scala:325)
[error] scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
[error] scala.collection.TraversableOnce.mkString(TraversableOnce.scala:327)
[error] scala.collection.TraversableOnce.mkString$(TraversableOnce.scala:327)
[error] scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
[error] xsbt.DelegatingReporter$.makePosition$1(DelegatingReporter.scala:89)
[error] xsbt.DelegatingReporter$.convert(DelegatingReporter.scala:94)
[error] xsbt.DelegatingReporter.info0(DelegatingReporter.scala:125)
[error] xsbt.DelegatingReporter.info0(DelegatingReporter.scala:102)
[error] scala.reflect.internal.Reporter.error(Reporting.scala:84)
[error] scala.reflect.internal.Reporting.globalError(Reporting.scala:69)
[error] scala.reflect.internal.Reporting.globalError$(Reporting.scala:69)
[error] scala.reflect.internal.SymbolTable.globalError(SymbolTable.scala:16)
[error] scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.outerSelect(ExplicitOuter.scala:235)
[error] scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.outerPath(ExplicitOuter.scala:267)
[error] scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.outerPath(ExplicitOuter.scala:267)
[error] scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.outerPath(ExplicitOuter.scala:267)
[error] scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.outerPath(ExplicitOuter.scala:267)
[error] scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.outerPath(ExplicitOuter.scala:267)
[error] scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.outerPath(ExplicitOuter.scala:267)
[error] scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.outerPath(ExplicitOuter.scala:267)
[error] scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.outerPath(ExplicitOuter.scala:267)
Edit: that's only the top of the stacktrace, there are a lot more of scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.outerPath(ExplicitOuter.scala:267).

Scala / Play 2.5: Overloaded method apply with alternatives

I am using Scala & Play 2.5. I am stuck with this error:
Game.scala:99: overloaded method value apply with alternatives:
[error] (block: => play.api.mvc.Result)play.api.mvc.Action[play.api.mvc.AnyContent] <and>
[error] (block: play.api.mvc.Request[play.api.mvc.AnyContent] => play.api.mvc.Result)play.api.mvc.Action[play.api.mvc.AnyContent] <and>
[error] [A](bodyParser: play.api.mvc.BodyParser[A])(block: play.api.mvc.Request[A] => play.api.mvc.Result)play.api.mvc.Action[A]
[error] cannot be applied to (Object)
[error] def start(id: String, apiKey: Option[String]) = Action {
This is the function:
def start(id: String, apiKey: Option[String]) = Action {
apiKey match {
case Some(API_KEY) => {
Server.actor ! Server.Start(id)
Ok("Started")
}
case _ => Future.successful(Unauthorized)
}
}
The problem is, the result of the match statement has been inferred to be Object, since from one case statement you're returning Result, and from the other you're returning Future[Result], so the only common super type is Object. To fix, change case _ => Future.successful(Unauthorized) to case _ => Unauthorized.

Scala map with partial function as value

In Twitter's Scala school collections section, they show a Map with a partial function as a value:
// timesTwo() was defined earlier.
def timesTwo(i: Int): Int = i * 2
Map("timesTwo" -> timesTwo(_))
If I try to compile this with Scala 2.9.1 and sbt I get the following:
[error] ... missing parameter type for expanded function ((x$1) => "timesTwo".$minus$greater(timesTwo(x$1)))
[error] Map("timesTwo" -> timesTwo(_))
[error] ^
[error] one error found
If I add the parameter type:
Map("timesTwo" -> timesTwo(_: Int))
I then get the following compiler error:
[error] ... type mismatch;
[error] found : Int => (java.lang.String, Int)
[error] required: (?, ?)
[error] Map("timesTwo" -> timesTwo(_: Int))
[error] ^
[error] one error found
I'm stumped. What am I missing?
It thinks you want to do this:
Map((x: Int) => "timesTwo".->timesTwo(x))
When you want this:
Map("timesTwo" -> { (x: Int) => timesTwo(x) })
So this works:
Map( ("timesTwo", timesTwo(_)) )
Map("timesTwo" -> { timesTwo(_) })
Note this is not an usual error, see
https://stackoverflow.com/a/7695459/257449.
Scala underscore - ERROR: missing parameter type for expanded function
(and probably more)
You are missing telling scalac that you want to lift the method timesTwo into a function. This can be done with an underscore as follows
scala> Map("timesTwo" -> timesTwo _)
res0: scala.collection.immutable.Map[java.lang.String,Int => Int] = Map(timesTwo -> <function1>)