DolphinDB error: A metric shouldn't be a constant in reactive state engine - constants

My script is
share streamTable(1:0, `date`time`sym`market`price`qty, [DATE, TIME, SYMBOL, CHAR, DOUBLE, INT]) as trade
outputTable = table(100:0, `date`sym`factor1`flag, [DATE, STRING, DOUBLE, INT])
engine = createReactiveStateEngine(name="test", metrics=[<mavg(price, 3)>, <1>], dummyTable=trade, outputTable=outputTable, keyColumn=["date","sym"], filter=<date between 2012.01.01 : 2012.01.03>, keepOrder=true)
It throws A metric shouldn't be a constant. How to pass a constant to the metircs?

Try using <1*1> instead.

Related

Find count in WindowedStream - Flink

I am pretty new in the world of Streams and I am facing some issues in my first try.
More specifically, I am trying to implement a count and groupBy functionality in a sliding window using Flink.
I 've done it in a normal DateStream but I am not able to make it work in a WindowedStream.
Do you have any suggestion on how can I do it?
val parsedStream: DataStream[(String, Response)] = stream
.mapWith(_.decodeOption[Response])
.filter(_.isDefined)
.map { record =>
(
s"${record.get.group.group_country}, ${record.get.group.group_state}, ${record.get.group.group_city}",
record.get
)
}
val result: DataStream[((String, Response), Int)] = parsedStream
.map((_, 1))
.keyBy(_._1._1)
.sum(1)
// The output of result is
// ((us, GA, Atlanta,Response()), 14)
// ((us, SA, Atlanta,Response()), 4)
result
.keyBy(_._1._1)
.timeWindow(Time.seconds(5))
//the following part doesn't compile
.apply(
new WindowFunction[(String, Int), (String, Int), String, TimeWindow] {
def apply(
key: Tuple,
window: TimeWindow,
values: Iterable[(String, Response)],
out: Collector[(String, Int)]
) {}
}
)
Compilation Error:
overloaded method value apply with alternatives:
[R](function: (String, org.apache.flink.streaming.api.windowing.windows.TimeWindow, Iterable[((String, com.flink.Response), Int)], org.apache.flink.util.Collector[R]) => Unit)(implicit evidence$28: org.apache.flink.api.common.typeinfo.TypeInformation[R])org.apache.flink.streaming.api.scala.DataStream[R] <and>
[R](function: org.apache.flink.streaming.api.scala.function.WindowFunction[((String, com.flink.Response), Int),R,String,org.apache.flink.streaming.api.windowing.windows.TimeWindow])(implicit evidence$27: org.apache.flink.api.common.typeinfo.TypeInformation[R])org.apache.flink.streaming.api.scala.DataStream[R]
cannot be applied to (org.apache.flink.streaming.api.functions.windowing.WindowFunction[((String, com.flink.Response), Int),(String, com.flink.Response),String,org.apache.flink.streaming.api.windowing.windows.TimeWindow]{def apply(key: String,window: org.apache.flink.streaming.api.windowing.windows.TimeWindow,input: Iterable[((String, com.flink.Response), Int)],out: org.apache.flink.util.Collector[(String, com.flink.Response)]): Unit})
.apply(
This is a simpler example that we can work on
val source: DataStream[(JsonField, Int)] = env.fromElements(("hello", 1), ("hello", 2))
val window2 = source
.keyBy(0)
.timeWindow(Time.minutes(1))
.apply(new WindowFunction[(JsonField, Int), Int, String, TimeWindow] {})
I have tried Your code and found the errors, it seems that you have an error when declaring the types for your WindowFunction.
The documentation says that the expected types for WindowFunction are WindowFunction[IN, OUT, KEY, W <: Window]. Now, if you take a look at Your code, Your IN is the type of the datastream that You are calculating windows on. The type of the stream is ((String, Response), Int) and not as declared in the code (String, Int).
If You will change the part that is not compiling to :
.apply(new WindowFunction[((String, Response), Int), (String, Response), String, TimeWindow] {
override def apply(key: String, window: TimeWindow, input: Iterable[((String, Response), Int)], out: Collector[(String, Response)]): Unit = ???
})
EDIT: As for the second example the error occurs because of the same reason in general. When You are using keyBy with Tuple You have two possible functions to use keyBy(fields: Int*), which uses integer to access field of the tuple using index provided (this is what You have used). And also keyBy(fun: T => K) where You provide a function to extract the key that will be used.
But there is one important difference between those functions one of them returns key as JavaTuple and the other one returns the key with its exact type.
So basically If You change the String to Tuple in Your simplified example it should compile clearly.

Could not find implicit value for parameter num: Numeric[Option[Double]]

I am new to Scala and am trying to get to grips with Option. I am trying to
sum the double values members of the following list according to their string keys:
val chmembers = List(("Db",0.1574), ("C",1.003), ("Db",15.4756), ("D",0.003), ("Bb",1.4278), ("D",13.0001))
Summing as:
List((D,13.0031), (Db,15.633000000000001), (C,1.003), (Bb,1.4278))
My current code is
def getClassWeights: List[(Option[String], Option[Double])] = {
chMembers.flatMap(p => Map(p.name -> p.weight))
.groupBy(_._1).mapValues(_.map(_._2)sum).toList
}
However this will not compile and returns:
'Could not find implicit value for parameter num:
Numeric[Option[Double]]'.
I don't understand where 'Numeric' comes from or how handle it.
I would be grateful for any suggestions. Thanks in advance.
Numeric is used to perform the sum. It is a type class which implements some common numeric operations for Int, Float, Double etc.
I am not sure why you want to use Option, I do not think it will help you solving your problem. Your code can be simplified to:
val chmembers = List(("Db",0.1574), ("C",1.003), ("Db",15.4756), ("D",0.003), ("Bb",1.4278), ("D",13.0001))
def getClassWeights: List[(String, Double)] = {
chmembers.groupBy(_._1).mapValues(_.map(_._2).sum).toList
}
you can do it like this:
val chmembers = List(("Db",0.1574), ("C",1.003), ("Db",15.4756), ("D",0.003), ("Bb",1.4278), ("D",13.0001))
chmembers.groupBy(_._1).mapValues(_.map(_._2).sum)
output
//Map(D -> 13.0031, Db -> 15.633000000000001, C -> 1.003, Bb -> 1.4278)

TypedPipe can't coerce strings to DateTime even when given implicit function

I've got a Scalding data flow that starts with a bunch of Pipe separated value files. The first column is a DateTime in a slightly non-standard format. I want to use the strongly typed TypedPipe API, so I've specified a tuple type and a case class to contain the data:
type Input = (DateTime, String, Double, Double, String)
case class LatLonRecord(date : DateTime, msidn : String, lat : Double, lon : Double, cellname : String)
however, Scalding doesn't know how to coerce a String into a DateTime, so I tried adding an implicit function to do the dirty work:
implicit def stringToDateTime(dateStr: String): DateTime =
DateTime.parse(dateStr, DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss.S"))
However, I still get a ClassCastException:
val lines: TypedPipe[Input] = TypedPipe.from(TypedPsv[Input](args("input")))
lines.map(x => x._1).dump
//cascading.flow.FlowException: local step failed at java.lang.Thread.run(Thread.java:745)
//Caused by: cascading.pipe.OperatorException: [FixedPathTypedDelimite...][com.twitter.scalding.RichPipe.eachTo(RichPipe.scala:509)] operator Each failed executing operation
//Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.joda.time.DateTime
What do I need to do to get Scalding to call my conversion function?
So I ended up doing this:
case class LatLonRecord(date : DateTime, msisdn : String, lat : Double, lon : Double, cellname : String)
object dateparser {
val format = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss.S")
def parse(s : String) : DateTime = DateTime.parse(s,format);
}
//changed first column to a String, yuck
type Input = (String, String, Double, Double, String)
val lines: TypedPipe[Input] = TypedPipe.from( TypedPsv[Input]( args("input")) )
val recs = lines.map(v => LatLonRecord(dateparser.parse(v._1), v._2, v._3,v._4, v._5))
But I feel like its a sub-optimal solution. I welcome better answers from people who have been using Scala for more than, say, 1 week, like me.

Spark Streaming groupByKey and updateStateByKey implementation

I am trying to run stateful Spark Streaming computations over (fake) apache web server logs read from Kafka. The goal is to "sessionize" the web traffic similar to this blog post
The only difference is that I want to "sessionize" each page the IP hits, instead of the entire session. I was able to do this reading from a file of fake web traffic using Spark in batch mode, but now I want to do it in a streaming context.
Log files are read from Kafka and parsed into K/V pairs of (String, (String, Long, Long)) or
(IP, (requestPage, time, time)).
I then call groupByKey() on this K/V pair. In batch mode, this would produce a:
(String, CollectionBuffer((String, Long, Long), ...) or
(IP, CollectionBuffer((requestPage, time, time), ...)
In a StreamingContext, it produces a:
(String, ArrayBuffer((String, Long, Long), ...) like so:
(183.196.254.131,ArrayBuffer((/test.php,1418849762000,1418849762000)))
However, as the next microbatch (DStream) arrives, this information is discarded.
Ultimately what I want is for that ArrayBuffer to fill up over time as a given IP continues to interact and to run some computations on its data to "sessionize" the page time.
I believe the operator to make that happen is "updateStateByKey." I'm having some trouble with this operator (I'm new to both Spark & Scala);
any help is appreciated.
Thus far:
val grouped = ipTimeStamp.groupByKey().updateStateByKey(updateGroupByKey)
def updateGroupByKey(
a: Seq[(String, ArrayBuffer[(String, Long, Long)])],
b: Option[(String, ArrayBuffer[(String, Long, Long)])]
): Option[(String, ArrayBuffer[(String, Long, Long)])] = {
}
I think you are looking for something like this:
def updateGroupByKey(
newValues: Seq[(String, ArrayBuffer[(String, Long, Long)])],
currentValue: Option[(String, ArrayBuffer[(String, Long, Long)])]
): Option[(String, ArrayBuffer[(String, Long, Long)])] = {
//Collect the values
val buffs: Seq[ArrayBuffer[(String, Long, Long)]] = (for (v <- newValues) yield v._2)
val buffs2 = if (currentValue.isEmpty) buffs else currentValue.get._2 :: buffs
//Convert state to buffer
if (buffs2.isEmpty) None else {
val key = if (currentValue.isEmpty) newValues(0)._1 else currentValue.get._1
Some((key, buffs2.foldLeft(new ArrayBuffer[(String, Long, Long)])((v, a) => v++a)))
}
}
Gabor's answer got me started down the right path, but here is an answer that produces the expected output.
First, for the output I want:
(100.40.49.235,List((/,1418934075000,1418934075000), (/,1418934105000,1418934105000), (/contactus.html,1418934174000,1418934174000)))
I don't need groupByKey(). updateStateByKey already accumulates the values into a Seq, so the addition of groupByKey is unnecessary (and expensive). Spark users strongly suggest not using groupByKey.
Here is the code that worked:
def updateValues( newValues: Seq[(String, Long, Long)],
currentValue: Option[Seq[ (String, Long, Long)]]
): Option[Seq[(String, Long, Long)]] = {
Some(currentValue.getOrElse(Seq.empty) ++ newValues)
}
val grouped = ipTimeStamp.updateStateByKey(updateValues)
Here updateStateByKey is passed a function (updateValues) that has the accumulation of values over time (newValues) as well as an option for the current value in the stream (currentValue). It then returns the combination of these.getOrElse is required as currentValue may occasionally be empty. Credit to https://twitter.com/granturing for the correct code.

Retrieving a list of tuples with the Scala anorm stream API

When I read in the Play! docs I find a way to parse the result to a List[(String, String)]
Like this:
// Create an SQL query
val selectCountries = SQL("Select * from Country")
// Transform the resulting Stream[Row] as a List[(String,String)]
val countries = selectCountries().map(row =>
row[String]("code") -> row[String]("name")
).toList
I want to do this, but my tuple will contain more data.
I'm doing like this:
val getObjects = SQL("SELECT a, b, c, d, e, f, g FROM table")
val objects = getObjects().map(row =>
row[Long]("a") -> row[Long]("b") -> row[Long]("c") -> row[String]("d") -> row[Long]("e") -> row[Long]("f") -> row[String]("g")).toList
Each tuple I get will be in this format, ofcourse, thats what I'm asking for in the code above:
((((((Long, Long), Long), String), Long), Long), String)
But I want this:
(Long, Long, Long, String, Long, Long, String)
What I'm asking is how should I parse the result to generate a tuple like the last one above. I want to do like they do in the documentation to List[(String, String)] but with more data.
Thanks
You are getting ((((((Long, Long), Long), String), Long), Long), String) because of ->, each call to this wraps two elements into a pair. So with each -> you got a tuple, then you took this tuple and made a new one, etc. You need to change arrow with a comma and wrap into ():
val objects = getObjects().map(row =>
(row[Long]("a"), row[Long]("b"), ..., row[String]("g")).toList
But remember that currently tuples can have no more then 22 elements in it.