aggregateByKey is not updating the value of the initial set - scala

The values in hllTotal are updated, but hllToday remains zero for every key.
Could anyone please help, why hllToday is not getting updated here?
val hllToday: HllSerializable = new HllSerializable(new HLL(13, 5))
val hllTotal: HllSerializable = new HllSerializable(new HLL(13, 5))
val initialSet = (hllToday, hllTotal)
val rdd7 = combinedRdd.map(tuple => {
tuple.segmentId -> (tuple.entryTime, tuple.exitTime, tuple.hll)`
}).aggregateByKey(initialSet)(
(acc: (HllSerializable, HllSerializable), v: (Long, Long, HllSerializable)) => {
if (v._1 == today_ts) {
acc._1.getHll.union(v._3.getHll)
}
acc._2.getHll.union(v._3.getHll)
acc
}, (acc1: (HllSerializable, HllSerializable), acc2: (HllSerializable, HllSerializable)) => {
acc1._1.getHll.union(acc2._1.getHll)
acc1._2.getHll.union(acc2._2.getHll)
acc1
}
)

Apparently this code is never called :
if (v._1 == today_ts) {
acc._1.getHll.union(v._3.getHll)
}
Try to replace if (v._1 == today_ts) with if(true) to see whether hllToday gets updated

Related

Scala functional programming dry run

Could you please help me in understanding the following method:
def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
val getGlobId = udf[String,Seq[GenericRowWithSchema]](genArr => {
val globId: List[String] =
genArr.toList
.filter(_(0) == custDimIndex)
.map(custDim => custDim(1).toString)
globId match {
case Nil => ""
case x :: _ => x
}
})
gaData.withColumn("globalId", getGlobId('customDimensions))
}
The method applies an UDF to to dataframe. The UDF seems intended to extract a single ID from column of type array<struct>, where the first element of the struct is an index, the second one an ID.
You could rewrite the code to be more readable:
def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
val getGlobId = udf((genArr : Seq[Row]) => {
genArr
.find(_(0) == custDimIndex)
.map(_(1).toString)
.getOrElse("")
})
gaData.withColumn("globalId", getGlobId('customDimensions))
}
or even shorter with collectFirst:
def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
val getGlobId = udf((genArr : Seq[Row]) => {
genArr
.collectFirst{case r if(r.getInt(0)==custDimIndex) => r.getString(1)}
.getOrElse("")
})
gaData.withColumn("globalId", getGlobId('customDimensions))
}

Scala Do While Loop Not Ending

I'm new to scala and i'm trying to implement a do while loop but I cannot seem to get it to stop. Im not sure what i'm doing wrong. If someone could help me out that would be great. Its not the best loop I know that but I am new to the language.
Here is my code below:
def mnuQuestionLast(f: (String) => (String, Int)) ={
var dataInput = true
do {
print("Enter 1 to Add Another 0 to Exit > ")
var dataInput1 = f(readLine)
if (dataInput1 == 0){
dataInput == false
} else {
println{"Do the work"}
}
} while(dataInput == true)
}
You're comparing a tuple type (Tuple2[String, Int] in this case) to 0, which works because == is defined on AnyRef, but doesn't make much sense when you think about it. You should be looking at the second element of the tuple:
if (dataInput1._2 == 0)
Or if you want to enhance readability a bit, you can deconstruct the tuple:
val (line, num) = f(readLine)
if (num == 0)
Also, you're comparing dataInput with false (dataInput == false) instead of assigning false:
dataInput = false
Your code did not pass the functional conventions.
The value that the f returns is a tuple and you should check it's second value of your tuple by dataInput1._2==0
so you should change your if to if(dataInput1._2==0)
You can reconstruct your code in a better way:
import util.control.Breaks._
def mnuQuestionLast(f: (String) => (String, Int)) = {
breakable {
while (true) {
print("Enter 1 to Add Another 0 to Exit > ")
f(readLine) match {
case (_, 0) => break()
case (_,1) => println( the work"
case _ => throw new IllegalArgumentException
}
}
}
}

Wait future completion to execute a another one for a sequence

I met a read / write problem this last days and I cannot fix an issue in my test. I have a JSON document based on the following model
package models
import play.api.libs.json._
object Models {
case class Record
(
id : Int,
samples : List[Double]
)
object Record {
implicit val recordFormat = Json.format[Record]
}
}
I have two functions : one to read a record and an another one to update.
case class MongoIO(futureCollection : Future[JSONCollection]) {
def readRecord(id : Int) : Future[Option[Record]] =
futureCollection
.flatMap { collection =>
collection.find(Json.obj("id" -> id)).one[Record]
}
def updateRecord(id : Int, newSample : Double) : Future[UpdateWriteResult]= {
readRecord(id) flatMap { recordOpt =>
recordOpt match {
case None =>
Future { UpdateWriteResult(ok = false, -1, -1, Seq(), Seq(), None, None, None) }
case Some(record) =>
val newRecord =
record.copy(samples = record.samples :+ newSample)
futureCollection
.flatMap { collection =>
collection.update(Json.obj("id" -> id), newRecord)
}
}
}
}
}
Now, I have a List[Future[UpdateWriteResult]] corresponds to a many updates on the document but what I want is that : wait the future is complete to execute the second one then wait the completion of the second to execute the third. I tried to do that with a foldLeft and flatMap like this :
val l : List[Future[UpdateWriteResult]] = ...
println(l.size) // give me 10
l
.foldLeft(Future.successful(UpdateWriteResult(ok = false, -1, -1, Seq(), Seq(), None, None, None))) {
case (cur, next) => cur.flatMap(_ => next)
}
but the document is never updated like excepted : instead to have a document with a samples list of size 10, I got a list of 1 samples. So the read is faster than the write (impression that I have) and also using a combinaison of foldLeft / flatMap seems to do not wait the completion of the current future so how can I fix this issue properly (without Await) ?
Update
val futureCollection = DB.getCollection("foo")
val mongoIO = MongoIO(futureCollection)
val id = 1
val samples = List(1.1, 2.2, 3.3)
val l : List[Future[UpdateWriteResult]] = samples.map(sample => mongoIO.updateRecord(id, sample))
You have to do the foldLeft on the samples:
(mongoIO.updateRecord(id, samples.head) /: samples.tail) {(acc, next) =>
acc.flatMap(_ => mongoIO.updateRecord(id, next))
}
updateRecord is what triggers the future, so you have to make sure not to call it until the previous one finishes.
I took the liberty of making some minor modification to you original sources:
case class MongoIO(futureCollection : Future[JSONCollection]) {
def readRecord(id : Int) : Future[Option[Record]] =
futureCollection.flatMap(_.find(Json.obj("id" -> id)).one[Record])
def updateRecord(id : Int, newSample : Double) : Future[UpdateWriteResult] =
readRecord(id) flatMap {
case None => Future.successful(UpdateWriteResult(ok = false, -1, -1, Nil, Nil, None, None, None))
case Some(record) =>
val newRecord = record.copy(samples = record.samples :+ newSample)
futureCollection.flatMap(_.update(Json.obj("id" -> id), newRecord))
}
}
Then we only need to write a serialized/sequential update:
def update(samples: Seq[Double], id: Int): Future[Unit] = s match {
case sample +: remainingSamples => updateRecord(id, sample).flatMap(_ => update(remainingSamples, id))
case _ => Future.successful(())
}

scopt3 sample script does not compile

I'm trying to use scopt3 in my project but I get compilation errors even for the sample code on scopt3 Github page:
val parser = new scopt.OptionParser[Config]("scopt") {
head("scopt", "3.x")
opt[Int]('f', "foo") action { (x, c) =>
c.copy(foo = x) } text("foo is an integer property")
opt[File]('o', "out") required() valueName("<file>") action { (x, c) =>
c.copy(out = x) } text("out is a required file property")
opt[(String, Int)]("max") action { case ((k, v), c) =>
c.copy(libName = k, maxCount = v) } validate { x =>
if (x._2 > 0) success else failure("Value <max> must be >0")
} keyValueName("<libname>", "<max>") text("maximum count for <libname>")
opt[Seq[File]]('j', "jars") valueName("<jar1>,<jar2>...") action { (x,c) =>
c.copy(jars = x) } text("jars to include")
opt[Map[String,String]]("kwargs") valueName("k1=v1,k2=v2...") action { (x, c) =>
c.copy(kwargs = x) } text("other arguments")
opt[Unit]("verbose") action { (_, c) =>
c.copy(verbose = true) } text("verbose is a flag")
opt[Unit]("debug") hidden() action { (_, c) =>
c.copy(debug = true) } text("this option is hidden in the usage text")
note("some notes.\n")
help("help") text("prints this usage text")
arg[File]("<file>...") unbounded() optional() action { (x, c) =>
c.copy(files = c.files :+ x) } text("optional unbounded args")
cmd("update") action { (_, c) =>
c.copy(mode = "update") } text("update is a command.") children(
opt[Unit]("not-keepalive") abbr("nk") action { (_, c) =>
c.copy(keepalive = false) } text("disable keepalive"),
opt[Boolean]("xyz") action { (x, c) =>
c.copy(xyz = x) } text("xyz is a boolean property"),
checkConfig { c =>
if (c.keepalive && c.xyz) failure("xyz cannot keep alive") else success }
)
}
errors are:
1) not found type Config
I thougth it was com.typesafe.config.Config, but when I import I get "velue copy is not a member of com.typesafe.config.Config". Where does Config come from?
2) not found value foo
All arguments to .copy() method are marked as "not found values" (I suppose is due to the previous error on Config)
I'm on scala 2.11.6 / SBT 0.13.8
Any help?
scopt has nothing to do with typesafe-config at all.
If you carefully read the README, you will notice that Config is defined right before the code you paste here:
case class Config(foo: Int = -1, out: File = new File("."), xyz: Boolean = false,
libName: String = "", maxCount: Int = -1, verbose: Boolean = false, debug: Boolean = false,
mode: String = "", files: Seq[File] = Seq(), keepalive: Boolean = false,
jars: Seq[File] = Seq(), kwargs: Map[String,String] = Map())
It is merely an example of an object that holds all parameters of a program.
The idea is you parametrise OptionParser with any configuration object you wish. E.g.
case class Foo(name: String)
val parser = new scopt.OptionParser[Foo]("foo") {
opt[String]('n', "name") action { (n, foo) => foo.copy(name = n) }
}
val res = parser.parse(args, Foo("default"))

Value * is not a member of AnyVal

This is a fold that I wrote and I get this error:
Error:(26, 42) value * is not a member of AnyVal
(candE.intersect(candR), massE * massR)
^
allAssignmentsTable is a List[Map[Set[Candidate[A]],Double]]
val allAssignmentsTable = hypothesis.map(h => {
allAssignments.map(copySet => {
if(h.getAssignment.keySet.contains(copySet))
(copySet -> h.getAssignment(copySet))
else
(copySet -> 0.0)
}).toMap
})
val aggregated = allAssignmentsTable.foldLeft(initialFold) { (res,element) =>
val allIntersects = element.map {
case (candE, massE) =>
res.map {
case (candR, massR) => candE.intersect(candR), massE * massR
}.toList
}.toList.flatten
val normalizer = allIntersects.groupBy(_._1).filter(_._1.size == 0).map {
case(key, value) => value.foldLeft(0.0)((e,i) => i._2 + e)
}.head
allIntersects.groupBy(_._1).map {
case(key, value) => key -> value.foldLeft(0.0)((e,i) => i._2 + e)
}
}
if I do this: case(candE, massE:Double) then I won't get an error but I will get exception in match.
The problem that you get here:
val aggregated = allAssignmentsTable.foldLeft(initialFold) { (res,element) =>
val allIntersects = element.map {
case (candE, massE) =>
res.map {
case (candR, massR) => candE.intersect(candR), massE * massR
}.toList
}.toList.flatten
is most probably arising from the previous code block:
val allAssignmentsTable = hypothesis.map(h => {
allAssignments.map(copySet => {
if(h.getAssignment.keySet.contains(copySet))
(copySet -> h.getAssignment(copySet))
else
(copySet -> 0.0)
}).toMap
})
My hypothesis is that h.getAssignment(copySet) returns something else instead of Double (which seems to be confirmed by the error message quoted in the OP - (26, 42)etc, neither of these two values look like it is a Double. Therefore, allAssignmentsTable undercover is probably not List[Map[Set[Candidate[A]],Double]] but something else e.g. it has Any instead of Double, therefore operator * cannot be applied.