Constructing phoneVector:
val phoneVector = (
for (i <- 1 until 20) yield {
val p = killNS(r.get("Phone %d - Value" format(i)))
val t = killNS(r.get("Phone %d - Type" format(i)))
if (p == None) None
else
if (t == None) (p,"Main") else (p,t)
}
).filter(_ != None)
Consider this very simple snippet:
for (pTuple <- phoneVector) {
println(pTuple.getClass.getName)
println(pTuple)
//val pKey = pTuple._1.replaceAll("[^\\d]","")
associate() // stub prints "associate"
}
When I run it, I see output like this:
scala.Tuple2
((609) 954-3815,Mobile)
associate
When I uncomment the line with replaceAll(), compile fails:
....scala:57: value _1 is not a member of Product with Serializable
[error] val pKey = pTuple._1.replaceAll("[^\\d]","")
[error] ^
Why does it not recognize pTuple as a Tuple2 and treat it only as Product
OK, this compiles and produces the desired result. But it's too verbose. Can someone please demonstrate a more concise solution for dealing with this typesafe stuff?
for (pTuple <- phoneVector) {
println(pTuple.getClass.getName)
println(pTuple)
val pPhone = pTuple match {
case t:Tuple2[_,_] => t._1
case _ => None
}
val pKey = pPhone match {
case s:String => s.replaceAll("[^\\d]","")
case _ => None
}
println(pKey)
associate()
}
You can do:
for (pTuple <- phoneVector) {
val pPhone = pTuple match {
case (key, value) => key
case _ => None
}
val pKey = pPhone match {
case s:String => s.replaceAll("[^\\d]","")
case _ => None
}
println(pKey)
associate()
}
Or simply phoneVector.map(_._1.replaceAll("[^\\d]",""))
By changing the construction of phoneVector, as wrick's question implied, I've been able to eliminate the match/case stuff because Tuple is assured. Not thrilled by it, but Change is Hard, and Scala seems cool.
Now, it's still possible to slip a None value into either of the Tuple values. My match/case does not check for that, and I suspect that could lead to a runtime error in the replaceAll call. How is that allowed?
def killNS (s:Option[_]) = {
(s match {
case _:Some[_] => s.get
case _ => None
}) match {
case None => None
case "" => None
case s => s
}
}
val phoneVector = (
for (i <- 1 until 20) yield {
val p = killNS(r.get("Phone %d - Value" format(i)))
val t = killNS(r.get("Phone %d - Type" format(i)))
if (t == None) (p,"Main") else (p,t)
}
).filter(_._1 != None)
println(phoneVector)
println(name)
println
// Create the Neo4j nodes:
for (pTuple <- phoneVector) {
val pPhone = pTuple._1 match { case p:String => p }
val pType = pTuple._2
val pKey = pPhone.replaceAll(",.*","").replaceAll("[^\\d]","")
associate(Map("target"->Map("label"->"Phone","key"->pKey,
"dial"->pPhone),
"relation"->Map("label"->"IS_AT","key"->pType),
"source"->Map("label"->"Person","name"->name)
)
)
}
}
Related
I want to avoid Runtime undefined behaivors as follows:
val jsonExample = Json.toJson(0)
jsonExample.asOpt[Instant]
yield Some(1970-01-01T00:00:00Z)
How can I verify this using partial function with a lift or some other way, to check thatits indeed Instant, or how you recommend to validate?
ex1:
val jsonExample = Json.toJson(Instant.now())
jsonExample match { ... }
ex2:
val jsonExample = Json.toJson(0)
jsonExample match { ... }
Examples for desired output:
validateInstant(Json.toJson(Instant.now())) -> return Some(...)
validateInstant(Json.toJson(0)) -> return None
I can do somthing as follows, maybe some other ideas?
Just wanted to add a note regarding parsing json, there some runtime undefined problems when we are trying to parse .asOpt[T]
for example:
Json.toJson("0").asOpt[BigDecimal] // yields Some(0)
Json.toJson(0).asOpt[Instant] // yields Some(1970-01-01T00:00:00Z)
We can validate it as follows or some other way:
Json.toJson("0") match {
case JsString(value) => Some(value)
case _ => None
}
Json.toJson(0) match {
case JsNumber(value) => Some(value)
case _ => None
}
Json.toJson(Instant.now()) match {
case o # JsString(_) => o.asOpt[Instant]
case _ => None
}
You can use Option:
def filterNumbers[T](value: T)(implicit tjs: Writes[T]): Option[Instant] = {
Option(Json.toJson(value)).filter(_.asOpt[JsNumber].isEmpty).flatMap(_.asOpt[Instant])
}
Then the following:
println(filterNumbers(Instant.now()))
println(filterNumbers(0))
will output:
Some(2021-02-22T10:35:13.777Z)
None
Assume I have a collection of c: List[T] sorted.
And I need to make aggregation with foldLeft/foldRight which outputs (List[T],List[T],List[T]), sorted.
To complete this, I have 2 possibilities:
foldLeft, and then reverse each one of the lists (use ::, otherwise its O(n) each step)
foldRight and remain in the same order (use ::)
I know that foldLeft is tailrec implemented and optimized, but making O(n) more steps to reverse the collections, in this use case - which method will generate me better performance?
Assume that the op is O(1)
Example of the code:
def analyzePayload(payload: JsObject, actionRules: List[ActionRule]): InspectorReport = {
val analyzeMode = appConfig.analyzeMode
val (succeed, notApplicable, failed) = actionRules.foldLeft((List[RuleResult](), List[RuleResult](), List[RuleResult]())) { case ( seed # (succeed, notApplicable, failed), actionRule) =>
// Evaluate Single ActionRule
def step(): (List[RuleResult], List[RuleResult], List[RuleResult]) = actionRuleService.eval(actionRule, payload) match {
// If the result is succeed
case EvalResult(true, _, _) => (RuleResult.fromSucceed(actionRule, appConfig.showSucceedAppData) :: succeed, notApplicable, failed)
// If the result is notApplicable
case EvalResult(_, missing #_ :: _, _) => (succeed, RuleResult.fromNotApplicable(actionRule, appConfig.showNotApplicableAppData, missing) :: notApplicable, failed)
// If the result is unmatched
case EvalResult(_, _, unmatched #_ :: _) => (succeed, notApplicable, RuleResult.fromFailed(actionRule, appConfig.showFailedAppData, unmatched) :: failed)
}
analyzeMode match {
case UntilFirstSucceed => if (succeed.isEmpty) step() else seed
case UntilFirstNotApplicable => if (notApplicable.isEmpty) step() else seed
case UntilFirstFailed => if (failed.isEmpty) step() else seed
case Default => step()
case _ => throw new RuntimeException(s"Unknown mode on analyzePayload with mode = ${analyzeMode}")
}
}
InspectorReport(succeed.reverse, notApplicable.reverse, failed.reverse)
}
Before that, I used tailrec which made bad preformance: (why??)
def analyzePayload(payload: JsObject, actionRules: List[ActionRule]): InspectorReport = {
val analyzeMode = appConfig.analyzeMode
def isCompleted(succeed: Int, notApplicable: Int, failed: Int) = {
analyzeMode match {
case Default => false
case UntilFirstSucceed => if (succeed == 0) false else true
case UntilFirstNotApplicable => if (notApplicable == 0) false else true
case UntilFirstFailed => if (failed == 0) false else true
}
}
#tailrec
def analyzePayloadRec(rules: List[ActionRule])(succeed: List[RuleResult], notApplicable: List[RuleResult], failed: List[RuleResult]): (List[RuleResult], List[RuleResult], List[RuleResult]) = {
if (isCompleted(succeed.size, notApplicable.size, failed.size)) (succeed, notApplicable, failed)
else rules match {
// Base cases:
case Nil => (succeed, notApplicable, failed)
// Evaluate case:
case nextRule :: tail =>
actionRuleService.eval(nextRule, payload) match {
// If the result is succeed
case EvalResult(true, _, _) => analyzePayloadRec(tail)(RuleResult.fromSucceed(nextRule, appConfig.showSucceedAppData) :: succeed, notApplicable, failed)
// If the result is notApplicable
case EvalResult(_, missing #_ :: _, _) => analyzePayloadRec(tail)(succeed, RuleResult.fromNotApplicable(nextRule, appConfig.showNotApplicableAppData, missing) :: notApplicable, failed)
// If the result is unmatched
case EvalResult(_, _, unmatched #_ :: _) => analyzePayloadRec(tail)(succeed, notApplicable, RuleResult.fromFailed(nextRule, appConfig.showFailedAppData, unmatched) :: failed)
}
}
}
analyzePayloadRec(actionRules.reverse)(Nil, Nil, Nil).toInspectorReport // todo: if the analyzeModes are not Default - consider use Streams for lazy collection
}
Performance:
Until 22:16, this tailrec analyze was in use, then, the above code.
which you can see that generates much better performance - the same data was in use.
The x-axis is timeline and y-axis is time in ms (1.4s)
Any ideas why? tailrec should be faster, that's why I consider using foldRight.
What makes your tailrec version slow is
isCompleted(succeed.size, notApplicable.size, failed.size)
taking the size of a linked list in Scala is linear in the size, and you're computing the size of all three lists (at least two of which you're never using (and you don't use any of them by Default)).
On the nth iteration of the tailrec, you're going to walk n list nodes between the size computations, so n iterations of tailrec is at least O(n^2).
Passing the lists themselves to isCompleted (as you effectively do in the foldLeft version) and checking isEmpty should dramatically speed things up (especially in the Default case).
Assuming we have those case classes:
case class InspectorReport(succeed: List[ActionRule], notApplicable: List[ActionRule], failed: List[ActionRule])
case class ActionRule()
case class EvalResult(success: Boolean, missing: List[String], unmatched: List[String])
And the evaluator:
class ActionRuleService {
def eval(actionRule: ActionRule, payload: JsObject): EvalResult = EvalResult(true, List(), List())
}
val actionRuleService = new ActionRuleService
I would try something like this:
def analyzePayload(payload: JsObject, actionRules: List[ActionRule]): InspectorReport = {
val evaluated = actionRules.map(actionRule => (actionRule, actionRuleService.eval(actionRule, payload)))
val (success, failed) = evaluated.partition(_._2.success)
val (missing, unmatched) = failed.partition(_._2.missing.nonEmpty)
InspectorReport(success.map(_._1), missing.map(_._1), unmatched.map(_._1))
}
Or:
def analyzePayload(payload: JsObject, actionRules: List[ActionRule]): InspectorReport = {
val evaluated = actionRules.map(actionRule => (actionRule, actionRuleService.eval(actionRule, payload)))
val success = evaluated.collect {
case (actionRule, EvalResult(true, _, _)) =>
actionRule
}
val missing = evaluated.collect {
case (actionRule, EvalResult(false, missing, _)) if missing.nonEmpty =>
actionRule
}
val unmatched = evaluated.collect {
case (actionRule, EvalResult(false, missing, unmatched)) if missing.isEmpty && unmatched.nonEmpty =>
actionRule
}
InspectorReport(success, missing, unmatched)
}
val date2 = Option(LocalDate.parse("2017-02-01"))
//date1.compareTo(date2)>=0
case class dummy(val prop:Seq[Test])
case class Test(val s :String)
case class Result(val s :String)
val s = "11,22,33"
val t = Test(s)
val dt =Test("2017-02-06")
val list = dummy(Seq(t))
val list2 = dummy(Seq(dt))
val code = Option("22")
val f = date2.flatMap(c => list2
.prop
.find(d=>LocalDate.parse(d.s)
.compareTo(c)>=0))
.map(_ => Result("Found"))
.getOrElse(Result("Not Found"))
code.flatMap(c => list
.prop
.find(_.s.split(",").contains(c)))
.map(_ => Result("Found"))
.getOrElse(Result("Not Found"))
I want to && the conditions below and return Result("Found")/Result("Not Found")
d=>LocalDate.parse(d.s).compareTo(c)>=0)
_.s.split(",").contains(c)
Is there any possible way to achieve the above .In actual scenerio list and list 2 are Future
I tried to make a more realistic example based on Futures. Here is how I would do it:
val date2 = Option(LocalDate.parse("2017-02-01"))
case class Test(s: String)
case class Result(s: String)
val t = Test("11,22,33")
val dt = Test("2017-02-06")
val code = Option("22")
val f1 = Future(Seq(t))
val f2 = Future(Seq(dt))
// Wait for both futures to finish
val futureResult = Future.sequence(Seq(f1, f2)).map {
case Seq(s1, s2) =>
// Check the first part, this will be a Boolean
val firstPart = code.nonEmpty && s1.exists(_.s.split(",").contains(code.get))
// Check the second part, also a boolean
val secondPart = date2.nonEmpty && s2.exists(d => LocalDate.parse(d.s).compareTo(date2.get) >= 0)
// Do the AND logic you wanted
if (firstPart && secondPart) {
Result("Found")
} else {
Result("Not Found")
}
}
// This is just for testing to see we got the correct result
val result = Await.result(futureResult, Duration.Inf)
println(result)
As an aside, your code and date2 values in your example are Options... If this is true in your production code, then we should do a check first to see if they are both defined. If they are not then there would be no need to continue with the rest of the code:
val futureResult = if (date2.isEmpty || code.isEmpty) {
Future.successful(Result("Not Found"))
} else {
Future.sequence(Seq(f1, f2)).map {
case Seq(s1, s2) =>
val firstPart = s1.exists(_.s.split(",").contains(code.get))
val secondPart = s2.exists(d => LocalDate.parse(d.s).compareTo(date2.get) >= 0)
if (firstPart && secondPart) {
Result("Found")
} else {
Result("Not Found")
}
}
}
Use pattern matching on Option instead of using flatMap
e.g.
val x = Some("20")
x match {
case Some(i) => println(i) //do whatever you want to do with the value. And then return result
case None => Result("Not Found")
}
Looking at what you are trying to do, You would have to use pattern matching twice, that too nested one.
Let's say I have the following tuple
(colType, colDocV)
Where colType is a boolean and colDocV is a String
Depending on those two values, I will apply some chunk of code that applies transformations to a Dataframe.
Now, this code works. However, I am not convinced this is the proper way to write functional programming code.
I don't know which of these 3 approaches will improve the quality of the code and remove all if-if else-else :
Should I apply some kind of design pattern and which one?
Should I use some kind of pattern matching?
Should I use some anonymous function?
if (colDocV) {
val newCol = udf(UDFHashCode.udfHashCode).apply(col(columnName))
dataframe.withColumn(columnName, newCol)
} else if (colType.contains("string") || colType.contains("text")) {
val newCol = udf(Entropy.stringEntropyFunc).apply(col(columnName)).cast(DoubleType)
dataframe.withColumn(columnName, newCol)
} else if (colType.contains("date")) {
val newCol = udf(DateUtils.getTimeAsDoubleFunc).apply(col(columnName)).cast(DoubleType)
dataframe.withColumn(columnName, newCol)
} else if (colType.contains("long")) {
dataframe.withColumn(columnName, dataframe(columnName).cast(DoubleType) )
} else {
dataframe.drop(columnName) //Dropping column that cannot be processed
}
You can do this with a match statement and a bunch of regexps.
val str = ".*(?:string|text).*".r
val date = ".*date.*".r
val long = ".*long.*".r
def col(tuple: (Boolean, String)) = tuple match {
case (true, _) => Some(udf(...))
case (_, str()) => Some(udf(...))
case (_, date()) => Some(udf(...))
case (, long()) => Some(udf(...))
case _ => None
}
col(colType -> colDocv)
.fold(dataframe.drop(columnName)) { newCol =>
dataframe.withColumn(columnName, newCol)
}
According to what I understand from your question following can be a solution using match case
def callUdf(colDocV: String, colType: Boolean, dataframe: DataFrame) = (colDocV, colType) match {
case x if (x._1.contains("string") || x._1.contains("text")) => dataframe.withColumn(columnName, udf(Entropy.stringEntropyFunc).apply(col(columnName)).cast(DoubleType))
case x if (x._1.contains("date")) => dataframe.withColumn(columnName, udf(DateUtils.getTimeAsDoubleFunc).apply(col(columnName)).cast(DoubleType))
case x if (x._1.contains("long")) => dataframe.withColumn(columnName, dataframe(columnName).cast(DoubleType) )
case _ => dataframe.drop(columnName)
}
I'm trying to walk through two arrays of potentially different sizes and compose a new array of randomly selected elements from them (for crossover in a genetic algorithm) (childGeneCount is just the length of the longer array).
In the following code snippet, each gene.toString logs, but my code doesn't seem to execute the last log. What dumb thing am I doing?
val genes = for (i <- 0 to childGeneCount) yield {
val gene = if (Random.nextBoolean()) {
if (i < p1genes.length) {
p1genes(i)
} else {
p2genes(i)
}
} else {
if (i < p2genes.length) {
p2genes(i)
} else {
p1genes(i)
}
}
Logger.debug(gene.toString)
gene
}
Logger.debug("crossover finishing - never gets here??")
New to scala, and would be happy for a slap on the wrist accompanied by a "do it this completely different way instead" if appropriate.
You are right, the problem was with "to" should have been "until". I have changed your code a bit to make it more scala like.
val p1genes = "AGTCTC"
val p2genes = "ATG"
val genePair = p1genes.zipAll(p2genes, None, None)
val matchedGene = for (pair <- genePair) yield {
pair match {
case (p1Gene, None) => p1Gene
case (None, p2Gene) => p2Gene
case (p1Gene, p2Gene) => if (Random.nextBoolean()) p1Gene else p2Gene
}
}
println(matchedGene)
The process is:
First zip two dna sequences into one.
Fill the shorter sequence with None.
Now loop over the zipped sequences and populate the new sequence.
Reworked Tawkir's answer, with cleaner None handling:
val p1genes = "AGTCTC"
val p2genes = "ATG"
val genePair = p1genes.map(Some.apply).zipAll(p2genes.map(Some.apply), None, None)
val matchedGene = genePair.map {
case (Some(p1Gene), None) => p1Gene
case (None, Some(p2Gene)) => p2Gene
case (Some(p1Gene), Some(p2Gene)) => if (Random.nextBoolean()) p1Gene else p2Gene
}
println(matchedGene)
If you want to avoid wrapping the sequence with Some, another solution is to use a character known not to appear in the sequence as a "none" marker:
val p1genes = "AGTCTC"
val p2genes = "ATG"
val none = '-'
val genePair = p1genes.zipAll(p2genes, none, none)
val matchedGene = genePair.map {
case (p1Gene, `none`) => p1Gene
case (`none`, p2Gene) => p2Gene
case (p1Gene, p2Gene) => if (Random.nextBoolean()) p1Gene else p2Gene
}
println(matchedGene)
Pretty sure harry0000's answer is correct: I was using "to" like "until", and am so used to exceptions being thrown loudly that I didn't think to look there!
I ended up switching from for/yield to List.tabulate(childGeneCount){ i => {, which fixed the error probably for the same reason.
Since you asked for possible style improvements, here are two suggested implementations. The first one is less idiomatic, but more performant. The second one is prettier but does some more work.
def crossover[E : ClassTag](a: Array[E], b: Array[E]): Array[E] = {
val (larger, smaller) = if(a.length > b.length) (a, b) else (b, a)
val result = Array.ofDim[E](larger.length)
for(i <- smaller.indices)
result(i) = if(Random.nextBoolean()) larger(i) else smaller(i)
for(i <- smaller.length until larger.length)
result(i) = larger(i)
result
}
def crossoverIdiomatic[E : ClassTag](a: Array[E], b: Array[E]): Array[E] = {
val randomPart = (a zip b).map { case (x,y) => if(Random.nextBoolean()) x else y }
val (larger, smaller) = if(a.length > b.length) (a, b) else (b, a)
randomPart ++ larger.drop(smaller.length)
}
val a = Array("1", "2", "3", "4", "5", "6")
val b = Array("one", "two", "three", "four")
// e.g. output: [one,2,three,4,5,6]
println(crossover(a, b).mkString("[", ",", "]"))
println(crossoverIdiomatic(a, b).mkString("[", ",", "]"))
Note that the E : ClassTag are only there to make the compiler happy about using Array[E], if you only need Int for your work, you can drop all the fancy generics.