Have assertion error return respective list position instead of the object? - scala

test("Comparison") {
val list: List[String] = List("Thing", "Entity", "Variable")
val expected: List[String] = List("Thingg", "Entityy", "Variablee")
var expectedPosition = 0
for (item <- list) {
assert(list(expectedPosition) == expected(expectedPosition))
expectedPosition += 1
}
}
In Scala, in order to make my low-level code tests more readable, I thought it would be a good idea to just use one assertion and have it iterate through a loop and increase an accumulator at the end.
This is to test more than one input at a time when the multiple inputs would more or less have similar attributes. When the assertion fails, it comes back as "Thing[]" did not equal "Thing[g]". Instead of it reporting the item in the list that failed, is there a way to get it to directly state the list position without concatenating the list position before the assertion or using a conditional statement that returns the list position? Like I would rather keep it all contained within the assert() error report.

val expected: LazyList[String] = ...
expected.zip(list)
.zipWithIndex
.foreach{case ((exp,itm),idx) =>
assert(exp == itm, s"off at index $idx")
}
//java.lang.AssertionError: assertion failed: off at index 0
expected is lazy so that it is traversed only once even though it gets zipped twice.

Related

Return keyword inside the inline function in Scala

I've heard about to not use Return keyword in Scala, because it might change the flow of the program like;
// this will return only 2 because of return keyword
List(1, 2, 3).map(value => return value * 2)
Here is my case; I've recursive case class, and using DFS to do some calculation on it. So, maximum depth could be 5. Here is the model;
case class Line(
lines: Option[Seq[Line]],
balls: Option[Seq[Ball]],
op: Option[String]
)
I'm using DFS approach to search this recursive model. But at some point, if a special value exist in the data, I want to stop iterating over the data left and return the result directly instead. Here is an example;
Line(
lines = Some(Seq(
Line(None, Some(Seq(Ball(1), Ball(3))), Some("s")),
Line(None, Some(Seq(Ball(5), Ball(2))), Some("d")),
Line(None, Some(Seq(Ball(9))), None)
)),
balls = None,
None
)
In this data, I want to return as like "NOT_OKAY" if I run into the Ball(5), which means I do not need to any operation on Ball(2) and Ball(9) anymore. Otherwise, I will apply a calculation to the each Ball(x) with the given operator.
I'm using this sort of DFS method;
def calculate(line: Line) = {
// string here is the actual result that I want, Boolean just keeps if there is a data that I don't want
def dfs(line: Line): (String, Boolean) = {
line.balls.map{_.map { ball =>
val result = someCalculationOnBall(ball)
// return keyword here because i don't want to iterate values left in the balls
if (result == "NOTREQUIRED") return ("NOT_OKAY", true)
("OKAY", false)
}}.getOrElse(
line.lines.map{_.map{ subLine =>
val groupResult = dfs(subLine)
// here is I'm using return because I want to return the result directly instead of iterating the values left in the lines
if (groupResult._2) return ("NOT_OKAY", true)
("OKAY", false)
}}
)
}
.... rest of the thing
}
In this case, I'm using return keyword in the inline functions, and change the behaviour of the inner map functions completely. I've just read somethings about not using return keyword in Scala, but couldn't make sure this will create a problem or not. Because in my case, I don't want to do any calculation if I run into a value that I don't want to see. Also I couldn't find the functional way to get rid of return keyword.
Is there any side effect like stack exception etc. to use return keyword here? I'm always open to the alternative ways. Thank you so much!

Creating Seq after waiting for all results from map/foreach in Scala

I am trying to loop over inputs and process them to produce scores.
Just for the first input, I want to do some processing that takes a while.
The function ends up returning just the values from the 'else' part. The 'if' part is done executing after the function returns the value.
I am new to Scala and understand the behavior but not sure how to fix it.
I've tried inputs.zipWithIndex.map instead of foreach but the result is the same.
def getscores(
inputs: inputs
): Future[Seq[scoreInfo]] = {
var scores: Seq[scoreInfo] = Seq()
inputs.zipWithIndex.foreach {
case (f, i) => {
if (i == 0) {
// long operation that returns Future[Option[scoreInfo]]
getgeoscore(f).foreach(gso => {
gso.foreach(score => {
scores = scores.:+(score)
})
})
} else {
scores = scores.:+(
scoreInfo(
id = "",
score = 5
)
)
}
}
}
Future {
scores
}
}
For what you need, I would drop the mutable variable and replace foreach with map to obtain an immutable list of Futures and recover to handle exceptions, followed by a sequence like below:
def getScores(inputs: Inputs): Future[List[ScoreInfo]] = Future.sequence(
inputs.zipWithIndex.map{ case (input, idx) =>
if (idx == 0)
getGeoScore(input).map(_.getOrElse(defaultScore)).recover{ case e => errorHandling(e) }
else
Future.successful(ScoreInfo("", 5))
})
To capture/print the result, one way is to use onComplete:
getScores(inputs).onComplete(println)
The part your missing is understanding a tricky element of concurrency, and that is that the order of execution when using multiple futures is not guaranteed.
If your block here is long running, it will take a while before appending the score to scores
// long operation that returns Future[Option[scoreInfo]]
getgeoscore(f).foreach(gso => {
gso.foreach(score => {
// stick a println("here") in here to see what happens, for demonstration purposes only
scores = scores.:+(score)
})
})
Since that executes concurrently, your getscores function will also simultaneously continue its work iterating over the rest of inputs in your zipWithindex. This iteration, especially since it's trivial work, likely finishes well before the long-running getgeoscore(f) completes the execution of the Future it scheduled, and the code will exit the function, moving on to whatever code is next after you called getscores
val futureScores: Future[Seq[scoreInfo]] = getScores(inputs)
futureScores.onComplete{
case Success(scoreInfoSeq) => println(s"Here's the scores: ${scoreInfoSeq.mkString(",")}"
}
//a this point the call to getgeoscore(f) could still be running and finish later, but you will never know
doSomeOtherWork()
Now to clean this up, since you can run a zipWithIndex on your inputs parameter, I assume you mean it's something like a inputs:Seq[Input]. If all you want to do is operate on the first input, then use the head function to only retrieve the first option, so getgeoscores(inputs.head) , you don't need the rest of the code you have there.
Also, as a note, if using Scala, get out of the habit of using mutable vars, especially if you're working with concurrency. Scala is built around supporting immutability, so if you find yourself wanting to use a var , try using a val and look up how to work with the Scala's collection library to make it work.
In general, that is when you have several concurrent futures, I would say Leo's answer describes the right way to do it. However, you want only the first element transformed by a long running operation. So you can use the future return by the respective function and append the other elements when the long running call returns by mapping the future result:
def getscores(inputs: Inputs): Future[Seq[ScoreInfo]] =
getgeoscore(inputs.head)
.map { optInfo =>
optInfo ++ inputs.tail.map(_ => scoreInfo(id = "", score = 5))
}
So you neither need zipWithIndex nor do you need an additional future or join the results of several futures with sequence. Mapping the future just gives you a new future with the result transformed by the function passed to .map().

how to detect duplicated line using scala akka stream

we have a scala application that read lines from text file and process them using Akka Stream. for better performance we set parallelism to 5. the problem is if the multiple lines contains the same email we only keep one of the line and treated others as duplicated and throw error. I tried to use a java concurrentHashMap to detect duplication but it didn't work, here is my code:
allIdentifiers = new ConcurrentHashMap[String, Int]()
Source(rows)
.mapAsync(config.parallelism.value) {
case (dataRow, index) => {
val eventResendResult: EitherT[Future, NonEmptyList[ResendError], ResendResult] =
for {
cleanedRow <- EitherT.cond[Future](
!allIdentifiers.containsKey(dataRow.lift(emailIndex)), {
allIdentifiers.put(dataRow.lift(emailIndex),index)
dataRow
}, {
NonEmptyList.of(
DuplicatedError(
s"Duplicated record at row $index",
List(identifier)
)
)
}
)
_ = logger.debug(
LoggingMessage(
requestId = RequestId(),
message = s"allIdentifiers: $allIdentifiers"
)
)
... more process step ...
} yield foldResponses(sent)
eventResendResult
.leftMap(errors => ResendResult(errors.toList, List.empty))
.merge
}
}
.runWith(Sink.reduce { (result1: ResendResult, result2: ResendResult) =>
ResendResult(
result1.errors ++ result2.errors,
result1.results ++ result2.results
)
})
we have config.parallelism.value set to 5, means any moment it'll process up to 5 lines concurrently. what I observed is if there are duplicated lines right next to each other, it didn't work, example:
line 0 contains email1
line 1 contains email1
line 2 contains email2
line 3 contains email2
line 4 contains email3
from the log i see the concurrentHashMap was populated with entries, but all lines passed the duplication detect and moved to the next process step.
so Akka Stream's parallelism is not the same thing as java's multithreads? how can i detect duplicated line in this case?
The problem is in the following snippet:
cleanedRow <- EitherT.cond[Future](
!allIdentifiers.containsKey(dataRow.lift(emailIndex)), {
allIdentifiers.put(dataRow.lift(emailIndex),index)
dataRow
}, {
NonEmptyList.of(
DuplicatedError(
s"Duplicated record at row $index",
List(identifier)
)
)
}
)
In particular: imagine two threads simultaneously processing an email which should be deduplicated. It is possible for the following to happen (in order)
The first thread checks containsKey and finds the email is not in the map
The second thread checks containsKey and finds the email is not in the map
The first thread adds the email to the map (based on results from step 1.) and passes the email through
The second thread adds the email to the map (based on results from step 3.) and passes the email through
In other words: you need to atomically check the map for the key and update it. This is a pretty common sort of thing to want, so it is exactly what ConcurrentHashMap's put does: it updates the value at the key and returns the previous value it replaced, if there was one.
I'm not too familiar with the combinators in Cats, so the following might not be idiomatic. However, note how it inserts and checks for a previous value in one atomic step.
cleanedRow <- EitherT(Future.successful {
val previous = allIdentifiers.put(dataRow.lift(emailIndex), index)
Either.cond(
previous != null,
dataRow,
NonEmptyList.of(
DuplicatedError(
s"Duplicated record at row $index",
List(identifier)
)
)
)
})

Groovy/Scala - Abort Early while iterating using an accumulator

I have a fairly large collection that I would like to iterate and find out if the collection contains more than one instance of a particular number. Since the collection is large, i'd like to exit early, i.e not traverse the complete list.
I have a dirty looking piece of code that does this in a non-functional programming way. However, i'm unable to find a functional programming way of doing this (In Groovy or Scala), since I need to do 2 things at the same time.
Accumulate state
Exit Early
The "accumulate state" can be done using the "inject" or "fold" methods in Groovy/Scala but there's no way of exiting early from those methods. Original groovy code is below. Any thoughts?
def collection = [1,2,3,2,4,6,0,65,... 1 million more numbers]
def n = 2
boolean foundMoreThanOnce(List<Integer> collection, Integer n) {
def foundCount = 0
for(Integer i : collection) {
if(i == n) {
foundCount = foundCount + 1
}
if(foundCount > 1) {
return true
}
}
return false
}
print foundMoreThanOnce(collection, n)
One of many possible Scala solutions.
def foundMoreThanOnce[A](collection: Seq[A], target: A): Boolean =
collection.dropWhile(_ != target).indexOf(target,1) > 0
Or a slight variation...
collection.dropWhile(target.!=).drop(1).contains(target)
Scans the collection only until the 2nd target element is found.
Not sure about groovy, but if possible for you to use Java 8 then there is a possibility
collection.stream().filter(z -> {return z ==2;} ).limit(2)
the limit will stop the stream processing as soon as it get 2nd occurrence of 2.
You can use it as below, to ensure there are exact two occurrences
Long occ = collection.stream().filter(z -> {return z ==2;} ).limit(2).count();
if(occ == 2)
return true;

Way to Extract list of elements from Scala list

I have standard list of objects which is used for the some analysis. The analysis generates a list of Strings and i need to look through the standard list of objects and retrieve objects with same name.
case class TestObj(name:String,positions:List[Int],present:Boolean)
val stdLis:List[TestObj]
//analysis generates a list of strings
var generatedLis:List[String]
//list to save objects found in standard list
val lisBuf = new ListBuffer[TestObj]()
//my current way
generatedLis.foreach{i=>
val temp = stdLis.filter(p=>p.name.equalsIgnoreCase(i))
if(temp.size==1){
lisBuf.append(temp(0))
}
}
Is there any other way to achieve this. Like having an custom indexof method that over rides and looks for the name instead of the whole object or something. I have not tried that approach as i am not sure about it.
stdLis.filter(testObj => generatedLis.exists(_.equalsIgnoreCase(testObj.name)))
use filter to filter elements from 'stdLis' per predicate
use exists to check if 'generatedLis' has a value of ....
Don't use mutable containers to filter sequences.
Naive solution:
val lisBuf =
for {
str <- generatedLis
temp = stdLis.filter(_.name.equalsIgnoreCase(str))
if temp.size == 1
} yield temp(0)
if we discard condition temp.size == 1 (i'm not sure it is legal or not):
val lisBuf = stdLis.filter(s => generatedLis.exists(_.equalsIgnoreCase(s.name)))