Lazy filter for one element - scala

I'm refactoring some scala code to teach my coworkers about for-comprehensions, and I've got a line like:
for {
// ...
result <- components.collectFirst({ case section if section.startsWith(DESIRED_SUBSTRING) => section.substring(section.indexOf(DELIM) + 1).trim() == "true" })
} yield result
That's a bit long.
At first, I wished I could just skip the result <- ... followed by the immediate yield, as I can in Haskell, but then I noticed the processing going on inside collectFirst.
So I thought it'd be much easier to read as I should better do this as
for {
// ...
section <- components.filter(_.startsWith(DESIRED_SUBSTRING)).headOption
} yield section.substring(section.indexOf(DELIM) + 1).trim() == "true"
Which works, but it is less efficient, since filter has to process all the elements. I'd like to be able to use a lazy filter:
components.withFilter(_.startsWith(DESIRED_SUBSTRING)).headOption
But FilterMonadic doesn't seem to support headOption, and I can't figure out a way to derive it from the operations it does support. I'm sure there's a way with flatMap and some bf, but I'm too unfamiliar with the scala ecosystem at the moment.
If I want to stick with standard library tricks, am I stuck with
for {
// ...
section <- components.collectFirst({ case section if section.startsWith(DESIRED_SUBSTRING) => section })
} yield section.substring(section.indexOf(DELIM) + 1).trim() == "true"
Or is there something better I can use?

If you use components.find(_.startsWith(DESIRED_SUBSTRING)) that will give you an Option with the first element that meets the condition. Then, you can just map over it with any subsequent processing you need.

Related

Filtering a collection of IO's: List[IO[Page]] scala

I am refactoring a scala http4s application to remove some pesky side effects causing my app to block. I'm replacing .unsafeRunSync with cats.effect.IO. The problem is as follows:
I have 2 lists: alreadyAccessible: IO[List[Page]] and pages: List[Page]
I need to filter out the pages that are not contained in alreadyAccessible.
Then map over the resulting list to "grant Access" in the database to these pages. (e.g. call another method that hits the database and returns an IO[Page].
val addable: List[Page] = pages.filter(p => !alreadyAccessible.contains(p))
val added: List[Page] = addable.map((p: Page) => {
pageModel.grantAccess(roleLst.head.id, p.id) match {
case Right(p) => p
}
})
This is close to what I want; However, it does not work because filter requires a function that returns a Boolean but alreadyAccessible is of type IO[List[Page]] which precludes you from removing anything from the IO monad. I understand you can't remove data from the IO so maybe transform it:
val added: List[IO[Page]] = for(page <- pages) {
val granted = alreadyAccessible.flatMap((aa: List[Page]) => {
if (!aa.contains(page))
pageModel.grantAccess(roleLst.head.id, page.id) match { case Right(p) => p }
else null
})
} yield granted
this unfortunately does not work with the following error:
Error:(62, 7) ';' expected but 'yield' found.
} yield granted
I think because I am somehow mistreating the for comprehension syntax, I just don't understand why I cannot do what I'm doing.
I know there must be a straight forward solution to such a problem, so any input or advice is greatly appreciates. Thank you for your time in reading this!
granted is going to be an IO[List[Page]]. There's no particular point in having IO inside anything else unless you truly are going to treat the actions like values and reorder them/filter them etc.
val granted: IO[List[Page]] = for {
How do you compute it? Well, the first step is to execute alreadyAccessible to get the actual list. In fact, alreadyAccessible is misnamed. It is not the list of accessible pages; it is an action that gets the list of accessible pages. I would recommend you rename it getAlreadyAccessible.
alreadyAccessible <- getAlreadyAccessible
Then you filter pages with it
val required = pages.filterNot(alreadyAccessible.contains)
Now, I cannot decipher what you're doing to these pages. I'm just going to assume you have some kind of function grantAccess: Page => IO[Page]. If you map this function over required, you will get a List[IO[Page]], which is not desirable. Instead, we should traverse with grantAccess, which will produce a IO[List[Page]] that executes each IO[Page] and then assembles all the results into a List[Page].
granted <- required.traverse(grantAccess)
And we're done
} yield granted

Creating Seq after waiting for all results from map/foreach in Scala

I am trying to loop over inputs and process them to produce scores.
Just for the first input, I want to do some processing that takes a while.
The function ends up returning just the values from the 'else' part. The 'if' part is done executing after the function returns the value.
I am new to Scala and understand the behavior but not sure how to fix it.
I've tried inputs.zipWithIndex.map instead of foreach but the result is the same.
def getscores(
inputs: inputs
): Future[Seq[scoreInfo]] = {
var scores: Seq[scoreInfo] = Seq()
inputs.zipWithIndex.foreach {
case (f, i) => {
if (i == 0) {
// long operation that returns Future[Option[scoreInfo]]
getgeoscore(f).foreach(gso => {
gso.foreach(score => {
scores = scores.:+(score)
})
})
} else {
scores = scores.:+(
scoreInfo(
id = "",
score = 5
)
)
}
}
}
Future {
scores
}
}
For what you need, I would drop the mutable variable and replace foreach with map to obtain an immutable list of Futures and recover to handle exceptions, followed by a sequence like below:
def getScores(inputs: Inputs): Future[List[ScoreInfo]] = Future.sequence(
inputs.zipWithIndex.map{ case (input, idx) =>
if (idx == 0)
getGeoScore(input).map(_.getOrElse(defaultScore)).recover{ case e => errorHandling(e) }
else
Future.successful(ScoreInfo("", 5))
})
To capture/print the result, one way is to use onComplete:
getScores(inputs).onComplete(println)
The part your missing is understanding a tricky element of concurrency, and that is that the order of execution when using multiple futures is not guaranteed.
If your block here is long running, it will take a while before appending the score to scores
// long operation that returns Future[Option[scoreInfo]]
getgeoscore(f).foreach(gso => {
gso.foreach(score => {
// stick a println("here") in here to see what happens, for demonstration purposes only
scores = scores.:+(score)
})
})
Since that executes concurrently, your getscores function will also simultaneously continue its work iterating over the rest of inputs in your zipWithindex. This iteration, especially since it's trivial work, likely finishes well before the long-running getgeoscore(f) completes the execution of the Future it scheduled, and the code will exit the function, moving on to whatever code is next after you called getscores
val futureScores: Future[Seq[scoreInfo]] = getScores(inputs)
futureScores.onComplete{
case Success(scoreInfoSeq) => println(s"Here's the scores: ${scoreInfoSeq.mkString(",")}"
}
//a this point the call to getgeoscore(f) could still be running and finish later, but you will never know
doSomeOtherWork()
Now to clean this up, since you can run a zipWithIndex on your inputs parameter, I assume you mean it's something like a inputs:Seq[Input]. If all you want to do is operate on the first input, then use the head function to only retrieve the first option, so getgeoscores(inputs.head) , you don't need the rest of the code you have there.
Also, as a note, if using Scala, get out of the habit of using mutable vars, especially if you're working with concurrency. Scala is built around supporting immutability, so if you find yourself wanting to use a var , try using a val and look up how to work with the Scala's collection library to make it work.
In general, that is when you have several concurrent futures, I would say Leo's answer describes the right way to do it. However, you want only the first element transformed by a long running operation. So you can use the future return by the respective function and append the other elements when the long running call returns by mapping the future result:
def getscores(inputs: Inputs): Future[Seq[ScoreInfo]] =
getgeoscore(inputs.head)
.map { optInfo =>
optInfo ++ inputs.tail.map(_ => scoreInfo(id = "", score = 5))
}
So you neither need zipWithIndex nor do you need an additional future or join the results of several futures with sequence. Mapping the future just gives you a new future with the result transformed by the function passed to .map().

Groovy/Scala - Abort Early while iterating using an accumulator

I have a fairly large collection that I would like to iterate and find out if the collection contains more than one instance of a particular number. Since the collection is large, i'd like to exit early, i.e not traverse the complete list.
I have a dirty looking piece of code that does this in a non-functional programming way. However, i'm unable to find a functional programming way of doing this (In Groovy or Scala), since I need to do 2 things at the same time.
Accumulate state
Exit Early
The "accumulate state" can be done using the "inject" or "fold" methods in Groovy/Scala but there's no way of exiting early from those methods. Original groovy code is below. Any thoughts?
def collection = [1,2,3,2,4,6,0,65,... 1 million more numbers]
def n = 2
boolean foundMoreThanOnce(List<Integer> collection, Integer n) {
def foundCount = 0
for(Integer i : collection) {
if(i == n) {
foundCount = foundCount + 1
}
if(foundCount > 1) {
return true
}
}
return false
}
print foundMoreThanOnce(collection, n)
One of many possible Scala solutions.
def foundMoreThanOnce[A](collection: Seq[A], target: A): Boolean =
collection.dropWhile(_ != target).indexOf(target,1) > 0
Or a slight variation...
collection.dropWhile(target.!=).drop(1).contains(target)
Scans the collection only until the 2nd target element is found.
Not sure about groovy, but if possible for you to use Java 8 then there is a possibility
collection.stream().filter(z -> {return z ==2;} ).limit(2)
the limit will stop the stream processing as soon as it get 2nd occurrence of 2.
You can use it as below, to ensure there are exact two occurrences
Long occ = collection.stream().filter(z -> {return z ==2;} ).limit(2).count();
if(occ == 2)
return true;

How does the following Java "continue" code translate to Scala?

for (String stock : allStocks) {
Quote quote = getQuote(...);
if (null == quoteLast) {
continue;
}
Price price = quote.getPrice();
if (null == price) {
continue;
}
}
I don't necessarily need a line by line translation, but I'm looking for the "Scala way" to handle this type of problem.
You don't need continue or breakable or anything like that in cases like this: Options and for comprehensions do the trick very nicely,
val stocksWithPrices =
for {
stock <- allStocks
quote <- Option(getQuote(...))
price <- Option(quote.getPrice())
} yield (stock, quote, price);
Generally you try to avoid those situations to begin with by filtering before you even start:
val goodStocks = allStocks.view.
map(stock => (stock, stock.getQuote)).filter(_._2 != null).
map { case (stock, quote) => (stock,quote, quote.getPrice) }.filter(_._3 != null)
(this example showing how you'd carry along partial results if you need them). I've used a view so that results will be computed as-needed, instead of creating a bunch of new collections at each step.
Actually, you'd probably have the quotes and such return options--look around on StackOverflow for examples of how to use those instead of null return values.
But, anyway, if that sort of thing doesn't work so well (e.g. because you are generating too many intermediate results that you need to keep, or you are relying on updating mutable variables and you want to keep the evaluation pattern simple so you know what's happening when) and you can't conceive of the problem in a different, possibly more robust way, then you can
import scala.util.control.Breaks._
for (stock <- allStocks) {
breakable {
val quote = getQuote(...)
if (quoteLast eq null) break;
...
}
}
The breakable construct specifies where breaks should take you to. If you put breakable outside a for loop, it works like a standard Java-style break. If you put it inside, it acts like continue.
Of course, if you have a very small number of conditions, you don't need the continue at all; just use the else of the if-statement.
Your control structure here can be mapped very idiomatically into the following for loop, and your code demonstrates the kind of filtering that Scala's for loop was designed for.
for {stock <- allStocks.view
quote = getQuote(...)
if quoteLast != null
price = quote.getPrice
if null != price
}{
// whatever comes after all of the null tests
}
By the way, Scala will automatically desugar this into the code from Rex Kerr's solution
val goodStocks = allStocks.view.
map(stock => (stock, stock.getQuote)).filter(_._2 != null).
map { case (stock, quote) => (stock,quote, quote.getPrice) }.filter(_._3 != null)
This solution probably doesn't work in general for all different kinds of more complex flows that might use continue, but it does address a lot of common ones.
If the focus is really on the continue and not on the null handling, just define an inner method (the null handling part is a different idiom in scala):
def handleStock(stock: String): Unit {
val quote = getQuote(...)
if (null == quoteLast) {
return
}
val price = quote.getPrice();
if (null == price) {
return
}
}
for (stock <- allStocks) {
handleStock(stock)
}
The simplest way is to embed the skipped-over code in an if with reversed-sense to what you have.
See http://www.scala-lang.org/node/257

Preferred way of processing this data with parallel arrays

Imagine a sequence of java.io.File objects. The sequence is not in any particular order, it gets populated after a directory traversal. The names of the files can be like this:
/some/file.bin
/some/other_file_x1.bin
/some/other_file_x2.bin
/some/other_file_x3.bin
/some/other_file_x4.bin
/some/other_file_x5.bin
...
/some/x_file_part1.bin
/some/x_file_part2.bin
/some/x_file_part3.bin
/some/x_file_part4.bin
/some/x_file_part5.bin
...
/some/x_file_part10.bin
Basically, I can have 3 types of files. First type is the simple ones, which only have a .bin extension. The second type of file is the one formed from _x1.bin till _x5.bin. And the third type of file can be formed of 10 smaller parts, from _part1 till _part10.
I know the naming may be strange, but this is what I have to work with :)
I want to group the files together ( all the pieces of a file should be processed together ), and I was thinking of using parallel arrays to do this. The thing I'm not sure about is how can I perform the reduce/acumulation part, since all the threads will be working on the same array.
val allBinFiles = allBins.toArray // array of java.io.File
I was thinking of handling something like that:
val mapAcumulator = java.util.Collections.synchronizedMap[String,ListBuffer[File]](new java.util.HashMap[String,ListBuffer[File]]())
allBinFiles.par.foreach { file =>
file match {
// for something like /some/x_file_x4.bin nameTillPart will be /some/x_file
case ComposedOf5Name(nameTillPart) => {
mapAcumulator.getOrElseUpdate(nameTillPart,new ListBuffer[File]()) += file
}
case ComposedOf10Name(nameTillPart) => {
mapAcumulator.getOrElseUpdate(nameTillPart,new ListBuffer[File]()) += file
}
// simple file, without any pieces
case _ => {
mapAcumulator.getOrElseUpdate(file.toString,new ListBuffer[File]()) += file
}
}
}
I was thinking of doing it like I've shown in the above code. Having extractors for the files, and using part of the path as key in the map. Like for example, /some/x_file can hold as values /some/x_file_x1.bin to /some/x_file_x5.bin. I also think there could be a better way of handling this. I would be interested in your opinions.
The alternative is to use groupBy:
val mp = allBinFiles.par.groupBy {
case ComposedOf5Name(x) => x
case ComposedOf10Name(x) => x
case f => f.toString
}
This will return a parallel map of parallel arrays of files (ParMap[String, ParArray[File]]). If you want a sequential map of sequential sequences of files from this point:
val sqmp = mp.map(_.seq).seq
To ensure that the parallelism kicks in, make sure you have enough elements in you parallel array (10k+).