foreach in Scala parallel collections - scala

I have this code evaluated with Ammonite:
$ amm
Welcome to the Ammonite Repl 2.5.1 (Scala 2.13.8 Java 17.0.1)
# import $ivy.`org.scala-lang.modules::scala-parallel-collections:1.0.4`
# import scala.collection.parallel.CollectionConverters._
# Seq(1,2).foreach { x => Thread sleep x*1000; println(s"Fin $x") };println("Fin")
Fin 1
Fin 2
Fin
It completes ok.
If I add the par to parallelize, then it never finishes:
# Seq(1,2).par.foreach { x => Thread sleep x*1000; println(s"Fin $x") };println("Fin")
Is this a bug?
With Scala 2.12 I get the same behaviour.

You're having this problem due this bug with Scala's lambda encoding that will also happen in the Scala REPL.
The bug on the Scala side: https://github.com/scala/scala-parallel-collections/issues/34
The corresponding ammonite bug report is here: https://github.com/com-lihaoyi/Ammonite/issues/556
You can work around this in two ways that I'm aware of.
The first is to put your parallel work inside of an object e.g.
object work {
def execute() = Seq(1, 2).foreach { x =>
Thread.sleep(x * 1000); println(s"Fin $x")
}; println("Fin")
}
work.execute
I believe running Ammonite with amm --class-based should also do the trick, but I'm not able to test this right now.

Related

ZIO 1.0.3 changes the way environments work and now http4s Blaze won't run

I'm using ZIO for the first time and I started with a boilerplate stub from https://github.com/guizmaii/scala-tapir-http4s-zio/blob/master/src/main/scala/example/HttpApp.scala that uses ZIO version 1.0.0-RC17 to set up and run an http4s Blaze server, including Tapir. That worked out nicely, but later on I tried to update to version 1.0.3 so that I'm using an up-to-date version, but that version is not compatible with the code in this stub. Specifically:
This is the code that defines the server (some unrelated routing lines cut out of the original):
val prog: ZIO[ZEnv, Throwable, Unit] = for {
conf <- ZIO.effect(ApplicationConf.build().orThrow())
_ <- putStrLn(conf.toString)
server = ZIO.runtime[AppEnvironment].flatMap { implicit rts =>
val apiRoutes = new ApiRoutes[AppEnvironment]()
val allTapirRoutes = apiRoutes.getRoutes.foldK
val httpApp: HttpApp[RIO[AppEnvironment, *]] = (allTapirRoutes).orNotFound
val httpAppExtended = Logger.httpApp(logHeaders = true, logBody = true)(httpApp)
BlazeServerBuilder[ZIO[AppEnvironment, Throwable, *]]
.bindHttp(conf.port.port.value, conf.server.value)
.withHttpApp(httpAppExtended)
.withoutBanner
.withSocketKeepAlive(true)
.withTcpNoDelay(true)
.serve
.compile[RIO[AppEnvironment, *], RIO[AppEnvironment, *], ExitCode]
.drain
}
prog <- server.provideSome[ZEnv] { currentEnv =>
new Clock {
override val clock: Clock.Service[Any] = currentEnv.clock
}
}
} yield prog
prog.foldM(h => putStrLn(h.toString).as(1), _ => ZIO.succeed(0))
This is the main body of the run() method. Running this code never results in the app exiting with code 0 because the Blaze server blocks termination, as expected. The problem is this snippet:
prog <- server.provideSome[ZEnv] { currentEnv =>
new Clock {
override val clock: Clock.Service[Any] = currentEnv.clock
}
}
This doesn't work in 1.0.3 because of the introduction of Has[A]. The compiler now complains that you can't inherit from final class Has so you can't invoke a new Clock.
I tried to remedy this by replacing it with
prog = server.provideSomeLayer[ZEnv]
and replacing the exit code ints with ExitCode objects, which made the code compile, but after this the Blaze server did not seem to initialize or prevent termination of the app. It just finished with exit code 0.
Clearly there's something missing here, and I haven't seen any information on the shift from the older environment system to the new system based on Has[A]. How can I fix this boilerplate so that the Blaze server runs again?
If you are interested in a template tapir-zio-http4s project, I suggest using the one from tapir repo: https://github.com/softwaremill/tapir/blob/master/examples/src/main/scala/sttp/tapir/examples/ZioExampleHttp4sServer.scala
It is guaranteed to always compile against the latest Tapir (since it's a part of the project).
Also I personally used it recently. It worked.

Understanding methods Vs functions in scala

I am learning the difference between methods and functions. I am following this link
http://jim-mcbeath.blogspot.co.uk/2009/05/scala-functions-vs-methods.html
The article says if you compile the following code:
class test {
def m1(x:Int) = x+3
val f1 = (x:Int) => x+3
}
We should get two files
1. test.class
2. test$$anonfun$1.class
But I do not get it. Secondly the example says if we execute the following command in REPL, we will get the below
scala> val f1 = (x:Int) => x+3
f1: (Int) => Int = <function>
But I get only this
scala> val f1 = (x:Int) => x+3
f1: Int => Int = $$Lambda$1549/1290654769#6d5254f3
Is it because we are using a different version? Please help.
Scala 2.11 and earlier versions behave as shown in the blog post.
The behavior changed in Scala 2.12. Scala now uses the lambda support that was added to version 8 of the JVM, so it doesn't need to emit the extra .class file. As a result, the .jar files produced by 2.12 are usually a lot smaller.
As a side effect of this, Scala can't override toString anymore, so you see the standard JVM toString output for lambdas.

Unpacking Future[Option[MyType]] in Scala

I am new to Scala and Play Framework so I am not quite sure what is wrong. I am trying to unpack a Future[Option[MyType]] given by a Slick DB controler (Play Framework). MyType is called BoundingBox in the code:
def getBoundingBoxByFileName(name: String) = {
val selectByName = boundingBoxTableQuery.filter{ boundingBoxTable =>
boundingBoxTable.name === name
}
db.run(selectByName.result.headOption)
}
BoundingBox type has a field called product_name. To retrieve this field I do the following:
val boundingBoxFutOpt = BoundingBoxQueryActions.getBoundingBoxByFileName("some_file")
val res = for {
optBb : Option[db.BoundingBox] <- boundingBoxFutOpt
} yield{
for(bb : db.BoundingBox <- optBb) yield {
println(s"${bb.product_name}")
}
}
This code does not yield anything on the output, though I have no compilation errors. If I change the println statement for some random text (not using the bb reference), it is also not printed on the console. To me it seems that the println statement is never executed.
I'll appreciate some directions on this problem.
It's likely that your program is terminating before the future has a chance to run the println. I think this will get you what you want:
import scala.concurrent.Await
import scala.concurrent.duration.Duration
// your code here
Await.result(res, Duration.Inf)
In your above example you're running a thread but then not giving it a chance to finish execution. The above will block until the future is complete.
It's worth nothing that you shouldn't use Await in production code as the blocking done negates the value of having code run in a separate thread.

Deal with Java NIO Iterator in Scala with Try

I recently learned how to use Scala's native Try type to handle errors. One good thing with Try is that I'm able to use for-comprehension and silently ignore the error.
However, this becomes a slight problem with Java's NIO package (which I really want to use).
val p = Paths.get("Some File Path")
for {
stream <- Try(Files.newDirectoryStream(p))
file:Path <- stream.iterator()
} yield file.getFileName
This would have been perfect. I intend to get all file names from a directory, and using a DirectoryStream[Path] is the best way because it scales really well. The NIO page says DirectoryStream has an iterator() method that returns an iterator. For Java's for loop, it's enough and can be used like this:
try (DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
for (Path file: stream) {
System.out.println(file.getFileName());
}
}
However, Scala does not accept this. I was greeted with the error:
[error] /.../DAL.scala:42: value filter is not a member of java.util.Iterator[java.nio.file.Path]
[error] file:Path <- stream.iterator
I try to use JavaConverters, and it shows it handles Java Iterator type: scala.collection.Iterator <=> java.util.Iterator, but when I try to call it in this way: stream.iterator().asScala, the method is not reachable.
What should I do? How do I write nice Scala code while still using NIO package?
I don't actually quite get while in this for comprehension filter is being invoked, but note that stream.iterator() returns a Iterator[Path], not a Path, even though my IDE thinks it does, probably because he thinks he can apply map to it, but in truth this are methods which are not defined on java.util.Iterator[java.nio.file.Path] as the compiler confirms:
scala> for {
| stream <- Try(Files.newDirectoryStream(p))
| file <- stream.iterator()
| } yield file
<console>:13: error: value map is not a member of java.util.Iterator[java.nio.file.Path]
file <- stream.iterator()
This for comprehension translates to:
Try(Files.newDirectoryStream(p)).flatMap(stream => stream.iterator().map(...))
Where the second map is not defined. One solution could be found in this SO question, but I can't tell you how to use iterator in for comprehension here since in java iterator cannot be mapped on and I'm not sure you can convert it into the comprehension.
Edit:
I managed to find out more about the problem, I tried this for comprehension:
for {
stream <- Try(Files.newDirectoryStream(p))
file <- stream.iterator().toIterator
} yield file
This doesn't compile because:
found : Iterator[java.nio.file.Path]
required: scala.util.Try[?]
file <- stream.iterator().toIterator
It translates to:
Try(Files.newDirectoryStream(p)).flatMap(stream => stream.iterator().map(...))
But flatMap actually expects a Try back, in fact this works:
Try(Files.newDirectoryStream(p)).flatMap(stream => Try(stream.iterator().map(...)))
^
What I came up with:
import java.nio.file.{Paths, Files}
import util.Try
import scala.collection.JavaConversions._
Try(Files.newDirectoryStream(p))
.map(stream =>
stream
.iterator()
.toIterator
.toList
.map(path => path.getFileName.toString)
).getOrElse(List())
Which returns a List[String], unfortunately this is far from being as pretty as your for comprehension, maybe somebody else has a better idea.
I really like what Ende Neu wrote and it's hard to work with NIO in Scala. I want to preserve the efficiency brought from Java's Stream, so I decide to write this function instead. It still uses Try and I only need to deal with Success and Failure cases :)
It's not as smooth as I'd hope, and without Java 7's great try-with-resource feature, I have to close the stream by myself (which is terrible...), but this works out.
def readFileNames(filePath: String):Option[List[Path]] = {
val p = Paths.get(filePath)
val stream: Try[DirectoryStream[Path]] = Try(Files.newDirectoryStream(p))
val listOfFiles = List[Path]()
stream match {
case Success(st) =>
val iterator = st.iterator()
while (iterator.hasNext) {
listOfFiles :+ iterator.next()
}
case Failure(ex) => println(s"The file path is incorrect: ${ex.getMessage}")
}
stream.map(ds => ds.close())
if(listOfFiles.isEmpty) None else Some(listOfFiles)
}

Concurrent map/foreach in scala

I have an iteration vals: Iterable[T] and a long-running function without any relevant side effects: f: (T => Unit). Right now this is applied to vals in the obvious way:
vals.foreach(f)
I would like the calls to f to be done concurrently (within reasonable limits). Is there an obvious function somewhere in the Scala base library? Something like:
Concurrent.foreach(8 /* Number of threads. */)(vals, f)
While f is reasonably long running, it is short enough that I don't want the overhead of invoking a thread for each call, so I am looking for something based on a thread pool.
Many of the answers from 2009 still use the old scala.actors.Futures._, which are no longer in the newer Scala. While Akka is the preferred way, a much more readable way is to just use parallel (.par) collections:
vals.foreach { v => f(v) }
becomes
vals.par.foreach { v => f(v) }
Alternatively, using parMap can appear more succinct though with the caveat that you need to remember to import the usual Scalaz*. As usual, there's more than one way to do the same thing in Scala!
Scalaz has parMap. You would use it as follows:
import scalaz.Scalaz._
import scalaz.concurrent.Strategy.Naive
This will equip every functor (including Iterable) with a parMap method, so you can just do:
vals.parMap(f)
You also get parFlatMap, parZipWith, etc.
I like the Futures answer. However, while it will execute concurrently, it will also return asynchronously, which is probably not what you want. The correct approach would be as follows:
import scala.actors.Futures._
vals map { x => future { f(x) } } foreach { _() }
I had some issues using scala.actors.Futures in Scala 2.8 (it was buggy when I checked). Using java libs directly worked for me, though:
final object Parallel {
val cpus=java.lang.Runtime.getRuntime().availableProcessors
import java.util.{Timer,TimerTask}
def afterDelay(ms: Long)(op: =>Unit) = new Timer().schedule(new TimerTask {override def run = op},ms)
def repeat(n: Int,f: Int=>Unit) = {
import java.util.concurrent._
val e=Executors.newCachedThreadPool //newFixedThreadPool(cpus+1)
(0 until n).foreach(i=>e.execute(new Runnable {def run = f(i)}))
e.shutdown
e.awaitTermination(Math.MAX_LONG, TimeUnit.SECONDS)
}
}
I'd use scala.actors.Futures:
vals.foreach(t => scala.actors.Futures.future(f(t)))
The latest release of Functional Java has some higher-order concurrency features that you can use.
import fjs.F._
import fj.control.parallel.Strategy._
import fj.control.parallel.ParModule._
import java.util.concurrent.Executors._
val pool = newCachedThreadPool
val par = parModule(executorStrategy[Unit](pool))
And then...
par.parMap(vals, f)
Remember to shutdown the pool.
You can use the Parallel Collections from the Scala standard library.
They're just like ordinary collections, but their operations run in parallel. You just need to put a par call before you invoke some collections operation.
import scala.collection._
val array = new Array[String](10000)
for (i <- (0 until 10000).par) array(i) = i.toString