Need some explanation to understand the map function in scala - scala

def map[U: ClassTag](f: T => U): RDD[U] = withScope {
val cleanF = sc.clean(f)
new MapPartitionsRDD[U, T](this, (context, pid, iter) => iter.map(cleanF))
}
This code snippet is from spark 2.2 source code. I am not professional in scala, so just wondering can anyone explain this code in a programmatic perspective? I am not sure what the square bracket after map does. And also refer to https://www.tutorialspoint.com/scala/scala_functions.htm, a scala function should just have curly bracket after '=', but why in this code snippet there is a function named withScope after the '=' sign?

actually, a scala function can have no bracket after "=". eg.
def func():Int = 1
so you can think withScope{} is a function with return type RDD[U],and the map function is to run the withScope function.
let's see the withScope's source code:
private[spark] def withScope[U](body: => U): U = RDDOperationScope.withScope[U](sc)(body)
see, it's a function here. let's go on:
private[spark] def withScope[T](
sc: SparkContext,
allowNesting: Boolean = false)(body: => T): T = {
val ourMethodName = "withScope"
val callerMethodName = Thread.currentThread.getStackTrace()
.dropWhile(_.getMethodName != ourMethodName)
.find(_.getMethodName != ourMethodName)
.map(_.getMethodName)
.getOrElse {
// Log a warning just in case, but this should almost certainly never happen
logWarning("No valid method name for this RDD operation scope!")
"N/A"
}
withScope[T](sc, callerMethodName, allowNesting, ignoreParent = false)(body)
}
let continue with the withScope at the end:
private[spark] def withScope[T](
sc: SparkContext,
name: String,
allowNesting: Boolean,
ignoreParent: Boolean)(body: => T): T = {
// Save the old scope to restore it later
val scopeKey = SparkContext.RDD_SCOPE_KEY
val noOverrideKey = SparkContext.RDD_SCOPE_NO_OVERRIDE_KEY
val oldScopeJson = sc.getLocalProperty(scopeKey)
val oldScope = Option(oldScopeJson).map(RDDOperationScope.fromJson)
val oldNoOverride = sc.getLocalProperty(noOverrideKey)
try {
if (ignoreParent) {
// Ignore all parent settings and scopes and start afresh with our own root scope
sc.setLocalProperty(scopeKey, new RDDOperationScope(name).toJson)
} else if (sc.getLocalProperty(noOverrideKey) == null) {
// Otherwise, set the scope only if the higher level caller allows us to do so
sc.setLocalProperty(scopeKey, new RDDOperationScope(name, oldScope).toJson)
}
// Optionally disallow the child body to override our scope
if (!allowNesting) {
sc.setLocalProperty(noOverrideKey, "true")
}
body
} finally {
// Remember to restore any state that was modified before exiting
sc.setLocalProperty(scopeKey, oldScopeJson)
sc.setLocalProperty(noOverrideKey, oldNoOverride)
}
}
in the end, it execute body parameter, int this case, body equals to
{
val cleanF = sc.clean(f)
new MapPartitionsRDD[U, T](this, (context, pid, iter) => iter.map(cleanF))
}
in conclusion, withScope is a Closure takes a function as argument, it first runs some code itself and the run the argument.

Related

How to perofrm a try and catch in ZIO?

My first steps with ZIO, I am trying to convert this readfile function with a compatible ZIO version.
The below snippet compiles, but I am not closing the source in the ZIO version. How do I do that?
def run(args: List[String]) =
myAppLogic.exitCode
val myAppLogic =
for {
_ <- readFileZio("C:\\path\\to\file.csv")
_ <- putStrLn("Hello! What is your name?")
name <- getStrLn
_ <- putStrLn(s"Hello, ${name}, welcome to ZIO!")
} yield ()
def readfile(file: String): String = {
val source = scala.io.Source.fromFile(file)
try source.getLines.mkString finally source.close()
}
def readFileZio(file: String): zio.Task[String] = {
val source = scala.io.Source.fromFile(file)
ZIO.fromTry[String]{
Try{source.getLines.mkString}
}
}
The simplest solution for your problem would be using bracket function, which in essence has similar purpose as try-finally block. It gets as first argument effect that closes resource (in your case Source) and as second effect that uses it.
So you could rewrite readFileZio like:
def readFileZio(file: String): Task[Iterator[String]] =
ZIO(Source.fromFile(file))
.bracket(
s => URIO(s.close),
s => ZIO(s.getLines())
)
Another option is to use ZManaged which is a data type that encapsulates the operation of opening and closing resource:
def managedSource(file: String): ZManaged[Any, Throwable, BufferedSource] =
Managed.make(ZIO(Source.fromFile(file)))(s => URIO(s.close))
And then you could use it like this:
def readFileZioManaged(file: String): Task[Iterator[String]] =
managedSource(file).use(s => ZIO(s.getLines()))

MVar tryPut returns true and isEmpty also returns true

I wrote simple callback(handler) function which i pass to async api and i want to wait for result:
object Handlers {
val logger: Logger = Logger("Handlers")
implicit val cs: ContextShift[IO] =
IO.contextShift(ExecutionContext.Implicits.global)
class DefaultHandler[A] {
val response: IO[MVar[IO, A]] = MVar.empty[IO, A]
def onResult(obj: Any): Unit = {
obj match {
case obj: A =>
println(response.flatMap(_.tryPut(obj)).unsafeRunSync())
println(response.flatMap(_.isEmpty).unsafeRunSync())
case _ => logger.error("Wrong expected type")
}
}
def getResponse: A = {
response.flatMap(_.take).unsafeRunSync()
}
}
But for some reason both tryPut and isEmpty(when i'd manually call onResult method) returns true, therefore when i calling getResponse it sleeps forever.
This is the my test:
class HandlersTest extends FunSuite {
test("DefaultHandler.test") {
val handler = new DefaultHandler[Int]
handler.onResult(3)
val response = handler.getResponse
assert(response != 0)
}
}
Can somebody explain why tryPut returns true, but nothing puts. And what is the right way to use Mvar/channels in scala?
IO[X] means that you have the recipe to create some X. So on your example, yuo are putting in one MVar and then asking in another.
Here is how I would do it.
object Handlers {
trait DefaultHandler[A] {
def onResult(obj: Any): IO[Unit]
def getResponse: IO[A]
}
object DefaultHandler {
def apply[A : ClassTag]: IO[DefaultHandler[A]] =
MVar.empty[IO, A].map { response =>
new DefaultHandler[A] {
override def onResult(obj: Any): IO[Unit] = obj match {
case obj: A =>
for {
r1 <- response.tryPut(obj)
_ <- IO(println(r1))
r2 <- response.isEmpty
_ <- IO(println(r2))
} yield ()
case _ =>
IO(logger.error("Wrong expected type"))
}
override def getResponse: IO[A] =
response.take
}
}
}
}
The "unsafe" is sort of a hint, but every time you call unsafeRunSync, you should basically think of it as an entire new universe. Before you make the call, you can only describe instructions for what will happen, you can't actually change anything. During the call is when all the changes occur. Once the call completes, that universe is destroyed, and you can read the result but no longer change anything. What happens in one unsafeRunSync universe doesn't affect another.
You need to call it exactly once in your test code. That means your test code needs to look something like:
val test = for {
handler <- TestHandler.DefaultHandler[Int]
_ <- handler.onResult(3)
response <- handler.getResponse
} yield response
assert test.unsafeRunSync() == 3
Note this doesn't really buy you much over just using the MVar directly. I think you're trying to mix side effects inside IO and outside it, but that doesn't work. All the side effects need to be inside.

Future not returning anything

trying to fetch result from database and returning the future resultset. But the issue is while accessing future result i am not getting any response.
below is the code snippnet:
def getAll(): Future[Iterable[Employee]] = {
Future{
fetchEmployees()
}(ec)
}
def fetchEmployees(): Iterable[Employee]={
var empList = ListBuffer[Employee]()
db.withConnection{ conn =>
val statement = conn.createStatement()
val rs = statement.executeQuery("Select * from Employee")
while (rs.next()){
println(rs.getString("EmpCode")+" "+rs.getString("FirstName")+" "+rs.getString("LastName"),rs.getString("Department"))
val emp = Employee(rs.getString("EmpCode"),rs.getString("FirstName"),rs.getString("LastName"),rs.getString("Department"))
empList.appended(emp)
}
}
empList
}
this is where trying to access return future object
def findAll: Future[Iterable[EmployeeResource]] = {
println("Inside resource handler")
repository.getAll().map(iterableEmp => {
iterableEmp.foreach(emp => println(s"Name is $emp.firstName"))
iterableEmp.map(emp=>createResource(emp))
})(ec)
}
Prints nothing.
Look at the doc for the appended method -- and note the term, it is not "append" (like in a command), but "appended", like what if...
def appended[B >: A](elem: B): ListBuffer[B]
A copy of this sequence with an element appended.
Your code:
empList.appended(emp)
creates a new ListBuffer, but you discard its result and your initial list buffer is never actually modified. (It is always a good idea to switch on the -Ywarn-value-discard scalac option!)
You need to use the += operator (or the addOne method).

cache using functional callbacks/ proxy pattern implementation scala

How to implement cache using functional programming
A few days ago I came across callbacks and proxy pattern implementation using scala.
This code should only apply inner function if the value is not in the map.
But every time map is reinitialized and values are gone (which seems obivous.
How to use same cache again and again between different function calls
class Aggregator{
def memoize(function: Function[Int, Int] ):Function[Int,Int] = {
val cache = HashMap[Int, Int]()
(t:Int) => {
if (!cache.contains(t)) {
println("Evaluating..."+t)
val r = function.apply(t);
cache.put(t,r)
r
}
else
{
cache.get(t).get;
}
}
}
def memoizedDoubler = memoize( (key:Int) => {
println("Evaluating...")
key*2
})
}
object Aggregator {
def main( args: Array[String] ) {
val agg = new Aggregator()
agg.memoizedDoubler(2)
agg.memoizedDoubler(2)// It should not evaluate again but does
agg.memoizedDoubler(3)
agg.memoizedDoubler(3)// It should not evaluate again but does
}
I see what you're trying to do here, the reason it's not working is that every time you call memoizedDoubler it's first calling memorize. You need to declare memoizedDoubler as a val instead of def if you want it to only call memoize once.
val memoizedDoubler = memoize( (key:Int) => {
println("Evaluating...")
key*2
})
This answer has a good explanation on the difference between def and val. https://stackoverflow.com/a/12856386/37309
Aren't you declaring a new Map per invocation ?
def memoize(function: Function[Int, Int] ):Function[Int,Int] = {
val cache = HashMap[Int, Int]()
rather than specifying one per instance of Aggregator ?
e.g.
class Aggregator{
private val cache = HashMap[Int, Int]()
def memoize(function: Function[Int, Int] ):Function[Int,Int] = {
To answer your question:
How to implement cache using functional programming
In functional programming there is no concept of mutable state. If you want to change something (like cache), you need to return updated cache instance along with the result and use it for the next call.
Here is modification of your code that follows that approach. function to calculate values and cache is incorporated into Aggregator. When memoize is called, it returns tuple, that contains calculation result (possibly taken from cache) and new Aggregator that should be used for the next call.
class Aggregator(function: Function[Int, Int], cache:Map[Int, Int] = Map.empty) {
def memoize:Int => (Int, Aggregator) = {
t:Int =>
cache.get(t).map {
res =>
(res, Aggregator.this)
}.getOrElse {
val res = function(t)
(res, new Aggregator(function, cache + (t -> res)))
}
}
}
object Aggregator {
def memoizedDoubler = new Aggregator((key:Int) => {
println("Evaluating..." + key)
key*2
})
def main(args: Array[String]) {
val (res, doubler1) = memoizedDoubler.memoize(2)
val (res1, doubler2) = doubler1.memoize(2)
val (res2, doubler3) = doubler2.memoize(3)
val (res3, doubler4) = doubler3.memoize(3)
}
}
This prints:
Evaluating...2
Evaluating...3

What Automatic Resource Management alternatives exist for Scala?

I have seen many examples of ARM (automatic resource management) on the web for Scala. It seems to be a rite-of-passage to write one, though most look pretty much like one another. I did see a pretty cool example using continuations, though.
At any rate, a lot of that code has flaws of one type or another, so I figured it would be a good idea to have a reference here on Stack Overflow, where we can vote up the most correct and appropriate versions.
Chris Hansen's blog entry 'ARM Blocks in Scala: Revisited' from 3/26/09 talks about about slide 21 of Martin Odersky's FOSDEM presentation. This next block is taken straight from slide 21 (with permission):
def using[T <: { def close() }]
(resource: T)
(block: T => Unit)
{
try {
block(resource)
} finally {
if (resource != null) resource.close()
}
}
--end quote--
Then we can call like this:
using(new BufferedReader(new FileReader("file"))) { r =>
var count = 0
while (r.readLine != null) count += 1
println(count)
}
What are the drawbacks of this approach? That pattern would seem to address 95% of where I would need automatic resource management...
Edit: added code snippet
Edit2: extending the design pattern - taking inspiration from python with statement and addressing:
statements to run before the block
re-throwing exception depending on the managed resource
handling two resources with one single using statement
resource-specific handling by providing an implicit conversion and a Managed class
This is with Scala 2.8.
trait Managed[T] {
def onEnter(): T
def onExit(t:Throwable = null): Unit
def attempt(block: => Unit): Unit = {
try { block } finally {}
}
}
def using[T <: Any](managed: Managed[T])(block: T => Unit) {
val resource = managed.onEnter()
var exception = false
try { block(resource) } catch {
case t:Throwable => exception = true; managed.onExit(t)
} finally {
if (!exception) managed.onExit()
}
}
def using[T <: Any, U <: Any]
(managed1: Managed[T], managed2: Managed[U])
(block: T => U => Unit) {
using[T](managed1) { r =>
using[U](managed2) { s => block(r)(s) }
}
}
class ManagedOS(out:OutputStream) extends Managed[OutputStream] {
def onEnter(): OutputStream = out
def onExit(t:Throwable = null): Unit = {
attempt(out.close())
if (t != null) throw t
}
}
class ManagedIS(in:InputStream) extends Managed[InputStream] {
def onEnter(): InputStream = in
def onExit(t:Throwable = null): Unit = {
attempt(in.close())
if (t != null) throw t
}
}
implicit def os2managed(out:OutputStream): Managed[OutputStream] = {
return new ManagedOS(out)
}
implicit def is2managed(in:InputStream): Managed[InputStream] = {
return new ManagedIS(in)
}
def main(args:Array[String]): Unit = {
using(new FileInputStream("foo.txt"), new FileOutputStream("bar.txt")) {
in => out =>
Iterator continually { in.read() } takeWhile( _ != -1) foreach {
out.write(_)
}
}
}
Daniel,
I've just recently deployed the scala-arm library for automatic resource management. You can find the documentation here: https://github.com/jsuereth/scala-arm/wiki
This library supports three styles of usage (currently):
1) Imperative/for-expression:
import resource._
for(input <- managed(new FileInputStream("test.txt")) {
// Code that uses the input as a FileInputStream
}
2) Monadic-style
import resource._
import java.io._
val lines = for { input <- managed(new FileInputStream("test.txt"))
val bufferedReader = new BufferedReader(new InputStreamReader(input))
line <- makeBufferedReaderLineIterator(bufferedReader)
} yield line.trim()
lines foreach println
3) Delimited Continuations-style
Here's an "echo" tcp server:
import java.io._
import util.continuations._
import resource._
def each_line_from(r : BufferedReader) : String #suspendable =
shift { k =>
var line = r.readLine
while(line != null) {
k(line)
line = r.readLine
}
}
reset {
val server = managed(new ServerSocket(8007)) !
while(true) {
// This reset is not needed, however the below denotes a "flow" of execution that can be deferred.
// One can envision an asynchronous execuction model that would support the exact same semantics as below.
reset {
val connection = managed(server.accept) !
val output = managed(connection.getOutputStream) !
val input = managed(connection.getInputStream) !
val writer = new PrintWriter(new BufferedWriter(new OutputStreamWriter(output)))
val reader = new BufferedReader(new InputStreamReader(input))
writer.println(each_line_from(reader))
writer.flush()
}
}
}
The code makes uses of a Resource type-trait, so it's able to adapt to most resource types. It has a fallback to use structural typing against classes with either a close or dispose method. Please check out the documentation and let me know if you think of any handy features to add.
Here's James Iry solution using continuations:
// standard using block definition
def using[X <: {def close()}, A](resource : X)(f : X => A) = {
try {
f(resource)
} finally {
resource.close()
}
}
// A DC version of 'using'
def resource[X <: {def close()}, B](res : X) = shift(using[X, B](res))
// some sugar for reset
def withResources[A, C](x : => A #cps[A, C]) = reset{x}
Here are the solutions with and without continuations for comparison:
def copyFileCPS = using(new BufferedReader(new FileReader("test.txt"))) {
reader => {
using(new BufferedWriter(new FileWriter("test_copy.txt"))) {
writer => {
var line = reader.readLine
var count = 0
while (line != null) {
count += 1
writer.write(line)
writer.newLine
line = reader.readLine
}
count
}
}
}
}
def copyFileDC = withResources {
val reader = resource[BufferedReader,Int](new BufferedReader(new FileReader("test.txt")))
val writer = resource[BufferedWriter,Int](new BufferedWriter(new FileWriter("test_copy.txt")))
var line = reader.readLine
var count = 0
while(line != null) {
count += 1
writer write line
writer.newLine
line = reader.readLine
}
count
}
And here's Tiark Rompf's suggestion of improvement:
trait ContextType[B]
def forceContextType[B]: ContextType[B] = null
// A DC version of 'using'
def resource[X <: {def close()}, B: ContextType](res : X): X #cps[B,B] = shift(using[X, B](res))
// some sugar for reset
def withResources[A](x : => A #cps[A, A]) = reset{x}
// and now use our new lib
def copyFileDC = withResources {
implicit val _ = forceContextType[Int]
val reader = resource(new BufferedReader(new FileReader("test.txt")))
val writer = resource(new BufferedWriter(new FileWriter("test_copy.txt")))
var line = reader.readLine
var count = 0
while(line != null) {
count += 1
writer write line
writer.newLine
line = reader.readLine
}
count
}
For now Scala 2.13 has finally supported: try with resources by using Using :), Example:
val lines: Try[Seq[String]] =
Using(new BufferedReader(new FileReader("file.txt"))) { reader =>
Iterator.unfold(())(_ => Option(reader.readLine()).map(_ -> ())).toList
}
or using Using.resource avoid Try
val lines: Seq[String] =
Using.resource(new BufferedReader(new FileReader("file.txt"))) { reader =>
Iterator.unfold(())(_ => Option(reader.readLine()).map(_ -> ())).toList
}
You can find more examples from Using doc.
A utility for performing automatic resource management. It can be used to perform an operation using resources, after which it releases the resources in reverse order of their creation.
I see a gradual 4 step evolution for doing ARM in Scala:
No ARM: Dirt
Only closures: Better, but multiple nested blocks
Continuation Monad: Use For to flatten the nesting, but unnatural separation in 2 blocks
Direct style continuations: Nirava, aha! This is also the most type-safe alternative: a resource outside withResource block will be type error.
There is light-weight (10 lines of code) ARM included with better-files. See: https://github.com/pathikrit/better-files#lightweight-arm
import better.files._
for {
in <- inputStream.autoClosed
out <- outputStream.autoClosed
} in.pipeTo(out)
// The input and output streams are auto-closed once out of scope
Here is how it is implemented if you don't want the whole library:
type Closeable = {
def close(): Unit
}
type ManagedResource[A <: Closeable] = Traversable[A]
implicit class CloseableOps[A <: Closeable](resource: A) {
def autoClosed: ManagedResource[A] = new Traversable[A] {
override def foreach[U](f: A => U) = try {
f(resource)
} finally {
resource.close()
}
}
}
How about using Type classes
trait GenericDisposable[-T] {
def dispose(v:T):Unit
}
...
def using[T,U](r:T)(block:T => U)(implicit disp:GenericDisposable[T]):U = try {
block(r)
} finally {
Option(r).foreach { r => disp.dispose(r) }
}
Another alternative is Choppy's Lazy TryClose monad. It's pretty good with database connections:
val ds = new JdbcDataSource()
val output = for {
conn <- TryClose(ds.getConnection())
ps <- TryClose(conn.prepareStatement("select * from MyTable"))
rs <- TryClose.wrap(ps.executeQuery())
} yield wrap(extractResult(rs))
// Note that Nothing will actually be done until 'resolve' is called
output.resolve match {
case Success(result) => // Do something
case Failure(e) => // Handle Stuff
}
And with streams:
val output = for {
outputStream <- TryClose(new ByteArrayOutputStream())
gzipOutputStream <- TryClose(new GZIPOutputStream(outputStream))
_ <- TryClose.wrap(gzipOutputStream.write(content))
} yield wrap({gzipOutputStream.flush(); outputStream.toByteArray})
output.resolve.unwrap match {
case Success(bytes) => // process result
case Failure(e) => // handle exception
}
More info here: https://github.com/choppythelumberjack/tryclose
Here is #chengpohi's answer, modified so it works with Scala 2.8+, instead of just Scala 2.13 (yes, it works with Scala 2.13 also):
def unfold[A, S](start: S)(op: S => Option[(A, S)]): List[A] =
Iterator
.iterate(op(start))(_.flatMap{ case (_, s) => op(s) })
.map(_.map(_._1))
.takeWhile(_.isDefined)
.flatten
.toList
def using[A <: AutoCloseable, B](resource: A)
(block: A => B): B =
try block(resource) finally resource.close()
val lines: Seq[String] =
using(new BufferedReader(new FileReader("file.txt"))) { reader =>
unfold(())(_ => Option(reader.readLine()).map(_ -> ())).toList
}
While Using is OK, I prefer the monadic style of resource composition. Twitter Util's Managed is pretty nice, except for its dependencies and its not-very-polished API.
To that end, I've published https://github.com/dvgica/managerial for Scala 2.12, 2.13, and 3.0.0. Largely based on the Twitter Util Managed code, no dependencies, with some API improvements inspired by cats-effect Resource.
The simple example:
import ca.dvgi.managerial._
val fileContents = Managed.from(scala.io.Source.fromFile("file.txt")).use(_.mkString)
But the real strength of the library is composing resources via for comprehensions.
Let me know what you think!