Try to understand Kafka Scala Syntax - scala

I have been going through Kafka source code of the Log class in core module of Kafka project but I am still new to scala.
I have encountered a syntax which is quite hard to understand.
Here is the code snippets:
snippet 1:
// Now do a second pass and load all the log and index files.
// We might encounter legacy log segments with offset overflow (KAFKA-6264). We need to split such segments. When
// this happens, restart loading segment files from scratch.
retryOnOffsetOverflow {
// In case we encounter a segment with offset overflow, the retry logic will split it after which we need to retry
// loading of segments. In that case, we also need to close all segments that could have been left open in previous
// call to loadSegmentFiles().
logSegments.foreach(_.close())
segments.clear()
loadSegmentFiles()
}
snippet2:
private[log] def retryOnOffsetOverflow[T](fn: => T): T = {
while (true) {
try {
return fn// what is fn here in context of code snippet 1?
} catch {
case e: LogSegmentOffsetOverflowException =>
info(s"Caught segment overflow error: ${e.getMessage}. Split segment and retry.")
splitOverflowedSegment(e.segment)//##!!!1.return a List[Segement], but where does this return goes?
}
}
throw new IllegalStateException()
}
What I found hard to understand is that how the method retryOnOffsetOverflow is called in snippet 1 and what is passed to it as the argument of its param? I know the param of retryOnOffsetOverflow is a function but here in this snippet what is the argument passed to this function?
Also I am not clear what is the return of retryOnOffsetOverflow here? The return is T which is kind of a generic? I am not sure what is the return of retryOnOffsetOverflow here, will it be different according to fact that it caught the exception or not? If so what will exactly be the return respectively?
Thank a lot for the explanation and please tell me if I missed the necessary code to answer the question.
updated: I would rectify my self that the param of retryOnOffsetOverflow is a by-name parameter which won't be evaluated unless and until it is referenced somewhere in the body of the method.

Upd.: Slightly changed last part, as looks like it would load "splitted" files right in the next loop iteration.
Param for retryOnOffsetOverflow here is everything inside curly braces - basically, these three lines of code - that's the function (fn: => T), that it accepts.
retryOnOffsetOverflow is trying to execute function, that was passed, and returns it's answer, if the execution was successful. One part that's a bit difficult to understand - when there's an exception, splitOverflowedSegment is called not for it return type, but for the fact that it "mutates state" in replaceSegments function. This segments would be read on the next loop iteration restart, in loadSegmentFiles function.

What I found hard to understand is that how the method retryOnOffsetOverflow is called in snippet 1 and what is passed to it as the argument of its param? I know the param of retryOnOffsetOverflow is a function but here in this snippet what is the argument passed to this function?
Consider following simplified example
def printTillSuccess[T](fn: => T): T = {
while (true) {
val result = fn
println(result)
return result
}
throw new Exception("dead end")
}
printTillSuccess("a")
printTillSuccess({ "a" })
printTillSuccess { "a" }
fn: => T is not a function but a by-name parameter. It will be evaluated on each reference, ie at line val result = fn. Functions in Scala have an apply method and this not the case here.
You can pass values into a method via (), and this is done in example printTillSuccess("a").
Scala allows wrapping any block of code with {} and the last statement will be used as result of the block. Thus, {"a"} is same as "a".
So, now you can pass {"a"} into methods, thus printTillSuccess({ "a" }) is a valid call.
And finally, Scala allows for substitution of () with block definitions {} in methods and this opens syntax printTillSuccess { "a" }.
Also I am not clear what is the return of retryOnOffsetOverflow here?
The return is T which is kind of a generic? I am not sure what is the
return of retryOnOffsetOverflow here, will it be different according
to fact that it caught the exception or not? If so what will exactly
be the return respectively?
The return type is the type of the by-name parameter and it's T. And statement return fn is the only place that defines T.
In case of LogSegmentOffsetOverflowException, the catch will be called and splitOverflowedSegment executed. This will mutate some internal state and next iteration of while(true) will evaluate the by-name parameter again. So, the exception does not change the return type but allows for the next iteration to happen.
The while loop can only exit when fn evaluates successfully or a different exception is thrown. In any case, the return, if happens, will be of T.

def retryOnOffsetOverflow[T](fn: => T): T
retryOnOffsetOverflow() is a method that take a single "by name" parameter, which means that the parameter won't be evaluated unless and until it is referenced somewhere in the body of the method.
So fn might be a single value and it might be multiple lines of code, and it will remain un-evaluated (un-executed) until here try { return fn }, where it is executed wrapped in a try.
The fn evaluation will result in value of some type. It doesn't really matter what that type is (we'll call it T) but retryOnOffsetOverflow() is required to return the same type.

Related

Scala Function Currying and call By name Functions, GenricTypes

I am bit new to scala curying and the call by name functions. I am facing difficulty in understanding the Syntax. What is the fllow of the function why there is need of returning the f(result) and what function is applied on it further.
def withScan[R](table: Table, scan: Scan)(f: (Seq[Result]) => R): R = {
var resultScanner: ResultScanner = null
try {
resultScanner = table.getScanner(scan)
val it: util.Iterator[Result] = resultScanner.iterator()
val results: mutable.ArrayBuffer[Result] = ArrayBuffer()
while (it.hasNext) {
results += it.next()
}
f(results)
} finally {
if (resultScanner != null)
resultScanner.close()
}
}
Let's look at just the function signature
def withScan[R](table: Table, scan: Scan)(f: (Seq[Result]) => R): R
Firstly, ignore the fancy currying syntax for now as you can always rewrite a curried function into a normal function by putting all the parameters in one parameter list i.e.
def withScan[R](table: Table, scan: Scan, f: Seq[Result] => R): R
Secondly, notice the last parameter is a function on its own and we don't know what it does yet. withScan will take a function somebody gives it and use that function on something. We might be interested in why someone needs such a function. Since we need to deal with a lot of resources that need to be opened and closed properly such as File, DatabaseConnection, Socket,... we will then repeat ourselves with the code that closes the resources or even worse, forget to close the resources. Hence we want to factor the boring common code out to give you a convenient function: if you use withScan to access the table, we will somehow give you the Result so that you can work on that and also we will make sure to close the resources properly for you so that you can just focus on the interesting operation. This is call the "loan pattern"
Now let's go back to the currying syntax. Although currying has other interesting use cases, I believe the reason it is written in this style is in Scala, you can use curly braces block to pass the parameter to the function i.e. one can use the function above like this
withScan(myTable, myScan) { results =>
//do whatever you want with the results
}
This looks just like a built in control flow like if-else or for loop!
As I understand that properly this is function which take some Table (probably db table) and try to scna this tabel using argument scan. After you collect data using relevant scanner this method just map collected sequence to object of type R.
For such mapping it is used f function.
You can use this function:
val list: List[Result] = withScan(table, scanner)(results => results.toList)
Or
val list: List[Result] = withScan(table, scanner)(results => ObjectWhichKeepAllData(results))
IMHO, it is not very well written code, and also I feel that the better would be to do mapping thing outside of this function. Let client do the mapping (which BTW should be for every single result) and leave scanning only for that function.
This is an example of a higher-order function: a function which takes another function as a parameter.
The function appears to do the following:
- opens the passed in table with the passed in scanner
- parses the table with an iterator, populating entries in a local ArrayBuffer
- calls a function, passed in by the caller, on the sequence of entries that have been parsed.
The function parameter allows this function to be used to carry out any operation on the scanned information, depending on the function passed in.
The function prototype could equally have been declared:
def withScan[R](table: Table, scan: Scan, f: (Seq[Result]) => R): R = {
The function has been declared with two argument lists; this is an example of currying. This is a benefit when calling the function, as it allows the method to be called with a clearer syntax.
Consider a function that might be passed into this function:
def getHighestValueEntry(results: Seq[Result]): R = {
Without currying, the function would be called like this:
withScan[R](table, scan, results => getHighestValueEntry(results))
With currying the function can be called in a manner that makes the function parameter stand out more clearly. This is helped by the ability in Scala to use curly braces instead of parentheses to surround the arguments to a function, if you are only passing in one argument:
withScan(table, scan) { results =>
getHighestValueEntry(results) }

Scala - Method Parameters - def function: Row => Message = { row => {

Coming from Java background trying to understand this Scala code:
def function: Row => Message = {
row => {
// code
// code
}
}
As I understand we pass a function that returns type Message? And then we actually implement row? Why first Row is capital and second is not?
Thanks.
Let's break it down.
def function
Declare a method named function that has no inputs.
:Row => Message
The return type is a function that take a Row as input and returns a Message
= row => {...}
Define and anonymous function with a single input named row. This is the function that is returned (in Scala the last thing in a block is returned so you don't need to use the return keyword). Scala is able to figure out what the input and output types of this function should be because they have to match the return type you declared for the method.
As I understand we pass a function that returns type Message?
No. Nothing is being passed. The method function doesn't have any parameters, it doesn't take any arguments, so you can't pass any.
And then we actually implement row?
No. Nothing is being implemented. There are no abstract members or abstract interfaces here to implement.
Why first Row is capital and second is not?
It doesn't have to be. It's just a convention. Types are usually written in PascalCase, parameters, fields, and methods in camelCase. (Actually, it's the exact same thing in Java.)
A rough Java equivalent would look something like this:
java.util.Function<Row, Message> function() {
return row -> {
// code
// code
};
}
As you can see, there really isn't much difference between the two.
The return type is a function that takes type Row and returns Message. There should be some higher level implementation of that function.
Check this link, it will help you understand:
https://www.safaribooksonline.com/library/view/scala-cookbook/9781449340292/ch09s08.html
Not quite.
def means that you are defining a method. In this case it takes no argument and its return type is Row => Message.
So the function method you are defining returns a function that takes a Row and returns a Message.

Scala no argument string function vs typed String parameter

I ran across a function that looks like this:
def doSomethingQuestionable(config: someConfig, value: String)(default: => String) : String
What is interesting is the parameterless function that gets passed in as second argument group. In the code base, the method is only ever called with a config and two strings, the latter being some default value, but as a String, not a function. Within the code body of the method, default is passed on to a method that takes 3 string arguments. So the function "default" only resolves down to a string within the body of this method.
Is there any benefit, apart from a currying usage which does not happen with this method in the code base I am going through, of defining the method this way? Why not just define it with 3 string arguments in a single argument group?
What am I missing? Some compiler advantage here? Keep in mind, I am assuming that no currying will ever be done with this, since it is a large code base, and it is not currently done with this method.
The point is to have a potentially expensive default string that is only created when you need it. You write the code as if you're creating the string to pass in, but because it's a by-name parameter ('=> String') it will actually be turned into a function that will be transparently called whenever default is referenced in the doSomethingQuestionable method.
The reason to keep it separate is in case you do want a big block of code to create that string. If you never do and never will, it may as well be
def doSomethingQuestionable(config: someConfig, value: String, default: => String): String
If you do, however,
def doSomethingQuestionable(cfg, v){
// Oh boy, something went wrong
// First we need to check if we have a database accessible
...
// (Much pain ensues)
result
}
is way better than embedding the code block as one argument in a multi-argument parameter list.
This is a parameterless function returning a String:
() => String
Which is not what you have. This,
=> <WHATEVER>
is a parameter being passed by-name instead of by-value. For example:
=> String // A string being passed by-name
=> () => String // A parameterless function returning string being passed by-name
The difference between these modes is that, on by-value, the parameter is evaluated and the resulting value is passed, whereas on by-name, the parameter is passed "as is", and evaluated each time it is used.
For example:
var x = 0
def printValue(y: Int) = println(s"I got $y. Repeating: $y.")
def printName(y: => Int) = println(s"I got $y. Repeating: $y.")
printValue { x += 1; x } // I got 1. Repeating: 1.
printName { x += 1; x } // I got 2. Repeating: 3.
Now, as to why the method splits that into a second parameter, it's just a matter of syntactic pleasantness. Take the method foldLeft, for example, which is similarly defined. You can write something like this:
(1 to 10).foldLeft(0) { (acc, x) =>
println(s"Accumulator: $acc\tx: $x\tacc+x: ${acc+x}")
acc+x
}
If foldLeft was defined as a single parameter list, it would look like this:
(1 to 10).foldLeft(0, { (acc, x) =>
println(s"Accumulator: $acc\tx: $x\tacc+x: ${acc+x}")
acc+x
})
Not much different, granted, but worse looking. I mean, you don't write this thing below, do you?
if (x == y, {
println("Same thing")
}, {
println("Different thing"
})

How should I read this piece of Scala (Play) code?

I am new to Scala, and am learning it by going over some Play code. I have had a good read of the major concepts of Scala and am comfortable with functional programming having done some Haskell and ML.
I am really struggling to read this code, at the level of the syntax and the programming paradigms alone. I understand what the code is supposed to do, but not how it does it because I can't figure out the syntax.
// -- Home page
def index(ref: Option[String]): Action[AnyContent] = Prismic.action(ref) { implicit request =>
for {
someDocuments <- ctx.api.forms("everything").ref(ctx.ref).submit()
} yield {
Ok(views.html.index(someDocuments))
}
}
(Prismic is an API separate to Play and is not really that relevant). How would I describe this function (or is it a method??) to another developer over the phone: in other words, using English. For example in this code:
def add(a: Int, b: Int): Int = a + b
I would say "add is a function which takes two integers, adds them together and returns the result as another integer".
In the Play code above I don't even know how to describe it after getting to "index is a function which takes an Option of a String and returns an Action of type AnyContent by ....."
The bit after the '=' and then the curly braces and the '=>' scare me! How do I read them? And is the functional or OO?
Thanks for your assistance
Let's reduce it to this:
def index(ref: Option[String]): Action[AnyContent] = Prismic.action(ref)(function)
That's better, isn't it? index is a function from Option of String to Action of AnyContent (one word), which calls the action method of the object Prismic passing two curried parameters: ref, the parameter that index received, and a function (to be described).
So let's break down the anonymous function:
{ implicit request =>
for {
someDocuments <- ctx.api.forms("everything").ref(ctx.ref).submit()
} yield {
Ok(views.html.index(someDocuments))
}
}
First, it uses {} instead of () because Scala allows one to drop () as parameter delimiter if it's a single parameter (there are two parameter lists, but each has a single parameter), and that parameter is enclosed in {}.
So, what about {}? Well, it's an expression that contains declarations and statements, with semi-colon inference on new lines, whose value is that of the last statement. That is, the value of these two expressions is the same, 3:
{ 1; 2; 3 }
{
1
2
3
}
It's a syntactic convention to use {} when passing a function that extends for more than one line, even if, as in this case, that function could have been passed with just parenthesis.
The next thing confusing is the implicit request =>, Let's pick something simpler:
x => x * 2
That's pretty easy, right? It takes one parameter, x, and returns x * 2. In our case, it is the same thing: the function takes one parameter, request, and returns this:
for (someDocuments <- somethingSomething())
yield Ok(views.html.index(someDocuments))
That is, it calls some methods, iterate over the result, and map those results into a new value. This is a close equivalent to Haskell's do notation. You can rewrite it like below (I'm breaking it down into multiple lines for readability):
ctx
.api
.forms("everything")
.ref(ctx.ref)
.submit()
.map(someDocuments => Ok(views.html.index(someDocuments)))
So, back to our method definition, we have this:
def index(ref: Option[String]): Action[AnyContent] = Prismic.action(ref)(
implicit request =>
ctx
.api
.forms("everything")
.ref(ctx.ref)
.submit()
.map(someDocuments => Ok(views.html.index(someDocuments)))
)
The only remaining question here is what that implicit is about. Basically, it makes that parameter implicitly available through the scope of the function. Presumably, at least one of these method calls require an implicit parameter which is properly fielded by request. I could drop the implicit there an pass request explicitly, if I knew which of these methods require it, but since I don't, I'm skipping that.
An alternate way of writing it would be:
def index(ref: Option[String]): Action[AnyContent] = Prismic.action(ref)({
request =>
implicit val req = request
ctx
.api
.forms("everything")
.ref(ctx.ref)
.submit()
.map(someDocuments => Ok(views.html.index(someDocuments)))
})
Here I added {} back because I added a declaration to the body of the function, though I decided not to drop the parenthesis, which I could have.
Something like this:
index is a function which takes an Option of a String and returns an Action of type AnyContent. It calls the method action that takes as a first argument an Option and as a second argument a method that assumes an implicit value request of type Request is in scope. This method uses a For-comprehension that calls the submit method which returns an Option or a Future and then in case its execution is successful, it yields the result Ok(...) that will be wrapped in the Action returned by the action method of Prismic.
Prismic.action is a method that takes 2 groups of arguments (a.k.a. currying).
The first is ref
The second is { implicit request => ...}, a function defined in a block of a code
more information on Action

Is there some reason to avoid return statements

Sometimes I see chunks of Scala code, with several nested levels of conditionals and matchings, that would be much clearer using an explicit return to exit from the function.
Is there any benefit in avoiding those explicit return statements?
A return may be implemented by throwing an exception, so it may have a certain overhead over the standard way of declaring the result of a method. (Thanks for Kim Stebel for pointing out this is not always, maybe not even often, the case.)
Also, a return on a closure will return from the method in which the closure is defined, and not simply from the closure itself. That makes it both useful for that, and useless for returning a result from closures.
An example of the above:
def find[T](seq: Seq[T], predicate: T => Boolean): Option[T] = {
seq foreach { elem =>
if (predicate(elem)) return Some(elem) // returns from find
}
None
}
If you still don't understand, elem => if (predicate(elem)) return Some(elem) is the method apply of an anonymous object of that implements Function1 and is passed to foreach as parameter. Remove return from it, and it won't work.
One drawback is that the return type can't be inferred. Everything else is a matter of style. What seems unclear or confusing to you might be perfectly "natural" to someone else.
An explicit return breaks the control flow. For example if you have a statement like
if(isAuth(user)) {
return getProfile(user)
}
else {
return None
}
the control structure (if) is not finished, which is the reason why I argue it is more confusing. For me this is analogous to a break statement. Additionally Scalas 'everything is a value' principle reduces the need for using explicit returns which leads to less people using a keyword which is only useful for def statements:
// start
def someString:String = return "somestring"
// def without return
def someString = "somestring"
// after refactoring
val someString = "somestring"
You see that the type annotation has to be added and when changing the def to a val it is required to remove the return.