Parameterized logging in slf4j - how does it compare to scala's by-name parameters? - scala

Here are two statements that seem to be generally accepted, but that I can't really get over:
1) Scala's by-name params gracefully replace the ever-so-annoying log4j usage pattern:
if (l.isDebugEnabled() ) {
logger.debug("expensive string representation, eg: godObject.toString()")
}
because the by-name-parameter (a Scala-specific language feature) doesn't get evaluated before the method invocation.
2) However, this problem is solved by parametrized logging in slf4f:
logger.debug("expensive string representation, eg {}:", godObject[.toString()]);
So, how does this work?
Is there some low-level magic involved in the slf4j library that prevents the evaluation of the parameter before the "debug" method execution? (is that even possible? Can a library impact such a fundamental aspect of the language?)
Or is it just the simple fact that an object is passed to the method - rather than a String? (and maybe the toString() of that object is invoked in the debug( ) method itself, if applicable).
But then, isn't that true for log4j as well? (it does have methods with Object params).
And wouldn't this mean that if you pass a string - as in the code above - it would behave identically to log4j?
I'd really love to have some light shed on this matter.
Thanks!

There is no magic in slf4j. The problem with logging used to be that if you wanted to log let's say
logger.debug("expensive string representation: " + godObject)
then no matter if the debug level was enabled in the logger or not, you always evaluated godObject.toString()which can be an expensive operation, and then also string concatenation. This comes simply from the fact that in Java (and most languages) arguments are evaluated before they're passed to a function.
That's why slf4j introduced logger.debug(String msg, Object arg) (and other variants for more arguments). The whole idea is that you pass cheap arguments to the debug function and it calls toString on them and combines them into a message only if the debug level is on.
Note that by calling
logger.debug("expensive string representation, eg: {}", godObject.toString());
you drastically reduce this advantage, as this way you convert godObject all the time, before you pass it to debug, no matter what debug level is on. You should use only
logger.debug("expensive string representation, eg: {}", godObject);
However, this still isn't ideal. It only spares calling toString and string concatenation. But if your logging message requires some other expensive processing to create the message, it won't help. Like if you need to call some expensiveMethod to create the message:
logger.debug("expensive method, eg: {}",
godObject.expensiveMethod());
then expensiveMethod is always evaluated before being passed to logger. To make this work efficiently with slf4j, you still have to resort back to
if (logger.isDebugEnabled())
logger.debug("expensive method, eg: {}",
godObject.expensiveMethod());
Scala's call-by-name helps a lot in this matter, because it allows you to wrap arbitrary piece of code into a function object and evaluate that code only when needed. This is exactly what we need. Let's have a look at slf4s, for example. This library exposes methods like
def debug(msg: => String) { ... }
Why no arguments like in slf4j's Logger? Because we don't need them any more. We can write just
logger.debug("expensive representation, eg: " +
godObject.expensiveMethod())
We don't pass a message and its arguments, we pass directly a piece of code that is evaluated to the message. But only if the logger decides to do so. If the debug level isn't on, nothing that's within logger.debug(...) is ever evaluated, the whole thing is just skipped. Neither expensiveMethod is called nor any toString calls or string concatenation happen. So this approach is most general and most flexible. You can pass any expression that evaluates to a String to debug, no matter how complex it is.

Related

What's the correct way to convert from StringBuilder to String?

From what I've seen online, people seem to suggest that the toString() method is to be used, however the documentation states:
Creates a String representation of this object. The default representation is platform dependent. On the java platform it is the concatenation of the class name, "#", and the object's hashcode in hexadecimal.
So it seems like using this method might cause some problems down the line?
There is also mkString and result(). The latter of which seems to make the most sense. But I'm not sure what the differences between these 3 methods are and if that's how result() is supposed to be used.
The toString implementation currently just redirects to the result method anyway, so those two methods will behave in the same way. However, they express slightly different intent:
toString requests a textual representation of StringBuilders current state that is "concise but informative (and) that is easy for a person to read". So, theoretically, the (vague) specification of this method does not forbid abbreviating the result, or enhancing conciseness and readability in any other way.
result requests the actual constructed string. No different readings seem possible here.
Therefore, if you want to obtain the resulting string, use result to express your intent as clearly as possible.
In this way, the reader of your code won't have to wonder whether StringBuilder.toString might shorten something for the sake of "conciseness" when the string gets over 9000 kB long, or something like that.
The mkString is for something else entirely, it's mostly used for interspersing separators, as in "hello".mkString(",") == "h,e,l,l,o".
Some further links:
The paragraph with "hashcode in hexadecimal" describes the default. It is just documentation inherited from AnyRef, because the creator of StringBuilder didn't bother to provide more detailed documentation.
If you look into code, you'll see that toString is actually just delegating to result.
The documentation of StringBuilder also mentions result() in the introductory overview paragraph.
Just use result().
TL;DR; use result as stated in the docs.
toString MUST never be called in anything at all for another purpose other than a quick debug.
mkString is inherited from collections hierarchy and it will basically create another StringBuilder so is very inefficient.

How can I allow the caller to call method of field of case class?

I am not sure the keywords for this pattern, sorry if the question is not clear.
If you have:
case class MyFancyWrapper(
somethingElse: Any,
heavyComplexObject: CrazyThing
)
val w = MyFancyWrapper(???, complexThing)
I want to be able to call w.method with the method coming from complexThing. I tried to extends CrazyThing but it is a trait and I don't want to implement all the method that would be very tedious. I also don't want to have to do:
def method1 = heavyComplexObject.method1
...
for all of them.
Any solution ?
Thanks.
You can do this with macros but I agree with Luis that this is an overkill. Macros are intended to repetitive boring things, not one time boring things. Also this is not as trivial as it sounds, because you probably don't want to pass through all the methods (you probably still want your own hashCode and equals). Finally macros have bad IDE support so most probably no auto-completion for all those methods. On the other hand if you do use a good IDE (like IDEA) there is most probably an action like "Delegate methods" that will generate most of the code for you. You still will have to change the return type from Unit to MyFancyWrapper and add returning this at the end of each method but this can easily be done with mass replace operations (hint: replace "}" with "this }" and the automatically re-formatting code should do the trick)
Here are some screenshots of the process from JetBrains IDEA:
You can use an implicit conversion to make all the methods of heavyComplexThing directly available on MyFancyWrapper:
implicit def toHeavy(fancy: MyFancyWrapper): CrazyThing = fancy.heavyComplexObject
This needs to be in scope when the method is called.
In the comments you indicate that you want to return this so that you can chain multiple calls on the same object:
w.method1.method2.method3
Don't do this
While this is a common pattern in non-functional languages, it is bad practice is Scala for two reasons:
This pattern inherently relies on side-effects, which is the antithesis of functional programming.
It is confusing, because in Scala chaining calls in this way is used to implement a data pipeline, where the output of one function is passed as the input to the next.
It is much clearer to write separate statements so that it is obvious that the methods are being called on the same object:
w.method1()
w.method2()
w.method3()
(It is also conventional to use () when calling methods with side effects)

Is it a pure function if it reads some data from outside rather than parameters?

In the book of "functional programming in Scala", it gives some examples about side-effects, like:
Modifying a variable
Modifying a data structure in place
Setting a field on an object
Throwing an exception or halting with an error  Printing to the console or reading user input
Reading from or writing to a file
Drawing on the screen
My question is, is reading some data from outside rathen than the parameters makes the function impure?
E.g.
val name = "Scala"
def upcase() = name.toUpperCase
Is the upcase function pure or not?
Edit: as per this answer: https://stackoverflow.com/a/31377452/342235, my "function" is not actually function, it's a method, so I give a function version of it, and ask the same question:
val name = "Scala"
val upcase: () => String = () => name.toUpperCase
Reading from immutable data is not impure; the function will still return the same value every time. If name were a var then that function would be impure, since something external could change name, so multiple calls to upcase() might evaluate to different values.
(Of course it might be possible to e.g. alter name through reflection. Properly we can only talk about purity with respect to some notion of what kind of functions are allowed to call a given function, and what kind of side effects we consider to be equivalent)
It's worth noting that your function is not pure because toUpperCase is not pure; it depends on the system's default Locale, and may produce different results on different systems (e.g. on a Turkish system, "i".toUpperCase == "İ"). You should always pass an explicit Locale, e.g. def upcase() = name.toUpperCase(Locale.ENGLISH); then the function will be pure.
Interestingly, the answer is "No", but not for the reason you think it is. Your upcase is not a pure function. It is, however, pure, but it is a method, not a function.

Parenthesis for not pure functions

I know that that I should use () by convention if a method has side effects
def method1(a: String): Unit = {
//.....
}
//or
def method2(): Unit = {
//.....
}
Do I have to do the same thing if a method doesn't have side effects but it's not pure, doesn't have any parameters and, of course, it returns the different results each time it's being called?
def method3() = getRemoteSessionId("login", "password")
Edit: After reviewing Luigi Plinge's comment, I came to think that I should rewrite the answer. This is also not a clear yes/no answer, but some suggestions.
First: The case regarding var is an interesting one. Declaring a var foo gives you a getter foo without parentheses. Obviously it is an impure call, but it does not have a side effect (it does not change anything unobserved by the caller).
Second, regarding your question: I now would not argue that the problem with getRemoteSessionId is that it is impure, but that it actually makes the server maintain some session login for you, so clearly you interfere destructively with the environment. Then method3() should be written with parentheses because of this side-effect nature.
A third example: Getting the contents of a directory should thus be written file.children and not file.children(), because again it is an impure function but should not have side effects (other than perhaps a read-only access to your file system).
A fourth example: Given the above, you should write System.currentTimeMillis. I do tend to write System.currentTimeMillis() however...
Using this forth case, my tentative answer would be: Parentheses are preferable when the function has either a side-effect; or if it is impure and depending on state not under the control of your program.
With this definition, it would not matter whether getRemoteSessionId has known side-effects or not. On the other hand, it implies to revert to writing file.children()...
The Scala style guide recommends:
Methods which act as accessors of any sort (either encapsulating a field or a logical property) should be declared without parentheses except if they have side effects.
It doesn't mention any other use case besides accessors. So the question boils down to whether you regard this method as an accessor, which in turns depends on how the rest of the class is set up and perhaps also on the (intended) call sites.

Scala toString: parenthesize or not?

I'd like this thread to be some kind of summary of pros/cons for overriding and calling toString with or without empty parentheses, because this thing still confuses me sometimes, even though I've been into Scala for quite a while.
So which one is preferable over another? Comments from Scala geeks, officials and OCD paranoids are highly appreciated.
Pros to toString:
seems to be an obvious and natural choice at the first glance;
most cases are trivial and just construct Strings on the fly without ever modifying internal state;
another common case is to delegate method call to the wrapped abstraction:
override def toString = underlying.toString
Pros to toString():
definitely not "accessor-like" name (that's how IntelliJ IDEA inspector complains every once in a while);
might imply some CPU or I/O work (in cases where counting every System.arrayCopy call is crucial to performance);
even might imply some mutable state changing (consider an example when first toString call is expensive, so it is cached internally to yield quicker calls in future).
So what's the best practice? Am I still missing something?
Update: this question is related specifically to toString which is defined on every JVM object, so I was hoping to find the best practice, if it ever exists.
Here's what Programming In Scala (section 10.3) has to say:
The recommended convention is to use a parameterless method whenever
there are no parameters and the method accesses mutable state only by
reading fields of the containing object (in particular, it does not
change mutable state). This convention supports the uniform access
principle,1 which says that client code should not be affected by a
decision to implement an attribute as a field or method.
Here's what the (unofficial) Scala Style Guide (page 18) has to say:
Scala allows the omission of parentheses on methods of arity-0 (no
arguments):
reply()
// is the same as
reply
However, this syntax
should only be used when the method in question has no side-effects
(purely-functional). In other words, it would be acceptable to omit
parentheses when calling queue.size, but not when calling println().
This convention mirrors the method declaration convention given above.
The latter does not mention the Uniform Access Principle.
If your toString method can be implemented as a val, it implies the field is immutable. If, however, your class is mutable, toString might not always yield the same result (e.g. for StringBuffer). So Programming In Scala implies that we should use toString() in two different situations:
1) When its value is mutable
2) When there are side-effects
Personally I think it's more common and more consistent to ignore the first of these. In practice toString will almost never have side-effects. So (unless it does), always use toString and ignore the Uniform Access Principle (following the Style Guide): keep parentheses to denote side-effects, rather than mutability.
Yes, you are missing something: Semantics.
If you have a method that simply gives back a value, you shouldn't use parens. The reason is that this blurs the line between vals and defs, satisfying the Uniform Access Principle. E.g. consider the size method for collections. For fixed-sized vectors or arrays this can be just a val, other collections may need to calculate it.
The usage of empty parens should be limited to methods which perform some kind of side effect, e.g. println(), or a method that increases an internal counter, or a method that resets a connection etc.
I would recommend always using toString. Regarding your third "pro" to toString():
Might imply some mutable state changing (consider an example when first toString call is expensive, so it is cached internally to yield quicker calls in future).
First of all, toString generally shouldn't be an expensive operation. But suppose it is expensive, and suppose you do choose to cache the result internally. Even in that case, I'd say use toString, as long as the result of toString is always the same for a given state of the object (disregarding the state of the toString cache).
The only reason I would not recommend using toString without parens is if you have a code profiler/analyzer that makes assumptions based on the presence or absence of parens. In that case, follow the conventions set forth by said profiler. Also, if your toString is that complicated, consider renaming it to something else, like expensiveToString. It is unofficially expected that toString be a straightforward, simple function in most cases.
Not much argumentation in this answer but GenTraversableOnce alone declares the following defs without parentheses:
toArray
toBuffer
toIndexedSeq
toIterable
toIterator
toList
toMap
toSeq
toSet
toStream
toTraversable