Recommended style for arity-1 call followed by arity-0 call - scala

Given that the official Scala style guide
http://docs.scala-lang.org/style/method-invocation.html
recommends using infix notation for arity-1 calls and dot notation for arity-0 calls, what's the recommended style for an arity-1 call followed by an arity-0 call?
For example, would this be the recommended way?
(bytes map (_.toChar)).mkString

The summary of the style guide is basically: use point-free style whenever it is simple and clear. Otherwise, don't. In this case, your options are
bytes.map(_.toChar).mkString
(bytes map (_.toChar)).mkString
and the former looks simpler and clearer to me, so it wins.
Really long chains are also not very clear in point-free notation
foo bar baz qux quux bippy
Say what again?
foo.bar(baz).qux(quux).bippy
Oh, okay.
Be pragmatic. Point-free style can be clean and elegant. Good coding style will often lead you to write code that looks nice point-free. But the point of the syntax is to be clear and avoid errors, so use whichever better accomplishes the goal.

I don't know who wrote this guide, but he definitely seemed biased, and I would advice against following this guide. Infix notation has a lot of pitfalls, which the author doesn't mention, and the benefits of it are questionable at very least. The arguments the author uses are not any less questionable.
The author argues that the following code makes it look like a method filter is being called on a function (_.toUpperCase):
names.map (_.toUpperCase).filter (_.length > 5)
But nobody formats the code like that. The following are the standard practices and they both introduce no ambiguity of such:
names.map(_.toUpperCase).filter(_.length > 5)
names.map( _.toUpperCase ).filter( _.length > 5 )
Pitfalls of the infix notation
Parameterless methods cannot be used inside a method calls chain.
Parameterless methods cannot be used in the end of a method calls chain, because they grab the terms from the next line.
It does not allow splitting to multiple lines. You'll either have to introduce some lispy bracketing or awkardly place the parameters on the next line. Another option is to switch back to the "dot" notation and end up with an inconsistent style.
All those options can hardly be referred to as increasing the readability.
It breeds misunderstandings like that.
Finally, it adds a layer of obfuscation of your intent. I.e., a reader has to analyse how the compiler will infer the dots and braces prior to actually comprehending the code.
Conclusion
The only argument for this notation that I've ever met is that it increases readability. While this argument is questionable itself, I find that it can hardly stand against any of the aforementioned drawbacks of this notation, due to which it often even decreases the readability.
The most consistent and safe standard is to use infix notation only for operators, i.e., methods with names like +, *, >>=.

Related

How to use B::Hooks to manipulate the perl parser

I'm looking to play with perl parser manipulation. It looks like the various B::Hooks modules are what people use. I was wondering:
Best place to start for someone who has no XS experience (yet). Any relevant blog posts?
How much work would be involved in creating a new operator, for example:
$a~>one~>two~>three
~> would work like -> but it would not try to call on undef and would instead simply return undef to LHS.
Although a source filter would work -- I'm more interested in seeing how you can manipulate the parser at a deeper level.
I don't believe you can add infix operators (operators whose operands are before and after the operator), much less symbolic ones (as opposed to named operators), but you could write an an op checker that replaces method calls. This means you could cause ->foo to behave differently. By writing your module as a pragma, you could limit the effect of your module to a lexical scope (e.g. { use mypragma; ...}).

Coffeescript whitespace, arguments and function scopes

I've been using CoffeeScript for a while. I find it a good language overall, certainly better than plain JS, but I find I'm still baffled by its indentation rules. Take this example:
Bacon.mergeAll(
#searchButton.asEventStream('click')
#searchInput.asEventStream('keyup')
.filter (e) => e.keyCode is 13
)
.map =>
#searchInput.val()
.flatMapLatest (query) =>
Bacon.fromPromise $.ajax
url: #searchURL + encodeURI query
dataType: 'jsonp'
This does what it should (the code is based on this tutorial, btw) but it took me a lot of trial and error to get it right.
Why do mergeAll and asEventStream require parentheses around their arguments? Why is indentation not enough to determine where their argument lists begin and end? OTOH, why is indentation enough for map and flatMapLatest? Why is the whitespace before a hanging method, such as .filter (its indentation level) not enough to determine what it binds to? It seems to be completely ignored.
Is there a definitive guide to this language's indentation rules? I never had a problem understanding Python syntax at a glance, even with very complex nesting, so it's not an issue with indentation-based syntax per se.
Indentation in CoffeeScript generally defines blocks, and argument lists aren't (necessarily) blocks. Similarly, a chained function call isn't a block; CoffeeScript simply sees a line starting with . and connects it to the previous line of similar or lower indentation.
Hence, the parentheses are needed for asEventStream, since CoffeeScript would otherwise see:
#searchInput.asEventStream 'keyup'.filter (e) => e.keyCode is 13
Which would call filter on the 'keyup' string, and it'd remain ambiguous whether the function is an argument to filter, or an argument to #searchInput.asEventStream('keyup'.filter)(). That last bit obviously doesn't make much sense, but CoffeeScript isn't a static analyzer, so it doesn't know that.
A function, meanwhile, is a block, hence the function argument to .map() works without parentheses, since it clearly delimited by its indentation. I.e. the line following the function has less indentation.
Personally, I'd probably write
Bacon.mergeAll(
#searchButton.asEventStream('click'), # explicit comma
#searchInput.asEventStream('keyup').filter (e) -> e.keyCode is 13 # no need for =>
)
.map(=> #searchInput.val()) # maybe not as pretty, but clearer
.flatMapLatest (query) =>
Bacon.fromPromise $.ajax
url: #searchURL + encodeURI query
dataType: 'jsonp'
In fact, I might break it up into separate expressions to make it clearer still. Insisting on the syntactic sugar while chaining stuff can indeed get confusing in CoffeeScript, but remember that you're not obliged to use it. Same as you're not obliged to always avoid parentheses; if they make things clearer, by all means use 'em!
If the code's easier to write, less ambiguous to read, and simpler to maintain without complex chaining/syntax (all of which seems true for this example), then I'd say just skip it.
In the end, there just are combinations of indentation syntax in CoffeeScript that can make either you or the compiler trip. Mostly, though, if you look at something, and find it straightforward, the compiler probably thinks so too. If you're in doubt, the compiler might be too, or it'll interpret it in unexpected ways. That's the best I can offer in terms of "definitive guide" (don't know of a written one).
Have you looked at the Javascript produced by this code? What happens when you omit ().
In Try Coffeescript I find that:
#searchButton.asEventStream 'click'
is ok. The second asEventStream compiles to:
this.searchInput.asEventStream('keyup').filter(function(e) {
but omitting the () changes it to:
this.searchInput.asEventStream('keyup'.filter(function(e) {
filter is now an attribute of 'keyup'. Putting a space to separate asEventStream and ('keyup') does the same thing.
#searchInput.asEventStream ('keyup')
As written .mergeAll() produces:
Bacon.mergeAll(...).map(...).flatMapLatest(...);
Omitting the ()
Bacon.mergeAll
#searchButton.asEventStream('click')
#searchInput.asEventStream('keyup')
gives an error because the compiler has no way of knowing that mergeAll is a function that takes arguments. It has no reason to expect an indented block.
Coming from Python, my inclination is to continue to use (),[],{} to mark structures like arguments, arrays and objects, unless the code is clearer without them. Often they help me read the code, even if the compiler does not need them. Coffeescript is also like Python in the use of indentation to denote code blocks (as opposed to the {} used in Javascript and other C styled languages).

Using regexp to index a file for imenu, performance is unacceptable

I'm producing a function for imenu-create-index-function, to index a source code module, for csharp-mode.el
It works, but delivers completely unacceptable performance. Any tips for fixing this?
The Background
I looked at js.el, which is the rebadged "espresso" now included, since v23.2, into emacs. It indexes Javascript files very nicely, does a good job with anonymous functions and various coding styles and patterns in common use. For example, in javascript one can do:
(function() {
var x = ... ;
function foo() {
if (x == 1) ...
}
})();
...to define a scope where x is "private" or inaccessible from other code. This gets indexed nicely by js.el, using regexps, and it indexes the inner functions (anonymous or not) within that scope also. It works quickly. A big module can be indexed in less than a second.
I tried following a similar approach in csharp-mode, but it's quite a bit more complicated. In Js, everything that gets indexed is a function. So the starting regex is "function" with some elaboration on either end. Once an occurrence of the function keyword is found, then there are 4 - 8 other regexps that get tried via looking-at - the number depends on settings. One nice thing about js mode is that you can turn on or off regexps for various coding styles, to speed things along I suppose. The default "styles" work for most of the code I tried.
This doesn't work in csharp-mode. It works, but it performs poorly enough to make it not very usable. I think the reason for this is that
there is no single marker keyword in C#, as function behaves in javascript. In C# I need to look for namespace, class, struct, interface, enum, and so on.
there's a great deal of flexibility with which csharp constructs can be defined. As one example, a class can define base classes as well as implemented interfaces. Another example: The return type for a method isn't a simple word-like string, but can be something messy like Dictionary<String, List<String>> . The index routine needs to handle all those cases, and capture the matches. This makes it run sloooooowly.
I use a lot of looking-back. The marker I use in the current approach is the open curly brace. Once I find one of those, I use looking-back to determine if the curly is a class, interface, enum, method, etc. I read that looking-back can be slow; I'm not clear on how much slower it is than, say, looking-at.
once I find an open-close pair of curlies, I call narrow-to-region in order to index what's inside. not sure if this is will kill performance or not. I suspect that it is not the main culprit, because the perf problems I see happen in modules with one namespace and 2 or 3 classes, which means narrow gets called 3 or 4 times total.
What's the Question?
My question is: do you have any tips for speeding up imenu-like indexing in a C# buffer?
I'm considering:
avoiding looking-back. I don't know exactly how to do this because when re-search-forward finds, say, the keyword class, the cursor is already in the middle of a class declaration. looking-back seems essential.
instead of using open-curly as the marker, use the keywords like enum, interface, namespace, class
avoid narrow-to-region
any hard advice? Further suggestions?
Something I've tried and I'm not really enthused about re-visiting: building a wisent-based parser for C#, and relying on semantic to do the indexing. I found semantic to be very very very (etc) difficult to use, hard to discover, and problematic. I had semantic working for a while, but then upgraded to v23.2, and it broke, and I never could get it working again. Simple things - like indexing the namespace keyword - took a very long time to solve. I'm very dissatisfied with it and don't want to try again.
I don't really know C# syntax, and without looking at your elisp it's hard to give an answer, but here goes anyway.
looking-back can be deadly slow. It's the first thing I'd experiment with. One thing that helps a lot is using the limit arg to, say, restrict your search to the beginning of the current line. A different approach is when you hit the open curly do backward-char then backward-sexp (or whatever) to get to the front of the previous word, then use looking-at.
Using keywords to search around instead of open curly is probably what I would have done. Maybe something like (re-search-forward "\\(enum\\|interface\\|namespace\\|class\\)[ \t\n]*{" nil t) then using match-string-no-properties on the first capture group to see which of the keywords was found. This might help with the looking-back problem as well.
I don't know how expensive narrow-to-region is, but could be avoided by when you find a open curly do save-excursion forward-sexp and keep point as a limit for the current iteration of your (I assume recursive) searches.

When to use " " ( space ) and when to use . ( dot ) when invoking methods in Scala?

I've seen Scala using both interchangeably, but I don't know when to use one or the other.
Is there a convention?
For instance these are equivalent
"hello" toString
and
"hello".toString()
And they can even be mixed
"hello".toString() length
What's the convention?
The space convention is generally used when the method functions like an operator (+, *, etc.); the dot convention is used when it functions more like, well, a method call.
(I know that explanation is kind of vague, but there's not really a hard and fast rule for the usage.)
To expand on the comment from Yardena, there is an Scala unofficial style guide. It has some suggestions on when to use the dot notation and when to drop the dot and the parenthesis and usually provides a brief rationale for the recommendation, that you may or may not agree with, but at least that's a starting point.
For instance name toList may behave differently depending on what's on the next line.
Personally, I would write hello.toString.length with the assumption that all calls are side-effect free (so I drop the parenthesis) and then I have to keep the dot for it to compile.

Does Perl perform common subexpression elimination?

I wonder if Perl performs common subexpression elimination?
And what kind of optimisations are done?
No, but I do.
Now, I don't unroll loops by hand, because loops are an easier concept once you're familiar with programming. Because you could be doing anything with a sequence of commands, the loop makes it clear that you're repeating a task.
But CSE is something that makes more efficient code regardless of the implementation of the language. So I do it. It doesn't make the code baroque, and it works in languages where it's not automatically included.
Perl offers compression of syntax so there are often less subexpressions that have to be hand-eliminated.
No, and its not possible to do it either, except in very simple cases.
In order to eliminate common subexpressions, you must know that they haven't changed their values in between. But since so much can happen between two expressions a few lines apart, its almost impossible to tell if the subexpressions are still common.
The only things you would be able to eliminate are expressions that are provably pure, like "7 + 5". But proving that something like a function call is safe to eliminate is not going to happen.
To do this, you need powerful and conservative static analysis, which Perl does not have, and is not likely to gain (in C/C++ you need less powerful stuff because the languages are less dynamic, but you still need something).