Coffeescript whitespace, arguments and function scopes - coffeescript

I've been using CoffeeScript for a while. I find it a good language overall, certainly better than plain JS, but I find I'm still baffled by its indentation rules. Take this example:
Bacon.mergeAll(
#searchButton.asEventStream('click')
#searchInput.asEventStream('keyup')
.filter (e) => e.keyCode is 13
)
.map =>
#searchInput.val()
.flatMapLatest (query) =>
Bacon.fromPromise $.ajax
url: #searchURL + encodeURI query
dataType: 'jsonp'
This does what it should (the code is based on this tutorial, btw) but it took me a lot of trial and error to get it right.
Why do mergeAll and asEventStream require parentheses around their arguments? Why is indentation not enough to determine where their argument lists begin and end? OTOH, why is indentation enough for map and flatMapLatest? Why is the whitespace before a hanging method, such as .filter (its indentation level) not enough to determine what it binds to? It seems to be completely ignored.
Is there a definitive guide to this language's indentation rules? I never had a problem understanding Python syntax at a glance, even with very complex nesting, so it's not an issue with indentation-based syntax per se.

Indentation in CoffeeScript generally defines blocks, and argument lists aren't (necessarily) blocks. Similarly, a chained function call isn't a block; CoffeeScript simply sees a line starting with . and connects it to the previous line of similar or lower indentation.
Hence, the parentheses are needed for asEventStream, since CoffeeScript would otherwise see:
#searchInput.asEventStream 'keyup'.filter (e) => e.keyCode is 13
Which would call filter on the 'keyup' string, and it'd remain ambiguous whether the function is an argument to filter, or an argument to #searchInput.asEventStream('keyup'.filter)(). That last bit obviously doesn't make much sense, but CoffeeScript isn't a static analyzer, so it doesn't know that.
A function, meanwhile, is a block, hence the function argument to .map() works without parentheses, since it clearly delimited by its indentation. I.e. the line following the function has less indentation.
Personally, I'd probably write
Bacon.mergeAll(
#searchButton.asEventStream('click'), # explicit comma
#searchInput.asEventStream('keyup').filter (e) -> e.keyCode is 13 # no need for =>
)
.map(=> #searchInput.val()) # maybe not as pretty, but clearer
.flatMapLatest (query) =>
Bacon.fromPromise $.ajax
url: #searchURL + encodeURI query
dataType: 'jsonp'
In fact, I might break it up into separate expressions to make it clearer still. Insisting on the syntactic sugar while chaining stuff can indeed get confusing in CoffeeScript, but remember that you're not obliged to use it. Same as you're not obliged to always avoid parentheses; if they make things clearer, by all means use 'em!
If the code's easier to write, less ambiguous to read, and simpler to maintain without complex chaining/syntax (all of which seems true for this example), then I'd say just skip it.
In the end, there just are combinations of indentation syntax in CoffeeScript that can make either you or the compiler trip. Mostly, though, if you look at something, and find it straightforward, the compiler probably thinks so too. If you're in doubt, the compiler might be too, or it'll interpret it in unexpected ways. That's the best I can offer in terms of "definitive guide" (don't know of a written one).

Have you looked at the Javascript produced by this code? What happens when you omit ().
In Try Coffeescript I find that:
#searchButton.asEventStream 'click'
is ok. The second asEventStream compiles to:
this.searchInput.asEventStream('keyup').filter(function(e) {
but omitting the () changes it to:
this.searchInput.asEventStream('keyup'.filter(function(e) {
filter is now an attribute of 'keyup'. Putting a space to separate asEventStream and ('keyup') does the same thing.
#searchInput.asEventStream ('keyup')
As written .mergeAll() produces:
Bacon.mergeAll(...).map(...).flatMapLatest(...);
Omitting the ()
Bacon.mergeAll
#searchButton.asEventStream('click')
#searchInput.asEventStream('keyup')
gives an error because the compiler has no way of knowing that mergeAll is a function that takes arguments. It has no reason to expect an indented block.
Coming from Python, my inclination is to continue to use (),[],{} to mark structures like arguments, arrays and objects, unless the code is clearer without them. Often they help me read the code, even if the compiler does not need them. Coffeescript is also like Python in the use of indentation to denote code blocks (as opposed to the {} used in Javascript and other C styled languages).

Related

Recommended style for arity-1 call followed by arity-0 call

Given that the official Scala style guide
http://docs.scala-lang.org/style/method-invocation.html
recommends using infix notation for arity-1 calls and dot notation for arity-0 calls, what's the recommended style for an arity-1 call followed by an arity-0 call?
For example, would this be the recommended way?
(bytes map (_.toChar)).mkString
The summary of the style guide is basically: use point-free style whenever it is simple and clear. Otherwise, don't. In this case, your options are
bytes.map(_.toChar).mkString
(bytes map (_.toChar)).mkString
and the former looks simpler and clearer to me, so it wins.
Really long chains are also not very clear in point-free notation
foo bar baz qux quux bippy
Say what again?
foo.bar(baz).qux(quux).bippy
Oh, okay.
Be pragmatic. Point-free style can be clean and elegant. Good coding style will often lead you to write code that looks nice point-free. But the point of the syntax is to be clear and avoid errors, so use whichever better accomplishes the goal.
I don't know who wrote this guide, but he definitely seemed biased, and I would advice against following this guide. Infix notation has a lot of pitfalls, which the author doesn't mention, and the benefits of it are questionable at very least. The arguments the author uses are not any less questionable.
The author argues that the following code makes it look like a method filter is being called on a function (_.toUpperCase):
names.map (_.toUpperCase).filter (_.length > 5)
But nobody formats the code like that. The following are the standard practices and they both introduce no ambiguity of such:
names.map(_.toUpperCase).filter(_.length > 5)
names.map( _.toUpperCase ).filter( _.length > 5 )
Pitfalls of the infix notation
Parameterless methods cannot be used inside a method calls chain.
Parameterless methods cannot be used in the end of a method calls chain, because they grab the terms from the next line.
It does not allow splitting to multiple lines. You'll either have to introduce some lispy bracketing or awkardly place the parameters on the next line. Another option is to switch back to the "dot" notation and end up with an inconsistent style.
All those options can hardly be referred to as increasing the readability.
It breeds misunderstandings like that.
Finally, it adds a layer of obfuscation of your intent. I.e., a reader has to analyse how the compiler will infer the dots and braces prior to actually comprehending the code.
Conclusion
The only argument for this notation that I've ever met is that it increases readability. While this argument is questionable itself, I find that it can hardly stand against any of the aforementioned drawbacks of this notation, due to which it often even decreases the readability.
The most consistent and safe standard is to use infix notation only for operators, i.e., methods with names like +, *, >>=.

What are the non-expressions in CoffeeScript?

I am watching this great video by Jeremy on CoffeeScript. He explains that one of the ideals of CoffeeScript is to have "everything be an expression".
How close to this ideal has CoffeeScript got? What are the CoffeeScript non-expressions?
There are a few things that are not converted into expressions in coffeescript, as explained in the documentation:
There are a handful of statements in JavaScript that can't be meaningfully converted into expressions, namely break, continue, and return. If you make use of them within a block of code, CoffeeScript won't try to perform the conversion.
Everything else is wrapped in function closures and handled by coffeescript, which means you can do cool stuff like
alert(
try
nonexistent / undefined
catch error
"And the error is ... #{error}"
)

Using regexp to index a file for imenu, performance is unacceptable

I'm producing a function for imenu-create-index-function, to index a source code module, for csharp-mode.el
It works, but delivers completely unacceptable performance. Any tips for fixing this?
The Background
I looked at js.el, which is the rebadged "espresso" now included, since v23.2, into emacs. It indexes Javascript files very nicely, does a good job with anonymous functions and various coding styles and patterns in common use. For example, in javascript one can do:
(function() {
var x = ... ;
function foo() {
if (x == 1) ...
}
})();
...to define a scope where x is "private" or inaccessible from other code. This gets indexed nicely by js.el, using regexps, and it indexes the inner functions (anonymous or not) within that scope also. It works quickly. A big module can be indexed in less than a second.
I tried following a similar approach in csharp-mode, but it's quite a bit more complicated. In Js, everything that gets indexed is a function. So the starting regex is "function" with some elaboration on either end. Once an occurrence of the function keyword is found, then there are 4 - 8 other regexps that get tried via looking-at - the number depends on settings. One nice thing about js mode is that you can turn on or off regexps for various coding styles, to speed things along I suppose. The default "styles" work for most of the code I tried.
This doesn't work in csharp-mode. It works, but it performs poorly enough to make it not very usable. I think the reason for this is that
there is no single marker keyword in C#, as function behaves in javascript. In C# I need to look for namespace, class, struct, interface, enum, and so on.
there's a great deal of flexibility with which csharp constructs can be defined. As one example, a class can define base classes as well as implemented interfaces. Another example: The return type for a method isn't a simple word-like string, but can be something messy like Dictionary<String, List<String>> . The index routine needs to handle all those cases, and capture the matches. This makes it run sloooooowly.
I use a lot of looking-back. The marker I use in the current approach is the open curly brace. Once I find one of those, I use looking-back to determine if the curly is a class, interface, enum, method, etc. I read that looking-back can be slow; I'm not clear on how much slower it is than, say, looking-at.
once I find an open-close pair of curlies, I call narrow-to-region in order to index what's inside. not sure if this is will kill performance or not. I suspect that it is not the main culprit, because the perf problems I see happen in modules with one namespace and 2 or 3 classes, which means narrow gets called 3 or 4 times total.
What's the Question?
My question is: do you have any tips for speeding up imenu-like indexing in a C# buffer?
I'm considering:
avoiding looking-back. I don't know exactly how to do this because when re-search-forward finds, say, the keyword class, the cursor is already in the middle of a class declaration. looking-back seems essential.
instead of using open-curly as the marker, use the keywords like enum, interface, namespace, class
avoid narrow-to-region
any hard advice? Further suggestions?
Something I've tried and I'm not really enthused about re-visiting: building a wisent-based parser for C#, and relying on semantic to do the indexing. I found semantic to be very very very (etc) difficult to use, hard to discover, and problematic. I had semantic working for a while, but then upgraded to v23.2, and it broke, and I never could get it working again. Simple things - like indexing the namespace keyword - took a very long time to solve. I'm very dissatisfied with it and don't want to try again.
I don't really know C# syntax, and without looking at your elisp it's hard to give an answer, but here goes anyway.
looking-back can be deadly slow. It's the first thing I'd experiment with. One thing that helps a lot is using the limit arg to, say, restrict your search to the beginning of the current line. A different approach is when you hit the open curly do backward-char then backward-sexp (or whatever) to get to the front of the previous word, then use looking-at.
Using keywords to search around instead of open curly is probably what I would have done. Maybe something like (re-search-forward "\\(enum\\|interface\\|namespace\\|class\\)[ \t\n]*{" nil t) then using match-string-no-properties on the first capture group to see which of the keywords was found. This might help with the looking-back problem as well.
I don't know how expensive narrow-to-region is, but could be avoided by when you find a open curly do save-excursion forward-sexp and keep point as a limit for the current iteration of your (I assume recursive) searches.

Is this trivial function silly?

I came across a function today that made me stop and think. I can't think of a good reason to do it:
sub replace_string {
my $string = shift;
my $regex = shift;
my $replace = shift;
$string =~ s/$regex/$replace/gi;
return $string;
}
The only possible value I can see to this is that it gives you the ability to control the default options used with a substitution, but I don't consider that useful. My first reaction upon seeing this function get called is "what does this do?". Once I learn what it does, I am going to assume it does that from that point on. Which means if it changes, it will break any of my code that needs it to do that. This means the function will likely never change, or changing it will break lots of code.
Right now I want to track down the original programmer and beat some sense into him or her. Is this a valid desire, or am I missing some value this function brings to the table?
The problems with that function include:
Opaque: replace_string doesn't tell you that you're doing a case-insensitive, global replace without escaping.
Non-idiomatic: $string =~ s{$this}{$that}gi is something you can learn what it means once, and its not like its some weird corner feature. replace_string everyone has to learn the details of, and its going to be different for everyone who writes it.
Inflexible: Want a non-global search-and-replace? Sorry. You can put in some modifiers by passing in a qr// but that's far more advanced knowledge than the s/// its hiding.
Insecure: A user might think that the function takes a string, not a regex. If they put in unchecked user input they are opening up a potential security hole.
Slower: Just to add the final insult.
The advantages are:
Literate: The function name explains what it does without having to examine the details of the regular expression (but it gives an incomplete explanation).
Defaults: The g and i defaults are always there (but that's non-obvious from the name).
Simpler Syntax: Don't have to worry about the delimiters (not that s{}{} is difficult).
Protection From Global Side Effects: Regex matches set a salad of global variables ($1, $+, etc...) but they're automatically locally scoped to the function. They won't interfere if you're making use of them for another regex.
A little overzealous with the encapsulation.
print replace_string("some/path", "/", ":");
Yes, you get some magic in not having to replace / with a different delimiter or escape / in the regex.
If it's just a verbose replacement for s/// then I'd guess that it was written by someone who came to Perl from a language where using regular expressions required extra syntax and who is/was more comfortable coding that way. If that's the case I'd classify it as Perl baby-talk: silly and awkward to seasoned coders but not bad -- not bad enough to warrant a beating, anyway. ;)
If I squint really hard I can almost see cases where such a function might be useful: applying a bunch of patterns to a bunch of strings, allowing user input for the terms, supplying a CODE reference for a callback...
My first reaction upon seeing that is a new Perl programmer didn't want to remember the syntax for a regular expression and created a function he or she could easily remember, without learning the syntax.
The only reason I can see other than the ones mentioned already ( new programmer does not want to remember regex syntax ) is that it is possible they may be using some IDE that does not have any syntax highlighting for regex, but it does exist for functions they've written. Not the best of reasons, but plausible.

Why do Perl control statements require braces?

This may look like the recent question that asked why Perl doesn't allow one-liners to be "unblocked," but I found the answers to that question unsatisfactory because they either referred to the syntax documentation that says that braces are required, which I think is just begging the question, or ignored the question and simply gave braceless alternatives.
Why does Perl require braces for control statements like if and for? Put another way, why does Perl require blocks rather than statements, like some other popular languages allow?
One reason could be that some styles dictate that you should always use braces with control structures, even for one liners, in order to avoid breaking them later, e.g.:
if (condition)
myObject.doSomething();
else
myObject.doSomethingElse();
Then someone adds something more to the first part:
if (condition)
myObject.doSomething();
myObject.doSomethingMore(); // Syntax error next line
else
myObject.doSomethingElse();
Or worse:
if (condition)
myObject.doSomething();
else
myObject.doSomethingElse();
myObject.doSomethingMore(); // Compiles, but not what you wanted.
In Perl, these kinds of mistakes are not possible, because not using braces with control structures is always a syntax error. In effect, a style decision has been enforced at the language syntax level.
Whether that is any part of the real reason, only Larry's moustache knows.
One reason could be that some constructs would be ambiguous without braces :
foreach (#l) do_something unless $condition;
Does unless $condition apply to the whole thing or just the do_something statement?
Of course this could have been worked out with priority rules or something,
but it would have been yet another way to create confusing Perl code :-)
One problem with braceless if-else clauses is they can lead to syntactic ambiguity:
if (foo)
if (bar)
mumble;
else
tumble;
Given the above, under what condition is tumble executed? It could be interpreted as happening when !foo or foo && !bar. Adding braces clears up the ambiguity without dirtying the source too much. You could then go on to say that it's always a good idea to have the braces, so let's make the language require it and solve the endless C bickering over whether they should be used or not. Or, of course, you could address the problem by getting rid of the braces completely and using the indentation to indicate nesting. Both are ways of making clear, unambiguous code a natural thing rather than requiring special effort.
In Programming Perl (which Larry Wall co-authored), 3rd Edition, page 113, compound statements are defined in terms of expressions and blocks, not statements, and blocks have braces.
Note that unlike in C and Java,
[compound statements] are defined in
terms of BLOCKS, not statements.
This means that the braces are
requried--no dangling statements
allowed.
I don't know if that answers your question but it seems like in this case he chose to favor a simple language structure instead of making exceptions.
Perhaps not directly relevant to your question about (presumably) Perl 5 and earlier, but…
In Perl 6, control structures do not require parentheses:
if $x { say '$x is true' }
for <foo bar baz> -> $s { say "[$s]" }
This would be horrendously ambiguous if the braces were also optional.
Isn't it that Perl allows you to skip the braces, but then you have to write statement before condition? i.e.
#!/usr/bin/perl
my $a = 1;
if ($a == 1) {
print "one\n";
}
# is equivalent to:
print "one\n" if ($a == 1);
"Okay, so normally, you need braces around blocks, but not if the block is only one statement long, except, of course, if your statement would be ambiguous in a way that would be ruled by precedence rules not like you want if you omitted the braces -- in this case, you could also imagine the use of parentheses, but that would be inconsistent, because it is a block after all -- this is of course dependent on the respective precedence of the involved operators. In any case, you don't need to put semicolons after closing braces -- it is even wrong if you end an if statement that is followed by an else statement -- except that you absolutely must put a semicolon at the end of a header file in C++ (or was it C?)."
Seriously, I am glad for every explicitness and uniformity in code.
Just guessing here, but "unblocked" loops/ifs/etc. tend to be places where subtle bugs are introduced during code maintenance, since a sloppy maintainer might try to add another line "inside the loop" without realizing that it's not really inside.
Of course, this is Perl we're talking about, so probably any argument that relies on maintainability is suspect... :)