pros and cons of various spell checking modes in emacs - emacs

I'm wondering if anyone could weigh in on pros and cons of different spelling modes for Emacs. Emacswiki-CategorySpelling mentions three modes for spell checking:
Flyspell mode (default one)
Speck mode (seems to be designed to be faster than flyspell)
Wcheck mode. (designed to be general purpose)
I'm also interested in which of these modes provide a way for the spell checker to skip part of a buffer depending on its syntax (for instance, in order to skip math mode parts in a LaTeX document, which are highlighted as brown in AUCTEX mode). Flyspell doesn't seem to do this

You can do partial flyspell-mode in a number of different ways. One is to use a multi-mode approach, where you define multiple modes in a single buffer, one of which is a mode to edit comments (for example), in which flyspell-mode is enabled. I used to do this for some programming language, but I can't find the config for it any more, so I guess I don't use that language anymore. Anyhow, see mmm-mode for more info there.
A second alternative is to use `flyspell-prog-mode' (which see) which sets up flyspell mode for certain parts of the buffer, defined in this case by the font face (there are specific faces for strings and comments for most programming language major modes). It uses a predicate call-back function, which can be defined however you want it; I maintain TNT, which is an AIM-mode for Emacs, and we use it like so:
(defun tnt-im-mode-flyspell-verify ()
"This function is used for `flyspell-generic-check-word-p' in TNT."
(not (get-text-property (point) 'read-only)))
(put 'tnt-im-mode 'flyspell-mode-predicate 'tnt-im-mode-flyspell-verify)
(put 'tnt-chat-mode 'flyspell-mode-predicate 'tnt-im-mode-flyspell-verify)
Regarding flyspell vs. speck vs. wcheck -- I've only used flyspell mode. speck seems to be very oriented on what is viewable, which can be fine, but generally I want the whole of whatever document I'm working on to be spell-checked, so I wouldn't want that. wcheck seems to be a generic interface to an external program; I'd guess you're going to have to build up its use yourself. flyspell can be used two different ways: as-you-type, which is how I usually use it, and "batch mode", where a whole region or buffer is checked at once. The former is incredibly fast, and I've never found a reason to look for a better tool. The latter can be a tad slow, especially when there are a lot of misspelled words and the document is large, but I really can't remember waiting more than 15 seconds for it to complete. While watching the screen for 15 seconds and doing nothing can seem like a long time, it's not, really. YMMV, of course.
Bottom line: I'd stick with flyspell-mode, assuming it meets your needs, of course.

Related

Lisp source code rewriting system

I would like to take Emacs Lisp code that has been macro expanded and unmacro expand it. I have asked this on the Emacs forum with no success. See:
https://emacs.stackexchange.com/questions/35913/program-rewriting-systems-unexpanded-a-defmacro-given-a-list-of-macros-to-undo
However one would think that this kind of thing, S-expression transformation, is right up Lisp's alley. And defmacro is I believe available in Lisp as it is in Emacs Lisp.
So surely there are program transformation systems, or term-rewriting systems that can be adapted here.
Ideally, in certain situations such a tool would be able to work directly off the defmacro to do its pattern find and replace on. However even if I have to come up with specific search and replace patterns manually to add to the transformation system, having such a framework to work in would still be useful
Summary of results so far: Although there have been a few answers that explore interesting possibilities, right now there is nothing definitive. So I think best to leave this open. I'll summarize some of the suggestions. (I've upvoted all the answers that were in fact answers instead of commentary on the difficulty.)
First, many people suggest considered the special form of macros that do expansion only,or as Drew puts it:
macro-expansion (i.e., not expansion followed by Lisp evaluation).
Macro-expansion is another way of saying reduction semantics, or
rewriting.
The current front-runner to my mind is in phils post where he uses a pattern-matching facility that seems specific to Emacs: pcase. I will be exploring this and will post results of my findings. If anyone else has thoughts on this please chime in.
Drew wrote a program called FTOC whose purpose was to convert Franz Lisp to Common Lisp; googling turns up a comp.lang.lisp posting
I found a Common Lisp package called optima with fare-quasiquote. Paulo thinks however this might not be powerful enough since it doesn't handle backtracking out of the box, but might be programmed in by hand. Although the generality of backtracking might be nice, I'm not convinced I need that for the most-used situations.)
Side note: Some seem put off by the specific application causing my initial interest. (But note that in research, it is not uncommon for good solutions to get applied in ways not initially envisioned.)
So in that spirit, here are a couple of suggestions for changing the end application. A good solution for these would probably translate to a solution for Emacs Lisp. (And if if helps you to pretend I'm not interested in Emacs Lisp, that's okay with me). Instead of a decompiler for Emacs Lisp, suppose I want to write a decompiler for clojure or some Common Lisp system. Or as suggested by Sylwester's answer, suppose I would like to automatically refactor my code by taking into account the benefit of using more concise macros that exist or that have gotten improved. Recall that at one time Emacs Lisp didn't have "when" or "unless" macros.
30-some years ago I did something similar, using macrolet.
(Actually, I used defmacro because we had only an early implementation of Common Lisp, which did not yet have macrolet. But macrolet is the right thing to use.)
I didn't translate macro-expanded code to what it was expanded from, but the idea is pretty much the same. You will come across some different difficulties, I expect, since your translation is even farther away from one-to-one.
I wrote a translator from (what was then) Franz Lisp to Common Lisp, to help with porting lots of existing code to a Lisp+Prolog-machine project. Franz Lisp back then was only dynamically scoped, while Common Lisp is (in general) lexically scoped.
And yes, obviously there is no general way to automatically translate Lisp code (in particular), especially considering that it can generate and then evaluate other code - but even ignoring that special case. Many functions are quite similar, but there is the lexical/dynamic difference, as well as significant differences in the semantics of some seemingly similar functions.
All of that has to be understood and taken for granted from the outset, by anyone wanting to make use of the results of translation.
Still, much that is useful can be done. And if the resulting code is self-documenting, telling you what it was derived from etc., then when in the resulting context you can decide just what to do with this or that bit that might be tricky (e.g., rewrite it manually, from scratch or just tweak it). In practice, lots of code was easily converted from Franz to Common - it saved much reprogramming effort.
The translator program was written in Common Lisp. It could be used interactively as well as in batch. When used interactively it provided, in effect, a Franz Lisp interpreter on top of Common Lisp.
The program used only macro-expansion (i.e., not expansion followed by Lisp evaluation). Macro-expansion is another way of saying reduction semantics, or rewriting.
Input Franz-Lisp code was macro-expanded via function-definition mapping macros to produce Common-Lisp code. Code that was problematic for translation was flagged (in code) with a description/analysis that described the situation.
The program was called FTOC. I think you can still find it, or at least references to it, by googling (ftoc lisp). (It was the first Lisp program I wrote, and I still have fond memories of the experience. It was a good way to learn both Lisp dialects and to learn Lisp in general.)
Have fun!
In general, I don't think you can do this. The expansion of an lisp macro is Turing complete, so you have to be able to predict the output of a program which could have arbitrary input.
There are some simple things that you could do. defmacros with backquoted forms in appear fairly similar in the output form and might be detected. This sort of heuristic would probably get you a long way.
What I don't understand is your use case. The macro-expanded version of a piece of code is usually only present in the compiled (or in emacs-lisp byte-compiled) form.
Ok so other people have pointed out the fact that this problem is impossible in general. There are two hard parts to this problem: one is that it could be a lot of work to find a preimage of some code fragment through a macro and it is also impossible to determine whether a macro was called or not—there are examples where one may write code which could have come from a macro without using that macro. Imagine for the sake of illustration an sha macro which expands to the SHA hash of the string literal passed to it. Then if you see some sha hash in your expanded code, it would obviously be silly to try to unexpand it. But it may be that the hash was put into the code as a literal, e.g. referencing a specific point in the history of a git repository so it would also be unhelpful to unexpand the macro.
Tractable subproblems
Let me preface this by saying that whilst these may be a little tractable, I still wouldn’t try to solve this problem.
Let’s ignore all the macros that do weird things (like the example above) and all the macros that are just as likely to not have been used in the original (e.g. cond vs if) and all the macros which generate complex code which seems like it would be difficult to unravel (e.g. loop, do, and backquote. Annoyingly these difficult cases are some of those which you would perhaps most want to unexpand). The type this leaves us with (that I’d like to focus on) are macros which basically just reduce boilerplate, e.g. save-excursion or with-XXXX. These are macros whose implementation consists of possibly making some fresh symbols (via gensym) and then having a big simple backquoted block of code. I still think it would be too hard to automatically go from defmacro to a function for unexpansion but I think you could attack some of these on a case-by-case basis. Do this by looking for the forms generated by the macro that delimit (I.e. begin/end) the expanded code. I can’t really offer much beyond that. This is still a hard problem and I don’t think any existing solutions (to other problems) will get you very far on your way.
A further complication I understand is that you do not start at the macroexpanded code but rather at the bytecode. Without knowing anything about the elisp compiler, I worry that more information would be lost in the compilation step and you would have to undo that as well, e.g. perhaps it is hard to determine which code goes inside a let or even when a let begins, or bytecode starts using goto type features even though elisp doesn’t have them.
You suggest that the reason you would like to unexpand macros is so you can decompile bytecode which sometimes comes up in the Emacs debugger and that this would be useful as even though the source code is available in theory, it isn’t always at your fingertips. I put it to you that if you want to make your life debugging elisp easier it would be more worthwhile to figure out how to have the Emacs debugger always take you to the source code for internal functions. This might involve installing extra debugging related packages or downloading the Emacs source code and setting some variable so Emacs knows where to find it or compiling Emacs yourself from source. I don’t really know about that but I bet getting thrown into bytecode instead of source would have been enough of a problem for Emacs developers over the past thirty years that a solution to that problem does exist.
If however what you really want to do is to try to implement a decompiler for elisp then I suppose that’s what you should do. A final observation is that while Lisp provides facilities which make manipulating Lisp code easy, this doesn’t help much with decompiling as all these facilities can be used in compilation so there are infinitely more patterns one might want to detect than in e.g. a C decompiler. Perhaps scheme style macros would be easier to unexpand, although they would still be hard.
If you’re decompiling because you want to give a better idea of which exact subexpression rather than line is being evaluated (normally Lisp debuggers work on expressions not lines anyway) in the debugger then perhaps it would actually be useful to see the code at the expanded level rather than the unexpanded one. Or perhaps it would be best to see both and maybe in between as well. Keeping track of what’s what through forwards macroexpansion is already difficult and fiddly. Doing it in reverse certainly won’t be easier. Good luck!
Edit: seeing as your not currently using Lisp anyway, I wonder if you might have more success using something like prolog for your unexpanding. You’d still have to manually write rules but I think it would be a large amount of work to try to derive rules from macro definitions.
I would like to take Emacs Lisp code that has been macro expanded and unmacro expand it.
Macros generate arbitrary expressions, which may contain macros recursively. You have no general way to revert the transformations, because it's not pattern-based.
Even if macros were pattern-based, they could still be infinite.
Even if macros were not infinite, they can certainly contain bugs in expansions of patterns that never matched. Given arbitrary code to try to unwind, it could match an expansion that looks like the code and try to revert to its pattern. Without bugs, you could still abuse this.
Even if you could revert macro expansion, some macros expand to the same code. An approach could be signalling a warning with a restart when all reversions expand equally minus the operator, such that if the restart doesn't handle the signal, it would choose the first expansion; and otherwise signalling an error with a restart, such that if the restart doesn't handle the signal, it errors. Or you could configure it to choose certain macros under certain conditions, such as in which package the code was found.
In practice, there are very few cases where reverting an expansion makes any sense. It could be a useful development tool that suggests macros, but I wouldn't generally rely on it for whole source transformations.
One way you could achieve what you want is through a controlled pattern matching. You could initially create patterns manually, which would already handle cases you care about directly, such as the ones you mention:
(if (not <cond>) <expr>) and (if (not <cond>) (progn <&expr>)) to (unless <cond> <&expr>)
You'd have to decide whether null would be equivalent to not. I personally don't mix the boolean meaning of nil with that of empty list or something else, e.g. no result, nothing found, null object, a designator, etc. But perhaps Lisp code as old as that in Emacs just uses them interchangeably.
(if <cond> <expr>) and (if <cond> (progn <&expr>)) to (when <cond> <&expr>)
If you feel like improving code overall, include cond with a single condition. And be careful with cond clauses with only the condition.
You should have a few dozen more, to see how the pattern matching behaves with more patterns to match in terms of time (CPU) and space (memory).
From the description of fare-quasiquote, optima doesn't support backtracking, which you probably want.
But you can do backtracking with optima by yourself, using recursion on complex inner patterns, and if nothing matches, return a control value to keep searching for matching patterns from the outer input.
Another approach is to treat a pattern as a description of a state machine, and handle each new token to advance the current state machines until one of them reaches the end, discarding the state machines that couldn't advance. This approach may consume more memory, depending on the amount of patterns, the similarity between patterns (if many have the same starting token, many state machines will be generated on a matching token), the length of the patterns and, last but not least, the length of the input (s-expression).
An advantage of this approach is that you can use it interactively to see which patterns have matched the most tokens, and you can give weights to patterns instead of just taking the first that matches.
A disadvantage is that, most probably, you'll have to spend effort to develop it.
EDIT: I just lousily described a kind of trie or radix tree.
Once you got something working, maybe try to obtain patterns automatically. This is really hard, you must probably limit it to simple backquoting and accept the fact you can't generalize for anything that contains more complex code.
I believe the hardest will be code walking, which is hard enough with source code, but much more with macro-expanded code. Perhaps if you could expand the whole picture a bit further to understand the goal, maybe someone could suggest a better approach other than operating on macro-expanded code.
However one would think that this kind of thing, S-expression transformation, is right up Lisp's alley. And defmacro is I believe available in Lisp as it is in Emacs Lisp.
So surely there are program transformation systems, or term-rewriting systems that can be adapted here.
There's a huge step from expanding code with defmacro and all that generality. Most Lisp developers will know about hygienic macros, at least in terms of symbols as variables.
But there's still hygienic macros in terms of symbols as operators1, code walking, interaction with a containing macro (usually using macrolet), etc. It's way too complex.
1.
Common Lisp evaluates the operator in a compound form in the lexical environment, and probably everyone makes macros that assume that the global macro or function definition of a symbol will be used.
But it might not be so:
(defmacro my-macro-1 ()
`1)
(defmacro my-macro-2 ()
`(my-function (my-macro-1)))
(defun my-function (n)
(* n 100))
(macrolet ((my-macro-1 ()
`2))
(flet ((my-function (n)
(* n 1000)))
(my-macro-2)))
That last line will expand to (my-function (my-macro-2)), which will be recursively expanded to (my-function 2). When evaluated, it will yield 2000.
For proper operator hygiene, you'd have to do something like this:
(defmacro my-macro-2 ()
;; capture global bindings of my-macro-1 and my-function-1 by name
(flet ((my-macro-1-global (form env)
(funcall (macro-function 'my-macro-1) form env))
(my-function-global (&rest args)
;; hope the compiler can optimize this
(apply 'my-function args)))
;; store them globally in uninterned symbols
;; hopefully, no one will mess with them
(let ((my-macro-1-symbol (gensym (symbol-name 'my-macro-1)))
(my-function-symbol (gensym (symbol-name 'my-function))))
(setf (macro-function my-macro-1-symbol) #'my-macro-1-global)
(setf (symbol-function my-function-symbol) #'my-function-global)
`(,my-function-symbol (,my-macro-1-symbol)))))
With this definition, the example will yield 100.
Common Lisp has some restrictions to avoid this, but it only states the consequences are undefined when (re)defining symbols in the common-lisp package, globally or locally. It doesn't require errors or warnings to be signaled.
I don't think it is possible to do this in general, but you can undo a pattern back into a macro use for every match if you supply code for each unmacroing. Code that mixed cond and if will end up being just if and your code would remove all if into cond making the reverse not the same as the starting point. The more macros you have and the more they expand into each other the more uncertain of the end result will be of the starting point.
You could have rules such that if is not translated into cond unless you used one of the features, like more than one predicate or implicit progn, but you have no idea if the coder actually did use cond everywhere because he liked in consistent regardless. Thus your unmacroing will acyually be more of a simplification.
I don't believe there's a general solution to that, and you certainly
can't guarantee that the structure of the output would match that of
the original code, and I'm not going near the idea of auto-generating
patterns and desired transformations from macro definitions; but you
might achieve a simple version of this with Emacs' own pcase pattern
matching facility.
Here's the simplest example I could think of:
With reference to the definition of when:
(defmacro when (cond &rest body)
(list 'if cond (cons 'progn body)))
We can transform code using a pcase pattern like so:
(let ((form '(if (and foo bar baz) (progn do (all the) things))))
(pcase form
(`(if ,cond (progn . ,body))
`(when ,cond ,#body))
(_ form)))
=> (when (and foo bar baz) do (all the) things)
Obviously if the macro definitions change, then your patterns will
cease to work (but that's a pretty safe kind of failure).
Caveat: This is the first time I've written a pcase form, and I
don't know what I don't know. It seems to work as intended, though.

Is it possible to turn off the enforced vertical alignment in clojure-mode?

I'm using Emacs 25.1, with the latest stable release of clojure-mode. When clojure-mode (not in other major modes like js2-mode) is turned on, everything is strictly aligned, no extra whitespace is allowed before any forms. For example,
(def
last (fn ^:static last [x]
(if (next x)
(recur (next x))
(first x))))
I can't insert a space (automatically removed after insertion) before the first non-whitespace character on each line. This kind of behavior is just not desirable to me. I tried changing the variables in the Clojure group, but nothing seemed to work. How can I turn this behavior off?
It isn't exactly clear what you mean, but I suspect you are referring to the way
Clojure will align forms in things like let bindings so that all the symbols and
values being bound are aligned i.e.
(let [a-val1 (something-1)
another-val (something-2)
final-col 12]
(do-some-stuff))
rather than
(let [a-val1 (something-1)
another val (something-2)
final-col 12]
(do-some-stuff))
If this is the case, then there are a few things you can try.
Look at the variable clojure-align-forms-automatically
clojure-align-forms-automatically is a variable defined in
‘clojure-mode.el’. Its value is t Original value was nil
This variable is safe as a file local variable if its value
satisfies the predicate ‘booleanp’.
Documentation: If non-nil, vertically align some forms automatically.
Automatically means it is done as part of indenting code. This
applies to binding forms (‘clojure-align-binding-forms’), to cond
forms (‘clojure-align-cond-forms’) and to map literals. For instance,
selecting a map a hitting ‘M-x indent-for-tab-command’ will align the
values like this:
{:some-key 10
:key2 20}
You can customize this variable.
This variable was introduced, or its default value was changed, in
version 5.1 of the clojure-mode package.
The other variable which you might want to look at is clojure-indent-style
clojure-indent-style is a variable defined in ‘clojure-mode.el’. Its
value is ‘:always-align’
This variable is safe as a file local variable if its value
satisfies the predicate ‘keywordp’.
Documentation: Indentation style to use for function forms and macro
forms. There are two cases of interest configured by this variable.
Case (A) is when at least one function argument is on the same line as the function name.
Case (B) is the opposite (no arguments are on the same line as the function name). Note that the body of macros is not affected by
this variable, it is always indented by ‘lisp-body-indent’ (default
2) spaces.
Note that this variable configures the indentation of function forms
(and function-like macros), it does not affect macros that already use
special indentation rules.
The possible values for this variable are keywords indicating how to
indent function forms.
‘:always-align’ - Follow the same rules as ‘lisp-mode’. All
args are vertically aligned with the first arg in case (A),
and vertically aligned with the function name in case (B).
For instance:
(reduce merge
some-coll)
(reduce
merge
some-coll)
‘:always-indent’ - All args are indented like a macro body.
(reduce merge
some-coll)
(reduce
merge
some-coll)
‘:align-arguments’ - Case (A) is indented like ‘lisp’, and
case (B) is indented like a macro body.
(reduce merge
some-coll)
(reduce
merge
some-coll)
You can customize this variable.
This variable was introduced, or its default value was changed, in
version 5.2.0 of the clojure-mode package.
There are some other alignment/indentation variables which you may also want to
check out - try M-x customize-group clojure-mode and M-x customize-group
cider to browse and see if anything is relevant. You might also find
something relevant on the cider documentation site. In particular, look at the
manual section on indentation
UPDATE EDIT: Based on additional information from the OP communicated in comments, I
decided to edit and extend this answer. I've left the original response as I
feel it may be useful to others who search and find the OPs question. However,
with the additional info in the comments, I don't think the answer addresses the
OPs actual issue, so have extended the answer below which I hope will help.
Some Emacs modes are more rigid or enforce code format more strictly than
others. This is particularly the case with vary regular languages like Clojure
(and most lisps generally) whee the syntax is minimal and the rules regarding
code indentation are easier to define and tend to have wide consensus.
The situation for the OP is further complicated because they are using a
pre-defined Emacs configuration - in this case Steve Purcell's emacs.d,
which is one of my favourite pre-defined or canned Emacs configurations. The
one drawback with these pre-defined configurations is that they will turn on and
define many optional features of the Emacs editor which may or may not be
in-line with user personal preferences. Emacs tends to have a vary conservative
position when it comes to new features or enhancements. Often, they are disabled
by default to avoid impacting new users. The cost for this conservative approach
is that over time, Emacs can seem primitive or less feature rich compared to
other editors to new users who expect some of this behaviour to be enabled by
default. By using a canned configuration, you get one person's preferred setup
without having to go through the often long and difficult process of doing it
yourself. The downside is that when it does not match with the user's
expectations, the user does not have the knowledge or understanding to make the
changes and it is difficult to get help because others don't know/understand
what their configuration is already.
As an example how using these pre-defined setups can complicated matters, when I
last looked at the Purcell configuration, it used the ELPA package
aggressive-indent, which enforces more rigid indentation rules and it could
well be this package rather than clojure-mode which is enforcing the rigid
indentation rule.
The OP mentions they are concerned regarding this auto-formatting as it could
cause problems when contributing to other projects and issues with
auto-formatting making the code look like there has been more changes than has
actually occurred due to the version control picking up the whitespace
adjustments. This issue mainly comes up over differences due to the use of tabs
and spaces. to a large extent, such issues are less frequent these days as most
version control systems can be configured to ignore whitespace changes.
In this case, my recommendation is to do nothing for now as there isn't a real
issue yet. Continue to use the canned configuration and continue to ask
questions, but also spend some time trying to learn and understand the
configuration. At some point, once your comfortable with Emacs, you will likely
want to re-configure the system to better meet your own personal taste. By this
time, you will have a better understanding of Emacs, the various options it has
and the way different modes work. When you run into specific real issues which
you cannot solve, then post another question. It is likely at that point, you
will have concrete information and someone will be able to provide specific
help.

How to disable special handling of calling convention examples in emacs-lisp-mode?

As described here, emacs-lisp-mode provides for special handling of s-expressions in docstrings that start in the first column. This requires them to be escaped with a backslash to avoid mucking up font-lock later on in the file.
This may be a feature for elisp, but is unfortunate in other lisp modes that reuse emacs-lisp-mode for convenience that don't have special handling of expressions in docstrings, as described/shown here.
My question is, is there any way for such "descendant" modes to configure emacs-lisp-mode to disregard "calling convention expressions" in docstrings?
The short answer is no.
The longer answer is that those other modes are simply broken. They should adapt to Emacs Lisp in this regard. There is no reason not to, is there? It is simply a bad idea to use workarounds (e.g. indent all doc-string lines), such are suggested in the link you provided (and its linked duplicate post).
Emacs doc string are not trivial strings. They have several special properties, including the handling of \\[...], \\{...}, and \\<...>, as well as the property you mention here.
If some mode cannot adjust to Emacs doc strings then it should use macros that define the things it needs without creating Emacs doc strings for them but by handling a different string argument in the special way desired. IOW, create pseudo doc strings that correspond to what the mode wants instead of what Emacs wants.
Of course, that means that you cannot directly take advantage of the Emacs documentation features. You would need to also define mode-specific doc commands that would, for example, wrap the existing doc functions such as describe-function with code that picks up the mode's pseudo-doc string and DTRT, following the mode's conventions instead of the Emacs doc-string conventions.
But I would think that the easiest approach would be to just adapt the mode to the existing Emacs behavior, so that it DTRT.
Many Emacs programming modes, and various Lisp modes are no exception, have been implemented based on parsers with regular expressions. This, unfortunately, gives the editor little idea of the document being edited. Eclipse, for example, has a very different idea of how to edit code, which is more structured, and JetBrain MPS editors are even more rigid and structured in this sense (almost like spreadsheets).
This makes Emacs modes faster and easier to implement, but it also means the code that supports the proper indentation, syntactic validation and highlighting has to re-parse more text every time it is being edited. CEDET, afaik, is trying to address this issue.
Thus, historically, there had been conventions designed to reduce the amount of code to parse on each edit. Parenthesis in the first column is one such convention. However, it also has been known to be an annoyance some times, that's why there's a open-paren-in-column-0-is-defun-start variable one can set to nil to inhibit this behaviour.
But It's hard to say what exactly the performance issues you may face when changing this setting. Lisp grammar is very regular, unless you are using many reader macros, so, perhaps, that won't be a problem.
If beginning-of-defun-function is set accordingly, i.e. checking if inside a comment or string, should be no need for such escaping.

Custom emacs nxml indentation

For the last 3 years, my work required writing and editing configuration files in xml format. The content in the xml tags has evolved so much that now it has become kind of like a programming language. Unfortunately, emacs indents everything inside the tags at the same level. Something like this:
But it'd be wicked if I could get the content indented like the following:
I've read several threads related to custom indentation, but I still don't have any clue how to do it.
I've tried to create a custom major-mode, but doing that killed all the syntax colors and indentation rules. Ideally what I'd like to do is just modify the nxml mayor mode indentation rules? Still, I don't know where about this rules are.
Also it would be a bonus if I could color some key words, like 'if' or 'set'.
I know that what I'm asking is a big job, so I'm not asking for a definitive answer here. I'm just looking for some help to point me to the right direction.
Probably the best way to do it is writing a new major mode for the language inside the tags and then using a multiple major modes package which allows more than one major modes to be active in the same file, so the contained language can have its own major mode.
You might want to have a look at this example.
(defun nxml-extra-space-indent ()
(nxml-indent-line)
(when (zerop (current-indentation))
(indent-line-to 4)))
(setq indent-line-function 'nxml-extra-space-indent)
I indent according to an existing major mode, and then customize the indentation if the line matches certain criteria.
I think this is the easiest way to customize indentation, as you are starting from something existing, and you don't need to know how the indentation logic of a particular mode is implemented.

Why is there no code-folding in emacs?

There are several questions on SO about how to get code folding in emacs, without having to add any special characters like "markers" in the comments for example. Someone said that there was "no perfect solution."
It seems that it could be done by parsing the source of the program being written and look for matching parenthesis or bracket, or to do it based on indentation. You could also use a combination of scripts that use different methods.
So why is it commonly accepted that there is no "perfect" and straightforward way to get code-folding in emac? Is there something in emacs or its architecture that makes it hard to program? If it were easy, after so many years of smart people using emacs you would think that someone would have wrote it.
You should play with Hideshow (hs-minor-mode) combined with fold-dwim.el. It does exactly what you suggested -- looks for matching braces/parens, and can be set up to fall back on the indentation.
There's a robust folding solution out there for most common languages, and if there isn't, all the folding packages are highly customizable. In fact, the only downside is the proliferation of folding methods (fold-dwim helps quite a bit with that); I used to think that because nobody could point me to a definitive solution, folding was hard or impossible — in fact, the opposite is true. You just have to experiment a little to see what works best for you.
I have used folding.el (e.g. to group stuff in my .emacs), outline-minor-mode, and now Hideshow. There's some chance that none of them would work exactly the way you want right out of the box (e.g. you might need to set up an outline regex, or define folding marks for folding.el), but it turns out to be easy. The default keybindings can be somewhat baroque, but this is remedied by fold-dwim and/or hideshow-org (highly recommended for Hideshow, cf the Emacswiki hideshow page; you can also mimic hideshow-org's behavior for other folding modes with some quick-and-dirty elisp and fold-dwim). Once you figure out your preferred setup, just turn it on automatically via hooks or buffer-local variables, and watch your code fold away :)
You should look into CEDET. It does code-folding just fine, and many other fancy features that you're probably looking for if you're switching from an IDE to Emacs.
http://cedet.sourceforge.net/
Specifically, look for `global-semantic-tag-folding-mode'
You don't need anything extra, just enable outline-minor-mode for file types you want to fold.
But in fact, there ARE various solutions for Emacs; I have listed some of them (those I have happened to come across) at http://en.wikipedia.org/w/index.php?title=Code_folding&oldid=375300945#cite_note-2.
Though, there are things I'm missing: in some cases, I'd like to combine several mechanisms: for example, for markdown, I'd like to use outline-based folding (for sections) and indentation-based folding (for quotations, code blocks etc.) -- in order not bother with implementing a complete parser for markdown.
Here they are:
Token-based folding in Emacs
Token-based folding in Emacs is impemented by the folding minor mode.
Indentation-based folding in Emacs
One can use the set-selective-display function in Emacs to hide lines based on the indentation level, as suggested in the Universal code folding note.
Syntax-dependent folding in Emacs
Syntax-dependent folding in Emacs is supported by:
the outline and allout modes
for special dedicated "outline"-syntaxes;
by the hideshow minor mode for some programming languages;
also,
by the semantic-tag-folding minor mode and the
senator-fold-tag command for
syntaxes supported by semantic,
as well as by doc-mode for JavaDoc or Doxygen comments,
by
TeX-fold-mode
sgml-fold-element command,
nxml-outln library
in the corresponding language-specific modes, and possibly in other modes for particular syntaxes.
Several folding mechanisms are unified by the
fold-dwim interface.
See also http://www.emacswiki.org/emacs/CategoryHideStuff.
Folding of user-selected regions in Emacs
Folding of user-selected regions in Emacs is implemented by the hide-region-hide command.
I have been using folding-mode for quite some time. With auto-insert template and abrevs it works quite well for me for for some nice bricks of code.
Being able to produce the buffer folded (for printing/emailing) has always been a desire of mine. Some of my folding tags are for secure / password hiding.
I know this is a bit old but for me origami.el works perfectly well out of the box.
Yes Finally code folding is there in emacs. Try yafolding present at melpa.org package library.