How does one implement the typedef hack in an antlr4 grammar - macros

I don't need typedef's exactly. I need aliases (for a shell language). But the hack of looking up an identifier and returning a different token type is what I need to make the grammar work. I don't necessarily need it to be done in the lexer, although that would seem cleanest to me (or in a phase between the lexer and parser).
Here is (a fragment of) the closest I can seem to come to a solution given what I know of antlr4, but it requires a whole level of non-terminals for each keyword token. Note, that per Antlr4 Capitalized words or tokens, lower case words are non-terminals.
aliasstmt: alias ident ident; // rule that makes aliases
ifstmt: if expression then statement; // sample rule with two keywords
// non-terminals converting aliases into keywords
alias: Alias // normal token for keyword
// hack, LookupAlias is map, I need.
| { LookupAlias(_input.LT(1).getText()).equals("alias") }? Ident
;
if : If
| { LookupAlias(_input.LT(1).getText()).equals("if") }? Ident
;
then : Then
| { LookupAlias(_input.LT(1).getText()).equals("then") }? Ident
;
// Non-terminal going the other way, converting keywords to identifiers when needed
ident : Ident
| Alias
| If
| Then
;
Now, I suppose, I could get rid of the Tokens for the keywords and do it all in the parser for this example. It wouldn't completely work in the language I'm parsing because a significant number of the keywords have "normal" spellings like "Set-Alias" or "-Name" which are not legal identifiers (and "Set - Alias" or "Set -Alias" is not the same as "Set-Alias", uggh).
However, I want to LookupAlias() function to be it's own Java class not something just embedded in the parser. I have other times I need to us it that aren't part of parsing and those times need to have then coordinated. How to do that is a separate question I will ask.

(Caveat... maybe aliases can be used in a shell in places I don’t know about, so this is based on my understanding)
In a shell, an alias is essentially an identifier that is expanded when it’s encountered. It’s only expected where a command could occur, and since you can’t know all the command in the path, your grammar would likely have an IDENTIFIER token (or the like) at that location in the parser rule.
You’d then check it against a list of built-in commands, commands in your PATH, and aliases (I’m not sure of the precedence, TBH).
So, you’d need to keep a symbol table to look up the alias resolution. I think post-resolution is where things will get “tricky”. IIRC, aliases don’t have to be syntactically complete, you you couldn’t really expect to pre-parse them (they possibly won’t parse correctly). Also, they are pretty much “injected” into the input stream. In this way they’re much more like pre-processor macros. I don’t see much way around detecting them, building an expanded input stream and lexing/parsing it.
I suppose that you could write a custom TokenStream, that detected aliases and responded to getNextToken() (and methods to get the token at a particular index, etc.). That would allow aliases anywhere in the token stream, which could get weird, and it would be the devil, probably, to provide useful error messages. (I guess you’d just have to point them at the alias itself). This approach would supply the alias definition tokens in place of the alias as the parser asked for the next token. I don’t see a way that you’ll use actions/predicates to change ANTLRs mind about what token it just saw :).
I suspect playing with existing shells a bit, creating invalid alias substitutions into the command line, and observing the error messages, might give insight into how other shells handle it. My impression, is that the shell preprocess the input and substitutes things like aliases and ENV variables, etc. and then re-parses the result the result for execution.
I’m pretty sure trying to modify the tokenStream as the parser is already processing it, is either no doable, or the path to madness.

Related

gcc precompiler directive __attribute__ ((__cleanup__)) vs ((cleanup)) (with vs without underscores?)

I'm learning about gcc's cleanup attribute, and learning how it calls a function to be run when a variable goes out of scope, and I don't understand why you can use the word "cleanup" with or without underscores. Where is the documentation for, or documentation of, the version with underscores?
The gcc documentation above shows it like this:
__attribute__ ((cleanup(cleanup_function)))
However, most code samples I read, show it like this:
__attribute__ ((__cleanup__(cleanup_function)))
Ex:
http://echorand.me/site/notes/articles/c_cleanup/cleanup_attribute_c.html
http://www.nongnu.org/avr-libc/user-manual/atomic_8h_source.html
Note that the first example link states they are identical, and of course coding it proves this, but how did he know this originally? Where did this come from?
Why the difference? Where is __cleanup__ defined or documented, as opposed to cleanup?
My fundamental problem lies in the fact that I don't know what I don't know, therefore I am trying to expose some of my unknown unknowns so they become known unknowns, until I can study them and make them known knowns.
My thinking is that perhaps there is some globally-applied principle to gcc preprocessor directives, where you can arbitrarily add underscores before or after any of them? -- Or perhaps only some of them? -- Or perhaps it modifies the preprocessor directive or attribute somehow and there are cases where one method, with or without the extra underscores, is preferred over the other?
You are allowed to define a macro cleanup, as it is not a name that is reserved to the compiler. You are not allowed to define one named __cleanup__. This guarantees that your code using __cleanup__ is unaffected by other code (provided that other code behaves, of course).
As https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html#Attribute-Syntax explains:
You may optionally specify attribute names with __ preceding and following the name. This allows you to use them in header files without being concerned about a possible macro of the same name. For example, you may use the attribute name __noreturn__ instead of noreturn.
(But note that attributes are not preprocessor directives.)

At which lines in my MATLAB code a variable is accessed?

I am defining a variable in the beginning of my source code in MATLAB. Now I would like to know at which lines this variable effects something. In other words, I would like to see all lines in which that variable is read out. This wish does not only include all accesses in the current function, but also possible accesses in sub-functions that use this variable as an input argument. In this way, I can see in a quick way where my change of this variable takes any influence.
Is there any possibility to do so in MATLAB? A graphical marking of the corresponding lines would be nice but a command line output might be even more practical.
You may always use "Find Files" to search for a certain keyword or expression. In my R2012a/Windows version is in Edit > Find Files..., with the keyboard shortcut [CTRL] + [SHIFT] + [F].
The result will be a list of lines where the searched string is found, in all the files found in the specified folder. Please check out the options in the search dialog for more details and flexibility.
Later edit: thanks to #zinjaai, I noticed that #tc88 required that this tool should track the effect of the name of the variable inside the functions/subfunctions. I think this is:
very difficult to achieve. The problem of running trough all the possible values and branching on every possible conditional expression is... well is hard. I think is halting-problem-hard.
in 90% of the case the assumption that the output of a function is influenced by the input is true. But the input and the output are part of the same statement (assigning the result of a function) so looking for where the variable is used as argument should suffice to identify what output variables are affected..
There are perverse cases where functions will alter arguments that are handle-type (because the argument is not copied, but referenced). This side-effect will break the assumption 2, and is one of the main reasons why 1. Outlining the cases when these side effects take place is again, hard, and is better to assume that all of them are modified.
Some other cases are inherently undecidable, because they don't depend on the computer states, but on the state of the "outside world". Example: suppose one calls uigetfile. The function returns a char type when the user selects a file, and a double type for the case when the user chooses not to select a file. Obviously the two cases will be treated differently. How could you know which variables are created/modified before the user deciding?
In conclusion: I think that human intuition, plus the MATLAB Debugger (for run time), and the Find Files (for quick search where a variable is used) and depfun (for quick identification of function dependence) is way cheaper. But I would like to be wrong. :-)

Invalid character stream macros

The following preprocessor macro:
#define _VARIANT_BOOL /##/
is not actually valid C; roughly speaking, the reason is that the preprocessor is defined as working on a stream of tokens, whereas the above assumes that it works on a stream of characters.
On the other hand, unfortunately the above actually occurs in a Microsoft header file, so I have to handle it anyway. (I'm working on a preprocessor implementation.)
What other cases have people encountered in the wild, be it in legacy code however old as long as that code may be still in use, of preprocessor macros that are not actually valid, but work anyway because they were written under compilers that use a character oriented preprocessor implementation?
(Rationale: I'm trying to get some idea in advance how many special cases I'm going to have to hack, if I write a proper clean standard-conforming token oriented implementation.)
The relevant part of the standard (§6.10.3.3 The ## operator) says:
If the result is not a valid preprocessing token, the behavior is undefined.
This means that your preprocessor can do anything it likes and still be standard conforming, including emulating the common behaviour.
I think you can still have a "token-based" implementation and support this behaviour, by specifying that when the result of the ## operator is not a valid preprocessing token, the result is the two operand tokens unchanged. You may also want to have your preprocessor emit a warning about the invalid code.

Using regexp to index a file for imenu, performance is unacceptable

I'm producing a function for imenu-create-index-function, to index a source code module, for csharp-mode.el
It works, but delivers completely unacceptable performance. Any tips for fixing this?
The Background
I looked at js.el, which is the rebadged "espresso" now included, since v23.2, into emacs. It indexes Javascript files very nicely, does a good job with anonymous functions and various coding styles and patterns in common use. For example, in javascript one can do:
(function() {
var x = ... ;
function foo() {
if (x == 1) ...
}
})();
...to define a scope where x is "private" or inaccessible from other code. This gets indexed nicely by js.el, using regexps, and it indexes the inner functions (anonymous or not) within that scope also. It works quickly. A big module can be indexed in less than a second.
I tried following a similar approach in csharp-mode, but it's quite a bit more complicated. In Js, everything that gets indexed is a function. So the starting regex is "function" with some elaboration on either end. Once an occurrence of the function keyword is found, then there are 4 - 8 other regexps that get tried via looking-at - the number depends on settings. One nice thing about js mode is that you can turn on or off regexps for various coding styles, to speed things along I suppose. The default "styles" work for most of the code I tried.
This doesn't work in csharp-mode. It works, but it performs poorly enough to make it not very usable. I think the reason for this is that
there is no single marker keyword in C#, as function behaves in javascript. In C# I need to look for namespace, class, struct, interface, enum, and so on.
there's a great deal of flexibility with which csharp constructs can be defined. As one example, a class can define base classes as well as implemented interfaces. Another example: The return type for a method isn't a simple word-like string, but can be something messy like Dictionary<String, List<String>> . The index routine needs to handle all those cases, and capture the matches. This makes it run sloooooowly.
I use a lot of looking-back. The marker I use in the current approach is the open curly brace. Once I find one of those, I use looking-back to determine if the curly is a class, interface, enum, method, etc. I read that looking-back can be slow; I'm not clear on how much slower it is than, say, looking-at.
once I find an open-close pair of curlies, I call narrow-to-region in order to index what's inside. not sure if this is will kill performance or not. I suspect that it is not the main culprit, because the perf problems I see happen in modules with one namespace and 2 or 3 classes, which means narrow gets called 3 or 4 times total.
What's the Question?
My question is: do you have any tips for speeding up imenu-like indexing in a C# buffer?
I'm considering:
avoiding looking-back. I don't know exactly how to do this because when re-search-forward finds, say, the keyword class, the cursor is already in the middle of a class declaration. looking-back seems essential.
instead of using open-curly as the marker, use the keywords like enum, interface, namespace, class
avoid narrow-to-region
any hard advice? Further suggestions?
Something I've tried and I'm not really enthused about re-visiting: building a wisent-based parser for C#, and relying on semantic to do the indexing. I found semantic to be very very very (etc) difficult to use, hard to discover, and problematic. I had semantic working for a while, but then upgraded to v23.2, and it broke, and I never could get it working again. Simple things - like indexing the namespace keyword - took a very long time to solve. I'm very dissatisfied with it and don't want to try again.
I don't really know C# syntax, and without looking at your elisp it's hard to give an answer, but here goes anyway.
looking-back can be deadly slow. It's the first thing I'd experiment with. One thing that helps a lot is using the limit arg to, say, restrict your search to the beginning of the current line. A different approach is when you hit the open curly do backward-char then backward-sexp (or whatever) to get to the front of the previous word, then use looking-at.
Using keywords to search around instead of open curly is probably what I would have done. Maybe something like (re-search-forward "\\(enum\\|interface\\|namespace\\|class\\)[ \t\n]*{" nil t) then using match-string-no-properties on the first capture group to see which of the keywords was found. This might help with the looking-back problem as well.
I don't know how expensive narrow-to-region is, but could be avoided by when you find a open curly do save-excursion forward-sexp and keep point as a limit for the current iteration of your (I assume recursive) searches.

Is there a Perl equivalent for Emacs' ido-completion?

I've built a number of work-specific helper functions that could be useful for other members of my team&mdash. But I've written them all in Emacs' Elisp, and getting them to convert from Notepad++ is NOT going to happen.
So, I'm thinking convert the functions to Perl. No problem.
Except I use ido-completion all the time to limit responses:
(setq client (ido-completing-read "Select a Client: " '("IniTrade" "HedgeCorp" "GlobalTech" "OCP") nil t))
EDIT: ido-completing-read is similar to completing-read, except that all the options are visible, and can be selected via cycling [arrow-keys, usually] or typing-completion. In the example above, the prompt would look like
Select a Client: {IniTrade | HedgeCorp | GlobalTech | OCP}
selections can be made on the left-most item by hitting RET, or by partial typing (in this case, the first letters are all unique, so that's all that would be needed, and the matching item would become the left-most).
nil in the example is an unused param, but "t" requires an exact match -- eg, the user must make one of the selections. The function returns a string, such as "IniTrade".
My "helper functions" are for internal needs -- opening a particular error log, restoring a batch to the server, etc. For these operations, the user needs to specify test or production environment, client, stage, etc. In almost all cases, these are string selections that are used for building another shell command. If a numeric item is returned, that could in turn be re-translated to a string -- but since the selections are usually the required string, it would be nice if that step could be skipped. [end EDIT]
Is there a Perl equivalent? I've looked at Term::Prompt which offers up a numbered-menu... closest I've found. That's not as pretty as ido-completion, and I'd still have to convert a numeric-result backwards to a string (not a major issue; just annoying).
While composing this, I noticed I used the term 'menu', so did some more searching and came up with Term::Menus. I haven't tried this one yet.
Term::ReadLine may do what you're looking for, though it's probably more like 'completing-read' than 'ido-completing-read'.