What does the "yy" in lex.yy.c stand for? - lex

What does the "yy" in lex.yy.c stand for?

Lex was meant to be used in concert with Yacc. The history and details of this are detailed in Steven Johnson's paper Yacc: Yet Another Compiler Compiler. The Yacc parser uses only names beginning in "yy' - there's no apparent meaning discussed beyond simply desiring a namespace. The "yy" in lex.yy.c indicates that the lex output is intended for a yacc parser.

I believe the "yy" is used to indicate that this is a generated code file.
Usually, the lex utility writes the program it generates to the file lex.yy.c
- Reference

I think it comes from Yacc, one of the older parser generators.

Probably from YACC (Yet Another Compiler Compiler), which was used with Lex to implement quite a few compilers and similar programs. The Gnu equivalents are Bison and Flex, and look a lot more widespread now, so the connection may not be obvious.

Related

Code generator for CLI based on CLD file

Although programming using the CLI$ routines is not very hard, it would be nice if there were a code generator for the basic stuff based on the CLD file. Does anyone have something like that, or is there anyone interested in it?
There is a code generator of sorts at http://www.tomwade.eu/software/vmsarg.html
This is designed for when you're porting a C program onto VMS that is set up to use the typical terse and unfriendly qualifiers like
$ mumble -f -l foo.txt
that Unix loves. It generates code that allows the program to accept
$ mumble /fast /log=foo.txt
and translates it into the hieroglyphics that the program expects. Add CLD like functionality to the program with minimal C coding.
It sounds like you have used enough of the features of CLDs that it would be a project to write a TECO macro to massage the CLD into the corresponding MUMPS code. (Sorry, wrong language?) Even LIB$TPARSE, or its Alpha replacement, would take some time to wrangle. Sounds like you have a "boring job" ahead of you, or a co-op. (Named for the sound it makes when it hits the wall.) Or find a YACC guru or someone with facility at various other parsing tools and turn them loose.

Does Common Lisp have the fastest PCRE implementation?

A friend claimed that Common Lisp has the fastest Perl-compatible regular expression library of any language, including Perl itself, because with an optimizing JIT compiler like SBCL, CL-PPCRE can compile each particular regex down to native assembly whereas other implementations, include Perl's, must generate bytecode and interpret it. In practice, especially for the common case where we try to match the same regex against many inputs or long input, the compilation overhead is more than justified.
Unfortunately, I can't find any benchmarks on this, and I don't know enough to run my own, so I turn to the hive mind. Can anyone evaluate this claim?
I have no benchmarks of my own to share, but perhaps your friend was referring to results concerning the portable regular expression library CL-PPCRE. The current web page no longer speaks of benchmarks, but courtesy of the Wayback Machine we can see that it used to show benchmark results where CL-PPCRE outperformed Perl 2-to-1. Benchmarking is a tricky business (especially for moving targets), which might explain why the current page is silent on the matter.
PCRE library has a JIT compiler module now, and it has a good performance compared to other regexes: http://sljit.sourceforge.net/regex_perf.html or http://blog.rburchell.com/2011/12/why-i-avoid-qregexp-in-qt-4-and-so.html

Where can I find a formal grammar for the Perl programming language?

I understand that the Perl syntax is ambiguous and that its disambiguation is non-trivial (sometimes involving execution of code during the compile phase). Regardless, does Perl have a formal grammar (albeit ambiguous and/or context-sensitive)?
From perlfaq7
Can I get a BNF/yacc/RE for the Perl language?
There is no BNF, but you can paw your
way through the yacc grammar in
perly.y in the source distribution if
you're particularly brave. The grammar
relies on very smart tokenizing code,
so be prepared to venture into toke.c
as well.
In the words of Chaim Frenkel: "Perl's
grammar can not be reduced to BNF. The
work of parsing perl is distributed
between yacc, the lexer, smoke and
mirrors."
To see the wonderful set of examples of WHY it's pretty much near impossible to parse Perl due to context influences, please look into Randal Schwartz's post: On Parsing Perl
In addition, please see the discussion in "Perl 5 Internals (Chapter 5. The Lexer and the Parser)" by Simon Cozens.
Please note that the answer is different for Perl6:
There exists a grammar for Perl6
Rakudo Perl has its own version of the grammar
Other people have posted this link before on similar questions, but I think it is fun and has a great case example: Perl Cannot Be Parsed (A Formal Proof).
From that link:
[Consider] the following devilish
snippet of code, concocted by Randal
Schwartz, and determine the correct
parse for it:
whatever / 25 ; # / ; die "this dies!";
Schwartz's Snippet can parse two different ways: if whatever is nullary
(that is, takes no arguments), the
first statement is a division in void
context, and the rest of the line is a
comment. If whatever takes an
argument, Schwartz's Snippet parses as
a call to the whatever function with
the result of a match operator, then a
call to the die() function.
This means that, in order to statically parse Perl, it must be
possible to determine from a string of
Perl 5 code whether it establishes a
nullary prototype for the whatever
subroutine.
I just post this part to show that it gets really hard really quickly.
Alternatively, many code/text editors can do a decent (though never great) job of syntax highlighting so you may start at those specs to see what they do. In fact you have inspired me, I think I will post a related question asking what editor best highlights Perl.
There is no formal grammar in the sense "this is the specification of Perl 5" (The Perl 6 effort is trying to fix that, though). But there is a formal grammar in the Perl 5 source code. Of course, understanding the code is most likely not a trivial undertaking.
Jeffrey Kegler has written some good articles about the perl grammar as well on his blog. In particular see, this post and this one. The rest of the blog has some quite interesting thoughts on parsing in general as well.

Is there a Perl Syntax Highlighter (outputting to HTML) like PHP's GeSHi?

Most PHP Developers are likely familar with the Syntax Highlighter called "GeSHi", which takes code, highlights it, with the use of HTML and CSS:
include('geshi.php');
$source = 'echo "hello, world!";
$language = 'php';
$path = 'geshi/';
$geshi = new GeSHi($source, $language, $path);
echo $geshi->parse_code();
GeSHi Supports a wide range of languages.
I wonder, is there a similar Module for Perl?
Perl has a port of Kate highlighting system: Syntax::Highlight::Engine::Kate which seems to be somewhat close to what you need. It appears to be part of Padre.
You also have an option of HTML client side highlighters (logic is obviously JS), such as Google's code prettifyer
Two good lists of syntax highlighting engines are:
Wiki syntax highlighting article - among the ones it lists, the Perl ports/APIs seem to exist for Kate and Colorer (Syntax::Highlight::Universal)
This very good review of HTML syntax highlighters, which contains a lot of client-side ones such as SHJS and many others.
Please be aware that NONE of those generic highlighters work "100% correctly", the way the syntax highlighters work in good IDEs, because they use regular expressions for approximate parsing instead of lexers for actual language grammar parsing. More details on the Wiki
You can also consider this for client side syntax highlighting.
http://alexgorbatchev.com/SyntaxHighlighter/
I have had some very good results with the PPI::HTML package. It uses PPI to parse the Perl before converting the text to HTML.
Pure Perl: Syntax::Highlight::Engine::Kate (there is Kate plugin for Padre IDE).
Wrappers for C libraries: Syntax::Highlight::Universal, Syntax::SourceHighlight.
Using external tools: Text::VimColor, Text::EmacsColor.
Also there are many one-language highlighters on CPAN.
You can always write a small php script to make GeSHi usable from command line and then call it within your perl script.
I did this for gitweb so I could leave svn (and websvn) behind for good.
My search brought me here, because I was looking for a 'Perl Syntax Highlighter' like the title said and not an general highlighter implemented in Perl.
To highlight only Perl, perltidy --html can be used. It belongs to the Perl::Tidy distribution the main module can be imported and used without spawning a process.
https://metacpan.org/dist/Perl-Tidy/view/bin/perltidy#HTML-OPTIONS
So not what the OP actually wanted to know, but hopefully of help for others coming here for the same reasons like me ... :)

Why are Perl source filters bad and when is it OK to use them?

It is "common knowledge" that source filters are bad and should not be used in production code.
When answering a a similar, but more specific question I couldn't find any good references that explain clearly why filters are bad and when they can be safely used. I think now is time to create one.
Why are source filters bad?
When is it OK to use a source filter?
Why source filters are bad:
Nothing but perl can parse Perl. (Source filters are fragile.)
When a source filter breaks pretty much anything can happen. (They can introduce subtle and very hard to find bugs.)
Source filters can break tools that work with source code. (PPI, refactoring, static analysis, etc.)
Source filters are mutually exclusive. (You can't use more than one at a time -- unless you're psychotic).
When they're okay:
You're experimenting.
You're writing throw-away code.
Your name is Damian and you must be allowed to program in latin.
You're programming in Perl 6.
Only perl can parse Perl (see this example):
#result = (dothis $foo, $bar);
# Which of the following is it equivalent to?
#result = (dothis($foo), $bar);
#result = dothis($foo, $bar);
This kind of ambiguity makes it very hard to write source filters that always succeed and do the right thing. When things go wrong, debugging is awkward.
After crashing and burning a few times, I have developed the superstitious approach of never trying to write another source filter.
I do occasionally use Smart::Comments for debugging, though. When I do, I load the module on the command line:
$ perl -MSmart::Comments test.pl
so as to avoid any chance that it might remain enabled in production code.
See also: Perl Cannot Be Parsed: A Formal Proof
I don't like source filters because you can't tell what code is going to do just by reading it. Additionally, things that look like they aren't executable, such as comments, might magically be executable with the filter. You (or more likely your coworkers) could delete what you think isn't important and break things.
Having said that, if you are implementing your own little language that you want to turn into Perl, source filters might be the right tool. However, just don't call it Perl. :)
It's worth mentioning that Devel::Declare keywords (and starting with Perl 5.11.2, pluggable keywords) aren't source filters, and don't run afoul of the "only perl can parse Perl" problem. This is because they're run by the perl parser itself, they take what they need from the input, and then they return control to the very same parser.
For example, when you declare a method in MooseX::Declare like this:
method frob ($bubble, $bobble does coerce) {
... # complicated code
}
The word "method" invokes the method keyword parser, which uses its own grammar to get the method name and parse the method signature (which isn't Perl, but it doesn't need to be -- it just needs to be well-defined). Then it leaves perl to parse the method body as the body of a sub. Anything anywhere in your code that isn't between the word "method" and the end of a method signature doesn't get seen by the method parser at all, so it can't break your code, no matter how tricky you get.
The problem I see is the same problem you encounter with any C/C++ macro more complex than defining a constant: It degrades your ability to understand what the code is doing by looking at it, because you're not looking at the code that actually executes.
In theory, a source filter is no more dangerous than any other module, since you could easily write a module that redefines builtins or other constructs in "unexpected" ways. In practice however, it is quite hard to write a source filter in a way where you can prove that its not going to make a mistake. I tried my hand at writing a source filter that implements the perl6 feed operators in perl5 (Perl6::Feeds on cpan). You can take a look at the regular expressions to see the acrobatics required to simply figure out the boundaries of expression scope. While the filter works, and provides a test bed to experiment with feeds, I wouldn't consider using it in a production environment without many many more hours of testing.
Filter::Simple certainly comes in handy by dealing with 'the gory details of parsing quoted constructs', so I would be wary of any source filter that doesn't start there.
In all, it really depends on the filter you are using, and how broad a scope it tries to match against. If it is something simple like a c macro, then its "probably" ok, but if its something complicated then its a judgement call. I personally can't wait to play around with perl6's macro system. Finally lisp wont have anything on perl :-)
There is a nice example here that shows in what trouble you can get with source filters.
http://shadow.cat/blog/matt-s-trout/show-us-the-whole-code/
They used a module called Switch, which is based on source filters. And because of that, they were unable to find the source of an error message for days.