What is scanner.l and how is it different from lex.yy.c? - lex

I have read a few books and sites but I am still not clear about a lexer, scanner.l and the difference between a lexer and lex.yy.c, scanner and lexer?

lex (or flex) take a scanner description, usually in a file whose extension is .l, and produce a C program which implements the scanner, often named lex.yy.c although you can (and should) use a more meaningful name. Many people use the word "lexer" to mean the same as "scanner"; you can consider them synonyms.

Related

Effect of use Encode qw/encode decode from_to/;?

What is the effect of this at the top of a perl script?
use Encode qw/encode decode from_to/;
I found this on code I have taken over, but I don't know what it does.
Short story: for an experienced perl coded who knows what modules are:
The Encode module is for converting perl strings to "some other" format (for which there are many sub-modules that define difference formats). Typically, it's used for converting to and from Unicode formats eg:
... to convert a string from Perl's internal format into ISO-8859-1, also known as Latin1:
$octets = encode("iso-8859-1", $string);
decode is for going the other way, and from_to converts a string from one format to another in place;
from_to($octets, "iso-8859-1", "cp1250");
Long story: for someone who doesn't know what a module is/does:
This is the classic way one uses code from elsewhere. "Elsewhere" usually means one of two possibilities - either;
Code written "in-house" - ie: a part of your private application that a past developer has decided to factor out (presumably) because its applicable in several locations/applications; or
Code written outside the organisation and made available publicly, typically from the Comprehensive Perl Archive Network - CPAN
Now, it's possible - but unlikely - that someone within your organization has created in-house code and co-incidentally used the same name for a module on CPAN so, if you check CPAN by searching for "Encode" - you can see that there is a module of that name - and that will almost certainly be what you are using. You can read about it here.
The qw/.../ stands for "quote words" and is a simple short hand for creating a list of strings; in this case it translates to ("encode", "decode", "from_to") which in turn is a specification of what parts of the Encode module you (or the original author) want.
You can read about those parts under the heading "Basic methods" on the documentation (or "POD") page I referred earlier. Don't be put off by the reference to "methods" - many modules (and it appears this one) are written in such a way that they support both an Object Oriented and functional interface. As a result, you will probably see direct calls to the three functions mentioned earlier as if they were written directly in the program itself.

Is It Better Practice To Use package-name:symbol In Code Or :use :package-name In A DEFPACKAGE?

This is I suspect, a matter of style and/or personal taste but I thought I'd ask anyway.
I have been in the habit of defining packages thus:
(defpackage :wibble
(:use :cl :drakma)
(:export :main))
Once I have executed IN-PACKAGE (:wibble, in this case), I can then use the symbols in DRAKMA unadorned:
(http-request ...
Then I recently read that seasoned Lisp hackers would rather not :use but:
(drakma:http-request ...
Just wondered what the consensus of opinion was on here and whether there were any pros or cons (not that type of CONS :) ) either way?
Cheers,
Peter
When you use a package, there are a couple subtle ways things might go wrong if the used package changes.
First, the package might export more symbols in the future. If, for example, the package exports a new symbol library:rhombus and you're already using that myapp::rhombus to name something, you are suddenly using the inherited symbol, with all possible attachments (e.g. classes, defuns, macros, etc), with sometimes strange results. If you use qualified symbol names, you will not get any more or any less than the symbols you want.
Second, the package might stop exporting symbols in the future. So if, for example, library:with-rhombus disappears, your call to (with-rhombus (42 42 42) ...) will suddenly get an error for an invalid function call (42 ...) rather than something that points directly to the source of the problem, the "missing" symbol. If you use qualified symbol names, you will get an error along the lines of Symbol WITH-RHOMBUS is not exported from the LIBRARY package which is clearer.
Importing symbols (with :import-from or :shadowing-import-from or import) is not without its own trouble. Importing works on any symbol, regardless of whether it's external or not. So it could be the case that the symbol is now library::rhombus, i.e. not intended for public consumption any more, but importing will still work with no errors.
Which option you use depends on your comfort level with the source package. Do you control it, and you will not make any conflicting changes without thorough testing? Go ahead and import or use to your heart's content. Otherwise, be careful about checking for unintended side-effects as library package interfaces change.
This is more a style issue, so it's impossible to categorize it in black and white, but here are the pros and cons:
Using package-qualified symbols.
Avoids symbol conflicts.
Allows to clearly distinguish foreign symbols.
Allows to easily search, replace, copy,... uses of a certain symbol from the external library (for refactoring, extracting the code to some other place etc.)
Makes code uglier, but only when library names are too long. (For example, I add a nickname re to cl-pprce, and now the code using it is even better, than w/o qualification: think re:scan)
Importing the whole package
Basically, the opposite of the previous case. But I tend to use it with utility libraries, because using qualified names often beats their whole purpose of making code more concise and clear :)
:import-from package symbol
This is one option you've forgotten to mention. I think it may be useful, when you use one or too very distinct symbols from a certain package very frequently. It also allows to import unexported symbols.
Good answers so far.
Another view is that a package and its symbols make up a language. If you think a symbol should be a part of this language, then you should make it available without the need to qualify it with another package - when programming in this language.
For example in the CLIM implementation there is a CLIM-LISP package which sets up the implementation language. It is a variant of the COMMON-LISP package. Then there are packages like CLIM-SYS (resources, processes, locks, ...), CLIM-UTILS (various utilities and extensions of Common Lisp) and CLIM itself. Now in a new package SILICA (an abstract window system) these four packages are used. The implementation of Silica thus is implemented in a language which is built as a union of two languages (the Common Lisp variant CLIM-LISP and the UI commands of CLIM) plus two utility packages which extend CLIM-LISP with some facilities.
In above example it makes sense to use the packages, since they are extending each other to form a new language and the implementation in that new package makes heavy use of those.
If you had a package which needs conflicting packages, then it would not make sense to use them. For example a package could use drawing commands tailored to a GUI and for Postscript output. They would have similar names. Using them both would lead to conflicts. You also want to make clear in the source code for the human reader from where these symbols are coming. Is it a line-drawing command from a postscript or a GTK+ library? Would be great if you can find it out easily - even though the function names are the same.
As a rule of thumb, I :use packages that extend the general language, but use qualified symbols for packages that have some special application. For example, I'd always :use alexandria, but refer fully qualified to symbols from Hunchentoot. When in doubt, I use qualified names.

The purpose of Lisp syntax to model AST

Lisp syntax represents AST as far as I know, but in high level format to allow human to easily read and modify, at the same time make it easy for the machine to process the source code as well.
For this reason, in Lisp, it is said that code is data and data is code, since code (s-epxression) is just AST, in essence. We can plug in more ASTs (which is our data, which is just lisp code) into other ASTs (lisp code) or independently to extend its functionality and manipulate it on the fly (runtime) without having to recompile the whole OS to integrate new code.In other languages, we have to recompile from to turn the human-language source code into valid AST before it is compiled into code.
Is this the reason for Lisp syntax to be designed like it is (represent an AST but is human readable, to satisfy both human and the machine) in the first place? To enable stronger (on the fly - runtime) as well as simpler (no recompile, faster) communication between man-machine?
I heard that the Lisp machine only has a single address space which holds all data. In operating system like Linux, the programmers only have virtual address space and pretend it to be the real physical address space and can do whatever they want. Data and code in Linux are separated regions, because effectively, data is data and data is code. In normal OS written in C (or C like language), it would be very messy if we only operate a single address space for the whole system and mixing data with code would be very messy.
In Lisp Machine, since code is data and data is code, is this the reason for this to have only a single address space (without the virtual layer)? Since we have GC and no pointer, should it be safe to operate on physical memory without breaking it (since having only 1 single space is a lot less complicated)?
EDIT: I ask this because it is said that one of the advantage of Lisp is single address space:
A safe language means a reliable environment without the need to
separate tasks out into their own separate memory spaces.
The "clearly separated process" model characteristic of Unix has
potent merits when dealing with software that might be unreliable to
the point of being unsafe, as is the case with code written in C or
C++ , where an invalid pointer access can "take down the system."
MS-DOS and its heirs are very unreliable in that sense, where just
about any program bug can take the whole system down; "Blue Screen of
Death" and the likes.
If the whole system is constructed and coded in Lisp, the system is as
reliable as the Lisp environment. Typically this is quite safe, as
once you get to the standards-compliant layers, they are quite
reliable, and don't offer direct pointer access that would allow the
system to self-destruct.
Third Law of Sane Personal Computing
Volatile storage devices (i.e. RAM) shall serve exclusively as
read/write cache for non-volatile storage devices. From the
perspective of all software except for the operating system, the
machine must present a single address space which can be considered
non-volatile. No computer system obeys this law which takes longer to
fully recover its state from a disruption of its power source than an
electric lamp would.
Single address space, as it is stated, holds all the running processes in the same memory space. I am just curious why people insist that single address space is better. I relate it to the AST like syntax of Lisp, to try to explain how it fits the single space model.
Your question doesn't reflect reality very accurately, especially in the part about code/data separation in Linux and other OS'es. Actually, this separation is enforced not at the OS level, but by the compiler/program loader. At the OS level there are just memory pages that can have different protection bits set (like executable, read-only etc), and above this level different executable formats exist (like ELF in Linux) that specify restrictions on different parts of program memory.
Returning to Lisp, as far as I know, historically, the S-expression format was used by Lisp creators, because they wanted to concentrate on the semantics of the language, putting syntax aside for some time. There was a plan to eventually create some syntax for Lisp (see M-expressions), and there were some Lisp-based languages which had some more syntax, like Dylan. But, overall, the Lisp community had come to the consensus, that the benefits of S-expressions outweight their cons, so they had stuck.
Regarding code as data, this is not strictly bound to S-expressions, as other code can as well be treated as data. This whole approach is called meta-programming and is supported at different levels and with different mechanisms by many languages. Every language, that supports eval (Perl, JavaScript, Python) allows to treat code as data, just the representation is almost always a string, while in Lisp it is a tree, which is much much more convenient and facilitates advanced stuff, like macros.

How can ported code be detected?

If you port code over from one language to another, how can this be detected?
Say you were porting code from c++ to Java, how could you tell?
What would be the difference between a program designed and implemented in Java, and a near identical program ported over to Java?
If the porting is done properly (by people expert in both languages and ready to translate the source language's idioms into the best similar idioms of the target language), there's no way you can tell that any porting has taken place.
If the porting is done incompetently, you can sometimes recognize goofily-transliterated idioms... but that can be hard to distinguish from people writing a new program in a language they know little just goofily transliterating the idioms from the language they do know;-).
Depending on how much effort was put into the intention to hide the porting it could be very easy to impossible to detect.
I would use pattern recognition for this task. Think about the "features" which would indicate code-similarities. Extract these feature from each code and compare them.
e.g:
One feature could be similar symbol names. Extract all symbols using ctags or regular expressions, make all lower-case, make uniq sort of both lists and compare them.
Another possible feature:
List of class + number of members e.g:
MyClass1 10
...
List of method + sequence of controll blocks. e.g:
doSth() if, while, if, ix, case
...
Another easy way, is to represent the code as a picture - e.g. load the code as text in Word and set the font size to 1. Human beings are very good on comparing pictures. For another Ideas of code Visualization you may check http://www.se-radio.net/2009/03/episode-130-code-visualization-with-michele-lanza/

Syntax changes from the examples in 'The Little Schemer' to the real Scheme

I have recently started following the examples from The Little Schemer and when trying out the examples in DrScheme, I have realised that there are some minor syntax changes from the examples in the book to what I can write in DrScheme.
First of all, as a language in DrScheme, I chose Pretty Big (one of the Legacy Languages).
Is this the correct choice for trying the examples in the book?
As regards the syntax changes I have noticed that, for example, I need to prefix the identifiers with a ' in order for them to work.
For example:
(rember 'jelly '(peanut butter jelly))
Are there any more changes (syntactical or not) that I need to be aware of when trying the examples from the 'The Little Schemer' book ?
IIRC, the book uses a different font for quoted pieces of data, and in real Scheme code that requires using quote. As for your use of PLT Scheme -- the "Pretty Big" language is really there just as a legacy language. You should use the Module language, and have all files start with #lang scheme (which should be there by default).
(The "new" way of using different languages in DrScheme is to always be in the Module "language" and specify the actual language using a #lang line.)
See the "Guidelines for the reader" section in the Preface. (I'm looking at the 4th edition here.)