"Fake" global lexical variables in Common Lisp

"Fake" global lexical variables in Common Lisp - lisp

It is stated in section "Global variables and constants" of the Google Common Lisp Style Guide that:
"Common Lisp does not have global lexical variables, so a naming convention is used to ensure that globals, which are dynamically bound, never have names that overlap with local variables.
It is possible to fake global lexical variables with a differently named global variable and a DEFINE-SYMBOL-MACRO. You should not use this trick, unless you first publish a library that abstracts it away."
Can someone, please, help me to understand the meaning of this last sentence.

The last sentence,
You should not use this trick, unless you first publish a library that abstracts it away.
means that if you do something that simulates global lexical variables, then the implementation of that simulation should not be apparent to the user. For instance, you might simulate a global lexical using some scheme using define-symbol-macro, but if you do, it should be transparent to the user. See Ron Garret's GLOBALS — Global Variables Done Right for an example of “a library that abstracts it away.”

Related

Usefulness of let?

Is the distinction between the 3 different let forms (as in Scheme's let, let*, and letrec) useful in practice?
I am current in the midst of developing a lisp-style language that does current support all 3 forms, yet I have found:
regular "let" is the most inefficient form, effectively having to translate to an immediately called lambda form and the instructions generated are nearly identical. Additionally, I haven't found myself needing this form very often.
let* (sequential binding) seems to be the most practically useful and most often used. This form can be translated to a sequence of nested "lets", each environment storing a single variable. But this again is highly inefficient, wasting space and lookup time.
letrec (recursive binding) can be efficiently implemented, given that no initializer expression refers to an unbound variable. Typically the case is that all initializers are lambda expressions and the above is true.
The question is: since letrec can be efficiently implemented and also subsumes the behavior of let*, regular let is not often used and can be converted to a lambda form with no great loss of efficiency, why not make default "let" have the behavior of the current "letrec" and be rid of the original "let"?

This [let*] form can be translated to a sequence of nested "lets", each environment storing a single variable. But this again is highly inefficient, wasting space and lookup time.
While what you are saying here is not incorrect, in fact there is no need for such a transformation. A compiling strategy for the simple let can handle the semantics of let* with just simple modifications (possibly supporting both with just a flag passed to common code).
let* just alters the scoping rules, which are settled at compile time; it's mostly a matter of which compile-time environment object is used when compiling a given variable init form.
A compiler can use a single environment object for the sequential bindings of a let*, and destructively update it as it compiles the variable init forms, so that each successive init form sees a more and more extended version of that environment which contains more and more variables. At the end of that, the complete environment is available with all the variables, for doing the code generation for generating the frame and whatnot.
One issue to watch out for is that a flat environment representation for let* means that lexical closures captured during the variable binding phase can capture future variables which are lexically invisible to them:
(let* ((past 42)
(present (lambda () (do-something-with past)))
(future (construct-huge-cumbersome-object)))
...))
If there is a single run-time environment object here containing the compiled versions of the variables past, present and future, then it means that the lambda must capture that environment. Which means that although ostensibly the lambda "sees" only the past variable, because future is not in scope, it has de facto captured future.
Thus, garbage collection will consider the huge-cumbersome-object to be reachable for as long as the lambda remains reachable.
There are ways to address this, like accompanying the environmental reference emanating from the lambda with some kind of frame index which says, "I'm only referencing part of the environment vector up to index 13". Then when the garbage collector traverses this fenced reference, it will only mark the indicated part of the environment vector: cells 0 to 13.
Anyway, about whether to implement both let and let*. I suspect if Lisp were being "green field" designed from scratch today, many designers would like reach for the sequentially binding version to be called let. The parallel construct would be the one available under the special name let*. The situations when you actually need let to be parallel are fewer. For instance, let allows us to re-bind a pair of variable symbols such that their contents appear exchanged; but this is rarely something that comes up in application programming. In some programming language cultures, variable shadowing is frowned up on entirely; GNU C has a -Wshadow warning against it, for instance.
Note how in ANSI Common Lisp, which has let and let*, the optional parameters of a function behave sequentially, like let*, and this is the only binding strategy supported! So that is to say:
(lambda (required &optional opt1 (opt2 opt1)) ...)
Here the value of opt2 is defaulted from whatever the value of opt1 is at the time of the call. The initialization expression of opt2 has the opt1 parameter in scope.
Also, in the same Lisp dialect, the regular setf is sequential; if you want parallel assignment you must use psetf, which is the longer name of the two.
Common Lisp already shows evidence of design decisions more recent than let tend to favor sequential operation, and designate the parallel as the extraordinary variant.

Think of metaprogramming. If your default let will sequentially create nested scopes, you'll have to make sure that none of the initialiser expressions are referring to the names from the wrong scopes. You have such a guarantee with a regular let. Control over name scoping is very important when you're generating code.
Letrec is even worse, it's introducing a very complicated scope rules that cannot be easily reasoned with.

Correct way of variable declaration in Perl

I have a set of 3 or 4 separate Perl scripts that used to be part of a simple pipeline, but I am now trying to combine them in a single script for easier use (for now without subroutine functions). The thing is that several variables with the same name are defined in the different scripts. The workaround I found was to give different names to those variables, but it can start to become messy and probably it is not the correct way of doing so.
I know the concept of global and local variables but I do not quite understand how do they exactly work.
Are there any rules of thumb for dealing with this sort of variables? Do you know any good documentation that can shed some light on variable-scope or have any advise on this?
Thanks.
EDITED: I already use "use warnings; use strict;" and declare variables with "my". The question might actually be more related to the definition of scoping blocks and how to get them to be independent from each other...

You are likely getting into trouble because of your use of global variables (which actually likely exist in package main). You should try to avoid the use of global variables.
And to do so, you should become familiar with the meaning of variable scope. Although somewhat dated, Coping with Scoping offers a good introduction to this topic. Also see this answer and the others to the question How to properly use Global variables in perl. (Short Answer: avoid them to the degree possible.)
The principle of variable scope and limiting use of global variables actually applies to nearly all programming languages. You should get in the habit of declaring variables as close as possible to the point where you are actually using them.
And finally, to save yourself from a lot of headaches, get in the habit of:
including use strict; and use warnings; at the top every Perl source file, and
declaring variables with my within each of your sub's (to limit the scope of those variables to the sub).
(See this PerlMonks article for more on this recommendation.)
I refer to this practice as "Perl programming with your seat belt on." :-)

The rule of thumb is to put your code in subroutines, each of them focused on a simple, well-defined part of the larger process. From this one decision flow many virtuous outcomes, including a natural solution to the variable scoping problem you asked about.
sub foo {
my $x = 99;
...
}
sub bar {
my $x = 1234; # Won't interfere with foo's $x.
...
}
If, for some reason, you really don't want to do that, you can wrap each section of the code in scoping blocks, and make sure you declare all variables with my (you should be doing the latter nearly always as a matter of common practice):
{
my $x = 99;
...
}
{
my $x = 1234; # Won't interfere with the other $x.
...
}

Moving from CGI to mod_perl. Understanding my, our, local

I've been using apache mod_cgi during some years. Now I am moving to mod_perl and I have found some problems, specially with subroutines. Until now I was never using my, our nor local; and the CGI scripts worked without problems. After reading documentation and even some previous questions posted here I understand more or less how my, our and local works. My concern is what information is going to be shared between the next requests (if I understand correctly, that's the main concern I must have while running mod_perl instead of mod_cgi).
Is there any difference between using our in a scalar or just the scalar without declaring anything special such as my? Aren't both global?
If I do not declare the scalar as private is going to be shared in the next request? Even in another request of a different perl script in the same server?
How can I share the value of a scalar inside a subroutine to outside that subroutine but not outside the same file nor the same request?
If I use a my in a scalar inside an if in the same level of the file or in the same subroutine, and after that I create another if where I use the same scalar; is that scalar shared between both if or each if means different blocks? What about while and for, are they different blocks for the previously declared as my scalar or that only works for subroutines and files?

mod_perl works by wrapping each Perl script in a subroutine called handler within a package based on the name and path of the script. Instead of starting a new process to run each script, this handler subroutine is called by one of a number of persistent Perl theads.
Ordinarily this knowledge would help a lot to understand the changes in environment from mod_cgi, but since you have never added use strict to your programs and become familiar with the workings of declared variables you have a lot of catching up to do!
The mod_perl environment has the potential for causing non-obvious security breaches, and you should start now to use strict on every script and declare every variable. use Carp will also help you to understand the error logs.
A variable name declared with our is a lexically-scoped synonym for a package variable of the same name that can be used without fully qualifying the name by including the package name. For instance, ordinarily a variable declared with our $var will provide access to the $main::var scalar (if there has been no preceding package declaration) without specifying main::. However, such variables that began life with a value of undef in mod_cgi will now retain their values from the previous execution of any given mod_perl thread, and for consistency it is safest to always initialise them at the point of declaration. Note also that the default package name is no longer main because of the wrapping that mod_perl does, so you can no longer access package variables using the main:: prefix, and it is unwise to find the actual name of the package and explicitly use that because it will be a very long name and will change if you move or rename your script.
A my variable is one that exists independently of the package symbol table, and normally its lifetime is the run time of the enclosing file (for variables declared at file scope) or subroutine. They are safe in mod_perl if both declared and used at file scope of the script or entirely within one subroutine, but you can be stung if you mix scopes and declare a my $global at file scope and then try to use it in a subroutine. The reason for this isn't simple, but it is caused by mod_perl wrapping your script in a handler subroutine so you have nested subroutine declarations. The inner subroutine will tend to adopt only the first instantiation of $global and ignore any others created by later calls to handler. If you need a global variable you should declare it with our and initialise it in that declaration as described above.
A local variable is very like an our variable in that it forms a synonym to a package variable. However it temporarily saves the current value of that variable and provides a new copy for use until the end of the file or block scope. Because of its automatic creation and deletion within its scope it can be a useful alternative to a my variable in mod_perl scripts, particularly where you are using pointers to data structures like, say, an instance of the CGI class. Declaring our $cgi = CGI->new would correctly create the object but, because of mod_perl's persistence, would leave it in memory until the next execution of the thread deletes it to make room for another one.
As for your questions:
Using a variable without declaring it either causes a compile-time error if use strict is in place as it should be. Otherwise it is a synonym for that variable in the current package namespace.
Variables are either package variables or lexical variables; there is no way to declare a variable as private as such. Lexical variables (declared with my) will be created and destroyed with each execution of the script, unless you have created an invalid closure as described above by writing a subroutine that uses a variable declared at a wider scope, when the variable will be persistent but won't do what you want it to. A variable declared with our will retain its value across calls to the script, while one declared with local will be destroyed when the script terminates. Both our and local variables are package variables and all references to the same variable name refer to the same variable.
To declare a variable that is consistently accessible everywhere within any one call of a script you can either use a local variable or an initialised our variable. At file scope local $global is largely equivalent to our $global = undef for mod_perl scripts. If you use an our variable to point to a data structure then remember to destroy it at the end of the script with undef $global.
my variables are unique to, and visible within, the block in which they are declared, whether that is a block within an if, while or for, or even just a bare { ... } block scope. Always use my variables for temporary work variables that are used only within a block and accessed from nowhere else.
I hope this helps

Edit: this is general information on Perl variable scoping only. Please see Borodin's post for specific mod_perl issues.
Variables declared with my are lexical. In other words, they exist only within the current scope. You should declare all of your variables with my by default; only do something else when you specifically want different functionality.
Using lexically-scoped variables is a basic part of good code design in (almost) any language. Putting use strict; and use warnings; in all of your scripts will require you to follow this good practice.
our is a way of declaring a global variable; the underlying result is very similar to using undeclared globals. However, it has two differences:
You are explicitly stating that you want the variable to be global. This is a good practice to follow, since use of global variables should be an exceptional case. Because of this, you can create a global in this way even if you use strict;.
The variable declared with our will be accessible by the name you declare throughout all packages in the current scope. An undeclared variable, by contrast, is only accessible by simple name within the current package. Outside of that, you could only refer to it as $package::variable.
See the documentation for our for more details.
local does not create a lexical variable; instead, it is a way to give a global variable a temporary value within the current scope. It is mostly used with Perl's special built-in (punctuation) variables:
{
local $/; #make the record separator undefined in this scope only.
my $file = <FILE>; #read in an entire file at once.
}
You can go far simply by using my at all times for your variables and using local only for special cases like that shown above.

Confusing about the Emacs custom system

There are several similar setting functions:
set & setq
set-default
defcustom
custom-set-value
custom-set-variables
customize-set-value
customize-set-variable
so, what's the difference between these functions?
If I want setting my own preferences to an add-on, for these scenario:
If a variable setting by defcustom, which setting-function will be better?
And what about a variable setting by defvar?

The short answer to you question is:
use setq or setq-default for variables defined by defvar.
use setq, setq-default, or the Customize mechanism for variables defined by defcustom
Below is the long answer.
The functions that you are going to use are the following:
set is the main function to set the value of a variable.
setq is another version that automatically quotes its first argument. This is useful since quoting the first argument is what you want to do almost all the time.
Some variables cannot be set globally. Whenever you set the variable it is only set for the current buffer. If you want to simulate setting this variable globally you use set-default or setq-default.
The functions that a package writer uses are:
defvar which allows the package writer to define a variable and to give some documentation. This function is not required but makes the life of users easier.
defcustom builds on defvar. It tells emacs that it is a variable, and it allows the developer to create a custom interface to set the value. The developer can say, things like "this variable can contain only the value 'foo or 'bar".
Setting variables can be done two ways:
if defvar was used, the values can only be set by the user in its .emacs by using the set function (or variants)
if defcustom was used, the values can be set using set (see 1.) OR by using Customize. When using the customize mechanism, emacs will generate some code that it will place in custom-set-variables. The user should not use this function.

They are, largely all paths to the same thing. There are some important differences though. The best way to get to know them is to read the manuals for Emacs and Elisp (see C-h i). Off the top of the head though:
set is a "low-level" variable assignment
(setq foo bar) is shorthand for (set (quote foo) bar)
(set-default foo bar) means "unless there is a more explicitly scoped definition of foo in the current buffer, use the value bar", and applies to all buffers.
defcustom is used to mark a variable as one that the user is expected to be able to safely modify through the customize feature.
custom-set-value and customize-set-value are two names that point to the same function. They're convenience methods for working with the customize system.
custom-set-variables and customize-set-variables are used to make some set of customized-through-customize variables active, IIRC.
In general, it's recommended to use M-x customize to change things around. You're free to set things defined with defcustom using set or setq in your .emacs, the customize system will warn you about it if you later change it via customize though.
defcustom is generally only used by people writing packages meant for distribution, and I don't think I've seen anyone use custom-set-* outside of files internal to customize. setq is very common in people's initialization files for setting things up how they like them, regardless of whether those things are marked for use with customize or not.
I don't have a full understanding of all this, hopefully someone else can shed more light, but I think that that's a fairly good overview :P

set and setq are the lowest level primitives used for assigning any kind of variable.
set-default and setq-default are emacs extensions that go along with buffer-local variables, to allow setting the default values used for new buffers.
3-7. All the "custom" stuff is a later addition that was designed to support a user interface for managing variables that are intended to be used as user preferences.
defcustom is similar to defvar, but allows you to specify a place in the hierarchy of options, as well as data type information so that the UI can display the value as a menu, or automatically convert user input to the appropriate type.
I don't think there is a custom-set-value function.
custom-set-variables is used by the customize UI when saving all the user's options. It lists all the variables that the user has changed from their defaults.
6-7. custom-set-value and custom-set-variable are used by the Customize UI to prompt the user for the current and default values of an option variable, and assign them. You don't normally call this yourself.

Just as addition, the differences between those commands have increased due to the introduction of lexical binding, though those differences will not really be relevant if you just want to customize some variables.
The def... constructs declare global variables. The set... functions set variables, whether global or local. When x is neither a local variable (a formal parameter of the current function or declared by a let form or similiar) nor defined by a def... form and you write (setq x 0) the byte compiler will even show a warning
Warning: assignment to free variable `x'
Variables declared with defvar, defcustom, defconst are dynamically bound, i.e. when you have a construct
(let ((lisp-indent-offset 2))
(pp (some-function)))
the function some-function will see the change of the global variable lisp-indent-offset.
When a variable is not dynamically bound, in something like
(let ((my-local-var 1))
(some-function))
where my-local-var has no global value, then some-function will not see the assigned value, as it is lexically scoped.
On the other hand, dynamically scoped variables will not be captured into lexical closures.
More details can be seen in http://www.gnu.org/software/emacs/manual/html_node/elisp/Lexical-Binding.html

Are there whole-program-transforming macros in Lisp or Scheme?

I have seen one answer of How does Lisp let you redefine the language itself?
Stack Overflow question (answered by Noah Lavine):
Macros aren't quite a complete redefinition of the language, at least as far as I know (I'm actually a Schemer; I could be wrong), because there is a restriction. A macro can only take a single subtree of your code, and generate a single subtree to replace it. Therefore you can't write whole-program-transforming macros, as cool as that would be.
After reading this I am curious about whether there are "whole-program-transforming macros" in Lisp or Scheme (or some other language).
If not then why?
It is not useful and never required?
Same thing could be achieved by some other ways?
It is not possible to implement it even in Lisp?
It is possible, but not tried or implemented ever?
Update
One kind of use case
e.g.
As in stumpwm code
here are some functions all in different lisp source files
uses a dynamic/global defvar variable *screen-list* that is defined in primitives.lisp , but used in screen.lisp, user.lisp, window.lisp.
(Here each files have functions, class, vars related to one aspect or object)
Now I wanted to define these functions under the closure where
*screen-list* variable available by let form, it should not be
dynamic/global variable, But without moving these all functions into
one place (because I do not want these functions to lose place from their
related file)
So that this variable will be accessible to only these functions.
Above e.g. equally apply to label and flet, so that it will further possible
that we could make it like that only required variable, function will be available,
to those who require it.
Note one way might be
implement and use some macro defun_with_context for defun where first argument is
context where let, flet variables definend.
But apart from it could it be achieved by reader-macro as
Vatine and Gareth Rees answered.

You quoted Noah Lavine as saying:
A macro can only take a single subtree of your code, and generate a single subtree to replace it
This is the case for ordinary macros, but reader macros get access to the input stream and can do whatever they like with it.
See the Hyperspec section 2.2 and the set-macro-character function.

In Racket, you can implement whole-program-transforming macros. See the section in the documentation about defining new languages. There are many examples of this in Racket, for example the lazy language and Typed Racket.

Off the top of my head, a few approaches:
First, you can. Norvig points out that:
We can write a compiler as a set of macros.
so you can transform an entire program, if you want to. I've only seen it done rarely, because typically the intersection between "things you want to do to every part of your program" and "things that you need macro/AST-type transformations for" is a pretty small set. One example is Parenscript, which transforms your Lisp code ("an extended subset of CL") into Javascript. I've used it to compile entire files of Lisp code into Javascript which is served directly to web clients. It's not my favorite environment, but it does what it advertises.
Another related feature is "advice", which Yegge describes as:
Great systems also have advice. There's no universally accepted name for this feature. Sometimes it's called hooks, or filters, or aspect-oriented programming. As far as I know, Lisp had it first, and it's called advice in Lisp. Advice is a mini-framework that provides before, around, and after hooks by which you can programmatically modify the behavior of some action or function call in the system.
Another is special variables. Typically macros (and other constructs) apply to lexical scope. By declaring a variable to be special, you're telling it to apply to dynamic scope (I think of it as "temporal scope"). I can't think of any other language that lets you (the programmer) choose between these two. And, apart from the compiler case, these two really span the space that I'm interested in as a programmer.

A typical approach is to write your own module system. If you just want access to all the code, you can have some sort of pre-processor or reader extension wrap source files with your own module annotation. If you then write your own require or import form, you will ultimately be able to see all the code in scope.
To get started, you could write your own module form that lets you define several functions which you then compile in some clever way before emitting optimized code.

There's always the choice of using compiler macros (they can do whole-function transformation based on a lew of criteria, but shouldn't change the value returned, as that would be confusing).
There's reader macros, they transform the input "as it is read" (or "before it is read", if you prefer). I haven't done much large-scale reader-macro hacking, but I have written some code to allow elisp sourec to be (mostly) read in Common Lisp, with quite a few subtle differences in syntactic sugar between the two.

I believe those sorts of macros are called code-walking macros. I haven't implemented a code walker myself, so I am not familiar with the limits.

In Common LISP, at least, you may wrap top-level forms in PROGN and they still retain their status as top-level forms (see CLTL2, section 5.3). Therefore, the limitation of a macro generating a single subtree is not much of a limitation since it could wrap any number of resulting subtrees within PROGN. This makes whole-program macros quite possible.
E.g.
(my-whole-program-macro ...)
= expands to =>
(progn
(load-system ...)
(defvar ...)
(defconstant ...)
(defmacro ...)
(defclass ...)
(defstruct ...)
(defun ...)
(defun ...)
...
)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse