LISP If Statement - Parsing Text File - lisp

I'm in a class reviewing various languages and we're building a text parser with Lisp. I can get my Lisp program to do lots of different functions with numbers but I'm struggling with text. I want to just peek at the first character in a line to see if it contains a < then do something, but I can't seem to figure out how to go about this simple task. Here's my simple little code so far:
;;;Sets up the y.xml file for use
(setq file (open "c:\\temp\\y.xml"))
;;;Just reads one line at a time, (jkk file)
(defun jkk (x)
(read-line x)
)
;;;Reads the entire file printing each line, (loopfile file)
(defun loopfile (x)
(loop for line = (read-line x nil)
while line do (print line))
)
This next part I tried to combine the loop with an if statement to see if it can find "<" and if so just print that line and skip any others which doesn't work. Any help with doing this really easy task would be greatly appreciated. Never used Lisp or any other functional language before, I'm used to using functions like crazy in my VB and Java projects but I don't have any decent reference materials for Lisp.
After this program is done we don't have to mess with Lisp anymore so I didn't bother to order anything. Trying Google Books.. starting to figure stuff out but this language is and old and tough one!
;;;Reads the entire file printing the line when < is found
(defun loopfile_xml (x)
(loop for line = (read-line x nil)
while line do
(
if(char= line "<")
(print line)
)
)
)
Thanks guys

First, Lisp is not C or Java - it has different indentation conventions:
;;;Sets up the y.xml file for use
(setq file (open "c:\\temp\\y.xml"))
;;;Just reads one line at a time, (jkk file)
(defun jkk (x)
(read-line x))
;;;Reads the entire file printing each line, (loopfile file)
(defun loopfile (x)
(loop for line = (read-line x nil)
while line do (print line)))
and
;;;Reads the entire file printing the line when < is found
(defun loopfile_xml (x)
(loop for line = (read-line x nil)
while line
do (if (char= line "<")
(print line))))
I would also give the variables meaningful names. x is not meaningful.
The function char= works on characters. But both arguments in your code are strings. Strings are not characters. #\< is a character. Strings are also arrays, so you can get the first element of a string using the function aref.
If you want to check if a line is just <, then you can compare the line using the function string= with the string "<".
The documentation:
character comparison: char= and related.
string comparison: string= and related.
accessing array contents with aref.
Lisp is old, but still used and it has a lot of interesting concepts.
To learn Lisp is actually not very tough. You can learn the basics of Lisp in a day. If you already know Java, you may need two or even three days.

To search for a character in a line of text, you could use position, with the char= function as your equality comparator.
Second, you MAY be better off collecting your file into a single string and search there.
Third, there's some excellent reference material available on the web, like the Common Lisp HyperStandard (link) and Peter Seibel's Practical Common Lisp.

Related

Various forms of looping and iteration in Elisp

I am trying to understand all the looping constructs in Emacs Lisp.
In one example I am trying to iterate over a list of symbols and print them to the *message* buffer like so:
(let* ((plist package-activated-list) ;; list of loaded packages
(sorted-plist (sort plist 'string<)))
(--map (message (format "%s" it)) sorted-plist))
--map is a function from the package dash.el.
How to do this in pure Elisp?
Now how do I iterate over a list in Elisp, without using other packages.
I have seen some examples using while and dolist macro, for example here:
https://www.gnu.org/software/emacs/manual/html_node/elisp/Iteration.html
But those are destructive, non-functional ways to express a loop.
Coming from Scheme (having worked with it and with SICP some twenty years ago!), I tend to prefer functional, non destructive (does that always lead to recursive?) ways to express ideas.
So what are idiomatic ways to loop over a list of items in Emacs Lisp?
Also: Are there ways to express loops in a functional fashion in Emacs Lisp?
What I have found so far
Loop Macros (from Common Lisp?) prefixed with "cl-*"
https://www.gnu.org/software/emacs/manual/html_node/cl/Loop-Facility.html
Iteration Clauses
https://www.gnu.org/software/emacs/manual/html_node/cl/Iteration-Clauses.html#Iteration-Clauses
Dash.el
https://github.com/magnars/dash.el
Magnar Sveen's excellent package marketed as "A modern list api for Emacs. No 'cl required."
What else is there? Any recommended reading?
There is a bunch of native ~map~ functions that you can explore for iterating throught lists, with some subbtilities or sugar.
In this case, I choose `mapconcat', it is loaded from C code.
(mapconcat #'message sorted-plist "\n")
dolist is not destructive, and that is probably the most idiomatic way in Emacs Lisp, or Common Lisp for that matter, to loop over a list when you just want to do something with each member in turn:
(setq *properties* '(prop1 prop2 prop3))
(dolist (p *properties*)
(print p))
The seq-doseq function does the same thing as dolist, but accepts a sequence argument (e.g., a list, vector, or string):
(seq-doseq (p *properties*)
(print p))
If a more functional style is desired, the seq-do function applies a function to the elements of a sequence and returns the original sequence. This function is similar to the Scheme procedure for-each, which is also used for its side effects.
(seq-do #'(lambda (p) (print p)) *properties*)
If you're looking for more traditional Lisp map functions, e-lisp has them:
https://www.gnu.org/software/emacs/manual/html_node/elisp/Mapping-Functions.html#Mapping-Functions
In your code snippet, mapc is likely the one you want (throws the values away, just applies the function; mapcar is still there if you want the values):
(mapc #'(lambda (thing) (message (format "%s" thing))) sorted-plist)

indent-[code-]rigidly called from emacs LISP function

I'm trying to write an emacs LISP function to un-indent the region
(rigidly). I can pass prefix arguments to indent-code-rigidly or
indent-rigidly or indent-region and they all work fine, but I don't
want to always have to pass a negative prefix argument to shift things
left.
My current code is as below but it seems to do nothing:
(defun undent ()
"un-indent rigidly."
(interactive)
(list
(setq fline (line-number-at-pos (region-beginning)))
(setq lline (line-number-at-pos (region-end)))
(setq curIndent (current-indentation))
;;(indent-rigidly fline lline (- curIndent 1))
(indent-region fline lline 2)
;;(message "%d %d" curIndent (- curIndent 1))
)
)
I gather that (current-indentation) won't get me the indentation of the first line
of the region, but of the first line following the region (so a second quesiton is
how to get that!). But even when I just use a constant for the column (as shown,
I don't see this function do any change.
Though if I uncomment the (message) call, it displays reasonable numbers.
GNU Emacs 24.3.1, on Ubuntu. And in case it matters, I use
(setq-default indent-tabs-mode nil) and (cua-mode).
I must be missing something obvious... ?
All of what Tim X said is true, but if you just need something that works, or an example to show you what direction to take your own code, I think you're looking for something like this:
(defun unindent-rigidly (start end arg &optional interactive)
"As `indent-rigidly', but reversed."
(interactive "r\np\np")
(indent-rigidly start end (- arg) interactive))
All this does is call indent-rigidly with an appropriately transformed prefix argument. If you call this with a prefix argument n, it will act as if you had called indent-rigidly with the argument -n. If you omit the prefix argument, it will behave as if you called indent-rigidly with the argument -1 (instead of going into indent-rigidly's interactive mode).
There are a number of problems with your function, including some vary
fundamental elisp requirements. Highly recommend reading the Emacs Lisp
Reference Manual (bundled with emacs). If you are new to programming and lisp,
you may also find An Introduction to Emacs Lisp useful (also bundled with
Emacs).
A few things to read about which will probably help
Read the section on the command loop from the elisp reference. In particular,
look at the node which describes how to define a new command and the use of
'interactive', which you will need if you want to bind your function to a key
or call it with M-x.
Read the section on variables from the lisp reference
and understand variable scope (local v global). Look at using 'let' rather
than 'setq' and what the difference is.
Read the section on 'positions' in the elisp reference. In particular, look at
'save-excursion' and 'save-restriction'. Understanding how to define and use
the region is also important.
It isn't clear if your writing this function just as a learning exercise or
not. However, just in case you are doing it because it is something you need to
do rather than just something to learn elisp, be sure to go through the Emacs
manual and index. What you appear to need is a common and fairly well supported
requirement. It can get a little complicated if programming modes are involved
(as opposed to plain text). However, with emacs, if what you need seems like
something which would be a common requirement, you can be fairly confident it is
already there - you just need to find it (which can be a challenge at first).
A common convention is for functions/commands to be defined which act 'in
reverse' when supplied with a negative or universal argument. Any command which
has this ability can also be called as a function in elisp code with the
argument necessary to get that behaviour, so understanding the inter-play
between commands, functions and calling conventions is important.

Racket reader where newline is end of statement

I'm trying to create a new language in Racket where statements are on separate lines. A newline defines the end of a statement and the start of a new one.
I read through the Create Languages chapter of the guide which was very useful but the examples were focused on extending s-exp-like languages. The only option I see is manually writing my own parser for read and read-syntax.
I was hoping to use readtables but I don't know if I can. I tried:
(make-readtable #f #f 'non-terminating-macro my-read-line-fn)
but I don't know if this is much help. I guess I could create a sub-readtable that does things like read-word, read-string which I dispatch to based on what character my-read-line-fn gets.
Is that the best strategy or is there a predefined way of reading until the end of line?
I don't think you need to do anything with the readtable. Your lang/reader.rkt can provide your own read-syntax that can read/parse however it wants, and presumably stop when it encounters EOL.
One interesting example is Brainfudge. Its concept of a "statement" is a single character, but IIUC also [ brackets ].
See its lang/reader.rkt and parser.rkt for the low-level bits, and then try to understand how that is ultimately evaluated as Racket expressions.
You do indeed need to write version of read and read-syntax that parse your language. The readtable is only meant to modify the builtin read, so I suggest that you take a look at Parser Tools (http://docs.racket-lang.org/parser-tools/index.html), which is tool for writing parsers in the lex/yacc style.
An alternative is to use ragg:
http://www.hashcollision.org/ragg/
Install Ragg using the package manager in DrRacket. Search for ragg in the list of available packages.
Make your own reader.rkt:
#lang s-exp syntax/module-reader
(test test)
#:read-syntax my-read-syntax
#:read my-read
;; override default read (won't be used but is required)
(define (my-read in) (read-line in))
;; override read-syntax by reading in one string at a time and
;; pass it to statement-string->code to get code as dara and
;; make it syntax with datum->syntax
(define (my-read-syntax in)
(datum->syntax #f (statement-string->code (read-line in))))
;; This is actually how you want your code
;; to become s-expressions. I imagine that my
;; module has a primitive called code that
;; interprets the string as is
(define (statement-string->code str)
(list 'code str))
Racket doesn't have "statements", so the concept of newlines ending "statements" is nonsensical.
If your motivation is to reduce or do away with parentheses, I encourage you to use a "standard alternative" reader like sweet-expressions, rather than making something home-grown.

Why can't CLISP call certain functions with uninterned names?

I've written an ad hoc parser generator that creates code to convert an old and little known 7-bit character set into unicode. The call to the parser generator expands into a bunch of defuns enclosed in a progn, which then get compiled. I only want to expose one of the generated defuns--the top-level one--to the rest of the system; all the others are internal to the parser and only get called from within the dynamic scope of the top-level one. Therefore, the other defuns generated have uninterned names (created with gensym). This strategy works fine with SBCL, but I recently tested it for the first time with CLISP, and I get errors like:
*** - FUNCALL: undefined function #:G16985
It seems that CLISP can't handle functions with uninterned names. (Interestingly enough, the system compiled without a problem.) EDIT: It seems that it can handle functions with uninterned names in most cases. See the answer by Rörd below.
My questions is: Is this a problem with CLISP, or is it a limitation of Common Lisp that certain implementations (e.g. SBCL) happen to overcome?
EDIT:
For example, the macro expansion of the top-level generated function (called parse) has an expression like this:
(PRINC (#:G75735 #:G75731 #:G75733 #:G75734) #:G75732)
Evaluating this expression (by calling parse) causes an error like the one above, even though the function is definitely defined within the very same macro expansion:
(DEFUN #:G75735 (#:G75742 #:G75743 #:G75744) (DECLARE (OPTIMIZE (DEBUG 2)))
(DECLARE (LEXER #:G75742) (CONS #:G75743 #:G75744))
(MULTIPLE-VALUE-BIND (#:G75745 #:G75746) (POP-TOKEN #:G75742)
...
The two instances of #:G75735 are definitely the same symbol--not two different symbols with the same name. As I said, this works with SBCL, but not with CLISP.
EDIT:
SO user Joshua Taylor has pointed out that this is due to a long standing CLISP bug.
You don't show one of the lines that give you the error, so I can only guess, but the only thing that could cause this problem as far as I can see is that you are referring to the name of the symbol instead of the symbol itself when trying to call it.
If you were referring to the symbol itself, all your lisp implementation would have to do is lookup that symbol's symbol-function. Whether it's interned or not couldn't possibly matter.
May I ask why you haven't considered another way to hide the functions, i.e. a labels statement or defining the functions within a new package that exports only the one external function?
EDIT: The following example is copied literally from an interaction with the CLISP prompt.
As you can see, calling the function named by a gensym is working as expected.
[1]> (defmacro test ()
(let ((name (gensym)))
`(progn
(defun ,name () (format t "Hello!"))
(,name))))
TEST
[2]> (test)
Hello!
NIL
Maybe your code that's trying to call the function gets evaluated before the defun? If there's any code in the macro expansion besides the various defuns, it may be implementation-dependent what gets evaluated first, and so the behaviour of SBCL and CLISP may differ without any of them violating the standard.
EDIT 2: Some further investigation shows that CLISP's behaviour varies depending upon whether the code is interpreted directly or whether it's first compiled and then interpreted. You can see the difference by either directly loading a Lisp file in CLISP or by first calling compile-file on it and then loading the FASL.
You can see what's going on by looking at the first restart that CLISP offers. It says something like "Input a value to be used instead of (FDEFINITION '#:G3219)." So for compiled code, CLISP quotes the symbol and refers to it by name.
It seems though that this behaviour is standard-conforming. The following definition can be found in the HyperSpec:
function designator n. a designator for a function; that is, an object that denotes a function and that is one of: a symbol (denoting the function named by that symbol in the global environment), or a function (denoting itself). The consequences are undefined if a symbol is used as a function designator but it does not have a global definition as a function, or it has a global definition as a macro or a special form. See also extended function designator.
I think an uninterned symbol matches the "a symbol is used as a function designator but it does not have a global definition as a function" case for unspecified consequences.
EDIT 3: (I can agree that I'm not sure whether CLISP's behaviour is a bug or not. Someone more experienced with details of the standard's terminology should judge this. It comes down to whether the function cell of an uninterned symbol - i.e. a symbol that cannot be referred to by name, only by having a direct hold on the symbol object - would be considered a "global definition" or not)
Anyway, here's an example solution that solves the problem in CLISP by interning the symbols in a throwaway package, avoiding the matter of uninterned symbols:
(defmacro test ()
(let* ((pkg (make-package (gensym)))
(name (intern (symbol-name (gensym)) pkg)))
`(progn
(defun ,name () (format t "Hello!"))
(,name))))
(test)
EDIT 4: As Joshua Taylor notes in a comment to the question, this seems to be a case of the (10 year old) CLISP bug #180.
I've tested both workarounds suggested in that bug report and found that replacing the progn with locally actually doesn't help, but replacing it with let () does.
You can most certainly define functions whose names are uninterned symbols. For instance:
CL-USER> (defun #:foo (x)
(list x))
#:FOO
CL-USER> (defparameter *name-of-function* *)
*NAME-OF-FUNCTION*
CL-USER> *name-of-function*
#:FOO
CL-USER> (funcall *name-of-function* 3)
(3)
However, the sharpsign colon syntax introduces a new symbol each time such a form is read read:
#: introduces an uninterned symbol whose name is symbol-name. Every time this syntax is encountered, a distinct uninterned symbol is created. The symbol-name must have the syntax of a symbol with no package prefix.
This means that even though something like
CL-USER> (list '#:foo '#:foo)
;=> (#:FOO #:FOO)
shows the same printed representation, you actually have two different symbols, as the following demonstrates:
CL-USER> (eq '#:foo '#:foo)
NIL
This means that if you try to call such a function by typing #: and then the name of the symbol naming the function, you're going to have trouble:
CL-USER> (#:foo 3)
; undefined function #:foo error
So, while you can call the function using something like the first example I gave, you can't do this last one. This can be kind of confusing, because the printed representation makes it look like this is what's happening. For instance, you could write such a factorial function like this:
(defun #1=#:fact (n &optional (acc 1))
(if (zerop n) acc
(#1# (1- n) (* acc n))))
using the special reader notation #1=#:fact and #1# to later refer to the same symbol. However, look what happens when you print that same form:
CL-USER> (pprint '(defun #1=#:fact (n &optional (acc 1))
(if (zerop n) acc
(#1# (1- n) (* acc n)))))
(DEFUN #:FACT (N &OPTIONAL (ACC 1))
(IF (ZEROP N)
ACC
(#:FACT (1- N) (* ACC N))))
If you take that printed output, and try to copy and paste it as a definition, the reader creates two symbols named "FACT" when it comes to the two occurrences of #:FACT, and the function won't work (and you might even get undefined function warnings):
CL-USER> (DEFUN #:FACT (N &OPTIONAL (ACC 1))
(IF (ZEROP N)
ACC
(#:FACT (1- N) (* ACC N))))
; in: DEFUN #:FACT
; (#:FACT (1- N) (* ACC N))
;
; caught STYLE-WARNING:
; undefined function: #:FACT
;
; compilation unit finished
; Undefined function:
; #:FACT
; caught 1 STYLE-WARNING condition
I hope I get the issue right. For me it works in CLISP.
I tried it like this: using a macro for creating a function with a GENSYM-ed name.
(defmacro test ()
(let ((name (gensym)))
`(progn
(defun ,name (x) (* x x))
',name)))
Now I can get the name (setf x (test)) and call it (funcall x 2).
Yes, it is perfectly fine defining functions that have names that are unintenred symbols. The problem is that you cannot then call them "by name", since you can't fetch the uninterned symbol by name (that is what "uninterned" means, essentially).
You would need to store the uninterned symbol in some sort of data structure, to then be able to fetch the symbol. Alternatively, store the defined function in some sort of data structure.
Surprisingly, CLISP bug 180 isn't actually an ANSI CL conformance bug. Not only that, but evidently, ANSI Common Lisp is itself so broken in this regard that even the progn based workaround is a courtesy of the implementation.
Common Lisp is a language intended for compilation, and compilation produces issues regarding the identity of objects which are placed into compiled files and later loaded ("externalized" objects). ANSI Common Lisp requires that literal objects reproduced from compiled files are only similar to the original objects. (CLHS 3.2.4 Literal Objects in Compiled Files).
Firstly, according to the definition similarity (3.2.4.2.2 Definition of Similarity), the rules for uninterned symbols is that similarity is name based. If we compile code with a literal that contains an uninterned symbol, then when we load the compiled file, we get a symbol which is similar and not (necessarily) the same object: a symbol which has the same name.
What if the same uninterned symbol is inserted into two different top-level forms which are then compiled as a file? When the file is loaded, are those two similar to each other at least? No, there is no such requirement.
But it gets worse: there is also no requirement that two occurrences of the same uninterned symbol in the same form will be externalized in such a way that their relative identity is preserved: that the re-loaded version of that object will have the same symbol object in all the places where the original was. In fact, the definition of similarity contains no provision for preserving the circular structure and substructure sharing. If we have a literal like '#1=(a b . #1#), as a literal in a compiled file, there appears to be no requirement that this be reproduced as a circular object with the same graph structure as the original (a graph isomorphism). The similarity rule for conses is given as naive recursion: two conses are similar if their respective cars and cdrs are similar. (The rule can't even be evaluated for circular objects; it doesn't terminate).
That the above works is because of implementations going beyond what is required in the spec; they are providing an extension consistent with (3.2.4.3 Extensions to Similarity Rules).
Thus, purely according to ANSI CL, we cannot expect to use macros with gensyms in compiled files, at least in some ways. The expectation expressed in code like the following runs afoul of the spec:
(defmacro foo (arg)
(let ((g (gensym))
(literal '(blah ,g ,g ,arg)))
...))
(defun bar ()
(foo 42))
The bar function contains a literal with two insertions of a gensym, which according to the similarity rules for conses and symbols need not reproduce as a list containing two occurrences of the same object in the second and third positions.
If the above works as expected, it's due to "extensions to the similarity rules".
So the answer to the "Why can't CLISP ..." question is that although CLISP does provide an extension for similarity which preserves the graph structure of literal forms, it doesn't do it across the entire compiled file, only within individual top level items within that file. (It uses *print-circle* to emit the individual items.) The bug is that CLISP doesn't conform to the best possible behavior users can imagine, or at least to a better behavior exhibited by other implementations.

lisp code excerpt

i've been reading some lisp code and came across this section, didn't quite understand what it specifically does, though the whole function is supposed to count how many times the letters from a -z appear in an entered text.
(do ((i #.(char-code #\a) (1+ i)))
((> i #.(char-code #\z)))
can anyone explain step by step what is happening? I know that it's somehow counting the letters but not quite sure how.
This Lisp code is slightly unusual, since it uses read-time evaluation. #.expr means that the expression will be evaluated only once, during read-time.
In this case a clever compiler might have guessed that the character code of a given character is known and could have removed the computation of character codes from the DO loop. The author of that code chose to do that by evaluating the expressions before the compiler sees it.
The source looks like this:
(do ((i #.(char-code #\a) (1+ i)))
((> i #.(char-code #\z)))
...)
When Lisp reads in the s-expression, we get this new code as the result (assuming a usual encoding of characters):
(do ((i 97 (1+ i)))
((> i 122))
...)
So that's a loop which counts the variable i up from 97 to 122.
Lisp codes are written as S-Expression. In a typical S-Expression sytax, the first element of any S-expression is treated as operator and the rest as operand. Operands can either be an atom or another S-expression. Please note, an atom is a single data object. Keeping this in mind
char-code
(char-code #\a) - returns the ascii representation of a character here its 'a'.
The do syntax looks similar to the below
(do ((var1 init1 step1)
(var2 init2 step2)
...)
(end-test result)
statement1
...)
So in your example
(do ((i #.(char-code #\a) (1+ i)))
((> i #.(char-code #\z)))
)
The first s-expression operand of do is the loop initialization, the second s-expression operand is the end-test.
So this means you are simply iterating over 'a' through 'z' incrementing i by 1.
In C++ (Not sure your other language comfort level, you can write
for(i='a';i<='z';i++);
the trick with the code you show is in poor form. i know this because i do it all
the time. the code makes an assumtion that the compiler will know the current fixnum
for each character. #.(char-code #\a) eq [(or maybe eql if you are so inclided) unsigned small integer or unsigned 8 bit character with a return value of a positive fixnum].
The # is a reader macro (I'm fairly sure you know this :). Using two reader macros is
not a great idea but it is fast when the compiler knows the datatype.
I have another example. Need to search for ascii in a binary stream:
(defmacro code-char= (byte1 byte2)
(flet ((maybe-char-code (x) (if characterp x) (char-code x) x)))
`(the fixnum (= (the fixnum ,(maybe-char-code byte1)
(the fixnum ,(maybe-char-code byte2)))))
))
Declaring the return type in sbcl will probably insult the complier, but I leave it as a sanity check (4 me not u).
(code-char= #\$ #x36)
=>
t
. At least I think so. But somehow I think you might know your way around some macros ... Hmmmm... I should turn on the machine...
If you're seriously interested, there is some assembler for the 286 (8/16 bit dos assembler) that you can use a jump table. It works fast for the PC , I'd have to look it up...