Is there a way to parse integers to their char equivalents in Common Lisp?
I've been looking all morning, only finding char-int...
* (char-int #\A)
65
Some other sources also claim the existance of int-char
* (int-char 65)
; in: INT-CHAR 65
; (INT-CHAR 65)
;
; caught STYLE-WARNING:
; undefined function: INT-CHAR
;
; compilation unit finished
; Undefined function:
; INT-CHAR
; caught 1 STYLE-WARNING condition
debugger invoked on a UNDEFINED-FUNCTION:
The function COMMON-LISP-USER::INT-CHAR is undefined.
Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [ABORT] Exit debugger, returning to top level.
("undefined function")
What I'm really looking for, however, is a way of converting 1 to #\1
How exactly would I do that?
To convert between characters and their numeric encodings, there are char-code and code-char:
* (char-code #\A)
65
* (code-char 65)
#\A
However, to convert a digit to the corresponding character, there is digit-char:
* (digit-char 1)
#\1
* (digit-char 13 16) ; radix 16
#\D
There's already an accepted answer, but it can be just as helpful to learn how to find the answer as getting the specific answer. One way of finding the function you needed would have been to do an apropos search for "CHAR". E.g., in CLISP, you'd get:
> (apropos "CHAR" "CL")
...
CHAR-CODE function
...
CODE-CHAR function
...
Another useful resource is the HyperSpec. There's permuted index, and searching for "char" in the "C" page will be useful. Alternatively, in the HyperSpec, the chapter 13. Characters is relevant, and 13.2 The Characters Dictionary would be useful.
Both of these approaches would also find the digit-char function mentioned in the other answer, too.
Related
I need to extract a substring from a string; the substring is enclosed by ":" and ";". E.g.
:substring;
But with Lisp (SBCL), I'm having trouble extracting the substring. When I run:
(subseq "8.I:123;" : ;)
I get:
#<THREAD "main thread" RUNNING {1000510083}>:
illegal terminating character after a colon: #\
Stream: #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDIN* {1000025923}>
Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [ABORT] Exit debugger, returning to top level.
(SB-IMPL::READ-TOKEN #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDIN* {1000025923}> #\:)
I've tried preceding the colon and semicolon with \ but that throws a different error. Can anyone advise? Thanks in advance for the help!
As you can see in docs for subseq, start and end are bounding index designators and they can be either integer or nil.
#\: and #\; are characters, so you can't use them, but you can use the function position to find the first index of each character and use these indices as arguments for subseq. You have to check that both indices exist and the second one is bigger than the first one:
(let* ((string "8.I:123;")
(pos1 (position #\: string))
(pos2 (position #\; string)))
(when (and pos1 pos2 (> pos2 pos1))
(subseq string
(1+ pos1)
pos2)))
=> "123"
This is a little bit cumbersome, so I suggest you to use some regex library. The following example was created with CL-PPCRE:
(load "~/quicklisp/setup.lisp")
(ql:quickload :cl-ppcre)
> (cl-ppcre:all-matches-as-strings "(?<=:)([^;]*)(?=;)" "8.I:123;:aa;")
("123" "aa")
Pretty straightforward, but I can't seem to find an answer. I have a string of 1s and 0s such as "01001010" - how would I parse that into a number?
Use string-to-number, which optionally accepts the base:
(string-to-number "01001010" 2)
;; 74
As explained by #sds in a comment, string-to-number returns 0 if the conversion fails. This is unfortunate, since a return value of 0 could also means that the parsing succeeded.
I'd rather use the Common Lisp version of this function, cl-parse-integer. The standard function is described in the Hyperspec, whereas the one in Emacs Lisp is slightly different (in particular, there is no secondary return value):
(cl-parse-integer STRING &key START END RADIX JUNK-ALLOWED)
Parse integer from the substring of STRING from START to END. STRING
may be surrounded by whitespace chars (chars with syntax ‘ ’). Other
non-digit chars are considered junk. RADIX is an integer between 2 and
36, the default is 10. Signal an error if the substring between START
and END cannot be parsed as an integer unless JUNK-ALLOWED is non-nil.
(cl-parse-integer "001010" :radix 2)
=> 10
(cl-parse-integer "0" :radix 2)
=> 0
;; exception on parse error
(cl-parse-integer "no" :radix 2)
=> Debugger: (error "Not an integer string: ‘no’")
;; no exception, but nil in case of errors
(cl-parse-integer "no" :radix 2 :junk-allowed t)
=> nil
;; no exception, parse as much as possible
(cl-parse-integer "010no" :radix 2 :junk-allowed t)
=> 2
This thread has an elisp tag. Because it also has a lisp tag, I would like to show standard Common Lisp versions of two solutions. I checked these on LispWorks only. If my solutions are not standard Common Lisp, maybe someone will correct and improve my solutions.
For solutions
(string-to-number "01001010" 2)
and
(cl-parse-integer "001010" :radix 2)
LispWorks does not have string-to-number and does not have cl-parse-integer.
In LispWorks, you can use:
(parse-integer "01001010" :radix 2)
For the solution
(read (concat "#2r" STRING))
LispWorks does not have concat. You can use concatenate instead. read won't work on strings in LispWorks. You have to give read a stream.
In LispWorks, you can do this:
(read (make-string-input-stream (concatenate 'string "#2r" "01001010")))
You can also use format like this:
(read (make-string-input-stream (format nil "#2r~a" "01001010")))
This seems hacky by comparison, but FWIW you could also do this:
(read (concat "#2r" STRING))
i.e. read a single expression from STRING as a binary number.
This method will signal an error if the expression isn't valid.
Emacs 24.3.1, Windows 2003
I found the 'byte-to-position' function is a little strange.
According to the document:
-- Function: byte-to-position byte-position
Return the buffer position, in character units, corresponding to
given BYTE-POSITION in the current buffer. If BYTE-POSITION is
out of range, the value is `nil'. **In a multibyte buffer, an
arbitrary value of BYTE-POSITION can be not at character boundary,
but inside a multibyte sequence representing a single character;
in this case, this function returns the buffer position of the
character whose multibyte sequence includes BYTE-POSITION.** In
other words, the value does not change for all byte positions that
belong to the same character.
We can make a simple experiment:
Create a buffer, eval this expression: (insert "a" (- (max-char) 128) "b")
Since the max bytes number in Emacs' internal coding system is 5, the character between 'a' and 'b' is 5 bytes. (Note that the last 128 characters is used for 8 bits raw bytes, their size is only 2 bytes.)
Then define and eval this test function:
(defun test ()
(interactive)
(let ((max-bytes (1- (position-bytes (point-max)))))
(message "%s"
(loop for i from 1 to max-bytes collect (byte-to-position i)))))
What I get is "(1 2 3 2 2 2 3)".
The number in the list represents the character position in the buffer. Because there is a 5 bytes big character, there should be five '2' between '1' and '3', but how to explain the magic '3' in the '2's ?
This was a bug. I no longer see this behavior in 26.x. You can read more about it here (which actually references this SO question).
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=20783
In visual lisp, you can use (atoi "123") to convert "123" to 123. It seems there is no "atoi" like function in clisp ?
any suggestion is appreciated !
Now i want to convert '(1 2 3 20 30) to "1 2 3 20 30", then what's the best way to do it ?
parse-interger can convert string to integer, and how to convert integer to string ? Do i need to use format function ?
(map 'list #'(lambda (x) (format nil "~D" x)) '(1 2 3)) => ("1" "2" "3")
But i donot know how to cnovert it to "1 2 3" as haskell does:
concat $ intersperse " " ["1","2","3","4","5"] => "1 2 3 4 5"
Sincerely!
In Common Lisp, you can use the read-from-string function for this purpose:
> (read-from-string "123")
123 ;
3
As you can see, the primary return value is the object read, which in this case happens to be an integer. The second value—the position—is harder to explain, but here it indicates the next would-be character in the string that would need to be read next on a subsequent call to a reading function consuming the same input.
Note that read-from-string is obviously not tailored just for reading integers. For that, you can turn to the parse-integer function. Its interface is similar to read-from-string:
> (parse-integer "123")
123 ;
3
Given that you were asking for an analogue to atoi, the parse-integer function is the more appropriate choice.
Addressing the second part of your question, post-editing, you can interleave (or "intersperse") a string with the format function. This example hard-codes a single space character as the separating string, using the format iteration control directives ~{ (start), ~} (end), and ~^ (terminate if remaining input is empty):
> (format nil "Interleaved: ~{~S~^ ~}." '(1 2 3))
"Interleaved: 1 2 3."
Loosely translated, the format string says,
For each item in the input list (~{), print the item by its normal conversion (~S). If no items remain, stop the iteration (~^). Otherwise, print a space, and then repeat the process with the next item (~}).
If you want to avoid hard-coding the single space there, and accept the separator string as a separately-supplied value, there are a few ways to do that. It's not clear whether you require that much flexibility here.
I am trying to build a Lisp grammar. Easy, right? Apparently not.
I present these inputs and receive errors...
( 1 1)
23 23 23
ui ui
This is the grammar...
%%
sexpr: atom {printf("matched sexpr\n");}
| list
;
list: '(' members ')' {printf("matched list\n");}
| '('')' {printf("matched empty list\n");}
;
members: sexpr {printf("members 1\n");}
| sexpr members {printf("members 2\n");}
;
atom: ID {printf("ID\n");}
| NUM {printf("NUM\n");}
| STR {printf("STR\n");}
;
%%
As near as I can tell, I need a single non-terminal defined as a program, upon which the whole parse tree can hang. But I tried it and it didn't seem to work.
edit - this was my "top terminal" approach:
program: slist;
slist: slist sexpr | sexpr;
But it allows problems such as:
( 1 1
Edit2: The FLEX code is...
%{
#include <stdio.h>
#include "a.yacc.tab.h"
int linenumber;
extern int yylval;
%}
%%
\n { linenumber++; }
[0-9]+ { yylval = atoi(yytext); return NUM; }
\"[^\"\n]*\" { return STR; }
[a-zA-Z][a-zA-Z0-9]* { return ID; }
.
%%
An example of the over-matching...
(1 1 1)
NUM
matched sexpr
NUM
matched sexpr
NUM
matched sexpr
(1 1
NUM
matched sexpr
NUM
matched sexpr
What's the error here?
edit: The error was in the lexer.
Lisp grammar can not be represented as context-free grammar, and yacc can not parse all lisp code.
It is because of lisp features such as read-evaluation and programmable reader. So, in order just to read an arbitrary lisp code, you need to have a full lisp running. This is not some obscure, non-used feature, but it is actually used. E.g., CL-INTERPOL, CL-SQL.
If the goal is to parse a subset of lisp, then the program text is a sequence of sexprs.
The error is really in the lexer. Your parentheses end up as the last "." in the lexer, and don't show up as parentheses in the parser.
Add rules like
\) { return RPAREN; }
\( { return LPAREN; }
to the lexer and change all occurences of '(', ')' to LPAREN and RPAREN respectively in the parser. (also, you need to #define LPAREN and RPAREN where you define your token list)
Note: I'm not sure about the syntax, could be the backslashes are wrong.
You are correct in that you need to define a non-terminal. That would be defined as a set of sexpr. I'm not sure of the YACC syntax for that. I'm partial to ANTLR for parser generators and the syntax would be:
program: sexpr*
Indicating 0 or more sexpr.
Update with YACC syntax:
program : /* empty */
| program sexpr
;
Not in YACC, but might be helpful anyway, here's a full grammar in ANTLR v3 that works for the cases you described(excludes strings in the lexer because it's not important for this example, also uses C# console output because that's what I tested it with):
program: (sexpr)*;
sexpr: list
| atom {Console.WriteLine("matched sexpr");}
;
list:
'('')' {Console.WriteLine("matched empty list");}
| '(' members ')' {Console.WriteLine("matched list");}
;
members: (sexpr)+ {Console.WriteLine("members 1");};
atom: Id {Console.WriteLine("ID");}
| Num {Console.WriteLine("NUM");}
;
Num: ( '0' .. '9')+;
Id: ('a' .. 'z' | 'A' .. 'Z')+;
Whitespace : ( ' ' | '\r' '\n' | '\n' | '\t' ) {Skip();};
This won't work exactly as is in YACC because YACC generates and LALR parser while ANTLR is a modified recursive descent. There is a C/C++ output target for ANTLR if you wanted to go that way.
Do you neccesarily need a yacc/bison parser? A "reads a subset of lisp syntax" reader isn't that hard to implement in C (start with a read_sexpr function, dispatch to a read_list when you see a '(', that in turn builds a list of contained sexprs until a ')' is seen; otherwise, call a read_atom that collects an atom and returns it when it can no longer read atom-constituent characters).
However, if you want to be able to read arbritary Common Lisp, you'll need to (at the worst) implement a Common Lisp, as CL can modify the reader run-time (and even switch between different read-tables run-time under program control; quite handy when you're wanting to load code written in another language or dialect of lisp).
It's been a long time since I worked with YACC, but you do need a top-level non-terminal. Could you be more specific about "tried it" and "it didn't seem to work"? Or, for that matter, what the errors are?
I'd also suspect that YACC might be overkill for such a syntax-light language. Something simpler (like recursive descent) might work better.
You could try this grammar here.
I just tried it, my "yacc lisp grammar" works fine :
%start exprs
exprs:
| exprs expr
/// if you prefer right recursion :
/// | expr exprs
;
list:
'(' exprs ')'
;
expr:
atom
| list
;
atom:
IDENTIFIER
| CONSTANT
| NIL
| '+'
| '-'
| '*'
| '^'
| '/'
;