lisp decoding? - lisp

how to decode a binary stream in lisp
i did with with-open -file and passing a argument as element-type '(unsigned byte 8) but returning as numbers not a string
please help me on this problem

;;; Flexi-Streams "bivalent streams" solve the binary vs. character stream problem.
;;; You'll want to install QuickLisp and understand the REPL * and ** variables:
(require 'flexi-streams) ;; or (ql:quickload 'flexi-streams)
(with-open-file (out "foo.text" :direction :output)
(write-line "Foo" out)) ; "Foo"
(with-open-file (in "foo.text")
(read-line in)) ; "Foo", NIL
(with-open-file (in "foo.text" :element-type '(unsigned-byte 8))
(read-line in)) ;; read-line wrong stream type error
(with-open-file (in "foo.text" :element-type '(unsigned-byte 8))
(let ((s (make-array 3)))
(read-sequence s in) s)) ; #(70 111 111)
(map 'list #'code-char *) ; (#\F #\o #\o)
(map 'string #'code-char **) ; "Foo"
(with-open-file (raw "foo.text" :element-type 'flexi-streams:octet)
(with-open-stream (in (flexi-streams:make-flexi-stream raw))
(read-line in))) ; "Foo", NIL
;; Thanks to Edi Weitz for providing this essential tool.

Your problem is not a problem, I think. When you open a file in binary mode as unsigned-byte 8, you are specifying to read the file, 8 bits as a time, represented as a number from 0 to 255. Depending on how you read it, you might get it as an ARRAY or a LIST.
A 'text' file is a set of numbers using the ASCII representation of characters. For more sophisticated text, Unicode representation is used, but that is closer to a traditional binary format than a text one.
If you attempt to read a PDF file, you will have to follow the file format to gain meaningful data from it. Wotsit's site has a library of file formats.
From your question, it sounds as if you are just learning programming. I don't recommend working with PDFs when you are just learning.

The question is a bit unclear. I think your problem is that you have created a file that you have written one (or more) elements of type (unsigned-byte 8), but when you try to read it you are getting characters, not binary values.
If that is the case, you will need to open the file with :element-type '(unsigned-byte 8).
If I have misunderstood what you want, please edit your question and I shall try to answer your question.

The short answer is that you don't need to specify the :element-type if you want to read in strings.
The type '(unsigned-byte 8) refers to a number, not to chars as in C. In Lisp, character is an actual datatype and you would need to open the file with this element type to get strings. The important thing to realize is that :element-type determines what data type elements in the file will be parsed into and returned as. If you read the hyperspec page on open you see that element-type has to either be a subtype of character, integer, or be unsigned-byte or signed-byte. The default, however, is character, which produces strings in whatever format your lisp uses.

Related

What are limitations of reader macros in Common Lisp

I have my own Lisp interpreter in JavaScript that I work for some time now, and now I want to implement reader macros like in Common Lisp.
I've created Streams (almost working except for special symbols like ,# , ` ') but it freezes the browser for a few seconds when it's loading the page with included scripts (lisp files that have 400 lines of code). This is because my Streams are based on substring function. If I first split tokens and then use TokenStream that iterate over tokens, it works fine.
So my question is this, is string streams really something that is in Common Lisp? Can you add reader macros that create whole new syntax like Python inside CL, this simplify to question can I implement """ macro (not sure if you can have 3 characters as reader macro) or other character that will implement template literal inside lisp for instance:
(let ((foo 10) (bar 20))
{lorem ipsum ${baz} and ${foo}})
or
(let ((foo 10) (bar 20))
""lorem ipsum ${baz} and ${foo}"")
or
(let ((foo 10) (bar 20))
:"lorem ipsum ${baz} and ${foo}")
would yield string
"lorem ipsum 10 and 20"
is something like this possible in Common Lisp and how hard would be to implement #\{ or #\: as reader macro?
The only way I can think of to have template literals in Lisp is something like this:
(let ((foo 10) (bar 20))
(tag "lorem ipsum ${baz} and ${foo}")))
where tag is macro that return strings with ${} as free variable. Can reader macro also return lisp code that is evaluated?
And another question can you implement reader macros like this:
(list :foo:bar)
(list foo:bar)
where : is reader macro and if it's before symbols it convert symbol to
foo.bar
and if it's inside it throw error. I'm asking this because with token based macros :foo:bar and foo:bar will be symbols and will not be processed by my reader macros.
and one more question can reader macro be put in one line and second line use it? This will definitely be only possible with string streams and from what I've tested not possible with interpreter written in JavaScript.
There are some limitations in the sense that it is pretty hard to, for instance, intervene in the interpretation of tokens in any way short of 'implement your own token interpreter from scratch'. But, well, you could if you wanted to do just that: the problem is that your code would need to deal with numbers & things as well as the existing code does and things like floating-point parsing are pretty fiddly to get right.
But the macro functions associated with macro characters get the stream that is being read, and they are free to read as much or as little of the stream as they like, and return any kind of object (or no object, which is how comments are implemented).
I would strongly recommend reading chapters 2 & 23 of the hyperspec, and then playing with an implementation. When you play with the implementation be aware that it is just astonishingly easy to completely wedge things by mucking around with the reader. At the minimum I would suggest code like this:
(defparameter *my-readtable* (copy-readtable nil))
;;; Now muck around with *my-readtable*, *not* the default readtable
;;;
(defun experimentally-read ((&key (stream *standard-input*)
(readtable *my-raedtable*)))
(let ((*readtable* readtable))
(read stream)))
This gives you at least some chance to recover from catastrophe: if you can once abort experimentally-read then you are back in a position where *readtable* is something sensible.
Here is a fairly useless example which shows how much you can subvert the syntax with macro characters: a macro character definition which will cause ( ...) to be read as a string. This may not be fully debugged, and as I say I can see no use for it.
(defun mindless-parenthesized-string-reader (stream open-paren)
;; Cause parenthesized groups to be read as strings:
;; - (a b) -> "a b"
;; - (a (b c) d) -> "a (b c) d"
;; - (a \) b) -> "a ) b"
;; This serves no useful purpose that I can see. Escapes (with #\))
;; and nested parens are dealt with.
;;
;; Real Programmers would write this with LOOP, but that was too
;; hard for me. This may well not be completely right.
(declare (ignore open-paren))
(labels ((collect-it (escaping depth accum)
(let ((char (read-char stream t nil t)))
(if escaping
(collect-it nil depth (cons char accum))
(case char
((#\\)
(collect-it t depth accum))
((#\()
(collect-it nil (1+ depth) (cons char accum)))
((#\))
(if (zerop depth)
(coerce (nreverse accum) 'string)
(collect-it nil (1- depth) (cons char accum))))
(otherwise
(collect-it nil depth (cons char accum))))))))
(collect-it nil 0 '())))
(defvar *my-readtable* (copy-readtable nil))
(set-macro-character #\( #'mindless-parenthesized-string-reader
nil *my-readtable*)
(defun test-my-rt (&optional (stream *standard-input*))
(let ((*readtable* *my-readtable*))
(read stream)))
And now
> (test-my-rt)
12
12
> (test-my-rt)
x
x
> (test-my-rt)
(a string (with some parens) and \) and the end)
"a string (with some parens) and ) and the end"

S-expressions and keeping track of source location

Lisp s-expressions are a concise and flexible way to represent code as an abstract syntax tree. Relative to the more specialized data structures used by compilers for other languages, however, they have one drawback: it is difficult to keep track of the file and line number corresponding to any particular point in the code. At least some Lisps end up just punting the problem; in the event of an error, they report source location only as far as function name, not file and line number.
Some dialects of Scheme have solved the problem by representing code not with ordinary cons cells, but with syntax objects, which are isomorphic to cons cells but can also carry additional information such as source location.
Has any implementation of Common Lisp solved this problem? If so, how?
The Common Lisp standard says very little about these things. It mentions for example that the function ed may take a function name and then open the editor with respective source code. But there is no mechanism specified and this feature is entirely provided by the development environment, possibly in combination with the Lisp system.
A typical way to deal with that is to compile a file and the compiler will record the source location of the object defined (a function, a variable, a class, ...). The source location could for example be placed on the property list of the symbol (the name of the thing defined), or recorded in some other place. Also the actual source code as a list structure can be associated with a Lisp symbol. See the function FUNCTION-LAMBDA-EXPRESSION.
Some implementations do more sophisticated source location recording. For example LispWorks can locate a specific part of a function which is currently executed. It also notes when the definition comes from an editor or a Listener. See Dspecs: Tools for Handling Definitions. The debugger then can for example locate where the code of a certain stack frame is located in the source.
SBCL also has a feature to locate source code.
Notice also that the actual 'source code' in Common Lisp is not always a text a file, but the read s-expression. eval and compile - two standard functions - don't take strings or filenames as arguments. They use the actual expressions:
CL-USER 26 > (compile 'foo (lambda (x) (1+ x)))
FOO
NIL
NIL
CL-USER 27 > (foo 41)
42
S-expressions as code are not bound to any particular textual formatting. They can be reformatted by the pretty printer function pprint and this may take available width into account to generate a layout.
So, noting the structure maybe be useful and it would be less useful to record source lines.
My understanding is that whatever data Scheme stores in the AST is data that can be associated to expressions in a CL environment.
Scheme
(defun my-simple-scheme-reader (stream)
(let ((char (read-char stream)))
(or (position char "0123456789")
(and (member char '(#\newline #\space #\tab)) :space)
(case char
(#\) :closing-paren)
(#\( (loop
with beg = (file-position stream)
for x = (my-simple-scheme-reader stream)
until (eq x :closing-paren)
unless (eq x :space)
collect x into items
finally (return (list :beg beg
:end (file-position stream)
:items items))))))))
For example:
(with-input-from-string (in "(0(1 2 3) 4 5 (6 7))")
(my-simple-scheme-reader in))
returns:
(:BEG 1 :END 20 :ITEMS
(0 (:BEG 3 :END 9 :ITEMS (1 2 3)) 4 5 (:BEG 15 :END 19 :ITEMS (6 7))))
The enriched tree represents syntax objects.
Common-Lisp
(defun make-environment ()
(make-hash-table :test #'eq))
(defun my-simple-lisp-reader (stream environment)
(let ((char (read-char stream)))
(or (position char "0123456789")
(and (member char '(#\newline #\space #\tab)) :space)
(case char
(#\) :closing-paren)
(#\( (loop
with beg = (file-position stream)
for x = (my-simple-lisp-reader stream environment)
until (eq x :closing-paren)
unless (eq x :space)
collect x into items
finally
(setf (gethash items environment)
(list :beg beg :end (file-position stream)))
(return items)))))))
Test:
(let ((env (make-environment)))
(with-input-from-string (in "(0(1 2 3) 4 5 (6 7))")
(values
(my-simple-lisp-reader in env)
env)))
Returns two values:
(0 (1 2 3) 4 5 (6 7))
#<HASH-TABLE :TEST EQL :COUNT 3 {1010524CD3}>
Given a cons cell, you can track back its original position. You can add more precise information if you want to. Once you evaluate a defun, for example, the source information can be attached to the function object, or as a symbol property, which means the information is garbage collected on redefinitions.
Remark
Note that in both cases there is no source file to keep track of, unless the system is able to track back to the original string in the source file where the reader is called.

How to get code point of a character in elisp (and other way too)?

I was very surprised not to be able to find this in the elisp manual or SO. I just want the equivalent of many languages' chr() and ord() or similar: convert between actual characters and their (unicode) code point values.
Emacs Lisp: getting ascii value of character explains that to elisp, a char just is its code-point. But what if I need the representation of that char~int as a series of ASCII decimal digits?
For example, if I wanted to generate in a buffer, a readable table showing the equivalences?
Thanks!
As you've already noted, characters are integers.
(eq ?A 65)
For example, if I wanted to generate in a buffer
Either of the following inserts the character A into the buffer:
(insert ?A)
(insert 65)
If you need to deal with strings, characters can be converted to strings:
(char-to-string ?A)
(char-to-string 65)
(format "%c" 65)
"A"
vs
(number-to-string 65)
(format "%d" 65)
"65"

Lazy reads of custom types in Racket

I'm new to Racket, and I am trying to write a function to read the lines of a file, parse each line into a struct, and return a lazy sequence of my data type. Here is a simple example of my input format (a matrix with row and column names). My actual input format also includes a header line, which I am omitting here, and consists of very large files, which is why I need the laziness.
R1 1.0 2.3 1.2
R2 1.2 3.1 3.4
Here is my latest attempt:
(struct row (key data))
(define (read-matrix in)
(for [(line (in-lines in))]
(let ([fields (string-split line "\t")]
(row (first fields) (list->vector (map string->number (rest fields))))
)))
I have also tried numerous other approaches including using call-with-input-file. My problem with the approach above is that if I use #lang racket it isn't lazy, and with #lang lazy string-split isn't defined. I should add that in my use case, the semantics I want is to close the port when the entire sequence has been consumed, because I can guarantee that either the whole sequence will be consumed, or the program will terminate.
So, am I on the right track? What approach should I take to solve this problem? Thanks!
I was composing this answer off-line, and came back to find you'd mostly answered it already. I'll post anyway in case the details are helpful to anyone.
If you really need #lang lazy, and want to use string-split, I think you can simply (require racket/string) to use it?
I'm not sure I understand exactly what you mean by "lazy", here. Using in-lines will not suck the entire file into memory, if that's your concern. It will process things one line at a time.
One thing you could do is define a helper function, that handles reading and parsing the line, checking for eof, and closing the input port automatically:
(struct row (key data)
#:transparent)
;; Example couple lines of input to use below.
(define text "R1 1.0 2.3 1.2\nR2 1.2 3.1 3.4")
;; read-matrix-row : input-port? -> (or/c eof row?)
;;
;; Given an input port, try to read another row.
(define (read-matrix-row in)
(match (read-line in)
[(? eof-object?)
(close-input-port in)
eof]
[line (match (string-split line " ")
[(cons key data)
(row key (list->vector (map string->number data)))])]))
You could use this function in a number of ways. One way is with in-producer:
;; Example use with in-producer:
(let ([in (open-input-string text)])
(for/list ([x (in-producer read-matrix-row eof in)])
x))
;; => (list (row "R1" '#(1.0 2.3 1.2))
;; (row "R2" '#(1.2 3.1 3.4)))
That example uses for/list to make list. Of course if you have a giant input file, that will yield a giant list. But you could display them one by one, or write them one by one to a file or database:
;; Example use, displaying one by one.
(let ([in (open-input-string text)])
(for ([x (in-producer read-matrix-row eof in)])
(displayln x))) ;or write to some file, for example
If instead you prefer a stream interface, it's easy to create a stream from any sequence including `in-producer':
;; If you prefer a stream interface, we can use sequence->stream to
;; transform the producer sequence into a stream:
(define (matrix-row-stream in)
(sequence->stream (in-producer read-matrix-row eof in)))
;; Example interactive use of the stream
(define stm (matrix-row-stream (open-input-string text)))
(stream-empty? stm) ;#f
(stream-first stm) ;(row "R1" '#(1.0 2.3 1.2))
(stream-empty? (stream-rest stm)) ;#f
(stream-first (stream-rest stm)) ;(row "R2" '#(1.2 3.1 3.4))
(stream-empty? (stream-rest (stream-rest stm))) ;#t
Try using the functions from SRFI-13, which is a string manipulating library also available in #lang lazy:
(require srfi/13)
And then do this:
[fields (string-tokenize line)]
Ultimately I found that the answer was to use Racket's sequence, streams, and generator libraries for this kind of thing. The generators are especially nice, allowing a simple Python-like "yield" function. These features allow lazy sequences without full-on lazy evaluation as provided by #lang lazy.
http://docs.racket-lang.org/reference/streams.html

lisp code excerpt

i've been reading some lisp code and came across this section, didn't quite understand what it specifically does, though the whole function is supposed to count how many times the letters from a -z appear in an entered text.
(do ((i #.(char-code #\a) (1+ i)))
((> i #.(char-code #\z)))
can anyone explain step by step what is happening? I know that it's somehow counting the letters but not quite sure how.
This Lisp code is slightly unusual, since it uses read-time evaluation. #.expr means that the expression will be evaluated only once, during read-time.
In this case a clever compiler might have guessed that the character code of a given character is known and could have removed the computation of character codes from the DO loop. The author of that code chose to do that by evaluating the expressions before the compiler sees it.
The source looks like this:
(do ((i #.(char-code #\a) (1+ i)))
((> i #.(char-code #\z)))
...)
When Lisp reads in the s-expression, we get this new code as the result (assuming a usual encoding of characters):
(do ((i 97 (1+ i)))
((> i 122))
...)
So that's a loop which counts the variable i up from 97 to 122.
Lisp codes are written as S-Expression. In a typical S-Expression sytax, the first element of any S-expression is treated as operator and the rest as operand. Operands can either be an atom or another S-expression. Please note, an atom is a single data object. Keeping this in mind
char-code
(char-code #\a) - returns the ascii representation of a character here its 'a'.
The do syntax looks similar to the below
(do ((var1 init1 step1)
(var2 init2 step2)
...)
(end-test result)
statement1
...)
So in your example
(do ((i #.(char-code #\a) (1+ i)))
((> i #.(char-code #\z)))
)
The first s-expression operand of do is the loop initialization, the second s-expression operand is the end-test.
So this means you are simply iterating over 'a' through 'z' incrementing i by 1.
In C++ (Not sure your other language comfort level, you can write
for(i='a';i<='z';i++);
the trick with the code you show is in poor form. i know this because i do it all
the time. the code makes an assumtion that the compiler will know the current fixnum
for each character. #.(char-code #\a) eq [(or maybe eql if you are so inclided) unsigned small integer or unsigned 8 bit character with a return value of a positive fixnum].
The # is a reader macro (I'm fairly sure you know this :). Using two reader macros is
not a great idea but it is fast when the compiler knows the datatype.
I have another example. Need to search for ascii in a binary stream:
(defmacro code-char= (byte1 byte2)
(flet ((maybe-char-code (x) (if characterp x) (char-code x) x)))
`(the fixnum (= (the fixnum ,(maybe-char-code byte1)
(the fixnum ,(maybe-char-code byte2)))))
))
Declaring the return type in sbcl will probably insult the complier, but I leave it as a sanity check (4 me not u).
(code-char= #\$ #x36)
=>
t
. At least I think so. But somehow I think you might know your way around some macros ... Hmmmm... I should turn on the machine...
If you're seriously interested, there is some assembler for the 286 (8/16 bit dos assembler) that you can use a jump table. It works fast for the PC , I'd have to look it up...