Differences between equal strings - racket

I am trying to write a simple parser in Racket, using the parser-tools. I got a behaviour which I could not explain (I am a Racket newbie, perhaps that is trivial).
Consider the following code:
#lang racket
(require parser-tools/yacc
parser-tools/lex
(prefix-in : parser-tools/lex-sre))
(define-tokens value-tokens ;;token which have a value
(STRING-VALUE ))
(define-empty-tokens op-tokens ;;token without a values
(EOF))
(define-lex-abbrevs ;;abbreviation
[STRING (:+ (:or (:/ "a" "z") (:/ "A" "Z") (:/ "0" "9") "." "_" "-"))]
)
(define lex-token
(lexer
[(eof) 'EOF]
;; recursively call the lexer on the remaining input after a tab or space. Returning the "1+1")
;; result of that operation. This effectively skips all whitespace.
[(:or #\tab #\space #\newline)
(lex-token input-port)]
[(:seq STRING) (token-STRING-VALUE lexeme)]
))
(define test-parser
(parser
(start query)
(end EOF)
(tokens value-tokens op-tokens)
(error (λ(ok? name value) (printf "Couldn't parse: ~a\n" name)))
(grammar
(query [(STRING-VALUE) $1])
)))
(define s (open-input-string
"abcd123"))
(define res
(test-parser (lambda () (lex-token s))))
(define str "abcd123")
After those definitions, res is a string:
> (string? res)
#t
and so is str.
If I try to execute a comparison with the "abcd123" string I get two different results:
> (eq? res "abcd123")
#f
> (eq? str "abcd123")
#t
Why is that? What am I missing here?

You should compare strings with equal?.

Like many programming languages there is a difference between same object and two objects that look the same. You probably should have a look at the Question about the difference between eq?, eqv?, equal? and =
Racket has string=? which compares strings specifically and might be faster than the less specific equal?.

Related

Function returns list but prints out NIL in LISP

I'm reading a file char by char and constructing a list which is consist of list of letters of words. I did that but when it comes to testing it prints out NIL. Also outside of test function when i print out list, it prints nicely. What is the problem here? Is there any other meaning of LET keyword?
This is my read fucntion:
(defun read-and-parse (filename)
(with-open-file (s filename)
(let (words)
(let (letter)
(loop for c = (read-char s nil)
while c
do(when (char/= c #\Space)
(if (char/= c #\Newline) (push c letter)))
do(when (or (char= c #\Space) (char= c #\Newline) )
(push (reverse letter) words)
(setf letter '())))
(reverse words)
))))
This is test function:
(defun test_on_test_data ()
(let (doc (read-and-parse "document2.txt"))
(print doc)
))
This is input text:
hello
this is a test
You're not using let properly. The syntax is:
(let ((var1 val1)
(var2 val2)
...)
body)
If the initial value of the variable is NIL, you can abbreviate (varN nil) as just varN.
You wrote:
(let (doc
(read-and-parse "document2.txt"))
(print doc))
Based on the above, this is using the abbreviation, and it's equivalent to:
(let ((doc nil)
(read-and-parse "document2.txt"))
(print doc))
Now you can see that this binds doc to NIL, and binds the variable read-and-parse to "document2.txt". It never calls the function. The correct syntax is:
(let ((doc (read-and-parse "document2.txt")))
(print doc))
Barmar's answer is the right one. For interest, here is a version of read-and-parse which makes possibly-more-idiomatic use of loop, and also abstracts out the 'is the character white' decision since this is something which is really not usefully possible in portable CL as the standard character repertoire is absurdly poor (there's no tab for instance!). I'm sure there is some library available via Quicklisp which deals with this better than the below.
I think this is fairly readable: there's an outer loop which collects words, and an inner loop which collects characters into a word, skipping over whitespace until it finds the next word. Both use loop's collect feature to collect lists forwards. On the other hand, I feel kind of bad every time I use loop (I know there are alternatives).
By default this collects the words as lists of characters: if you tell it to it will collect them as strings.
(defun char-white-p (c)
;; Is a character white? The fallback for this is horrid, since
;; tab &c are not a standard characters. There must be a portability
;; library with a function which does this.
#+LispWorks (lw:whitespace-char-p c)
#+CCL (ccl:whitespacep c) ;?
#-(or LispWorks CCL)
(member char (load-time-value
(mapcan (lambda (n)
(let ((c (name-char n)))
(and c (list c))))
'("Space" "Newline" "Page" "Tab" "Return" "Linefeed"
;; and I am not sure about the following, but, well
"Backspace" "Rubout")))))
(defun read-and-parse (filename &key (as-strings nil))
"Parse a file into a list of words, splitting on whitespace.
By default the words are returned as lists of characters. If
AS-STRINGS is T then they are coerced to strings"
(with-open-file (s filename)
(loop for maybe-word = (loop with collecting = nil
for c = (read-char s nil)
;; carry on until we hit EOF, or we
;; hit whitespace while collecting a
;; word
until (or (not c) ;EOF
(and collecting (char-white-p c)))
;; if we're not collecting and we see
;; a non-white character, then we're
;; now collecting
when (and (not collecting) (not (char-white-p c)))
do (setf collecting t)
when collecting
collect c)
while (not (null maybe-word))
collect (if as-strings
(coerce maybe-word 'string)
maybe-word))))

lisp remove a the content of one list from another list

I have a list of string like this called F:
("hello word i'am walid" "goodbye madame") => this list contain two elements of string
and I have another list call S like this ("word" "madame") => this contain two words
now I want to remove the elements of the list S from each string of the list F and the output should be like this ("hello i'am walid" "goodbye")
i found already this function:
(defun remove-string (rem-string full-string &key from-end (test #'eql)
test-not (start1 0) end1 (start2 0) end2 key)
"returns full-string with rem-string removed"
(let ((subst-point (search rem-string full-string
:from-end from-end
:test test :test-not test-not
:start1 start1 :end1 end1
:start2 start2 :end2 end2 :key key)))
(if subst-point
(concatenate 'string
(subseq full-string 0 subst-point)
(subseq full-string (+ subst-point (length rem-string))))
full-string)))
example:
(remove-string "walid" "hello i'am walid") => the output "hello i'am"
but there is a problem
example:
(remove-string "wa" "hello i'am walid") => the output "hello i'am lid"
but the output should be like this "hello i'am walid" in another word i wont the remove the exact word from the string
please help me and thank's
You can use the cl-ppcre library for regular expressions. Its regex flavour understands the word boundary \b.
The replacement could work like this:
(cl-ppcre:regex-replace-all "\\bwa\\b" "ba wa walid" "")
=> "ba walid"
I guess that you want to collapse any whitespace around the removed word into one:
(cl-ppcre:regex-replace-all "\\s*\\bwa\\b\\s*" "ba wa walid" " ")
=> "ba walid"
See the documentation linked above.
UPDATE: You extended the question to punctuation. That's actually a tad more complicated, since you now have three kinds of characters: alphanumeric, punctuation, and whitespace.
I can't give a complete solution here, but the outline I envision is to create boundary definitions between all three of these kinds. You need positive/negative lookaheads/lookbehinds for that. Then you look at the replaced string, whether it starts or ends with punctuation and append or prepend the corresponding boundary to the effective expression.
For defining the boundaries in a readable manner, the parse tree syntax of cl-ppcre might prove useful.
The Common Lisp Cookbook provides this function:
(defun replace-all (string part replacement &key (test #'char=))
"Returns a new string in which all the occurences of the part
is replaced with replacement."
(with-output-to-string (out)
(loop with part-length = (length part)
for old-pos = 0 then (+ pos part-length)
for pos = (search part string
:start2 old-pos
:test test)
do (write-string string out
:start old-pos
:end (or pos (length string)))
when pos do (write-string replacement out)
while pos)))
Using that function:
(loop for raw-string in '("hello word i'am walid" "goodbye madame")
collect (reduce (lambda (source-string bad-word)
(replace-all source-string bad-word ""))
'("word" "madame")
:initial-value raw-string))

Is it possible to turn off qualification of symbols when using clojure syntax quote in a macro?

I am generating emacs elisp code from a clojure function. I originally started off using a defmacro, but I realized since I'm going cross-platform and have to manually eval the code into the elisp environment anyway, I can just as easily use a standard clojure function. But basically what I'm doing is very macro-ish.
I am doing this because my goal is to create a DSL from which I will generate code in elisp, clojure/java, clojurescript/javascript, and maybe even haskell.
My "macro" looks like the following:
(defn vt-fun-3 []
(let [hlq "vt"]
(let [
f0 'list
f1 '(quote (defun vt-inc (n) (+ n 1)))
f2 '(quote (ert-deftest vt-inc-test () (should (= (vt-inc 7) 8))))]
`(~f0 ~f1 ~f2)
)))
This generates a list of two function definitions -- the generated elisp defun and a unit test:
(list (quote (defun vt-inc (n) (+ n 1))) (quote (ert-deftest vt-inc-test () (should (= (vt-inc 7) 8)))))
Then from an emacs scratch buffer, I utilize clomacs https://github.com/clojure-emacs/clomacs to import into the elisp environment:
(clomacs-defun vt-fun-3 casc-gen.core/vt-fun-3)
(progn
(eval (nth 0 (eval (read (vt-fun-3)))))
(eval (nth 1 (eval (read (vt-fun-3))))))
From here I can then run the function and the unit test:
(vt-inc 4)
--> 5
(ert "vt-inc-test")
--> t
Note: like all macros, the syntax quoting and escaping is very fragile. It took me a while to figure out the proper way to get it eval properly in elisp (the whole "(quote (list..)" prefix thing).
Anyway, as suggested by the presences of the "hlq" (high-level-qualifier) on the first "let", I want to prefix any generated symbols with this hlq instead of hard-coding it.
Unfortunately, when I use standard quotes and escapes on the "f1" for instance:
f1 '(quote (defun ~hlq -inc (n) (+ n 1)))
This generates:
(list (quote (defun (clojure.core/unquote hlq) -inc (n) (+ n 1)))
(quote (ert-deftest vt-inc-test () (should (= (vt-inc 7) 8)))))
In other words it substitutes 'clojure.core/unquote' for "~" which is not what I want.
The clojure syntax back-quote:
f1 `(quote (defun ~hlq -inc (n) (+ n 1)))
doesn't have this problem:
(list (quote (casc-gen.core/defun vt casc-gen.core/-inc (casc-gen.core/n) (clojure.core/+ casc-gen.core/n 1))) (quote (ert-deftest vt-inc-test () (should (= (vt-inc 7) 8)))))
It properly escapes and inserts "vt" as I want (I still have to work out to concat to the stem of the name, but I'm not worried about that).
Problem solved, right? Unfortunately syntax quote fully qualifies all the symbols, which I don't want since the code will be running under elisp.
Is there a way to turn off the qualifying of symbols when using the syntax quote (back tick)?
It also seems to me that the syntax quote is more "capable" than the standard quote. Is this true? Or can you, by trickery, always make the standard quote behave the same as the syntax quote? If you cannot turn off qualification with syntax quote, how could I get this working with the standard quote? Would I gain anything by trying to do this as a defmacro instead?
The worst case scenario is I have to run a regex on the generated elisp and manually remove any qualifications.
There is no way to "turn off" the qualifying of symbols when using syntax quote. You can do this however:
(let [hlq 'vt] `(~'quote (~'defun ~hlq ~'-inc (~'n) (~'+ ~'n 1))))
Which is admittedly pretty tedious. The equivalent without syntax quote is:
(let [hlq 'vt] (list 'quote (list 'defun hlq '-inc '(n) '(+ n 1))))
There is no way to get your desired output when using standard quote prefixing the entire form however.
As to the issue of using defmacro instead, as far as I understand your intentions, I don't think you would gain anything by using a macro.
Based on the input from justncon, here is my final solution. I had to do a little extra formatting to get the string concat on the function name right, but everything was pretty much like he recommended:
(defn vt-gen-4 []
(let [hlq 'vt]
(let [
f1 `(~'quote (~'defun ~(symbol (str hlq "-inc")) (~'n) (~'+ ~'n 1)))
f2 `(~'quote (~'defun ~(symbol (str hlq "-inc-test")) () (~'should (~'= (~(symbol (str hlq "-inc")) 7) 8))))
]
`(~'list ~f1 ~f2))))
What I learned:
syntax quote is the way to go, you just have to know how to control unquoting at the elemental level.
~' (tilde quote) is my friend here. Within a syntax quote expression, if you specify ~' before either a function or var it will be passed through to the caller as specified.
Take the expression (+ 1 1)
Here is a synopsis of how this expression will expand within a syntax quote expression based on various levels of escaping:
(defn vt-foo []
(println "(+ 1 1) -> " `(+ 1 1)) --> (clojure.core/+ 1 1)
(println "~(+ 1 1) -> " `~(+ 1 1)) --> 2
(println "~'(+ 1 1) -> " `~'(+ 1 1)) --> (+ 1 1)
)
The last line was what I wanted. The first line was what I was getting.
If you escape a function then do not escape any parameters you want escaped. For instance, here we want
to call the "str" function at macro expand time and to expand the variable "hlq" to it's value 'vt:
;; this works
f1 `(quote (defun ~(str hlq "-inc") ~hlq (n) (+ n 1)))
;; doesn't work if you escape the hlq:
f1 `(quote (defun ~(str ~hlq "-inc") ~hlq (n) (+ n 1)))
I guess an escape spans to everything in the unit your escaping. Typically you escape atoms (like strings or symbols), but if it's a list then everything in the list is automatically escaped as well, so don't double escape.
4) FWIW, I ended writing a regex solution before I got the final answer. It's definitely not as nice:
(defn vt-gen-3 []
(let [hlq "vt"]
(let
[
f0 'list
f1 `(quote (defun ~(symbol (str hlq "-inc")) (n) (+ n 1)))
f2 '(quote (ert-deftest vt-inc-test () (should (= (vt-inc 7) 8))))
]
`(~f0 ~f1 ~f2)
))
)
;; this strips out any qualifiers like "casc-gen.core/"
(defn vt-gen-3-regex []
(clojure.string/replace (str (vt-gen-3)) #"([\( ])([a-zA-Z0-9-\.]+\/)" "$1" ))
Macro expansion is very delicate and requires lots of practice.

Convert char to number

I'm in the process of reading a flat file - to use the characters read I want to convert them into numbers. I wrote a little function that converts a string to a vector:
(defun string-to-vec (strng)
(setf strng (remove #\Space strng))
(let ((vec (make-array (length strng))))
(dotimes (i (length strng) vec)
(setf (svref vec i) (char strng i)))))
However this returns a vector with character entries. Short of using char-code to convert unit number chars to numbers in a function, is there a simple way to read numbers as numbers from a file?
In addition to Rainer's answer, let me mention read-from-string (note that Rainer's code is more efficient than repeated application of read-from-string because it only creates a stream once) and parse-integer (alas, there is no parse-float).
Note that if you are reading a CSV file, you should probably use an off-the-shelf library instead of writing your own.
Above is shorter:
? (map 'vector #'identity (remove #\Space "123"))
#(#\1 #\2 #\3)
You can convert a string:
(defun string-to-vector-of-numbers (string)
(coerce
(with-input-from-string (s string)
(loop with end = '#:end
for n = (read s nil end)
until (eql n end)
unless (numberp n) do (error "Input ~a is not a number." n)
collect n))
'vector))
But it would be easier to read the numbers directly form the file. Use READ, which can read numbers.
Note that read-like functions are affected by reader macros.
Pick an example:
* (defvar *foo* 'bar)
*FOO*
* (read-from-string "#.(setq *foo* 'baz)")
BAZ
19
* *foo*
BAZ
As you can see read-from-string can implicitly set a variable. You can disable the #. reader macro by setting *read-eval* to nil but anyway if you have only integers on the input then consider using parse-integer instead.

str_replace in Common Lisp?

Is there some function similar to PHP's str_replace in Common Lisp?
http://php.net/manual/en/function.str-replace.php
There is a library called cl-ppcre:
(cl-ppcre:regex-replace-all "qwer" "something to qwer" "replace")
; "something to replace"
Install it via quicklisp.
I think there is no such function in the standard. If you do not want to use a regular expression (cl-ppcre), you could use this:
(defun string-replace (search replace string &optional count)
(loop for start = (search search (or result string)
:start2 (if start (1+ start) 0))
while (and start
(or (null count) (> count 0)))
for result = (concatenate 'string
(subseq (or result string) 0 start)
replace
(subseq (or result string)
(+ start (length search))))
do (when count (decf count))
finally (return-from string-replace (or result string))))
EDIT: Shin Aoyama pointed out that this does not work for replacing, e.g., "\"" with "\\\"" in "str\"ing". Since I now regard the above as rather cumbersome I should propose the implementation given in the Common Lisp Cookbook, which is much better:
(defun replace-all (string part replacement &key (test #'char=))
"Returns a new string in which all the occurences of the part
is replaced with replacement."
(with-output-to-string (out)
(loop with part-length = (length part)
for old-pos = 0 then (+ pos part-length)
for pos = (search part string
:start2 old-pos
:test test)
do (write-string string out
:start old-pos
:end (or pos (length string)))
when pos do (write-string replacement out)
while pos)))
I especially like the use of with-output-to-string, which generally performs better than concatenate.
If the replacement is only one character, which is often the case, you can use substitute:
(substitute #\+ #\Space "a simple example") => "a+simple+example"