Newbie transforming CSV files in Clojure

Newbie transforming CSV files in Clojure - perl

I'm both new and old to programming -- mostly I just write a lot of small Perl scripts at work. Clojure came out just when I wanted to learn Lisp, so I'm trying to learn Clojure without knowing Java either. It's tough, but it's been fun so far.
I've seen several examples of similar problems to mine, but nothing that quite maps to my problem space. Is there a canonical way to extract lists of values for each line of a CSV file in Clojure?
Here's some actual working Perl code; comments included for non-Perlers:
# convert_survey_to_cartography.pl
open INFILE, "< coords.csv"; # Input format "Northing,Easting,Elevation,PointID"
open OUTFILE, "> coords.txt"; # Output format "PointID X Y Z".
while (<INFILE>) { # Read line by line; line bound to $_ as a string.
chomp $_; # Strips out each line's <CR><LF> chars.
#fields = split /,/, $_; # Extract the line's field values into a list.
$y = $fields[0]; # y = Northing
$x = $fields[1]; # x = Easting
$z = $fields[2]; # z = Elevation
$p = $fields[3]; # p = PointID
print OUTFILE "$p $x $y $z\n" # New file, changed field order, different delimiter.
}
I've puzzled out a little bit in Clojure and tried to cobble it together in an imperative style:
; convert-survey-to-cartography.clj
(use 'clojure.contrib.duck-streams)
(let
[infile "coords.csv" outfile "coords.txt"]
(with-open [rdr (reader infile)]
(def coord (line-seq rdr))
( ...then a miracle occurs... )
(write-lines outfile ":x :y :z :p")))
I don't expect the last line to actually work, but it gets the point across. I'm looking for something along the lines of:
(def values (interleave (:p :y :x :z) (re-split #"," coord)))
Thanks, Bill

Please don't use nested def's. It doesn't do, what you think it does. def is always global! For locals use let instead. While the library functions are nice to know, here a version orchestrating some features of functional programming in general and clojure in particular.
(import 'java.io.FileWriter 'java.io.FileReader 'java.io.BufferedReader)
(defn translate-coords
Docstrings can be queried in the REPL via (doc translate-coords). Works eg. for all core functions. So supplying one is a good idea.
"Reads coordinates from infile, translates them with the given
translator and writes the result to outfile."
translator is a (maybe anonymous) function which extracts the translation from the surrounding boilerplate. So we can reuse this functions with different transformation rules. The type hints here avoid reflection for the constructors.
[translator #^String infile #^String outfile]
Open the files. with-open will take care, that the files are closed when its body is left. Be it via normal "drop off the bottom" or be it via a thrown Exception.
(with-open [in (BufferedReader. (FileReader. infile))
out (FileWriter. outfile)]
We bind the *out* stream temporarily to the output file. So any print inside the binding will print to the file.
(binding [*out* out]
The map means: take the seq and apply the given function to every element and return the seq of the results. The #() is a short-hand notation for an anonymous function. It takes one argument, which is filled in at the %. The doseq is basically a loop over the input. Since we do that for the side effects (namely printing to a file), doseq is the right construct. Rule of thumb: map: lazy => for result, doseq: eager => for side effects.
(doseq [coords (map #(.split % ",") (line-seq in))]
println takes care for the \n at the end of the line. interpose takes the seq and adds the first argument (in our case " ") between its elements. (apply str [1 2 3]) is equivalent to (str 1 2 3) and is useful to construct function calls dynamically. The ->> is a relatively new macro in clojure, which helps a bit with readability. It means "take the first argument and add it as last item to the function call". The given ->> is equivalent to: (println (apply str (interpose " " (translator coords)))). (Edit: Another note: since the separator is \space, we could here write just as well (apply println (translator coords)), but the interpose version allows to also parametrize the separator as we did with the translator function, while the short version would hardwire \space.)
(->> (translator coords)
(interpose " ")
(apply str)
println)))))
(defn survey->cartography-format
"Translate coords in survey format to cartography format."
Here we use destructuring (note the double [[]]). It means the argument to the function is something which can be turned into a seq, eg. a vector or a list. Bind the first element to y, the second to x and so on.
[[y x z p]]
[p x y z])
(translate-coords survey->cartography-format "survey_coords.txt" "cartography_coords.txt")
Here again less choppy:
(import 'java.io.FileWriter 'java.io.FileReader 'java.io.BufferedReader)
(defn translate-coords
"Reads coordinates from infile, translates them with the given
translator and writes the result to outfile."
[translator #^String infile #^String outfile]
(with-open [in (BufferedReader. (FileReader. infile))
out (FileWriter. outfile)]
(binding [*out* out]
(doseq [coords (map #(.split % ",") (line-seq in))]
(->> (translator coords)
(interpose " ")
(apply str)
println)))))
(defn survey->cartography-format
"Translate coords in survey format to cartography format."
[[y x z p]]
[p x y z])
(translate-coords survey->cartography-format "survey_coords.txt" "cartography_coords.txt")
Hope this helps.
Edit: For CSV reading you probably want something like OpenCSV.

Here's one way:
(use '(clojure.contrib duck-streams str-utils)) ;;'
(with-out-writer "coords.txt"
(doseq [line (read-lines "coords.csv")]
(let [[x y z p] (re-split #"," line)]
(println (str-join \space [p x y z])))))
with-out-writer binds *out* such that everything you print will go to the filename or stream you specify, rather than standard-output.
Using def as you're using it isn't idiomatic. A better way is to use let. I'm using destructuring to assign the 4 fields of each line to 4 let-bound names; then you can do what you want with those.
If you're iterating over something for the purpose of side-effects (e.g. I/O) you should usually go for doseq. If you wanted to collect up each line into a hash-map and do something with them later, you could use for:
(with-out-writer "coords.txt"
(for [line (read-lines "coords.csv")]
(let [fields (re-split #"," line)]
(zipmap [:x :y :z :p] fields))))

Related

Checking user input in Racket

I am getting an input from a user for a tex-field% in racket which would look something like this:
open button a = fwd; button b = xxx; button s = xxx; close
I have verified that it does contain open and close at beginning and end respectively. But now i need to store each of the substrings based on the semicolons to check them for semicolons at the end, among other things. For example, in the example above it should store 3 substrings in a vector/list (whichever is easier). It would be stored as:
button a = fwd;
button b = xxx;
button s = xxx;
;input is the name of the string the user enters
(define vec (apply vector (string-split input)))
(define vecaslist(vector->list vec))
(define removedopen (cdr vecaslist))
(define withoutopenandclose (reverse(cdr(reverse removedopen))))
(define stringwithoutopen (string-replace input "open " ""))
(define stringtoderivate (string-replace stringwithoutopen " close" ""))
(define tempvec (apply vector (string-split stringtoderivate ";" #:trim? #f #:repeat? #t)))
Attempted to split it by semicolons and place in a vector, but it removes the semicolons. When i do print the length of the vector it correctly shows 3 though, but i would like to keep the semicolons for now.

You can use string-split with a regular expression separator, as follows:
(string-split input #rx"(open | close)|(?<=;).")
which will output the list:
'("button a = fwd;" "button b = xxx;" "button s = xxx;")
To break down the regular expression:
(exp) matches any sub-expression "exp". Hence, (open ) matches the sub-expression "open " in input. Similarly with ( close), matching " close".
(?<=exp) does a positive look-behind, matching if "exp" matches preceding.
. matches anything, such as whitespace, characters etc.
| matches either the expression that comes before it, or after it, trying left first.

When do variables output properly in skeletons functions?

I'm trying to write a skeleton-function to output expressions in a loop. Out of a loop I can do,
(define-skeleton test
""
> "a")
When I evaluate this function it outputs "a" into the working buffer as desired. However, I'm having issues when inserting this into a loop. I now have,
(define-skeleton test
"A test skeleton"
(let ((i 1))
(while (< i 5)
>"a"
(setq i (1+ i)))))
I would expect this to output "aaaaa". However, instead nothing is outputted into the working buffer in this case. What is happening when I insert the loop?

The > somestring skeleton dsl does not work inside lisp forms.
You can however concatenate the string inside a loop:
(define-skeleton barbaz
""
""
(let ((s ""))
(dotimes (i 5)
(setq s (concat s "a")))
s)
)
My understanding is that code such as
> "a"
only works at the first nesting level inside a skeleton.
[EDIT] Regarding your question
What is happening when I insert the loop?
The return value of the let form (that is, the return value of the while form)is inserted. I do not know why it does not raise an error when evaluating > "a", but the return value of a while form is nil, so nothing is inserted.

I do agree that there's not much point using define-skeleton if you're going to need an (insert function within the skeleton.
This is also a rather trivial example to be using define-skeleton.
That said, they are often easier to read than a defun and useful when you want to create a function that inserts text (and optionally, takes input).
For example you may wish to have a different character repeated a set no. of times... Below, str refers to the argument supplied with the function (usually a string) and v1, v2 are the default names for local variables in a skeleton. Thus:
(define-skeleton s2 ""
nil ; don't prompt for value of 'str'
'(set 'v1 (make-string 5 (string-to-char str)))
\n v1 \n \n)
Below, calling the function leads to a newline, the string, then leaves the cursor at the position indicated by the square brackets [].
(s2 "a")
aaaaa
[]

no dot between functions in map

I have the following code:
object testLines extends App {
val items = Array("""a-b-c d-e-f""","""a-b-c th-i-t""")
val lines = items.map(_.replaceAll("-", "")split("\t"))
print(lines.map(_.mkString(",")).mkString("\n"))
}
By mistake i did not put a dot between replaceAll and split but it worked.
By contrary when putting a dot between replaceAll and split i got an error
identifier expected but ';' found.
Implicit conversions found: items =>
What is going on?
Why does it work without a dot but is not working with a dot.
Update:
It works also with dot. The error message is a bug in the scala ide. The first part of the question is still valid
Thanks,
David

You have just discovered that Operators are methods. x.split(y) can also be written x split y in cases where the method is operator-like and it looks nicer. However there is nothing stopping you putting either side in parentheses like x split (y), (x) split y, or even (x) split (y) which may be necessary (and is a good idea for readability even if not strictly necessary) if you are passing in a more complex expression than a simple variable or constant and need parentheses to override the precedence.
With the example code you've written, it's not a bad idea to do the whole thing in operator style for clarity, using parentheses only where the syntax requires and/or they make groupings more obvious. I'd probably have written it more like this:
object testLines extends App {
val items = Array("a-b-c d-e-f", "a-b-c th-i-t")
val lines = items map (_ replaceAll ("-", "") split "\t")
print(lines map (_ mkString ",") mkString "\n")
}

Having a table as result of an org-babel code block in Perl

I have a very simple example to illustrate the problem. Consider the following code block in Perl, in an org-mode file:
#+begin_src perl :results table
return qw(1 2 3);
#+end_src
It produces the following result:
#+results:
| 1\n2\n3\n |
which is not totally satisfactory since I was expecting a full org-table.
For instance, in Python the following code:
#+begin_src python :results table
return (1, 2, 3)
#+end_src
produces this result:
#+results:
| 1 | 2 | 3 |
So that's apparently working in Python but not in Perl. Am I doing something wrong? Is this a known bug?

Since I felt a little masochistic this morning I decided to take a shot at hacking a little lisp again. I cooked up a small fix which works for your example but I can't promise it will work more complex ones. So here it comes:
org-babel defines a wrapper for each language. The perl one did not produce something babel detects as a list so I modified it. In order to not make everything formated as a table I had to check if the result was printable as a table:
(setq org-babel-perl-wrapper-method
"
sub main {
%s
}
#r = main;
open(o, \">%s\");
if ($#r > 0) {
print o \"(\",join(\", \",#r), \")\",\"\\n\"
} else {
print o join(\"\\n\", #r), \"\\n\"
}")
You can modify this further to fit your needs if you want to.
The next thing is that the perl-evaluate method in babel does not run the output through further formating so I modified the evaluate method taking the new parts from the python-evaluate code:
(defun org-babel-perl-table-or-string (results)
"Convert RESULTS into an appropriate elisp value.
If the results look like a list or tuple, then convert them into an
Emacs-lisp table, otherwise return the results as a string."
(org-babel-script-escape results))
(defun org-babel-perl-evaluate (session body &optional result-type)
"Pass BODY to the Perl process in SESSION.
If RESULT-TYPE equals 'output then return a list of the outputs
of the statements in BODY, if RESULT-TYPE equals 'value then
return the value of the last statement in BODY, as elisp."
(when session (error "Sessions are not supported for Perl."))
((lambda (raw)
(if (or (member "code" result-params)
(member "pp" result-params)
(and (member "output" result-params)
(not (member "table" result-params))))
raw
(org-babel-perl-table-or-string (org-babel-trim raw))))
(case result-type
(output (org-babel-eval org-babel-perl-command body))
(value (let ((tmp-file (org-babel-temp-file "perl-")))
(org-babel-eval
org-babel-perl-command
(format org-babel-perl-wrapper-method body
(org-babel-process-file-name tmp-file 'noquote)))
(org-babel-eval-read-file tmp-file))))))
The new parts are org-babel-perl-table-or-string and the part in org-babel-perl-evaluate between the empty lines (plus 1 closing parenthesis at the end).
So what this now does is let perl print lists similar to the way python prints them and put the printed results through org-babel's formating procedures.
Now to the result:
A List:
#+begin_src perl :results value
return qw(1 2 3);
#+end_src
#+results:
| 1 | 2 | 3 |
A scalar:
#+begin_src perl :results value
return "Hello test 123";
#+end_src
#+results:
: Hello test 123
Ways you can use this code:
Place it in scratch and M-x eval-buffer for testing
Place it in a elsip src block at the beginning of your org-document
Place it in your .emacs after babel is loaded
Modify ob-perl.el in your lisp/org folder (might need to recompile org-mode afertwards)
I didn't not tested this much further than the output examples I gave you so if it misbehaves for other examples feel free to complain.

What's the corresponding standard function of atoi in clisp?

In visual lisp, you can use (atoi "123") to convert "123" to 123. It seems there is no "atoi" like function in clisp ?
any suggestion is appreciated !
Now i want to convert '(1 2 3 20 30) to "1 2 3 20 30", then what's the best way to do it ?
parse-interger can convert string to integer, and how to convert integer to string ? Do i need to use format function ?
(map 'list #'(lambda (x) (format nil "~D" x)) '(1 2 3)) => ("1" "2" "3")
But i donot know how to cnovert it to "1 2 3" as haskell does:
concat $ intersperse " " ["1","2","3","4","5"] => "1 2 3 4 5"
Sincerely!

In Common Lisp, you can use the read-from-string function for this purpose:
> (read-from-string "123")
123 ;
3
As you can see, the primary return value is the object read, which in this case happens to be an integer. The second value—the position—is harder to explain, but here it indicates the next would-be character in the string that would need to be read next on a subsequent call to a reading function consuming the same input.
Note that read-from-string is obviously not tailored just for reading integers. For that, you can turn to the parse-integer function. Its interface is similar to read-from-string:
> (parse-integer "123")
123 ;
3
Given that you were asking for an analogue to atoi, the parse-integer function is the more appropriate choice.
Addressing the second part of your question, post-editing, you can interleave (or "intersperse") a string with the format function. This example hard-codes a single space character as the separating string, using the format iteration control directives ~{ (start), ~} (end), and ~^ (terminate if remaining input is empty):
> (format nil "Interleaved: ~{~S~^ ~}." '(1 2 3))
"Interleaved: 1 2 3."
Loosely translated, the format string says,
For each item in the input list (~{), print the item by its normal conversion (~S). If no items remain, stop the iteration (~^). Otherwise, print a space, and then repeat the process with the next item (~}).
If you want to avoid hard-coding the single space there, and accept the separator string as a separately-supplied value, there are a few ways to do that. It's not clear whether you require that much flexibility here.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Newbie transforming CSV files in Clojure - perl

Related

Checking user input in Racket

When do variables output properly in skeletons functions?

no dot between functions in map

Having a table as result of an org-babel code block in Perl

What's the corresponding standard function of atoi in clisp?

Categories

Resources