I am trying to find an equivalent of assoc-in (clojure) in scala. I am trying to convert
(defn- organiseDataByTradeId [data]
(reduce #(let [a (assoc-in %1
[(%2 "internaltradeid") (read-string (%2 "paramseqnum")) "levelcols"]
(reduce (fn [m k](assoc m k (get %2 k)))
{}
(string/split xmlLevelAttributesStr #",")))
b (assoc-in a
[(%2 "internaltradeid") (read-string (%2 "paramseqnum")) "subLevelCols" (read-string (%2 "cashflowseqnum"))]
(reduce (fn [m k] (assoc m k (get %2 k)))
{}
(string/split xmlSubLevelAttributesStr #","))
)]
b)
{}
data))
to scala.
Have tried this :
def organiseDataByTradeId(data: List[Map[String, String]]) = {
data.map { entry => Map(entry("internaltradeid") -> Map(entry("paramseqnum").toInt -> Map("levelcols" -> (xmlLevelAttributesStr.split(",")).map{key=> (key,entry(key))}.toMap,
"subLevelCols" -> Map(entry("cashflowseqnum").asInstanceOf[String].toInt -> (xmlSubLevelAttributesStr.split(",")).map{key=> (key,entry(key))}.toMap)))) }
}
Not sure how to merge the list of maps I got without overwriting.
Here data List[Map[String,String]] is basically describing a table.Each entry is a row.Column names are keys of the maps and values are values.xmlLevelAttributeStr and xmlSubLevelAttributeStr are two Strings where column names are separated by comma.
I am fairly new to scala. I converted each row(Map[String,String]) to a scala Map and now not sure how to merge them so that previous data is not overwritten and behaves exactly as the clojure code.Also I am not allowed to use external libraries such as scalaz.
This Clojure code is not a good pattern to copy: it has a lot of duplication, and little explanation of what it is doing. I would write it more like this:
(defn- organiseDataByTradeId [data]
(let [level-reader (fn [attr-list]
(let [levels (string/split attr-list #",")]
(fn [item]
(into {} (for [level levels]
[level (get item level)])))))
attr-levels (level-reader xmlLevelAttributesStr)
sub-levels (level-reader xmlSubLevelAttributesStr)]
(reduce (fn [acc item]
(update-in acc [(item "internaltradeid"),
(read-string (item "paramseqnum"))]
(fn [trade]
(-> trade
(assoc "levelcols" (attr-levels item))
(assoc-in ["subLevelCols", (read-string (item "cashflowseqnum"))]
(sub-levels item))))))
{}, data)))
It's more lines of code than your original, but I've taken the opportunity to name a number of useful concepts and extract the repetition into a local function so that it's more self-explanatory.
It's even easier if you know there will be no duplication of internaltradeid: you can simply generate a number of independent maps and merge them together:
(defn- organiseDataByTradeId [data]
(let [level-reader (fn [attr-list]
(let [levels (string/split attr-list #",")]
(fn [item]
(into {} (for [level levels]
[level (get item level)])))))
attr-levels (level-reader xmlLevelAttributesStr)
sub-levels (level-reader xmlSubLevelAttributesStr)]
(apply merge (for [item data]
{(item "internaltradeid")
{(read-string (item "paramseqnum"))
{"levelcols" (attr-levels item),
"subLevelCols" {(read-string (item "cashflowseqnum")) (sub-levels item)}}}}))))
But really, neither of these approaches will work well in Scala, because Scala has a different data modeling philosophy than Clojure does. Clojure encourages loosely-defined heterogeneous maps like this, where Scala would prefer that your maps be homogeneous. When you will have data mixing multiple types, Scala suggests you define a class (or perhaps a case class - I'm no Scala expert) and then create instances of that class.
So here you'd want a Map[String, Map[Int, TradeInfo]], where TradeInfo is a class with two fields, levelcols : List[Attribute], and subLevelCols as some sort of pair (or perhaps a single-element map) containing a cashflowseqnum and another List[Attribute].
Once you've modeled your data in the Scala way, you'll be quite far away from using anything that looks like assoc-in because your data won't be a single giant map, so the question won't arise.
Related
I am trying to rewrite Spark Structured Streaming example in Clojure.
The example is written in Scala as follows:
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
(ns flambo-example.streaming-example
(:import [org.apache.spark.sql Encoders SparkSession Dataset Row]
[org.apache.spark.sql.functions]
))
(def spark
(->
(SparkSession/builder)
(.appName "sample")
(.master "local[*]")
.getOrCreate)
)
(def lines
(-> spark
.readStream
(.format "socket")
(.option "host" "localhost")
(.option "port" 9999)
.load
)
)
(def words
(-> lines
(.as (Encoders/STRING))
(.flatMap #(clojure.string/split % #" " ))
))
The above code causes the following exception.
;; Caused by java.lang.IllegalArgumentException
;; No matching method found: flatMap for class
;; org.apache.spark.sql.Dataset
How can I avoid the error ?
You have to follow the signatures. Java Dataset API provides two implementations of Dataset.flatMap, one which takes scala.Function1
def flatMap[U](func: (T) ⇒ TraversableOnce[U])(implicit arg0: Encoder[U]): Dataset[U]
and the second one which takes Spark's own o.a.s.api.java.function.FlatMapFunction
def flatMap[U](f: FlatMapFunction[T, U], encoder: Encoder[U]): Dataset[U]
The former one is rather useless for you, but you should be able to use the latter one. For RDD API flambo uses macros to create Spark friendly adapters which can be accessed with flambo.api/fn - I am not sure if these will work directly with Datasets, but you should be able to adjust them if you need.
Since you cannot depend on implicit Encoders you also have to provide explicit encoder which matches return type.
Overall you'll need something around:
(def words
(-> lines
(.as (Encoders/STRING))
(.flatMap f e)
))
where f implements FlatMapFunction and e is an Encoder. One example implementation:
(def words
(-> lines
(.as (Encoders/STRING))
(.flatMap
(proxy [FlatMapFunction] []
(call [s] (.iterator (clojure.string/split s #" "))))
(Encoders/STRING))))
but I guess it is possible to find a better one.
In practice I'd avoid typed Dataset whatsoever and focus on DataFrame (Dataset[Row]).
According to the Learning Clojure wikibook backticks are expanded as follows
`(x1 x2 x3 ... xn)
is interpreted to mean
(clojure.core/seq (clojure.core/concat |x1| |x2| |x3| ... |xn|))
Why wrap concat with seq? What difference does it make?
Regardless of how it arose
concat returns a sequence, and
seq returns a sequence with the same content as its sequence argument,
... so seq is effectively an identity-op on a concat... except in one circumstance:
When s is an empty sequence, (seq s) is nil.
I doubt that the expansion is correct, since
`()
... evaluates to
()
... with type
clojure.lang.PersistentList$EmptyList
Whereas
(seq (concat))
... evaluates to
nil
This suggests that the wrapping call to seq is not there.
Strictly speaking, it expands to:
(macroexpand '`(x1 x2 x3))
(clojure.core/seq (clojure.core/concat (clojure.core/list (quote user/x1)) (clojure.core/list (quote user/x2)) (clojure.core/list (quote user/x3))))
(macroexpand `(x1 x2 x3))
(user/x1 user/x2 user/x3)
Why the call seq ? Because sequences are corner stones in Clojure philosophy. I recommend you read Clojure Sequences. Otherwise, I would duplicate it here.
I'm trying to solve a problem: I need to create a map from passed-in values, but while the symbol names for the values are consistent, the keys they map to are not. For instance: I might be passed a value that is a user ID. In the code, I can always use the symbol user-id -- but depending on other factors, I might need to make a map {"userId" user-id} or {"user_id" user-id} or {:user-id user-id} or -- well, you get the picture.
I can write a macro that gets me part-way there:
(defmacro user1 [user-id] `{"userId" ~user-id}
(defmacro user2 [user-id] `{"user_id" ~user-id}
But what I'd much rather do is define a set of maps, then combine them with a given set of symbols:
(def user-id-map-1 `{"userId" `user-id}
(defn combiner [m user-id] m) ;; <-- Around here, a miracle occurs.
I can't figure out how to get this evaluation to occur. It seems like I should be able to make a map containing un-evaluated symbols, then look up those symbols in the lexical scope of a function or macro that binds those symbols as locals -- but how?
Instead of standardizing your symbolic names, use maps with standard keyword keys. You don't need to go near macros, and you can turn your maps into records if need be without much trouble.
What you know as
(def user1 {:id 3124, :surname "Adabolo", :forenames ["Julia" "Frances"]})
... can be transformed by mapping the keys with whatever function you choose:
(defn map-keys [keymap m]
(zipmap (map keymap (keys m)) (vals m)))
For example,
(map-keys name user1)
;{"id" 3124, "surname" "Adabolo", "forenames" ["Julia" "Frances"]}
or
(map-keys {:id :user-id, :surname :family-name} user1)
;{:user-id 3124, :family-name "Adabolo", nil ["Julia" "Frances"]}
If you want rid of the nil entry, wrap the expression in (dissoc ... nil):
(defn map-keys [keymap m]
(dissoc
(zipmap (map keymap (keys m)) (vals m))
nil))
Then
(map-keys {:id :user-id, :surname :family-name} user1)
;{:user-id 3124, :family-name "Adabolo"}
I see from Michał Marczyk's answer, which has priority, that the above essentially rewrites clojure.set/rename-keys, which, however ...
leaves missing keys untouched:
For example,
(clojure.set/rename-keys user1 {:id :user-id, :surname :family-name})
;{:user-id 3124, :forenames ["Julia" "Frances"], :family-name "Adabolo"}
doesn't work with normal functions:
For example,
(clojure.set/rename-keys user1 name)
;IllegalArgumentException Don't know how to create ISeq from: clojure.core$name ...
If you forego the use of false and nil as keys, you can leave missing keys untouched and still use normal functions:
(defn map-keys [keymap m]
(zipmap (map #(or (keymap %) %) (keys m)) (vals m)))
Then
(map-keys {:id :user-id, :surname :family-name} user1)
;{:user-id 3124, :family-name "Adabolo", :forenames ["Julia" "Frances"]}
How about putting your passed-in values in a map keyed by keywords forged from the formal parameter names:
(defmacro zipfn [map-name arglist & body]
`(fn ~arglist
(let [~map-name (zipmap ~(mapv keyword arglist) ~arglist)]
~#body)))
Example of use:
((zipfn argmap [x y z]
argmap)
1 2 3)
;= {:z 3, :y 2, :x 1}
Better yet, don't use macros:
;; could take varargs for ks (though it would then need another name)
(defn curried-zipmap [ks]
#(zipmap ks %))
((curried-zipmap [:x :y :z]) [1 2 3])
;= {:z 3, :y 2, :x 1}
Then you could rekey this map using clojure.set/rename-keys:
(clojure.set/rename-keys {:z 3, :y 2, :x 1} {:z "z" :y "y" :x "x"})
;= {"x" 1, "z" 3, "y" 2}
The second map here is the "translation map" for the keys; you can construct in by merging maps like {:x "x"} describing how the individual keys ought to be renamed.
For the problem you described I can't find a reason to use macros.
I'd recommend something like
(defn assoc-user-id
[m user-id other-factors]
(assoc m (key-for other-factors) user-id))
Where you implement key-for so that it selects the key based on other-factors.
I'm halfway through figuring out a solution to my question, but I have a feeling that it won't be very efficient. I've got a 2 dimensional cell structure of variable length arrays that is constructed in a very non-functional way in Matlab that I would like to convert to Clojure. Here is an example of what I'm trying to do:
pre = cell(N,1);
aux = cell(N,1);
for i=1:Ne
for j=1:D
for k=1:length(delays{i,j})
pre{post(i, delays{i, j}(k))}(end+1) = N*(delays{i, j}(k)-1)+i;
aux{post(i, delays{i, j}(k))}(end+1) = N*(D-1-j)+i; % takes into account delay
end;
end;
end;
My current plan for implementation is to use 3 loops where the first is initialized with a vector of N vectors of an empty vector. Each subloop is initialized by the previous loop. I define a separate function that takes the overall vector and the subindices and value and returns the vector with an updated subvector.
There's got to be a smarter way of doing this than using 3 loop/recurs. Possibly some reduce function that simplifies the syntax by using an accumulator.
I'm not 100% sure I understand what your code is doing (I don't know Matlab) but this might be one approach for building a multi-dimensional vector:
(defn conj-in
"Based on clojure.core/assoc-in, but with vectors instead of maps."
[coll [k & ks] v]
(if ks
(assoc coll k (conj-in (get coll k []) ks v))
(assoc coll k v)))
(defn foo []
(let [w 5, h 4, d 3
indices (for [i (range w)
j (range h)
k (range d)]
[i j k])]
(reduce (fn [acc [i j k :as index]]
(conj-in acc index
;; do real work here
(str i j k)))
[] indices)))
user> (pprint (foo))
[[["000" "001" "002"]
["010" "011" "012"]
["020" "021" "022"]
["030" "031" "032"]]
[["100" "101" "102"]
["110" "111" "112"]
["120" "121" "122"]
["130" "131" "132"]]
[["200" "201" "202"]
["210" "211" "212"]
["220" "221" "222"]
["230" "231" "232"]]
[["300" "301" "302"]
["310" "311" "312"]
["320" "321" "322"]
["330" "331" "332"]]
[["400" "401" "402"]
["410" "411" "412"]
["420" "421" "422"]
["430" "431" "432"]]]
This only works if indices go in the proper order (increasing), because you can't conj or assoc onto a vector anywhere other than one-past-the-end.
I also think it would be acceptable to use make-array and build your array via aset. This is why Clojure offers access to Java mutable arrays; some algorithms are much more elegant that way, and sometimes you need them for performance. You can always dump the data into Clojure vectors after you're done if you want to avoid leaking side-effects.
(I don't know which of this or the other version performs better.)
(defn bar []
(let [w 5, h 4, d 3
arr (make-array String w h d)]
(doseq [i (range w)
j (range h)
k (range d)]
(aset arr i j k (str i j k)))
(vec (map #(vec (map vec %)) arr)))) ;yikes?
Look to Incanter project that provide routines for work with data sets, etc.
I want to make a local instance of a Java Scanner class in a clojure program. Why does this not work:
; gives me: count not supported on this type: Symbol
(let s (new Scanner "a b c"))
but it will let me create a global instance like this:
(def s (new Scanner "a b c"))
I was under the impression that the only difference was scope, but apparently not. What is the difference between let and def?
The problem is that your use of let is wrong.
let works like this:
(let [identifier (expr)])
So your example should be something like this:
(let [s (Scanner. "a b c")]
(exprs))
You can only use the lexical bindings made with let within the scope of let (the opening and closing parens). Let just creates a set of lexical bindings. I use def for making a global binding and lets for binding something I want only in the scope of the let as it keeps things clean. They both have their uses.
NOTE: (Class.) is the same as (new Class), it's just syntactic sugar.
LET is not "make a lexical binding in the current scope", but "make a new lexical scope with the following bindings".
(let [s (foo whatever)]
;; s is bound here
)
;; but not here
(def s (foo whatever))
;; s is bound here
Simplified: def is for global constants, let is for local variables.
Correct syntax:
(let [s (Scanner. "a b c")] ...)
The syntax for them is different, even if the meanings are related.
let takes a list of bindings (name value pairs) followed by expressions to evaluate in the context of those binding.
def just takes one binding, not a list, and adds it to the global context.
You could think of let as syntactic sugar for creating a new lexical scope with fn then applying it immediately:
(let [a 3 b 7] (* a b)) ; 21
; vs.
((fn [a b] (* a b)) 3 7) ; 21
So you could implement let with a simple macro and fn:
(defmacro fnlet [bindings & body]
((fn [pairs]
`((fn [~#(map first pairs)] ~#body) ~#(map last pairs)))
(partition 2 bindings)))
(fnlet [a 3 b 7] (* a b)) ; 21