Scala map to HashMap - scala

Given a List of Person objects of this class:
class Person(val id : Long, val name : String)
What would be the "scala way" of obtaining a (java) HashMap with id for keys and name for values?
If the best answer does not include using .map, please provide an example with it, even if it's harder to do.
Thank you.
EDIT
This is what I have right now, but it's not too immutable:
val map = new HashMap[Long, String]
personList.foreach { p => map.put(p.getId, p.getName) }
return map

import collection.JavaConverters._
val map = personList.map(p => (p.id, p.name)).toMap.asJava
personList has type List[Person].
After .map operation, you get List[Tuple2[Long, String]] (usually written as, List[(Long, String)]).
After .toMap, you get Map[Long, String].
And .asJava, as name suggests, converts it to a Java map.
You don't need to define .getName, .getid. .name and .id are already getter methods. The value-access like look is intentional, and follows uniform access principle.

How about this:
preallocate enough entries in the empty HashMap using personList's size,
run the foreach loop,
if you need immutability, return java.collections.unmodifiableMap(map)?
This approach creates no intermediate objects. Mutable state is OK when it's confined to one local object — no side effects anyway :)
Disclaimer: I know very little Scala, so be cautious upvoting this.

Related

Scala copy and reflection

In my project, there are many places where objects are picked out of a collection, copied with some values changed, and pushed back into the collection. I have been trying to create my own 'copy' method, that in addition to making a copy, also gives me a 'Diff' object. In other words, something that contains the arguments you just put into it.
The 'Diff' object should then be sent somewhere to be aggregated, so that someone else can get a report of all the changes since last time, without sending the actual entire object. This is all simple enough if one does it like this:
val user = new User(Some(23), true, "arne", None, None, "position", List(), "email", None, false)
val user0 = user.copy(position = "position2")
list ::= user0
val diff = new Diff[User](Map("position" -> "position2"))
However, there is some duplicate work there, and I would very much like to just have it in one method, like:
val (user, diff) = user.copyAndDiff(position = "position")
I haven't been able to figure out what form the arguments to 'copy' actually takes, but I would be able to work with other forms as well.
I made a method with a Type argument, that should make a copy and a diff. Something like this:
object DiffCopy[Copyable]{
def apply(original:Copyable, changes:Map[String, Any]){
original.copy(??uhm..
original.getAllTheFieldsAndCopyAndOverWriteSomeAccordingToChanges??
My first problem was that there doesn't seem to be any way to guarantee that the original object has a 'copy' method that I can overload to. The second problem appears when I want to actually assign the changes to their correct fields in the new, copied object. I tried to fiddle about with Reflection, and tried to find a way to set the value of a field with a name given as String. In which case I could keep my Diff as a a simple map, and simply create this diff-map first, and then apply it to my objects and also send them to where they needed to go.
However, I ended up deeper and deeper in the rabbit hole, and further and further away from what I actually wanted. I got to a point where I had an array of fields from an arbitrary object, and could get them by name, but I couldn't get it to work for a generic Type. So now I am here to ask if anyone can give me some advice on this situation?
The best answer I could get, would be if someone could tell me a simple way to apply a Map[String, Any] to something equivalent to the 'copy' method. I'm fairly sure this should be possible to implement, but it is simply currently beyond me...
A little bit overcomplicated but solve your original problem.
The best answer I could get, would be if someone could tell me a simple way to apply a Map[String, Any] to something equivalent to the 'copy' method. I'm fairly sure this should be possible to implement, but it is simply currently beyond me...
Take all fields from case class to map.
Update map with new values.
Create case class from new fields map.
Problems:
low performance
I'm pretty sure it can be done simpler...
case class Person(name: String, age: Int)
def getCCParams(cc: Any) =
(Map[String, Any]() /: cc.getClass.getDeclaredFields) {(a, f) =>
f.setAccessible(true)
a + (f.getName -> f.get(cc))
}
def enrichCaseClass[T](cc: T, vals : Map[String, Any])(implicit cmf : ClassManifest[T]) = {
val ctor = cmf.erasure.getConstructors().head
val params = getCCParams(cc.asInstanceOf[Any]) ++ vals
val args = cmf.erasure.getDeclaredFields().map( f => params(f.getName).asInstanceOf[Object] )
ctor.newInstance(args : _*).asInstanceOf[T]
}
val j = Person("Jack", 15)
enrichCaseClass(j, Map("age" -> 18))

Scala practices: lists and case classes

I've just started using Scala/Spark and having come from a Java background and I'm still trying to wrap my head around the concept of immutability and other best practices of Scala.
This is a very small segment of code from a larger program:
intersections is RDD(Key, (String, String))
obs is (Key, (String, String))
Data is just a case class I've defined above.
val intersections = map1 join map2
var listOfDatas = List[Data]()
intersections take NumOutputs foreach (obs => {
listOfDatas ::= ParseInformation(obs._1.key, obs._2._1, obs._2._2)
})
listOfDatas foreach println
This code works and does what I need it to do, but I was wondering if there was a better way of making this happen. I'm using a variable list and rewriting it with a new list every single time I iterate, and I'm sure there has to be a better way to create an immutable list that's populated with the results of the ParseInformation method call. Also, I remember reading somewhere that instead of accessing the tuple values directly, the way I have done, you should use case classes within functions (as partial functions I think?) to improve readability.
Thanks in advance for any input!
This might work locally, but only because you are takeing locally. It will not work once distributed as the listOfDatas is passed to each worker as a copy. The better way of doing this IMO is:
val processedData = intersections map{case (key, (item1, item2)) => {
ParseInfo(key, item1, item2)
}}
processedData foreach println
A note for a new to functional dev: If all you are trying to do is transform data in an iterable (List), forget foreach. Use map instead, which runs your transformation on each item and spits out a new iterable of the results.
What's the type of intersections? It looks like you can replace foreach with map:
val listOfDatas: List[Data] =
intersections take NumOutputs map (obs => {
ParseInformation(obs._1.key, obs._2._1, obs._2._2)
})

In Scala what is the easiest way to parse json and map to objects?

I'm looking for a super simple way to take a big JSON fragment, that is a long list with a bunch of big objects in it, and parse it, then pick out the same few values from each object and then map into a case class.
I have tried pretty hard to get lift-json (2.5) working for me, but I'm having trouble cleanly dealing with checking if a key is present, and if so, then map the whole object, but if not, then skip it.
I absolutely do not understand this syntax for Lift-JSON one bit:
case class Car(make: String, model: String)
...
val parsed = parse(jsonFragment)
val JArray(cars) = parsed / "cars"
val carList = new MutableList[Car]
for (car <- cars) {
val JString(model) = car / "model"
val JString(make) = car / "make"
// i want to check if they both exist here, and if so
// then add to carList
carList += car
}
What on earth is that construct that makes it look like a case class is being created left of the assignment operator? I'm talking about the "JString" part.
Also how is it supposed to cope with the situation where a key is missing?
Can someone please explain to me what the right way to do this is?
And if I have nested values I'm looking for, I just want to skip the whole object and go on to try to map the next one.
Is there something more straightforward for this than Lift-JSON?
Would using extractOpt help?
I have looked at this a lot:
https://github.com/lift/framework/tree/master/core/json
and it's still not particularly clear to me.
Help is very much appreciated!!!!!
Since you are only looking to extract certain fields, you are on the right track. This modified version of your for-comprehension will loop through your car structure, extract the make and model and only yield your case class if both items exist:
for{
car <- cars
model <- (car \ "model").extractOpt[String]
make <- (car \ "make").extractOpt[String]
} yield Car(make, model)
You would add additional required fields the same way. If you want to also utilize optional parameters, let's say color - then you can call that in your yield section and the for comprehension won't unbox them:
for{
car <- cars
model <- (car \ "model").extractOpt[String]
make <- (car \ "make").extractOpt[String]
} yield Car(make, model, (car \ "color").extractOpt[String])
In both cases you will get back a List of Car case classes.
The weird looking assignment is pattern-matching used on val declaration.
When you see
val JArray(cars) = parsed / "cars"
it extracts from the parsed json the subtree of "cars" objects and matches the resulting value with the extractor pattern JArrays(cars).
That is to say that the value is expected to be in the form of a constructor JArrays(something) and the something is bound to the cars variable name.
It works pretty much the same as you're probably familiar with case classes, like Options, e.g.
//define a value with a class that can pattern match
val option = Some(1)
//do the matching on val assignment
val Some(number) = option
//use the extracted binding as a variable
println(number)
The following assignments are exactly the same stuff
//pattern match on a JSon String whose inner value is assigned to "model"
val JString(model) = car / "model"
//pattern match on a JSon String whose inner value is assigned to "make"
val JString(make) = car / "make"
References
The JSON types (e.g. JValue, JString, JDouble) are defined as aliases within the net.liftweb.json object here.
The aliases in turn point to corresponding inner case classes within the net.liftweb.json.JsonAST object, found here
The case classes have an unapply method for free, which lets you do the pattern-matching as explained in the above answer.
I think this should work for you:
case class UserInfo(
name: String,
firstName: Option[String],
lastName: Option[String],
smiles: Boolean
)
val jValue: JValue
val extractedUserInfoClass: Option[UserInfo] = jValue.extractOpt[UserInfo]
val jsonArray: JArray
val listOfUserInfos: List[Option[UserInfo]] = jsonArray.arr.map(_.extractOpt[UserInfo])
I expect jValue to have smiles and name -- otherwise extracting will fail.
I don't expect jValue to necessarily have firstName and lastName -- so I write Option[T] in the case class.

Map inside Map in Scala

I've this code :
val total = ListMap[String,HashMap[Int,_]]
val hm1 = new HashMap[Int,String]
val hm2 = new HashMap[Int,Int]
...
//insert values in hm1 and in hm2
...
total += "key1" -> hm1
total += "key2" -> hm2
....
val get = HashMap[Int,String] = total.get("key1") match {
case a : HashMap[Int,String] => a
}
This work, but I would know if exists a better (more readable) way to do this.
Thanks to all !
It looks like you're trying to re-implement tuples as maps.
val total : ( Map[Int,String], Map[Int,Int]) = ...
def get : Map[Int,String] = total._1
(edit: oh, sorry, I get it now)
Here's the thing: the code above doesn't work. Type parameters are erased, so the match above will ALWAYS return true -- try it with key2, for example.
If you want to store multiple types on a Map and retrieve them latter, you'll need to use Manifest and specialized get and put methods. But this has already been answers on Stack Overflow, so I won't repeat myself here.
Your total map, containing maps with non uniform value types, would be best avoided. The question is, when you retrieve the map at "key1", and then cast it to a map of strings, why did you choose String?
The most trivial reason might be that key1 and so on are simply constants, that you know all of them when you write your code. In that case, you probably should have a val for each of your maps, and dispense with map of maps entirely.
It might be that the calls made by the client code have this knowledge. Say that the client does stringMap("key1"), or intMap("key2") or that one way or another, the call implies that some given type is expected. That the client is responsible for not mixing types and names. Again in that case, there is no reason for total. You would have a map of string maps, a map of int maps (provided that you are previous knowledge of a limited number of value types)
What is your reason to have total?
First of all: this is a non-answer (as I would not recommend the approach I discuss), but it was too long for a comment.
If you haven't got too many different keys in your ListMap, I would suggest trying Malvolio's answer.
Otherwise, due to type erasure, the other approaches based on pattern matching are practically equivalent to this (which works, but is very unsafe):
val get = total("key1").asInstanceOf[HashMap[Int, String]]
the reasons why this is unsafe (unless you like living dangerously) are:
total("key1") is not returning an Option (unlike total.get("key1")). If "key1" does not exist, it will throw a NoSuchElementException. I wasn't sure how you were planning to manage the "None" case anyway.
asInstanceOf will also happily cast total("key2") - which should be a HashMap[Int, Int], but is at this point a HashMap[Int, Any] - to a HashMap[Int, String]. You will have problem later on when you try to access the Int value (which now scala believes is a String)

scala: map-like structure that doesn't require casting when fetching a value?

I'm writing a data structure that converts the results of a database query. The raw structure is a java ResultSet and it would be converted to a map or class which permits accessing different fields on that data structure by either a named method call or passing a string into apply(). Clearly different values may have different types. In order to reduce burden on the clients of this data structure, my preference is that one not need to cast the values of the data structure but the value fetched still has the correct type.
For example, suppose I'm doing a query that fetches two column values, one an Int, the other a String. The result then names of the columns are "a" and "b" respectively. Some ideal syntax might be the following:
val javaResultSet = dbQuery("select a, b from table limit 1")
// with ResultSet, particular values can be accessed like this:
val a = javaResultSet.getInt("a")
val b = javaResultSet.getString("b")
// but this syntax is undesirable.
// since I want to convert this to a single data structure,
// the preferred syntax might look something like this:
val newStructure = toDataStructure[Int, String](javaResultSet)("a", "b")
// that is, I'm willing to state the types during the instantiation
// of such a data structure.
// then,
val a: Int = newStructure("a") // OR
val a: Int = newStructure.a
// in both cases, "val a" does not require asInstanceOf[Int].
I've been trying to determine what sort of data structure might allow this and I could not figure out a way around the casting.
The other requirement is obviously that I would like to define a single data structure used for all db queries. I realize I could easily define a case class or similar per call and that solves the typing issue, but such a solution does not scale well when many db queries are being written. I suspect some people are going to propose using some sort of ORM, but let us assume for my case that it is preferred to maintain the query in the form of a string.
Anyone have any suggestions? Thanks!
To do this without casting, one needs more information about the query and one needs that information at compiole time.
I suspect some people are going to propose using some sort of ORM, but let us assume for my case that it is preferred to maintain the query in the form of a string.
Your suspicion is right and you will not get around this. If current ORMs or DSLs like squeryl don't suit your fancy, you can create your own one. But I doubt you will be able to use query strings.
The basic problem is that you don't know how many columns there will be in any given query, and so you don't know how many type parameters the data structure should have and it's not possible to abstract over the number of type parameters.
There is however, a data structure that exists in different variants for different numbers of type parameters: the tuple. (E.g. Tuple2, Tuple3 etc.) You could define parameterized mapping functions for different numbers of parameters that returns tuples like this:
def toDataStructure2[T1, T2](rs: ResultSet)(c1: String, c2: String) =
(rs.getObject(c1).asInstanceOf[T1],
rs.getObject(c2).asInstanceOf[T2])
def toDataStructure3[T1, T2, T3](rs: ResultSet)(c1: String, c2: String, c3: String) =
(rs.getObject(c1).asInstanceOf[T1],
rs.getObject(c2).asInstanceOf[T2],
rs.getObject(c3).asInstanceOf[T3])
You would have to define these for as many columns you expect to have in your tables (max 22).
This depends of course on that using getObject and casting it to a given type is safe.
In your example you could use the resulting tuple as follows:
val (a, b) = toDataStructure2[Int, String](javaResultSet)("a", "b")
if you decide to go the route of heterogeneous collections, there are some very interesting posts on heterogeneous typed lists:
one for instance is
http://jnordenberg.blogspot.com/2008/08/hlist-in-scala.html
http://jnordenberg.blogspot.com/2008/09/hlist-in-scala-revisited-or-scala.html
with an implementation at
http://www.assembla.com/wiki/show/metascala
a second great series of posts starts with
http://apocalisp.wordpress.com/2010/07/06/type-level-programming-in-scala-part-6a-heterogeneous-list%C2%A0basics/
the series continues with parts "b,c,d" linked from part a
finally, there is a talk by Daniel Spiewak which touches on HOMaps
http://vimeo.com/13518456
so all this to say that perhaps you can build you solution from these ideas. sorry that i don't have a specific example, but i admit i haven't tried these out yet myself!
Joschua Bloch has introduced a heterogeneous collection, which can be written in Java. I once adopted it a little. It now works as a value register. It is basically a wrapper around two maps. Here is the code and this is how you can use it. But this is just FYI, since you are interested in a Scala solution.
In Scala I would start by playing with Tuples. Tuples are kinda heterogeneous collections. The results can be, but not have to be accessed through fields like _1, _2, _3 and so on. But you don't want that, you want names. This is how you can assign names to those:
scala> val tuple = (1, "word")
tuple: ([Int], [String]) = (1, word)
scala> val (a, b) = tuple
a: Int = 1
b: String = word
So as mentioned before I would try to build a ResultSetWrapper around tuples.
If you want "extract the column value by name" on a plain bean instance, you can probably:
use reflects and CASTs, which you(and me) don't like.
use a ResultSetToJavaBeanMapper provided by most ORM libraries, which is a little heavy and coupled.
write a scala compiler plugin, which is too complex to control.
so, I guess a lightweight ORM with following features may satisfy you:
support raw SQL
support a lightweight,declarative and adaptive ResultSetToJavaBeanMapper
nothing else.
I made an experimental project on that idea, but note it's still an ORM, and I just think it may be useful to you, or can bring you some hint.
Usage:
declare the model:
//declare DB schema
trait UserDef extends TableDef {
var name = property[String]("name", title = Some("姓名"))
var age1 = property[Int]("age", primary = true)
}
//declare model, and it mixes in properties as {var name = ""}
#BeanInfo class User extends Model with UserDef
//declare a object.
//it mixes in properties as {var name = Property[String]("name") }
//and, object User is a Mapper[User], thus, it can translate ResultSet to a User instance.
object `package`{
#BeanInfo implicit object User extends Table[User]("users") with UserDef
}
then call raw sql, the implicit Mapper[User] works for you:
val users = SQL("select name, age from users").all[User]
users.foreach{user => println(user.name)}
or even build a type safe query:
val users = User.q.where(User.age > 20).where(User.name like "%liu%").all[User]
for more, see unit test:
https://github.com/liusong1111/soupy-orm/blob/master/src/test/scala/mapper/SoupyMapperSpec.scala
project home:
https://github.com/liusong1111/soupy-orm
It uses "abstract Type" and "implicit" heavily to make the magic happen, and you can check source code of TableDef, Table, Model for detail.
Several million years ago I wrote an example showing how to use Scala's type system to push and pull values from a ResultSet. Check it out; it matches up with what you want to do fairly closely.
implicit val conn = connect("jdbc:h2:f2", "sa", "");
implicit val s: Statement = conn << setup;
val insertPerson = conn prepareStatement "insert into person(type, name) values(?, ?)";
for (val name <- names)
insertPerson<<rnd.nextInt(10)<<name<<!;
for (val person <- query("select * from person", rs => Person(rs,rs,rs)))
println(person.toXML);
for (val person <- "select * from person" <<! (rs => Person(rs,rs,rs)))
println(person.toXML);
Primitives types are used to guide the Scala compiler into selecting the right functions on the ResultSet.