In my Scala application I need to use several Maps and Lists which gets updated very often. Are there any thread safe collections in Scala which maintain the insertion order?
Yes, there is a trait in the scala.collection.concurrent package: concurrent.Map it's just a trait, so just mixin this trait into your Map and it would become thread-safe.
If you need a good concurrent map, try google's ConcurrentLinkedHashMap and convert it to Scala Map using Scala/Java converter, that will give more performance that mixin SynchronizedMap. For example my favourite Spray toolkit, use it as a core structure for implmenting it's caching module. As you can see, spray is the fastest Scala web toolkit
Related
In my project, I need to ship plenty of existing Java objects to Spark workers, most of them are not extended from java.io.Serializable. I also want the ability to control the variables/attributes included in the objects. I only want to serialize the useful attributes, not everything in a object.
Spark documentation indicates that there are two ways to serialize an object in Spark, using java.io.Serializable or Kryo. I think both ways need to rewrite a bunch of wrappers or extra code for every business objects. However, my current code base has already implemented protocol-buffers serialization. I am wondering if there is any way to embed this serialization mechanism into Spark.
I am building an application using Scala 2.10, Salat and Play frmework 2.1-RC2 (will upgrade to 2.1 release soon) and MongoDB.
This is a faceless application where JSON web services are exposed for consumers. Up until now JSON was converted into Model object directly using Play's Json API and implicit converters. I have to refactor some case classes to avoid 22 tuples limit and now instead of flat case class I'm now refactoring to have an embedded case(and embedded MongoDB collection).
Web service interface should remain same where client should still be passing in JSON data as they were before in a flat structure but application needs to map them into proper case class(es) structure. What's the best way to handle this kind of situation. I fear of writing a lot of conversion code <-> Flat JSON <-> complex case class structure <-> from complex case classes to flat JSON output again.
How would you approach such a requirement? I assume case class 22 tuple limit may have had been faced by many others to handle this kind of requirements? How would you approach this
The Play 2.1 json library relies heavily on combinators (path1 and path2). These combinators all have the same 22 restriction. That gives you two options:
Don't use combinators and construct your objects the hard way: path(json) will give you the value at that point in the path. Searching for 'Accessing value of JsPath' at ScalaJsonCombinators will give more examples.
First transform the json into a structure that does not have more than 22 values in a single object and then use the normal combinators. More information about transforming can be found here: ScalaJsonTransformers
I'm working on my first Scala application, where we use an ActiveRecord style to retrieve data from MongoDB.
I have models like User and Category, which all have a companion object that uses the trait:
class MongoModel[T <: IdentifiableModel with CaseClass] extends ModelCompanion[T, ObjectId]
ModelCompanion is a Salat class which provide common MongoDB crud operations.
This permits to retrieve data like this:
User.profile(userId)
I never had any experience with this ActiveRecord query style. But I know Rails people are using it. And I think I saw it on Play documentation (version 1.2?) to deal with JPA.
For now it works fine, but I want to be able to run integration tests on my MongoDB.
I can run an "embedded" MongoDB with a library. The big deal is that my host/port configuration are actually kind of hardcoded on the MongoModel class which is extended by all the model companions.
I want to be able to specify a different host/port when I run integration tests (or any other "profile" I could create in the future).
I understand well dependency injection, using Spring for many years in Java, and the drawbacks of all this static stuff in my application. I saw that there is now a scala-friendly way to configure a Spring application, but I'm not sure using Spring is appropriate in Scala.
I have read some stuff about the Cake pattern and it seems to do what I want, being some kind of typesafe, compile-time-checked spring context.
Should I definitely go to the Cake pattern, or is there any other elegant alternative in Scala?
Can I keep using an ActiveRecord style or is it a total anti-pattern for testability?
Thanks
No any static references - using Cake pattern you got 2 different classes for 2 namespaces/environments, each overriding "host/port" resource on its own. Create a trait containing your resources, inherit it 2 times (by providing actual information about host/port, depending on environment) and add to appropriate companion objects (for prod and for test). Inside MongoModel add self type that is your new trait, and refactor all host/port references in MongoModel, to use that self type (your new trait).
I'd definitely go with the Cake Pattern.
You can read the following article with show an example of how to use the Cake Pattern in a Play2 application:
http://julien.richard-foy.fr/blog/2011/11/26/dependency-injection-in-scala-with-play-2-it-s-free/
I need to store Scala class in Morphia. With annotations it works well unless I try to store collection of _ <: Enumeration
Morphia complains that it does not have serializers for that type, and I am wondering, how to provide one. For now I changed type of collection to Seq[String], and fill it with invoking toString on every item in collection.
That works well, however I'm not sure if that is right way.
This problem is common to several available layers of abstraction on the top of MongoDB. It all come back to a base reason: there is no enum equivalent in json/bson. Salat for example has the same problem.
In fact, MongoDB Java driver does not support enums as you can read in the discussion going on here: https://jira.mongodb.org/browse/JAVA-268 where you can see the problem is still open. Most of the frameworks I have seen to use MongoDB with Java do not implement low-level functionalities such as this one. I think this choice makes a lot of sense because they leave you the choice on how to deal with data structures not handled by the low-level driver, instead of imposing you how to do it.
In general I feel that the absence of support comes not from technical limitation but rather from design choice. For enums, there are multiple way to map them with their pros and their cons, while for other data types is probably simpler. I don't know the MongoDB Java driver in detail, but I guess supporting multiple "modes" would have required some refactoring (maybe that's why they are talking about a new version of serialization?)
These are two strategies I am thinking about:
If you want to index on an enum and minimize space occupation, you will map the enum to an integer ( Not using the ordinal , please can set enum start value in java).
If your concern is queryability on the mongoshell, because your data will be accessed by data scientist, you would rather store the enum using its string value
To conclude, there is nothing wrong in adding an intermediate data structure between your native object and MongoDB. Salat support it through CustomTransformers, on Morphia maybe you would need to do the conversion explicitely. Go for it.
I'm trying to find the 'right' actor implementation. I realized there is a bunch of them and it's a bit confusing to pick one. Personally I'm especially interested in remote actors, but I guess a complete overview would be helpful to many others. This is a pretty general question, so feel free to answer just for the implementation you know about.
I know about the following Scala Actor implementations (SAI). Please add the missing ones.
Scala 2.7 (difference to)
Scala 2.8
Akka (http://www.akkasource.org/)
Lift (http://liftweb.net/)
Scalaz (http://code.google.com/p/scalaz/)
What are the target use-cases for these SAIs (lightweight vs. "heavy" enterprise framework)?
do they support remote actors? What shortcomings do remote actors have in the SAIs?
How is their performace?
How active is there community?
How easy are they to get started? How good is the documentation?
How easy are they to extend?
How stable are they? Which projects are using them?
What are their shortcomings?
What are their design principles?
Are they thread based or event based (receive/ react) or both?
Nested receiveS
hotswapping the Actor’s message loop
This is the most comprehensive comparison I have seen so far:
http://doc.akka.io/docs/misc/Comparison_between_4_actor_frameworks.pdf via http://klangism.tumblr.com/post/2497057136/all-actors-in-scala-compared
As of Scala 2.10, scala actors is now deprecated and Akka Actors is now part of standard distribution
Scala 2.7.7. vs 2.8 after The Scala 2.8.0 RC3 distribution:
New Reactors provide more lightweight, purely event-based actors with optional, implicit sender identification. Support for actors with daemon-style semantics was added. Actors can be configured to use the efficient JSR166y fork/join pool, resulting in significant performance improvements on 1.6 JVMs. Schedulers are now pluggable and easier to customize.
There's also a design document of Haller: Scala Actors: Unifying Thread-based and Event-based Programming
As far as I know, only Scala and Akka support remote actors.
Akka is backed up by scalablesolutions, which offer commerical support and plug ins for akka.
Akka seems like a heavyweight solution, which targets integration with existing frameworks (camel, AMQP, JTA, Comet, Spring, Redis) and additionally STMs and persistence.
Akka compared to Scala doesn't support nested receives, but supports hotswapping the actors message loop and has both, thread based and event based actors and so called "Event-based single-threaded" ones.
I realized that akka enforces exhaustive matches. So even if technically receive expects a partial function, the function must not be partial. This means you have to handle every message immediately.