Why does Breeze use Array to represent a matrix? - scala

The class DenseMatrix has a parameter data of type Array[V]. Why not using some other mutable collection that can grow dynamically, such as Vector?

The comments (from Raphael Roth and Jasper-M) both make good points and they're both part of the reason. Breeze uses netlib-java for handling its interface to native BLAS via JNI, and it uses arrays. (It could have been implemented in terms of Java Buffers, but they didn't.) Dynamically resizing DenseMatrices doesn't make a lot of sense, and the implementations of DM and DV are deliberately similar.
Arrays also have way better performance characteristics than any other built-in collection-like thing in Java and in Scala, and since Breeze cares about being fast, it's the best choice. All generic collections in both scala and java box primitive elements, which is totally unacceptable in performance sensitive environments. (I could have rolled my own specialized ArrayBuffer like thing using Scala's specialized, but I didn't.) Also, java Vector synchronizes all accesses and so it is particularly unacceptable unless you actually need the locking.
You can use VectorBuilder (which has a settable length parameter and can be set to -1 to turn off bounds checking) if you're unsure of the dimensionality of your data set.

Related

Pattern to bridge the gap between Scalas functional immutable style and JavaFX 2 Properties?

currently working on a GUI application using JavaFX 2 as framework. Used in Java allready and know the principles of data binding.
As the functional style programming in scala advocates the use of imutable values (vals), there is a gap.
Is there an other solution than having an mutable fx-property based presentation model for the gui and and immutable model for application logic with an conversion layer?
Greets,
Andreas
Since your question is a bit vague, please forgive if this is largely based on personal opinion: There are, to my knowledge, no other approaches to the mutable property model. I would however argue that you don't want one:
First of, functional programming, at least from a puristic point of view, attempts to avoid side effects. User interfaces, however, are exclusively about causing side effects. There is a slight philosophical mismatch to begin there.
One of the main benefits of immutable data is that you don't have to deal with control structures to avoid concurrent modification. However, JavaFX's event queue implements a very strict single-threaded approach with an implicit control of read and write access. On the other hand, user interface components fit the idea of mutable objects better than most other fields of programming. The node structure is, after all, an inherent hierarchy of stateful components.
Considering this, I think trying to force a functional and immutable paradigm on JavaFX is not going to work out. Instead, I would recommend building a translation layer based on keypath selections - e.g. binding a Label to display an (immutable) Person's name to the Person, not the name property, and having a resolver handle the access to the name attribute. Basically, this would mean having a combination of Bindings#select and a JavaBeanStringProperty.

Scala, Morphia and Enumeration

I need to store Scala class in Morphia. With annotations it works well unless I try to store collection of _ <: Enumeration
Morphia complains that it does not have serializers for that type, and I am wondering, how to provide one. For now I changed type of collection to Seq[String], and fill it with invoking toString on every item in collection.
That works well, however I'm not sure if that is right way.
This problem is common to several available layers of abstraction on the top of MongoDB. It all come back to a base reason: there is no enum equivalent in json/bson. Salat for example has the same problem.
In fact, MongoDB Java driver does not support enums as you can read in the discussion going on here: https://jira.mongodb.org/browse/JAVA-268 where you can see the problem is still open. Most of the frameworks I have seen to use MongoDB with Java do not implement low-level functionalities such as this one. I think this choice makes a lot of sense because they leave you the choice on how to deal with data structures not handled by the low-level driver, instead of imposing you how to do it.
In general I feel that the absence of support comes not from technical limitation but rather from design choice. For enums, there are multiple way to map them with their pros and their cons, while for other data types is probably simpler. I don't know the MongoDB Java driver in detail, but I guess supporting multiple "modes" would have required some refactoring (maybe that's why they are talking about a new version of serialization?)
These are two strategies I am thinking about:
If you want to index on an enum and minimize space occupation, you will map the enum to an integer ( Not using the ordinal , please can set enum start value in java).
If your concern is queryability on the mongoshell, because your data will be accessed by data scientist, you would rather store the enum using its string value
To conclude, there is nothing wrong in adding an intermediate data structure between your native object and MongoDB. Salat support it through CustomTransformers, on Morphia maybe you would need to do the conversion explicitely. Go for it.

how to access complex data structures in Scala while preserving immutability?

Calling expert Scala developers! Let's say you have a large object representing a writable data store. Are you comfortable with this common Java-like approach:
val complexModel = new ComplexModel()
complexModel.modify()
complexModel.access(...)
Or do you prefer:
val newComplexModel = complexModel.withADifference
newComplexModel.access(...)
If you prefer that, and you have a client accessing the model, how is the client going
to know to point to newComplexModel rather than complexModel? From the user's perspective
you have a mutable data store. How do you reconcile that perspective with Scala's emphasis
on immutability?
How about this:
var complexModel = new ComplexModel()
complexModel = complexModel.withADifference
complexModel.access(...)
This seems a bit like the first approach, except that it seems the code inside withADifference is going to have to do more work than the code inside modify(), because it has to create a whole new complex data object rather than modifying the existing one. (Do you run into this problem of having to do more work in trying to preserve
immutability?) Also, you now have a var with a large scope.
How would you decide on the best strategy? Are there exceptions to the strategy you would choose?
I think the functional way is to actually have Stream containing all your different versions of your datastructure and the consumer just trying to pull the next element from that stream.
But I think in Scala it is an absolutely valid approach to a mutable reference in one central place and change that, while your whole datastructure stays immutable.
When the datastructure becomes more complex you might be interested in this question: Cleaner way to update nested structures which asks (and gets answered) how to actually create new change versions of an immutable data structure that is not trivial.
By such name of method as modify only it's easy to identify your ComplexModel as a mutator object, which means that it changes some state. That only implies that this kind of object has nothing to do with functional programming and trying to make it immutable just because someone with questionable knowledge told you that everything in Scala should be immutable will simply be a mistake.
Now you could modify your api so that this ComplexModel operated on immutable data, and I btw think you should, but you definitely must not try to convert this ComplexModel into immutable itself.
The canonical answer to your question is using Zipper, one SO question about it.
The only implementation for Scala I know of is in ScalaZ.
Immutability is merely a useful tool, not dogma. Situations will arise where the cost and inconvenience of immutability outweigh its usefulness.
The size of a ComplexModel may make it so that creating a modified copy is sufficiently expensive in terms of memory and/or CPU that a mutable model is more practical.

How do I implement a collection in Scala 2.8?

In trying to write an API I'm struggling with Scala's collections in 2.8(.0-beta1).
Basically what I need is to write something that:
adds functionality to immutable sets of a certain type
where all methods like filter and map return a collection of the same type without having to override everything (which is why I went for 2.8 in the first place)
where all collections you gain through those methods are constructed with the same parameters the original collection had (similar to how SortedSet hands through an ordering via implicits)
which is still a trait in itself, independent of any set implementations.
Additionally I want to define a default implementation, for example based on a HashSet. The companion object of the trait might use this default implementation. I'm not sure yet if I need the full power of builder factories to map my collection type to other collection types.
I read the paper on the redesign of the collections API but it seems like things have changed a bit since then and I'm missing some details in there. I've also digged through the collections source code but I'm not sure it's very consistent yet.
Ideally what I'd like to see is either a hands-on tutorial that tells me step-by-step just the bits that I need or an extensive description of all the details so I can judge myself which bits I need. I liked the chapter on object equality in "Programming in Scala". :-)
But I appreciate any pointers to documentation or examples that help me understand the new collections design better.
I'd have a look at the implementation of collection.immutable.BitSet. It's a bit spread out, reusing things from collection.BitSetLike and collection.generic.BitSetFactory. But it does exactly what you specified: implement an immutable set of a certain element type that adds new functionality.

NSDictionaries vs. custom objects with properties, what's your take?

I'm writing an App that basically uses 5 business entities, A, B C, D and E
A has some properties and holds a list of B's
B has some other properties and a list of C's and a list of D's
C has some other properties and a list of D's and a list of E's
D has only a few properties
E has only a few properties
There is no inheritance between any of them.
There's no real business logic involved, the objects are created, populated, and then accessed read-only, no further manipulations.
My natural coding style would be to go object oriented and write classes for each of those entities, use NSArrays for the lists, and have the mentioned properties synthesized.
It would make the code readable.
But another approach seems obvious too: only use NSDictionaries and NSArrays, and working with keys/values instead of properties. This seems more efficient, and somehow "closer" to iPhone-style programming to me... but obviously leads to less readable code. Another advantage is there's no additional custom encoding/decoding for serialization required (persisting state to disk, using JSON, ...)
So on the paper, it speaks for the latter approach, on the other hand, it still feels somehow awkward NOT to use custom objects...
Is this really just a matter of taste question? Or are there maybe other arguments in favour/against one of the approaches? Is only using Dictionaries better memory/performance-wise? Is it the preferred "Apple Coding Style"? (I'm coming from Java/C#).
I don't see much difference between Java/C# and Cocoa in this area. Your question is equivalently applicable to those platforms as well (the same also applies to key-value stores and relational stores).
In an object oriented environment, you have to make a trade-off between the flexibility of the key-value approach for storing data and the structured and object oriented style. I'd go with the key-value approach only when I need the flexibility (e.g. the structure is dynamic and might change by user or not known at compile time). Otherwise, taking that route might get you completely off the OOP conventions and benefits (By the way, this is the important point. Does the hassle of sticking to object oriented principles worth it for that specific circumstance? I think your question reduces to this one and to answer it, you should analyze your specific situation)
It largely depends on whether your objects are just collections of data (key/value pairs) or implement their own functionality.
If they're data I'd say go with NSDictionary, it's a lot less code and as you point out you won't have to write serialization routines for each class.
Use a hybrid approach. Store the dictionaries the objects are based on, but expose the most-used values as properties that are either filled when the object is initialized from a dictionary, or have the accessors look into the dictionary for values (less efficient).
Also provide a property to get at the dictionary. This way if you need to propagate a new value quickly to a specific area of the code from the dictionary (presumably a new value added by the server) you have that flexibility. Then if callers are making heavy use of a value you can migrate it to be a true property and get the completion and type checking of a property.