Transient collections for Scala? - scala

Clojure has a very nice concept of transient collections. Is there a library providing those for Scala (or F#)?

This sounds like a really great concept for language like F#, thanks for an interesting link!
When programming with arrays, F# programmers use exactly the same pattern. For example, create a mutable array, initialize it imperatively, return it and then work with it using functions that treat it as immutable such as Array.map (even though the array can actually be mutated as there is no transient array).
Using seq<'a> type: One way to do something similar is to convert the data structure to a generic sequence (seq<'a>) which is an immutable data type and so you cannot (directly) modify the original data structure via seq<'a>. For example:
let test () =
let arr = Array.create 10 0
for i in 0 .. (arr.Length - 1) do
arr.[i] <- // some calculation
Array.toSeq arr
The good thing is that the conversion is usually O(1) (arrays/lists/.. implement seq<'a> as an interface, so this is just cast). However, seq<'a> doesn't preserve properties of the source collection (such as efficiency, etc) and you can process it only using generic functions for working with sequences (from the Seq module). However, I think this is relatively close to the transient collections pattern.
Similar .NET type is also ReadOnlyCollection<'a> which wraps a collection type (somewhat more powerful than seq<'a>) into an immutable wrapper whose operations for modifying collection throw exception.
Related types: For more complicated types of collections, F#/.NET usually have both mutable and immutable implementation (the immutable one coming from F# libraries). The types are usually quite different, but sometimes share a common interface. This makes it possible to use one type when you use mutation and convert it to the other type when you know that you won't need it anymore. However, here you need copy data between different structures, so the conversion definitely isn't O(1). It may be something between O(n) and O(n*log n).
Examples of similar collections are mutable Dictionary<'Key, 'Value> with immutable Map<'Key, 'Value> and mutable HashSet<'T> or SortedSet<'T> with immutable set<'T> (from F# libraries).

I don't know of any library for this in F# (nothing in the standard library, and I don't recall seeing anyone blog anything like this, though there are a number of libraries for simple persistent/immutable structures). It would be great for some third-party to create such a library. Rich Hickey is the man these days when it comes to these awesome practical (mostly-)functional data structures, I love reading about that stuff.

Please have a look at the following post by Daniel Spiewak:
http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala
He also ported the algorithm by Rich Hickey to Scala. In the article IntMap is also mentioned which is almost as fast the Clojure implementation.

Related

Scala - Does pattern matching break the Open-Closed principle? [duplicate]

If I add a new case class, does that mean I need to search through all of the pattern matching code and find out where the new class needs to be handled? I've been learning the language recently, and as I read about some of the arguments for and against pattern matching, I've been confused about where it should be used. See the following:
Pro:
Odersky1 and
Odersky2
Con:
Beust
The comments are pretty good in each case, too. So is pattern matching something to be excited about or something I should avoid using? Actually, I imagine the answer is "it depends on when you use it," but what are some positive use cases for it and what are some negative ones?
Jeff, I think you have the right intuition: it depends.
Object-oriented class hierarchies with virtual method dispatch are good when you have a relatively fixed set of methods that need to be implemented, but many potential subclasses that might inherit from the root of the hierarchy and implement those methods. In such a setup, it's relatively easy to add new subclasses (just implement all the methods), but relatively difficult to add new methods (you have to modify all the subclasses to make sure they properly implement the new method).
Data types with functionality based on pattern matching are good when you have a relatively fixed set of classes that belong to a data type, but many potential functions that operate on that data type. In such a setup, it's relatively easy to add new functionality for a data type (just pattern match on all its classes), but relatively difficult to add new classes that are part of the data type (you have to modify all the functions that match on the data type to make sure they properly support the new class).
The canonical example for the OO approach is GUI programming. GUI elements need to support very little functionality (drawing themselves on the screen is the bare minimum), but new GUI elements are added all the time (buttons, tables, charts, sliders, etc). The canonical example for the pattern matching approach is a compiler. Programming languages usually have a relatively fixed syntax, so the elements of the syntax tree will change rarely (if ever), but new operations on syntax trees are constantly being added (faster optimizations, more thorough type analysis, etc).
Fortunately, Scala lets you combine both approaches. Case classes can both be pattern matched and support virtual method dispatch. Regular classes support virtual method dispatch and can be pattern matched by defining an extractor in the corresponding companion object. It's up to the programmer to decide when each approach is appropriate, but I think both are useful.
While I respect Cedric, he's completely wrong on this issue. Scala's pattern matching can be fully-encapsulated from class changes when desired. While it is true that a change to a case class would require changing any corresponding pattern matching instances, this is only when using such classes in a naive fashion.
Scala's pattern matching always delegates to the deconstructor of a class's companion object. With a case class, this deconstructor is automatically generated (along with a factory method in the companion object), though it is still possible to override this auto-generated version. At all times, you can assert complete control over the pattern matching process, insulating any patterns from potential changes in the class itself. Thus, pattern matching is simply another way of accessing class data through the safe filter of encapsulation, just like any other method.
So, Dr. Odersky's opinion would be the one to trust here, particularly given the sheer volume of research he has performed in the area of object-oriented programming and design.
As for where it should be used, that is entirely according to taste. If it makes your code more concise and maintainable, use it! Otherwise, don't. For most object-oriented programs, pattern matching is unnecessary. However, once you begin to integrate more functional idioms (Option, List, etc) I think you'll find that pattern matching will significantly reduce syntactic overhead as well as improving the safety offered by the type system. In general, any time you want to extract data while simultaneously testing some condition (e.g. extracting a value from Some), pattern matching will likely be of use.
Pattern matching is definitely good if you are doing functional programming. In case of OO, there are some cases where it is good. In Cedric's example itself, it depends on how you view the print() method conceptually. Is it a behavior of each Term object? Or is it something outside it? I would say it is outside, and makes sense to do pattern matching. On the other hand if you have an Employee class with various subclasses, it is a poor design choice to do pattern matching on an attribute of it (say name) in the base class.
Also pattern matching offers an elegant way of unpacking members of a class.

What is a multi-pass Sequence in Swift?

I am looking at this language from the Swift GeneratorType documentation and I'm having a hard time understanding it:
Any code that uses multiple generators (or for...in loops) over a single sequence should have static knowledge that the specific sequence is multi-pass, either because its concrete type is known or because it is constrained to CollectionType. Also, the generators must be obtained by distinct calls to the sequence's generate() method, rather than by copying.
What does it mean for a sequence to be "multi-pass"? This language seems quite important, but I can't find a good explanation for it. I understand, for example, the concept of a "multi-pass compiler", but I'm unsure if the concepts are similar or related...
Also, I have searched SO for other posts that answer this question. I have found this one, which makes the following statement in the C++ context:
The difference between algorithms that copy their iterators and those that do not is that the former are termed "multipass" algorithms, and require their iterator type to satisfy ForwardIterator, while the latter are single-pass and only require InputIterator.
But the meaning of that isn't entirely clear to me either, and I'm not sure if the concept is the same in Swift.
Any insight from those wiser than me would be much appreciated.
A "multi-pass" sequence is one that can be iterated over multiple times via a for...in loop or by using any number of generators (constructed via generate())
The text explains you would know a sequence is multi-pass because you
know its type (perhaps a class you designed) or
know it conforms to CollectionType. (for example, sets and arrays)

Why is elt not as common as car, cdr, first, rest?

I see this a lot in examples I read in books and articles:
(caddr *something*)
Or the many variants of c***r commands.
It seems a bit ridiculous to me, when you can more clearly just pull things out with elt:
(elt *something* 2)
But I don't see this technique used as much.
Is there a convention I don't understand that prefers the c***r functions?
elt is a generic function that works on both lists and arrays. It is needed when you want to write a generic algorithm which works in the same way on both data types. But there aren't many such algorithms because this would usually disadvantage lists.
The general convention seems to be that:
If you write a general-purpose function to work on lists you would use c(a|d)+r function (these cases are exceedingly rare because most of the time there is a library function for that). This usually happens in Stackoverflow questions code / code for class assignments etc :)
Seasoned Lisp programmers would use first, second etc. if possible. This is also some times mentioned in best practices. The best practices would also usually mention that one should create appropriate data structures instead of dealing with non-trivial lists.
nth or elt are indeed rare because it's hard to think of a good use case for them. I can imagine how both can be used in macros, where the performance isn't important, but some kind of generality is desirable, for instance if someone would want to deal with strings and lists of characters in the same way. Maybe in some prototyping code, where the programmer isn't yet sure of what data type they are going to use, but that's about it.
Functions like caddr, etc can extract components from lists nested inside other lists. But elt only operates on the top-level list, thus it could return a nested list in its entirety, but you'd need to also nest elt commands to extract components, which is what caddr is ultimately doing, so they are not really so comparable.
In some cases, you could interchange them, as they might return the same results if your list has no lists inside it.

Scala binary serialization library

I'm interested in canvasing opinion about options for Scala data-structure serialization. I'd like to find something which is developed enough to allow (if possible) efficient binary serialization of the Scala collection types (i.e. not using generic Java reflection - I don't want to be serializing all parts of a collection class, including internal book-keeping data) but also allows me to extend functionality for my own purposes/classes: I am more than happy to have to write serialization code for each of our own classes, but would rather not have to do it for collections from the Scala standard libraries. In C++ I get a lot of this functionality from the Boost serialization library.
I've used SBinary in the past and it does some of what I want, but is not getting obvious active maintenance and doesn't seem (afaik) to keep track of objects already serialized (e.g. for DAGs or cyclic datastructures).
Are there other Scala-specific solutions, or if not, what are your recommendations for efficient binary serialization?
Probably, if you only need to serialize data and not whole java objects, best solutions are :
msgpack(implementation),
BSON (implementation),
protocol buffers.
I am using msgpack and bson in several projects and they work pretty well. I really recommend msgpack – has most efficient binary representation (of these three) and is fully JSON compatibile.
A protocol buffers compiler for Scala: https://github.com/SandroGrzicic/ScalaBuff - perhaps this can help?
There's a couple of other links at the bottom of this page: http://doc.akka.io/docs/akka/snapshot/scala/serialization.html

Can you suggest any good intro to Scala philosophy and programs design?

In Java and C++ designing program's objects hierarchy is pretty obvious. But beginning Scala I found myself difficult to decide what classes to define to better employ Scala's syntactic sugar facilities (an even idealess about how should I design for better performance). Any good readings on this question?
I have read 4 books on Scala, but I have not found what you are asking for. I guess you have read "Programming in Scala" by Odersky (Artima) already. If not, this is a link to the on-line version:
http://www.docstoc.com/docs/8692868/Programming-In-Scala
This book gives many examples how to construct object-oriented models in Scala, but all examples are very small in number of classes. I do not know of any book that will teach you how to structure large scale systems using Scala.
Imperative object-orientation has
been around since Smalltalk, so we
know a lot about this paradigm.
Functional object-orientation on the
other hand, is a rather new concept,
so in a few years I expect books
describing large scale FOO systems to
appear. Anyway, I think that the PiS
book gives you a pretty good picture
how you can put together the basic
building blocks of a system, like
Factory pattern, how to replace the
Strategy pattern with function
literals and so on.
One thing that Viktor Klang once told me (and something I really agree upon) is that one difference between C++/Java and Scala OO is that you define a lot more (smaller) classes when you use Scala. Why? Because you can! The syntactic sugar for the case class result in a very small penalty for defining a class, both in typing and in readability of the code. And as you know, many small classes usually means better OO (fewer bugs) but worse performance.
One other thing I have noticed is that I use the factory pattern a lot more when dealing with immutable objects, since all "changes" of an instance results in creating a new instance. Thank God for the copy() method on the case class. This method makes the factory methods a lot shorter.
I do not know if this helped you at all, but I think this subject is very interesting myself, and I too await more literature on this subject.
Cheers!
This is still an evolving matter. For instance, the just released Scala 2.8.0 brought support of type constructor inference, which enabled a pattern of type classes in Scala. The Scala library itself has just began using this pattern. Just yesterday I heard of a new Lift module in which they are going to try to avoid inheritance in favor of type classes.
Scala 2.8.0 also introduced lower priority implicits, plus default and named parameters, both of which can be used, separately or together, to produce very different designs than what was possible before.
And if we go back in time, we note that other important features are not that old either:
Extractor methods on case classes object companions where introduced February 2008 (before that, the only way to do extraction on case classes was through pattern matching).
Lazy values and Structural types where introduced July 2007.
Abstract types support for type constructors was introduced in May 2007.
Extractors for non-case classes was introduced in January 2007.
It seems that implicit parameters were only introduced in March 2006, when they replaced the way views were implemented.
All that means we are all learning how to design Scala software. Be sure to rely on tested designs of functional and object oriented paradigms, to see how new features in Scala are used in other languages, like Haskell and type classes or Python and default (optional) and named parameters.
Some people dislike this aspect of Scala, others love it. But other languages share it. C# is adding features as fast as Scala. Java is slower, but it goes through changes too. It added generics in 2004, and the next version should bring some changes to better support concurrent and parallel programming.
I don't think that there are much tutorials for this. I'd suggest to stay with the way you do it now, but to look through "idiomatic" Scala code as well and to pay special attention in the following cases:
use case classes or case objects instead of enums or "value objects"
use objects for singletons
if you need behavior "depending on the context" or dependency-injection-like functionality, use implicits
when designing a type hierarchy or if you can factor things out of a concrete class, use traits when possible
Fine grained inheritance hierarchies are OK. Keep in mind that you have pattern matching
Know the "pimp my library" pattern
And ask as many questions as you feel you need to understand a certain point. The Scala community is very friendly and helpful. I'd suggest the Scala mailing list, Scala IRC or scala-forum.org
I've just accidentally googled to a file called "ScalaStyleGuide.pdf". Going to read...