What is the recommended replacement for LinkedList in Scala 2.11+ - scala

A LinkedList is needed for a custom LRU implementation. When looking at the source we see:
#deprecated("Low-level linked lists are deprecated
due to idiosyncrasies in interface and incomplete features.", "2.11.0")
class LinkedList[A]() extends AbstractSeq[A]
with LinearSeq[A]
with GenericTraversableTemplate[A, LinkedList]
with LinkedListLike[A, LinkedList[A]]
with Serializable {
So then what is the recommended alternative (none mentioned here..). Do we fall back to the java.util.LinkedList? I am guessing there were a better option ..
Update The specific characteristic of LinkedList that is needed is the ability to access an individual entry on O(1) in order to insert/remove elements in the list efficiently. This would require that a LinkedListEntry (or Node or similar ..) reference be exposed and returned upon creation of new element in the list. It appears none of the available implementations - including java.util.LinkedList - are suitable.

Probably LinkedHashMap may fit better than others, but:
there is spray-caching which is a really good implemantation and they are using something called ConcurrentLinkedHashMap which is better for using with Scala as it provides high-performance concurrency.

Related

Can anyone give examples for why interface vs abstract class in terms of code reuse, loose-coupling & polymorphism?

There have been several discussions for this question. But I am looking for a good satisfactory answer that in terms of below oops concepts.
a. code reuse
b. loose coupling
c. polymorphism
If any one can explain(with some examples that make use of above specified techniques) when interface is to used and when abstract class is to be used
Loose coupling means encapsulated components know just enough about each other to get the job done - nothing more, nothing less. It is a good property for a system to have because loosely coupled components can change its implementation with little to no impact to other dependent components of the system. This increases the maintainability of the system.
Let's use an Iterator as an example. An Iterator is an object that allows one to traverse a container, element by element. There may be many implementations of an Iterator:
List Iterator
Vector Iterator
Stack Iterator
Linked List Iterator
Queue Iterator
All implementations vary with respect to the container that they operate on. But the concept of an iterator remains the same - its a way to traverse a container, element by element, without having to know the details of the container implementation. You can say that an Iterator interface is an abstraction for traversing a container.
It is loosely coupled because the details of the container implementation are irrelevant to how the elements are traversed. The clients that use the Iterator do not have to know anything about how the Container is implemented, and vice-versa.
Here is an example of an Iterator interface:
public interface Iterator {
public bool HasNext { get; set;}
public object Next();
}
You could then develop algorithms that are dependent on the Iterator abstraction. For example, you could implement a Reverse algorithm, that traverses a container and reverses the sequence:
public IList<int> Reverse(Iterator iter) {
var list = new List<int>();
while (iter.HasNext) {
list.Insert(0, iter.Next());
}
return list;
}
This implementation of the Reverse algorithm has no idea about what container Iterator is based on (it doesn't care). It could be a StackIterator, VectorIterator, etc.
You could further refactor the implementation by returning an Iterator instead of an IList:
public Iterator Reverse(Iterator iter) {
var list = new List<int>();
while (iter.HasNext) {
list.Insert(0, iter.Next());
}
return new ListIterator(list);
}
Then callers of the algorithm do not have to be dependent on the IList interface - again, looser coupling is the goal here.
Interface vs Abstract Class vs Concrete Class
This is a spectrum of more abstract to less abstract, or in other words, looser coupling to tighter coupling.
Using the example above for an Iterator abstraction, we would have the following spectrum:
Iterator <--> BaseIterator <--> VectorIterator
Iterator is an interface - its the most abstract and exposes the least implementation details, therefore it is least coupled to classes that use it
Base Iterator is an abstract class - it has a partial implementation, and implementing classes will provide the rest of the implementation, classes that use it are dependent on the abstract class (or partial implementation).
VectorIterator is a concrete class - it has a full implementation. Classes that use it are dependent on the concrete implementation, therefore there is tighter coupling to the classes that use it.
Real World Analogy
Here is an analogy. Suppose you're an Employee and about to ask your Employer for a raise. This isn't the first time that any Employee has asked for a raise from his/her manager - its been done many times before by countless others. Suppose there is a general strategy for asking for a raise (without begging) that maximizes your chance of success. Well if you knew that strategy then you could apply it to any Manager with reasonable success throughout your entire career. This is an example of a loosely coupled design. An Employee (any Employee) interacting with a Manager (any Manager) for the purpose of asking for a raise. You could swap out Manager A with Manager B, and your chances for success would still be above average.
Now consider, instead of knowing a general strategy, the employee happens to know of a specific strategy that will work on your current boss (Mr. John A. Smith) because of rumors that he's heard that will make him more agreeable. This is an example of a tightly coupled design. An Employee (Mr. Jones) interacting with a specific Manager (Mr. Smith) for the purpose of asking for a raise. You couldn't swap out Manager A with Manager B, and expect your chance of success to be the same because your strategy will likely only work with Mr. Smith.
In terms of re-usability, the first strategy is more re-usable than the second strategy. Any idea why?

Scalaz Bind[Seq] typeclass

I'm currently porting some code from traditional Scala to Scalaz style.
It's fairly common through most of my code to use the Seq trait in my exposed API signatures rather than a concrete type (i.e. List, Vector) directly. However, this poses some problem with Scalaz, since it doesn't provide an implementation of a Bind[Seq] typeclass.
i.e. This will work correctly.
List(1,2,3,4) >>= bindOperation
But this will not
Seq(1,2,3,4) >>= bindOperation
failing with the error could not find implicit value for parameter F0: scalaz.Bind[Seq]
I assume this is an intentional design decision in Scalaz - however am unsure about intended/best practice on how to precede.
Should I instead write my code directly to List/Vector as appropriate instead of using the more flexible Seq interface? Or should I simply define my own Bind[Seq] typeclass?
The collections library does backflips to accommodate subtyping: when you use map on a specific collection type (list, map, etc.), you'll (usually) get the same type back. It manages this through the use of an extremely complex inheritance hierarchy together with type classes like CanBuildFrom. It gets the job done (at least arguably), but the complexity doesn't feel very principled. It's a mess. Lots of people hate it.
The complexity is generally pretty easy to avoid as a library user, but for a library designer it's a nightmare. If I provide a monad instance for Seq, that means all of my users' types get bumped up the hierarchy to Seq every type they use a monadic operation.
Scalaz folks tend not to like subtyping very much, anyway, so for the most part Scalaz stays around the leaves of the hierarchy—List, Vector, etc. You can see some discussion of this decision on the mailing list, for example.
When I first started using Scalaz I wrote a lot of utility code that tried to provide instances for Seq, etc. and make them usable with CanBuildFrom. Then I stopped, and now I tend to follow Scalaz in only ever using List, Vector, Map, and Set in my own code. If you're committed to "Scalaz style", you should do that as well (or even adopt Scalaz's own IList, ISet, ==>>, etc.). You're not going to find clear agreement on best practices more generally, though, and both approaches can be made to work, so you'll just need to experiment to find which you prefer.

Memory overhead of Case classes in scala

What is the memory overhead of a case class in scala ?
I've implemented some code to hold a lexicon with multiple types of interned tokens for NLP processing. I've got a case class for each token type.
For example, the canonical lemma/stem token is as follows:
sealed trait InternedLexAtom extends LexAtom{
def id : Int
}
case class Lemma(id: Int) extends InternedLexAtom
I'm going to be returning document vectors of these interned tokens, the reason I wrap them in case classes is to be able to add methods to the tokens via implicit classes. The reason I use this way of adding behaviour to the lexeme's is because I want the lexemes to have different methods based on different contexts.
So I'm hoping the answer will be zero memory overhead due to type erasure. Is this the case ?
I have a suspicion that a single pointer might be packed with the parameters for some of the magic Scala can do :(
justification
To put things in perspective. The JVM uses 1.5-2gigs of memory with my lexicon loaded (the lexicon does not use cases classes in it's in-memory representation), and C++ does the same in 500-700 mb of memory. If my codebase keeps scaling it's memory requirements the way it is now I'm not going to be able to do this stuff on my laptop (in-memory)
I'll sidestep the problem by structuring my code differently. For example I can just strip away the case classes in vector representations if I need to. Would be nice if I didn't have to.
Question Extension.
Robin and Pedro have addressed the use-case, thank you. In this case I was missing value classes. With those there are no more downsides. additionally: I tried my best not to mention C++'s POD concept. But now I must ask :D A c++ POD is just a struct with primitive values. If I wanted to pack more than just one value into value class, how would I achieve this ? I am assuming this would be what I want to do ?
class SuperTriple(val underlying: Tuple2[Int,Int]) extends AnyVal {
def super: underlying._1
def triple: underlying._2
}
I do actually need the above construct, since a SuperTriple is what I am using as my vector model symbol :D
The original question still remains "what is the overhead of a case class".
In Scala 2.10 you can use value classes. (In older versions of Scala, for something with zero overhead for just one member, you need to use unboxed tagged types.)

Common in scala's Array and List

I'm new to scala(just start learning it), but have figured out smth strange for me: there are classes Array and List, they both have such methods/functions as foreach, forall, map etc. But any of these methods aren't inherited from some special class(trait). From java perspective if Array and List provide some contract, that contract have to be declared in interface and partially implemented in abstract classes. Why do in scala each type(Array and List) declares own set of methods? Why do not they have some common type?
But any of these methods aren't inherited from some special class(trait)
That simply not true.
If you open scaladoc and lookup say .map method of Array and List and then click on it you'll see where it is defined:
For list:
For array:
See also info about Traversable and Iterable both of which define most of the contracts in scala collections (but some collections may re-implement methods defined in Traversable/Iterable, e.g. for efficiency).
You may also want to look at relations between collections (scroll to the two diagrams) in general.
I'll extend om-nom-nom answer here.
Scala doesn't have an Array -- that's Java Array, and Java Array doesn't implement any interface. In fact, it isn't even a proper class, if I'm not mistaken, and it certainly is implemented through special mechanisms at the bytecode level.
On Scala, however, everything is a class -- an Int (Java's int) is a class, and so is Array. But in these cases, where the actual class comes from Java, Scala is limited by the type hierarchy provided by Java.
Now, going back to foreach, map, etc, they are not methods present in Java. However, Scala allows one to add implicit conversions from one class to another, and, through that mechanism, add methods. When you call arr.foreach(println), what is really done is Predef.refArrayOps(arr).foreach(println), which means foreach belongs to the ArrayOps class -- as you can see in the scaladoc documentation.

Why use Collection.empty[T] instead of new Collection[T]()

I was wondering if there is a good reason to use Collection.empty[T] instead of new Collection[T]() (or the inverse) ? Or is it just a personal preference ?
Thanks.
Calling new Collection[T]() will create a new instance every time. On the other hand, Collection.empty[T] will most likely always return the same singleton object, usually defined somewhere as
object Empty extends Collection[Nothing] ...
which will be much faster. Edit: This is only possible for immutable collections, mutable collections have to return a new instance every time empty is called.
You should always prefer Collection.empty[Type].
In addition to Collection.empty[T] being clearer on the intent, you should favour it for the same reason that you should favour factory methods in general when instantiating a collection: because thoses factories abstract away some implementation details that you might not (or should not) care about.
By example, when you do Seq.empty[String] you actually get an instance of List[String]. You could directly instantiate a List[String] but if all you care about is to have some Seq you would introduce a needless dependency to List (well OK, actually you cannot as it stands, because List is already abstract, but let's pretend we can for the sake of the argument)
The whole point of factories is precisely to have some amount of separation of concern and not bother with unnecessary instantiation details.
As another more elaborate example, let's talk about collection.immutable.HashMap. This one is very much a concrete class so you might think there is no need for a factory here. Except that for optimization purpose the factory in the companion object collection.immutable.HashMap will actually create different sub-classes depending on the number of elements that you initialize the map with (see this question: Scala: how to make a Hash(Trie)Map from a Map (via Anorm in Play)). Obviously, if you directly instantiate collection.immutable.HashMap you will lose this optimization.
Another common optimization for empty is to always return (when it is an immutable collection) the same instance, yet another useful optimization that you would lose by directly instantiating the collection.
So as a rule of thumb, as far as you can you should use the factories that are provided by the various collection companion objects, so as to shield yourself from unneeded dependencies while at the same time benefiting from potential optimizations provided by the collection framework.
empty is just a special case of factory, and so the same logic applies.