MapMaker and ReferenceMap - Google Collections - guava

I understand ReferenceMap from the alpha version of Google Collections has been replaced by MapMaker.
I used this ReferenceMap constructor with the backing map:
public ReferenceMap(ReferenceType keyReferenceType, ReferenceType
valueReferenceType, ConcurrentMap<Object, Object> backingMap) {
this(keyReferenceType, valueReferenceType, backingMap, true);
}
My backing map is a concurrentmap with the ability to collect statistics (hit/miss etc).
What can I use in place of the above ReferenceMap constructor?
Thanks, Grace

We were not able to continue to offer the ability to pass your own backing map. MapMaker works using a customized map implementation of its own.
But, to gather hit/miss statistics, you can wrap the returned ConcurrentMap in a ForwardingConcurrentMap to count get invocations (using an AtomicLong), and have your Function count misses in a similar way. (Hits being, of course, nearly equal to request minus misses.)

Related

Losing path dependent type when extracting value from Try in scala

I'm working with scalax to generate a graph of my Spark operationS. So, I have a custom library that generates my graph. So, let me show a sample:
val DAGWithoutGet = createGraphFromOps(ops)
val DAGWithGet = createGraphFromOps(ops).get
The return type of DAGWithoutGet is
scala.util.Try[scalax.collection.Graph[typeA, scalax.collection.GraphEdge.DiEdge]],
and, for DAGWithGet is
scalax.collection.Graph[typeA, scalax.collection.GraphEdge.DiEdge].
Here, typeA is a project related class representing a single Spark operation, not relevant for the context of this question. (for context only: What my custom library does is, essentially, generate a map of dependencies between those operations, creating a big Map object, and calling Graph(myBigMap: _*) to generate the graph).
As far as I know, calling the .get command on this point of my code or later should not make any difference, but that is not what I'm seeing.
Calling DAGWithoutGet.get.nodes has a return type of scalax.collection.Graph[typeA,DiEdge]#NodeSetT,
while calling DAGWithGet.nodes returns DAGWithGet.NodeSetT.
When I extract one of those nodes (using the .find method), I receive scalax.collection.Graph[typeA,DiEdge]#NodeT and DAGWithGet.NodeT types, respectively. Much to my dismay, even the methods available in each case are different - I cannot use pathTo (which happens to be what I want) or withSubgraph on the former, only on the latter.
My doubt is, then, after this relatively complex example: what is going on here? Why extracting the value from the Try construct on different moments leads to different types, one path dependent, and the other not - or, if that isn't correct, what may I be missing here?

Scodec: Using vectorOfN with a vlong field

I am playing around with the Bitcoin blockchain to learn Scala and some useful libraries.
Currently I am trying to decode and encode Blocks with SCodec and my problem is that the vectorOfN function takes its size as an Int. How can I use a long field for the size while still preserving the full value range.
In other words is there a vectorOfLongN function?
This is my code which would compile fine if I were using vintL instead of vlongL:
object Block {
implicit val codec: Codec[Block] = {
("header" | Codec[BlockHeader]) ::
(("numTx" | vlongL) >>:~
{ numTx => ("transactions" | vectorOfN(provide(numTx), Codec[Transaction]) ).hlist })
}.as[Block]
}
You may assume that appropriate Codecs for the Blockheader and the Transactions are implemented. Actually, vlong is used as a simplification for this question, because Bitcoin uses its own codec for variable sized ints.
I'm not a scodec specialist but my common sense suggests that this is not possible because Scala's Vector being a subtype of GenSeqLike is limited to have length of type Int and apply that accepts Int index as its argument. And AFAIU this limitation comes from the underlying JVM platform where you can't have an array of size more than Integer.MAX_VALUE i.e. around 2^31 (see also "Criticism of Java" wiki). And although Vector theoretically could have work this limitation around, it was not done. So it makes no sense for vectorOfN to support Long size as well.
In other words, if you want something like this, you probably should start from creating your own Vector-like class that does support Long indices working around JVM limitations.
You may want to take a look at scodec-stream, which comes in handy when all of your data is not available immediately or does not fit into memory.
Basically, you would use your usual codecs.X and turn it into a StreamDecoder via scodec.stream.decode.many(normal_codec). This way you can work with the data through scodec without the need to load it into memory entirely.
A StreamDecoder then offers methods like decodeInputStream along scodec's usual decode.
(I used it a while ago in a slightly different context – parsing data sent by a client to a server – but it looks like it would apply here as well).

Why are SessionVars in Lift implemented using singletons?

One typical way of managing state in Lift is to create a singleton object extending SessionVar, like in this example taken from the documentation:
object MySnippetCompanion {
object mySessionVar extends SessionVar[String]("hello")
}
The case for using SessionVars is clear and I've been using them in practice as needed. I also roughly understand how they work inside.
Still, I can't help but wonder why the mechanism for "session variables", which are clearly associated with the current session (usually just one out of many sessions in the system), was designed to be used via a singleton? This goes so against my intuition that at first glance I was tempted to believe that Lift was somehow able to override Scala's language features and to make object mean something different that in regular Scala.
Even though I now understand how it works, I can't grasp the rationale for such a design, which, at least for me, breaks the rule of least astonishment. Can someone point out any advantages or perhaps explain why such a design decision could have been made?
Session variables in Lift use Scala's DynamicVariable. Basically they allow you to statically reference a variable in a code-block and then later on call the code and substitute a value:
import scala.util.DynamicVariable
val x = new DynamicVariable(1)
def printIt() {
println(x.value)
}
printIt()
//> 1
x.withValue(2)(printIt())
//> 2
So each time a request is handled, the scope of these dynamic variables is changed to the current session, completely hiding the state change of the current session to you as a programmer.
The other option would be to pass around a "sessionID" object which you would have to use when you want to access session specific data. Not really handy.
The reason you have to use the object keyword is that object is unique in that it defines both a value and a class. This allows Lift to call getClass to get a name that uniquely identifies this SessionVar vs. any other one, which Lift needs in order to serialize and deserialize every piece of session state in the right place(s). Furthermore if the SessionVar is in a class that has two instances (for instance a snippet rendered in two tabs), they will both refer to the same piece of session state. (The flip side of the coin is that the same SessionVar instance can be referenced by two different sessions and mean the right thing to each.)
Actually at times this is insufficient --- for instance, if you define a SessionVar in a trait, and have two different classes that inherit the trait, but you need them two have two different values. The solution in that case is to override the def for the "name salt", which is combined with getClass to identify the SessionVar.

Why use Collection.empty[T] instead of new Collection[T]()

I was wondering if there is a good reason to use Collection.empty[T] instead of new Collection[T]() (or the inverse) ? Or is it just a personal preference ?
Thanks.
Calling new Collection[T]() will create a new instance every time. On the other hand, Collection.empty[T] will most likely always return the same singleton object, usually defined somewhere as
object Empty extends Collection[Nothing] ...
which will be much faster. Edit: This is only possible for immutable collections, mutable collections have to return a new instance every time empty is called.
You should always prefer Collection.empty[Type].
In addition to Collection.empty[T] being clearer on the intent, you should favour it for the same reason that you should favour factory methods in general when instantiating a collection: because thoses factories abstract away some implementation details that you might not (or should not) care about.
By example, when you do Seq.empty[String] you actually get an instance of List[String]. You could directly instantiate a List[String] but if all you care about is to have some Seq you would introduce a needless dependency to List (well OK, actually you cannot as it stands, because List is already abstract, but let's pretend we can for the sake of the argument)
The whole point of factories is precisely to have some amount of separation of concern and not bother with unnecessary instantiation details.
As another more elaborate example, let's talk about collection.immutable.HashMap. This one is very much a concrete class so you might think there is no need for a factory here. Except that for optimization purpose the factory in the companion object collection.immutable.HashMap will actually create different sub-classes depending on the number of elements that you initialize the map with (see this question: Scala: how to make a Hash(Trie)Map from a Map (via Anorm in Play)). Obviously, if you directly instantiate collection.immutable.HashMap you will lose this optimization.
Another common optimization for empty is to always return (when it is an immutable collection) the same instance, yet another useful optimization that you would lose by directly instantiating the collection.
So as a rule of thumb, as far as you can you should use the factories that are provided by the various collection companion objects, so as to shield yourself from unneeded dependencies while at the same time benefiting from potential optimizations provided by the collection framework.
empty is just a special case of factory, and so the same logic applies.

Scala immutable map, when to go mutable?

My present use case is pretty trivial, either mutable or immutable Map will do the trick.
Have a method that takes an immutable Map, which then calls a 3rd party API method that takes an immutable Map as well
def doFoo(foo: String = "default", params: Map[String, Any] = Map()) {
val newMap =
if(someCondition) params + ("foo" -> foo) else params
api.doSomething(newMap)
}
The Map in question will generally be quite small, at most there might be an embedded List of case class instances, a few thousand entries max. So, again, assume little impact in going immutable in this case (i.e. having essentially 2 instances of the Map via the newMap val copy).
Still, it nags me a bit, copying the map just to get a new map with a few k->v entries tacked onto it.
I could go mutable and params.put("bar", bar), etc. for the entries I want to tack on, and then params.toMap to convert to immutable for the api call, that is an option. but then I have to import and pass around mutable maps, which is a bit of hassle compared to going with Scala's default immutable Map.
So, what are the general guidelines for when it is justified/good practice to use mutable Map over immutable Maps?
Thanks
EDIT
so, it appears that an add operation on an immutable map takes near constant time, confirming #dhg's and #Nicolas's assertion that a full copy is not made, which solves the problem for the concrete case presented.
Depending on the immutable Map implementation, adding a few entries may not actually copy the entire original Map. This is one of the advantages to the immutable data structure approach: Scala will try to get away with copying as little as possible.
This kind of behavior is easiest to see with a List. If I have a val a = List(1,2,3), then that list is stored in memory. However, if I prepend an additional element like val b = 0 :: a, I do get a new 4-element List back, but Scala did not copy the orignal list a. Instead, we just created one new link, called it b, and gave it a pointer to the existing List a.
You can envision strategies like this for other kinds of collections as well. For example, if I add one element to a Map, the collection could simply wrap the existing map, falling back to it when needed, all while providing an API as if it were a single Map.
Using a mutable object is not bad in itself, it becomes bad in a functional programming environment, where you try to avoid side-effects by keeping functions pure and objects immutable.
However, if you create a mutable object inside a function and modify this object, the function is still pure if you don't release a reference to this object outside the function. It is acceptable to have code like:
def buildVector( x: Double, y: Double, z: Double ): Vector[Double] = {
val ary = Array.ofDim[Double]( 3 )
ary( 0 ) = x
ary( 1 ) = y
ary( 2 ) = z
ary.toVector
}
Now, I think this approach is useful/recommended in two cases: (1) Performance, if creating and modifying an immutable object is a bottleneck of your whole application; (2) Code readability, because sometimes it's easier to modify a complex object in place (rather than resorting to lenses, zippers, etc.)
In addition to dhg's answer, you can take a look to the performance of the scala collections. If an add/remove operation doesn't take a linear time, it must do something else than just simply copying the entire structure. (Note that the converse is not true: it's not beacuase it takes linear time that your copying the whole structure)
I like to use collections.maps as the declared parameter types (input or return values) rather than mutable or immutable maps. The Collections maps are immutable interfaces that work for both types of implementations. A consumer method using a map really doesn't need to know about a map implementation or how it was constructed. (It's really none of its business anyway).
If you go with the approach of hiding a map's particular construction (be it mutable or immutable) from the consumers who use it then you're still getting an essentially immutable map downstream. And by using collection.Map as an immutable interface you completely remove all the ".toMap" inefficiency that you would have with consumers written to use immutable.Map typed objects. Having to convert a completely constructed map into another one simply to comply to an interface not supported by the first one really is absolutely unnecessary overhead when you think about it.
I suspect in a few years from now we'll look back at the three separate sets of interfaces (mutable maps, immutable maps, and collections maps) and realize that 99% of the time only 2 are really needed (mutable and collections) and that using the (unfortunately) default immutable map interface really adds a lot of unnecessary overhead for the "Scalable Language".