How does the "Implementing FP languages with fast equality, sets and maps..." technique deal with garbage collection? - hash

This paper presents a technique for the implementation of functional languages with fast equality, sets and maps, using hash-consing under the hoods. As far as my understanding goes, it uses the address of a hash-consed value as its key when inserting it on a map. This has the advantage that figuring the hashed key of essentially any value is O(1), as opposed to the O(N) standard. What I don't understand, though, is: what happens with a map after a garbage collection? Since the GC process will cause the address of every value to change, then the configuration of the map will be incorrect. In other words, there is no guarantee that addr(value) will be the same for the lifetime of the program.

Since the GC process will cause the address of every value to change
Only moving garbage collectors do that. When using non-moving algorithms like mark-and-sweep, all that happens is that unused objects are freed during the GC cycle - used objects stay exactly where they are.
Moving garbage collectors are generally seen as preferable to mark-and-sweep, but according to the abstract of the paper "mark-and-sweep becomes fast in a maximal sharing environment", which is further expanded on in section 2.4.4.
The paper also describes a way to make moving garbage collectors work (by assigning each object a unique id and using that instead of its address), but deems that impractical (section 2.4.2).

Related

Why passing by value is sometimes better than passing by reference

It is a common fact that it is better in a certain circumstances to pass a parameter by reference to avoid costly copying. But recently I watched a Handmade Hero series where Casey said that if the object is not too complex sometimes it's better to pass it by value. I'm not too familiar with low-level details, but I assume it's connected with a cache. Could someone give more solid explanation of what's going on?
If you pass by value you are likely passing via registers (assuming not too many arguments and each one is not too large). That means the callee doesn't need to do anything to use the values, they are already in registers. If passing by reference, the address of each value may be in a register, but that requires a dereference which needs to look in the CPU cache (if not main memory), which is slower.
On many popular systems you can pass-by-value roughly 5-10 values which are each as wide as an address.

Is there a possibility to create a memory-efficient sequence of bits in the JVM?

I've got a piece of code that takes into account a given amount of features, where each feature is Boolean. I'm looking for the most efficient way to store a set of such features. My initial thought was to try and store these as a BitSet. But then, I realized that this implementation is meant to be used to store numbers in bit format rather than manipulate each bit, which is something I'd like to do (see the effect of switching any feature on and off). I then thought of using a Boolean array, but apparently the JVM uses much more memory for each Boolean element than the one bit it actually needs.
I'm therefore left with the question: What is the most efficient way to store a set of bits that I'd like to treat as independent bits rather than the building blocks of some number?
Please refer to this question: boolean[] vs. BitSet: Which is more efficient?
According to the answer of Peter Lawrey, boolean[] (not Boolean[]) is your way to go since its values can be manipulated and it takes only one byte of memory per bit to store. Consider that there is no way for a JVM application to store one bit in only one bit of memory and let it be directly (array-like) manipulated because it needs a pointer to find the address of the bit and the smallest addressable unit is a byte.
The site you referenced already states that the mutable BitSet is the same as the java.util.BitSet. There is nothing you can do in Java that you can't do in Scala. But since you are using Scala, you probably want a safe implementation which is probably meant to be even multithreaded. Mutable datatypes are not suitable for that. Therefore, I would simply use an immutable BitSet and accept the memory cost.
However, BitSets have their limits (deriving from the maximum number of int). If you need larger data sizes, you may use LongBitSets, which are basically Map<Long, BitSet>. If you need even more space, you may nest them in another map Map<Long, LongBitSet>, but in that case you need to use two or more identifiers (longs).

Scala immutable collections cannot be shared without synchronization?

From the «Learning concurrent programming in Scala» book:
In current versions of Scala (2.11.1), however, certain collections that are
deemed immutable, such as List and Vector, cannot be shared without
synchronization. Although their external API does not allow you to
modify them, they contain non-final fields.
Could anyone demonstrate this with a small example? And does this still apply to 2.11.7?
The behavior of changes made in one thread when viewed from another is governed by the Java Memory Model. In particular, these rules are extremely weak when it comes to something like building a collection and then passing the built-and-now-immutable collection to another thread. The JMM does not guarantee that the other thread won't see an earlier view where the collection was not fully built!
Since synchronized blocks enforce an ordering, they can be used to get a consistent view if they're used on every single operation.
In practice, though, this is rarely actually necessary. On the CPU side, there is typically a memory barrier operation that can be used to enforce memory consistency (i.e. if you write the tail of your list and then pass a memory barrier, no other thread can see the tail un-set). And in practice, JVMs usually have to implement synchronized by using memory barriers. So one could hope that you could just pass the created list within a synchronzied block, trusting that a memory barrier would be issued, and everything thereafter would be fine.
Unfortunately, the JMM doesn't require that it be implemented in this way (and you can't assume that the memory-barrier-like behavior of object creation will actually be a full memory barrier that applies to everything in that thread as opposed to simply the final fields of that object), which is both why the recommendation is what it is, and why it's not fixed (yet, anyway) in the library.
For what it's worth, on x86 architectures, I've never observed a problem if you hand off the immutable object within a synchronized block. I have observed problems if you try to do it with CAS (e.g. by using the java.util.concurrent.atomic classes).
As an addition to the excellent answer from Rex Kerr:
it should be noted that most common use cases of immutable collections in a multithreading context are not affected by this problem. The only situation where this might affect you is when you do something that you probably should not do in the first place.
E.g. you have a variable var x: Vector[Int], which you write from one thread A and read from another thread B.
If you mark x with #volatile, there will be no problem, since the volatile write introduces a memory barrier. So you will never be able to observe the Vector in an inconsistent state. The same is true when using a synchronized { } block when writing and reading, or when using java.util.concurrent.atomic.AtomicReference.
If you don't mark x with #volatile, you might observe the vector in an inconsistent state (not just wrong elements, but internally inconsistent!). But in that case your code is arguably broken to begin with. It is completely undefined when you will see the changes from A in B.
You might see them
immediately
after there is a memory barrier somewhere else in your program
not at all
depending on the architecture you`re running on, the phase of the moon, whatever. So as Viktor Klang put it: "Unsafe publication is unsafe..."
Note that if you use a higher level concurrency framework such as akka actors, it is also guaranteed that receivers of messages can not see immutable collections in an inconsistent state.

G-Counters in Riak: Don't the underlying vclocks provide the same data?

I've been reading into CvRDTs and I'm aware that Riak has already added a few to Riak 2.
My question is: why would Riak implement a gcounter when it sounds like the underlying vclock that is associated with every object records the same information? Wouldn't the result be a gcounter stored with a vclock, each containing the same essential information?
My only guess right now would be that Riak may garbage-collect the vclocks, trimming information that would actually be important for the purpose of a gcounter (i.e. the number of increments).
I cannot read Erlang particularly well, so maybe I've wrongly assumed that Riak stores vclocks with these special-case data types. However, the question still applies to the homegrown solutions that are written on top of standard Riak (and hence inherit vclocks with each object persisted).
EDIT:
I have since written the following article to help explain CvRDTs in a more practical manner. This article also touches on the redundancy I have highlighted above:
Conflict-free Replicated Data Types (CRDT) - A digestible explanation with less math.
Riak prunes version vectors, no big deal for causality (false concurrency, more siblings, safe) but a disaster for counters.
Riak's CRDT support is general. We "hide" CRDTs inside the regular riak object.
Riak's CRDT support is in it's first wave, we'll be optimising further as we make further releases.
We have a great mailing list for questions like this, btw. Stack Overflow has it's uses but if you want to talk to the authors of an open source DB why not use their list? Since Riak is open source, you can submit a pull request, we'd love to incorporate your ideas into the code base.
Quick answer: Riak's counters are actually PN-Counters, ie they allow both increments and decrements, so can't be implemented like a vclock, as they require tracking the increments and decrements differently.
Long Answer:
This question suggests you have completely misunderstood the difference between a g-counter and a vector clock (or version vector).
A vector clock (vclock) is a system for tracking the causality of concurrent updates to a piece of data. They are a map of {actor => logical clock}. Actors only increment their logical clocks when the data they're associated with changes, and try to increment it as little as possible (so at most once per update). Two vclocks can either be concurrent, or one can dominate the other.
A g-counter is a CvRDT with what looks like the same structure as a vclock, but with important differences. They are implemented as a map of {actor => counter}. Actors can increment their own counter as much as they want. A g-counter has the concept of a "counter value", and the concept of a "merge", so that when concurrent operations are executed by different actors, they can work out what the actual "counter value" should be.
Importantly, g-counters can't track causality, and vclocks have no idea what their "counter value" is.
To conflate the two in a codebase would not only be confusing, but could also bring in errors.
Add this to the fact that riak actually implements pn-counters. The difference is that a g-counter can only be incremented, but pn-counters can be both incremented and decremented. Pn-counters work by being a map of {actor => (increment count, decrement count)}, which more obviously has a different structure to a vclock. You can only increment both those counts, hence why there are two and not just one.

Objective-C sparse array redux

First off, I've seen this, but it doesn't quite seem to suit my needs.
I've got a situation where I need a sparse array. Some situations where I could have, say 3000 potential entries with only 20 allocated, other situations where I could have most or all of the 3000 allocated. Using an NSMutableDictionary (with NSString representations of the integer index values) would appear to work well for the first case, but would seemingly be inefficient for the second, both in storage and lookup speed. Using an NSMutableArray with NSNull objects for the empty entries would work fairly well for the second case, but it seems a bit wasteful (and it could produce an annoying delay at the UI) to insert most of 3000 NSNull entries for the first case.
The referenced article mentions using an NSMapTable, since it supposedly allows integer keys, but apparently that class is not available on iPhone (and I'm not sure I like having an object that doesn't retain, either).
So, is there another option?
Added 9/22
I've been looking at a custom class that embeds an NSMutableSet, with set entries consisting of a custom class with integer (ie, element#) and element pointer, and written to mimic an NSMutableArray in terms of adds/updates/finds (but not inserts/removals). This seems to be the most reasonable approach.
A NSMutableDictionary probably will not be slow, dictionaries generally use hashing and are rather fast, bench mark.
Another option is a C array of pointers. Allocation a large array only allocates virtual memory until the real memory is accessed (cure calloc, not malloc, memset). The downside is that memory is allocated in 4KB pages which can be wasteful for small numbers of entries, for large numbers of entries many may fall in the same page.
What about CFDictionary (or actually CFMutableDictionary)? In the documentation, it says that you can use any C data type as a key, so perhaps that would be closer to what you need?
I've got the custom class going and it works pretty well so far. It's 322 lines of code in the h+m files, including the inner class stuff, a lot of blank lines, comments, description formatter (currently giving me more trouble than anything else) and some LRU management code unrelated to the basic concept. Performance-wise it seems to be working faster than another scheme I had that only allowed "sparseness" on the tail end, presumably because I was able to eliminate a lot of special-case logic.
One nice thing about the approach was that I could make much of the API identical to NSMutableArray, so I only needed to change maybe 25% of the lines that somehow reference the class.
I also needed a sparse array and have put mine on git hub.
If you need a sparse array feel free to grab https://github.com/LavaSlider/DSSparseArray