This question is related to: Matlab: dynamically storing objects, alternatives to containers.Map class
I'm building a data structure that needs to have key-value functionality, where the key is an int and the value is an object. And also needs to be able to dinamically add elements to this key-value map.
So, Containers.map would a good option, but it is extremely slow (I have measured retrieval of values on a map of ~450 elements to be around 0.1s on my Linux machine). That's really strange, as I thought that they would implement this class as a hashmap or something like that.
I need something a lot faster. I'm thinking on implementing myself a balanced binary search tree or something like that, but I don't know if this kind if dynamic recursive object would be fast on MATLAB (probably not).
Is it possible to bind std::map into my application, or something else, that is faster than Containers.map?
Edit, clarifications and code sample:
I'm running this on a Matlab 2015a on linux. Here's a reproduction of the bad performance. In this program, the performance is not as bad because on my program I have a much more complex class hierarchy which generates a lot of overhead (the simple act of having a for to iterate for each element of the map and simply retrieve it takes almost 1 minute when there was ~450 elements). Here, I created a very simple graph class to illustrate the problem. pastebin.com/TvyzJxgK
Related
My basic understanding of lensing is that, "a lens is a value representing maps between a complex type and one of its constituents. This map works both ways—we can get or "access" the constituent and set or "mutate" it"
I came across this when I was designing a machine learning library (neural nets), which demands keeping a big datastructure of parameters, groups of which need to be updated at different stages of the algorithm. I wanted to create the whole parameter data structure immutable, but changing a single group of parameters requires copying all the parameters, and recreating a new datastructure, which sounds inefficient. Not surprisingly other people have thought it too. Some people suggest using lensing, which in a sense, let you modify immutable datastructures. While some other people suggested just using mutables for these. Unfortunately I couldn't find anything on comparing these two paradigms, speed-wise, space-wise, code-complexity-wise etc.
Now question is, what are the pros/cons of using lensing vs mutable design?
The trade offs between the two are pretty much as you surmised. Lenses are less complex than tracking the changes to a large immutable data structure manually, but still require more complex code than a mutable data structure, and there is some amount of runtime overhead. To know how much, you would have to measure, but it's probably less than you think, because a lot of the updated structure isn't copied but shared.
Mutable data structures are simpler and somewhat faster to modify, but harder to reason about, because now you have to take the order functions are called into account, worry about concurrency, and so forth.
Your third option is to make a bunch of small immutable data structures instead of one big one. Mutability often forces a single large data structure because of the need for a single source of truth, and to ensure that all references to data change at the same time. With immutability, this is a lot easier to control.
For example, you can have two separate Maps with the same type of key and different types of simple values, instead of one Map with a more complex value. Not only does this have performance benefits, it also makes it much easier to modularize your code.
There is a nice flowchart (taken from here) for choosing a particular container in C++:
Is there something similar for the Scala collections? I'm still somewhat overwhelmed with the options.
I am not aware of such flowcharts for Scala, but I guess one would be useful.
I made one for you -- larger picture here.
Note that there is some added complexity, since Scala has more collections and there is both the mutable and the immutable package. Where possible, I added both alternatives to the rectangle.
I tried to follow the C++ STL flow diagram as much as possible, but I thought that the lower left part was complicating things a bit too much, so I changed the flow there slightly.
EDIT: fixed some typos.
EDIT: As Travis, suggested, note that in a majority of situations, you only need to pick between a Map, Set, List, ArrayBuffer or a Vector.
if you need key-value lookup, use a Map
if you need to check for presence of elements, use a Set
if you need to store elements and traverse them, use a List or an ArrayBuffer
if you don't need a persistent collection, but random access is really important, use ArrayBuffer
if you need relatively fast random access and persistent sequences, use Vector
If that does not help and you have a more exotic use-case, use this chart.
I've been experimenting with Scala for some time, and have often encountered the advice to favor immutable data structures.
But when you have a data structure like e.g. a 3D scene graph, a large neural network, or anything with quite a few objects that need frequent updates (animating the objects in the scene, training the neural net, ...), this seems to be
horribly inefficient at runtime since you need to constantly recreate the whole object graph, and
difficult to program since when you have a reference to some objects that need to be updated, you can't just call setters on them but you need copy the object graph and replace the old objects with the updated ones.
How are such things dealt with in idiomatic Scala?
Scala is multi-paradigm: OO and functional, mutable and immutable.
Complex graphs are one example of a data structure that, as you have identified, may be easier to work with in a mutable context. If so, make the data structure mutable.
Idiomatic Scala is to use the right paradigm to solve your problem.
Quick question. I'm currently designing some database queries to extract reasonably large, but not massive datasets into memory, say approximately 10k-100k records.
So far I've been testing loading these resultsets into a scala.collection.immutable.Seq and have discovered it seems to take an incredibly long time to build the collection. Whereas if I change to a Vector or List the write into memory takes fractions of a second.
MY question is therefore why is Seq so slow in this case? If so in what cases would using Seq be more appropriate than Vector?
Thanks
It would help if you'd post the relevant snippet and which operations you call on the sequence -- immutable.Seq is represented using a List (see https://github.com/scala/scala/blob/v2.10.2/src/library/scala/collection/immutable/Seq.scala#L42). My guess is that you've been using :+ on the immutable.Seq, which under the hood appends to the end of the list by copying it (probably giving you quadratic overall performance), and when you switched to using immutable.List directly, you've been attaching to the beginning using :: (giving you linear performance).
Since Seq is just a List under the hood, you should use it when you attach to the beginning of the sequence -- the cons operator :: only creates a one node and links it to the rest of the list, which is as fast as it can get when it comes to immutable data structures. Otherwise, if you add to the end, and you insist on immutability, you should use a Vector (or the upcoming Conc lists!).
If you would like a validation of these claims, see this link where the performance of the two operations is compared using ScalaMeter -- lists are 8 times faster than vectors when you add to the beginning.
However, the most appropriate data structure should be either an ArrayBuffer or a VectorBuilder. These are mutable data structures that resize dynamically and if you build them using += you will get a reasonable performance. This is assuming that you are not storing primitives.
If I have an immutable Map which I might expect (over a very short period of time - like a few seconds) to be adding/removing hundreds of thousands of items from, is the standard HashMap a bad idea? Let's say I want to pass 1Gb of data through the Map in <10 seconds in such a way that the maximum size of the Map at any once instant is only 256Mb.
I get the impression that the map keeps some kind of "history" but I will always be accessing the last-updated table (i.e. I do not pass the map around) because it is a private member variable of an Actor which is updated/accessed only from within reactions.
Basically I suspect that this data structure may be (partly) at fault for issues I am seeing around JVMs going out of memory when reading in large amounts of data in a short time.
Would I be better off with a different map implementation and, if so, what is it?
Ouch. Why do you have to use an immutable map? Poor garbage collector! Immutable maps generally require (log n) new objects per operation in addition to (log n) time, or they really just wrap mutable hash maps and layer changesets on top (which slows things down and can increase the number of object creations).
Immutability is great, but this does not seem to me like the time to use it. If I were you, I'd stick with scala.collection.mutable.HashMap. If you need concurrent access, wrap the Java util.concurrent one instead.
You also might want to increase the size of the young generation in the JVM: -Xmn1G or more (assuming you're running with -Xmx3G). Also, use the throughput (parallel) garbage collector.
That would be awful. You say you always want to access the last-updated table, that means you only need an ephemeral data structure, there is no need to pay the cost for a persistent data structure - it's like trading time and memory to gain completely arguable "style points". You are not building your karma by using blindly persistent structures when they are not called for.
Also, a hashtable is a particularly difficult structure to make persistent. In other words, "very, very slow" (basically it is usable when reads greatly outnumber writes - and you seem to talk about many writes).
By the way, a ConcurrentHashMap wouldn't make sense in this design, given that the map is accessed from a single actor (that's what I understand from the description).
Scala's so-called(*) immutable Map is broken beyond basic usage up to Scala 2.7. Don't trust me, just look up the number of open tickets for it. And the solution is just "it will be replaced with something else on Scala 2.8" (which it did).
So, if you want an immutable map for Scala 2.7.x, I'd advise looking for it in something other than Scala. Or just use TreeHashMap instead.
(*) Scala's immutable Map isn't really immutable. It is a mutable data structure internally, which requires lot of synchronization.