What should I use for best performance as an Array type on scala.js? - scala.js

Scala has some special treatment for Arrays since they are backed by JVM's native arrays, which adds complexity to their use - in particular, a requirement to pass class manifests when used as generics. It also has a less performing GenericArray for use when passing those are not feasible. Now, scala.js does not run on JVM, so what happens to all this complexity? Are Arrays of primitive types store their values unboxed? Is GenericArray less performant than Array? What should I use for best performance as an Array type on scala.js?

The scala.Array emulates the features of JVM arrays in Scala.js. This means that they have all the same complexity, and that GenericArray is indeed slower. They store everything unboxed.
For the best performance in generic contexts, use js.Array. This does not need class tags, and doesn't have the generic array penalty. However, Chars are boxed, as is the case for any generic type.

Related

Why functional languages use objects for data types?

I feel a tendency that most functional languages, like Kotlin and Scala, is handling all data types as objects.
Double, Float, Long, Int.. They are all actual objects and they don't offer any primitive alternative?
Why does functional languages favor objects?
Is it just because it is easier to offer operational overloading and polymorphism? Or are there any deeper meaning in this?
This has nothing to do with functional languages. This is in fact true for almost all object-oriented languages as well.
Artificially splitting values into two different kinds of things only creates complications, why would you want that?
It is perfectly possible to generate efficient code for arithmetic operations on number objects. In fact, most high-performance OO implementations generate code dealing with primitive native machine number types even for "object numbers". So, if you can generate the same machine code for both cases, but one of the cases is simpler because it doesn't have this artificial split, then it seems obvious what is the better design, doesn't it?
Now, if you want to ask me why the designers of Java made this particular choice, I can't tell you. They certainly should have been aware of the work of the Self team, who after all worked at Sun.

Scala: Lensing vs mutable design

My basic understanding of lensing is that, "a lens is a value representing maps between a complex type and one of its constituents. This map works both ways—we can get or "access" the constituent and set or "mutate" it"
I came across this when I was designing a machine learning library (neural nets), which demands keeping a big datastructure of parameters, groups of which need to be updated at different stages of the algorithm. I wanted to create the whole parameter data structure immutable, but changing a single group of parameters requires copying all the parameters, and recreating a new datastructure, which sounds inefficient. Not surprisingly other people have thought it too. Some people suggest using lensing, which in a sense, let you modify immutable datastructures. While some other people suggested just using mutables for these. Unfortunately I couldn't find anything on comparing these two paradigms, speed-wise, space-wise, code-complexity-wise etc.
Now question is, what are the pros/cons of using lensing vs mutable design?
The trade offs between the two are pretty much as you surmised. Lenses are less complex than tracking the changes to a large immutable data structure manually, but still require more complex code than a mutable data structure, and there is some amount of runtime overhead. To know how much, you would have to measure, but it's probably less than you think, because a lot of the updated structure isn't copied but shared.
Mutable data structures are simpler and somewhat faster to modify, but harder to reason about, because now you have to take the order functions are called into account, worry about concurrency, and so forth.
Your third option is to make a bunch of small immutable data structures instead of one big one. Mutability often forces a single large data structure because of the need for a single source of truth, and to ensure that all references to data change at the same time. With immutability, this is a lot easier to control.
For example, you can have two separate Maps with the same type of key and different types of simple values, instead of one Map with a more complex value. Not only does this have performance benefits, it also makes it much easier to modularize your code.

Does parallel programing in swift eliminate value type optimizations?

As I understand it value types in swift can be more performant because they are stored on the stack as opposed to the heap. But if you make many calls to DispatchQueue.sync or DispatchQueue.async does this not render the benefits of value types mute because closures are stored on the heap?
As I understand it value types in swift can be more performant because they are stored on the stack as opposed to the heap.
Sometimes. Often not. For example, String includes heap-allocated storage. Many value types have hidden heap-allocated storage (this is actually really common). So you may not be getting the performance gain you're expecting for many types, but in many cases you're not losing it via closures either.
Value types are about behavior, not performance (and of course you need to distinguish between value types and value semantics, which are different, and can have impacts on performance). So the nice thing about value types and DispatchQueue is that you know you're not going to accidentally modify a value on multiple queues, because you know you have your own independent copy. By the time you've paid the overhead of dispatching to a queue (which is optimized, but still not cheap), the extra cost of copying the value type probably is not the major issue.
In my experience, it is very difficult to reason about Swift performance, particularly due to copy-on-write optimizations. But the fact that apparent "value types" can have hidden internal reference types also make performance analysis very tricky. You often have to know and rely on internal details that are subject to change. To even begin getting your head around Swift performance, you should definitely watch Understand Swift Performance (possibly a couple of times). If you're bringing any performance intuitions from C++, you have to throw almost all of that away for Swift. It just does so many things differently.
I suspect that your view of performance metrics and optimization doesn't entirely match the Swift model.
First, it does look like you've got that point correctly, but in general, the term "stack-allocated" and "heap-allocated" are misleading. Value types can be part of reference types and live on the heap. Likewise, things that presumably go to the heap don't really have to go to the heap: a reference-counted object that provably doesn't need reference counting could be allocated on the stack without anyone noticing. In other languages like C++, the preferred terminology is "automatic storage" ("stack") and "dynamic storage" ("heap"). Of course, Swift doesn't have these concepts (it only has value types and reference types), but they're useful to describe performance characteristics.
Escaping closures need dynamic storage because their lifetime can't be tied to a stack frame. However, the performance price that you pay to call a function that takes an escaping closure is uniform, regardless of how many variables need to be captured, because a closure will always be allocated and that closure can have storage for any number of values.
In other words, all of your captured value-typed objects are grouped in a single dynamic allocation, and the performance cost of allocating memory does not scale with the amount that you're requesting. Therefore, you should consider that there is a (small) speed cost associated to escaping closures themselves, but that cost does not scale with the number of values that the closure captures. Aside from that unavoidable upfront cost, there should be no degradation of performance for value types.
Additionally, as Rob said, every non-trivial value type (strings, arrays, dictionaries, sets, etc) is actually a wrapper to a reference type, so for these object, value types had more of a semantic advantage than a performance advantage to begin with.

Store enum MongoDB

I am storing enums for things such as ranks (administrator, moderator, user...) and achievements for each user in my Mongo database. As far as I know Mongo does not have an enum data type which means I have to store it using another type.
I have thought of storing it using integers which I would assume uses less space than storing strings for everything that could easily be expressed as an integer. Another upside I see of using integers is that if I wanted to rename an achievement or rank I could easily change it without even having to touch the database. A benefit I see for using strings is that the data requires less processing before it is used and is more human readable which could help in tracking down bugs.
Are there any better ways of storing enums in Mongo? Is there an strong reason to use either integers or strings? (trying to stay away from a which is better question)
TL;DR: Strings are probably the safer choice, and the performance difference should be negligible. Integers make sense for huge collections where the enum must be indexed. YMMV.
I have thought of storing it using integers which I would assume uses less space than storing strings for everything that could easily be expressed as an integer
True.
other upside I see of using integers is that if I wanted to rename an achievement or rank I could easily change it without even having to touch the database.
This is a key benefit of integers in my opinion. However, it also requires you to make sure the associated values of the enum don't change. If you screw that up, you'll almost certainly wreak havoc, which is a huge disadvantage.
A benefit I see for using strings is that the data requires less processing before it is used
If you're actually using an enum data type, it's probably some kind of integer internally, so the integer should require less processing. Either way, that overhead should be negligible.
Is there an strong reason to use either integers or strings?
I'm repeating a lot of what's been said, but maybe that helps other readers. Summing up:
Mixing up the enum value map wreaks havoc. Imagine your Declined states are suddenly interpreted as Accepted, because Declined had the value '2' and now it's Accepted because you reordered the enum and forgot to assign values manually... (shudders)
Strings are more expressive
Integers take less space. Disk space doesn't matter, usually, but index space will eat RAM which is expensive.
Integer updates don't resize the object. Strings, if their lengths vary greatly, might require a reallocation. String padding and padding factor should alleviate this, though.
Integers can be flags (not yet queryable (yet), unfortunately, see SERVER-3518)
Integers can be queried by $gt / $lt so you can efficiently implement complex $or queries, though that is a rather arcane requirement and there's nothing wrong with $or queries...

What are the real advantages of immutable collections?

Scala provides immutable collections, such as Set, List, Map. I understand that the immutability has advantages in concurrent programs. However what are exactly the advantages of the immutability in regular data processing?
What if I enumerate subsets, permutations and combinations for example? Does the immutable collections have any advantage here?
What are exactly the advantages of the immutability in regular data processing?
Generally speaking, immutable objects are easier/simpler to reason about.
It does. Since you're enumerating on a collection, presumably you'd want to be certain that elements are not inadvertently added or removed while you're enumerating.
Immutability is very much a paradigm in functional programming. Making collections immutable allows one to think of them much like primitive data types (i.e. modifying a collection or any other object results in creating a different object just as adding 2 to 3 doesn't modify 3, but creates 5)
To expand Matt's answer: From my personal experience I can say that implementations of algorithms based on search trees (e.g. breadth first, depth first, backtracking) using mutable collections end up regularly as a steaming pile of crap: Either you forget to copy a collection before a recursive call, or you fail to take back changes correctly if you get the collection back. In that area immutable collections are clearly superior. I ended up writing my own immutable list in Java when I couldn't get a problem right with Java's collections. Lo and behold, the first "immutable" implementation worked immediately.
If your data doesn't change after creation, use immutable data structures. The type you choose will identify the intent of usage. Anything more specific would require knowledge about your particular problem space.
You may really be looking for a subset, permutation, or combination generator, and then the discussion of data structures is moot.
Also, you mentioned that you understand the concurrent advantages. Presumably, you're throwing some algorithm at permutations and subsets, and there's a good chance that algorithm can be parallelized to some extent. If that's the case, using immutable structures up front ensures your initial implementation of algorithm X will be easily transformed into concurrent algorithm X.
I have a couple advantages to add to the list:
Immutable collections can't be invalidated out from under you
That is, it's totally fine to have immutable public val members of a Scala class. They are read-only by definition. Compare to Java where not only do you have to remember to make the member private but also write a get method that returns a copy of the object so the original is not modified by the calling code.
Immutable data structures are persistent. This means that the immutable collection obtained by calling filter on your TreeSet actually shares some of its nodes with the original. This translates to time and space savings and offsets some of the penalties incurred by using immutability.
some of immutability advantages :
1 - smaller margin for error (you always know what’s in your collections and read-only variables).
2 - you can write concurrent programs without worrying about threads stepping on each other when modifying variables and collections.