Is there a way to accurately measure heap allocations in Unity for unit testing? - unity3d

Allocating temporary objects on the heap every frame in Unity is costly, and we all do our best to avoid this by caching heap objects and avoiding garbage generating functions. It's not always obvious when something will generate garbage though. For example:
enum MyEnum {
Zero,
One,
Two
}
List<MyEnum> MyList = new List<MyEnum>();
MyList.Contains(MyEnum.Zero); // Generates garbage
MyList.Contains() generates garbage because the default equality comparer for List uses objects which causes boxing of the enum value types.
In order to prevent inadvertent heap allocations like these, I would like to be able to detect them in my unit tests.
I think there are 2 requirements for this:
A function to return the amount of heap allocated memory
A way to prevent garbage collection occurring during the test
I haven't found a clean way to ensure #2. The closest thing I've found for #1 is GC.GetTotalMemory()
[UnityTest]
IEnumerator MyTest()
{
long before = GC.GetTotalMemory(false);
const int numObjects = 1;
for (int i = 0 ; i < numObjects; ++i)
{
System.Version v = new System.Version();
}
long after = GC.GetTotalMemory(false);
Assert.That(before == after);
}
The problem is that GC.GetTotalMemory() returns the same value before and after in this test. I suspect that Unity/Mono only allocates memory from the system heap in chunks, say 4kb, so you need to allocate <= 4kb before Unity/Mono will actually request more memory from the system heap, at which point GC.GetTotalMemory() will return a different value. I confirmed that if I change numObjects to 1000, GC.GetTotalMemory() returns different values for before and after.
So in summary, 1. how can i accurately get amount of heap allocated memory, accurate to the byte and 2. can the garbage collector run during the body of my test, and if so, is there any non-hacky way of disabling GC for the duration of my test
TL;DR
Thanks for your help!

I posted the same question over on Unity answers and got a reply:
https://answers.unity.com/questions/1535588/is-there-a-way-to-accurately-measure-heap-allocati.html
No, basically it's not possible. You could run your unit test a bunch of times in a loop and hope that it generates enough garbage to cause a change in the value returned by GC.GetTotalMemory(), but that's about it.

Related

Scala Array.view memory usage

I'm learning Scala and have been trying some LeetCode problems with it, but I'm having trouble with the memory limit being exceeded. One problem I have tried goes like this:
A swap is defined as taking two distinct positions in an array and swapping the values in them.
A circular array is defined as an array where we consider the first element and the last element to be adjacent.
Given a binary circular array nums, return the minimum number of swaps required to group all 1's present in the array together at any location.
and my attempted solution looks like
object Solution {
def minSwaps(nums: Array[Int]): Int = {
val count = nums.count(_==1)
if (count == 0) return 0
val circular = nums.view ++ nums.view
circular.sliding(count).map(_.count(_==0)).min
}
}
however, when I submit it, I'm hit with Memory Limit Exceeded for one of the test case where nums is very large.
My understanding is that, because I'm using .view, I shouldn't be allocating over O(1) memory. Is that understanding incorrect? To be clear, I realise this is the most time efficient way of solving this, but I didn't expect it to be memory inefficient.
The version used is Scala 2.13.7, in case that makes a difference.
Update
I did some inspection of the types and it seems circular is only a View unless I replace ++ with concat which makes it IndexedSeqView, why is that, I thought ++ was just an alias for concat?
If I make the above change, and replace circular.sliding(count) with (0 to circular.size - count).view.map(i => circular.slice(i, i + count)) it "succeeds" in hitting the time limit instead, so I think sliding might not be optimised for IndexedSeqView.

Scala Buffer: Size or Length?

I am using a mutable Buffer and need to find out how many elements it has.
Both size and length methods are defined, inherited from separate traits.
Is there any actual performance difference, or can they be considered exact synonyms?
They are synonyms, mostly a result of Java's decision of having size for collections and length for Array and String. One will always be defined in terms of the other, and you can easily see which is which by looking at the source code, the link for which is provided on scaladoc. Just find the defining trait, open the source code, and search for def size or def length.
In this case, they can be considered synonyms. You may want to watch out with some other cases such as Array - whilst length and size will always return the same result, in versions prior to Scala 2.10 there may be a boxing overhead for calling size (which is provided by a Scala wrapper around the Array), whereas length is provided by the underlying Java Array.
In Scala 2.10, this overhead has been removed by use of a value class providing the size method, so you should feel free to use whichever method you like.
As of Scala-2.11, these methods may have different performance. For example, consider this code:
val bigArray = Array.fill(1000000)(0)
val beginTime = System.nanoTime()
var i = 0
while (i < 2000000000) {
i += 1
bigArray.length
}
val endTime = System.nanoTime()
println(endTime - beginTime)
sys.exit(-1)
Running this on my amd64 computer gives about 2423834 nanos time (varies from time to time).
Now, if I change the length method to size, it will become about 70764719 nanos time.
This is more than 20x slower.
Why does it happen? I didn't dig it through, I don't know. But there are scenarios where length and size perform drastically different.
They are synonyms, as the scaladoc for Buffer.size states:
The size of this buffer, equivalent to length.
The scaladoc for Buffer.length is explicit too:
The length of the buffer. Note: xs.length and xs.size yield the same result.
Simple advice: refer to the scaladoc before asking a question.
UPDATE: Just saw your edit adding mention of performance. As Daniel C. Sobral aid, one is normally always implemented in term of the other, so they have the same performance.

Optimal HashSet Initialization (Scala | Java)

I'm writing an A.I. to solve a "Maze of Life" puzzle. Attempting to store states to a HashSet slows everything down. It's faster to run it without a set of explored states. I'm fairly confident my node (state storage) implements equals and hashCode well as tests show a HashSet doesn't add duplicate states. I may need to rework the hashCode function, but I believe what's slowing it down is the HashSet rehashing and resizing.
I've tried setting the initial capacity to a very large number, but it's still extremely slow:
val initCapacity = java.lang.Math.pow(initialGrid.width*initialGrid.height,3).intValue()
val frontier = new QuickQueue[Node](initCapacity)
Here is the quick queue code:
class QuickQueue[T](capacity: Int) {
val hashSet = new HashSet[T](capacity)
val queue = new Queue[T]
//methods below
For more info, here is the hash function. I store the grid values in bytes in two arrays and access it using tuples:
override def hashCode(): Int = {
var sum = Math.pow(grid.goalCoords._1, grid.goalCoords._2).toInt
for (y <- 0 until grid.height) {
for (x <- 0 until grid.width) {
sum += Math.pow(grid((x, y)).doubleValue(), x.toDouble).toInt
}
sum += Math.pow(sum, y).toInt
}
return sum
}
Any suggestions on how to setup a HashSet that wont slow things down? Maybe another suggestion of how to remember explored states?
P.S. using java.util.HashSet, and even with initial capacity set, it takes 80 seconds vs < 7 seconds w/o the set
Okay, for a start, please replace
override def hashCode(): Int =
with
override lazy val hashCode: Int =
so you don't calculate (grid.height*grid.width) floating point powers every time you need to access the hash code. That should speed things up by an enormous amount.
Then, unless you somehow rely upon close cells having close hash codes, don't re-invent the wheel. Use scala.util.hashing.MurmurHash3.seqHash or somesuch to calculate your hash. This should speed your hash up by another factor of 20 or so. (Still keep the lazy val.)
Then you only have overhead from the required set operations. Right now, unless you have a lot of 0x0 grids, you are using up the overwhelming majority of your time waiting for math.pow to give you a result (and risking everything becoming Double.PositiveInfinity or 0.0, depending on how big the values are, which will create hash collisions which will slow things down still further).
Note that the following assumes all your objects are immutable. This is a sane assumption when using hashing.
Also you should profile your code before applying optimization (use e.g. the free jvisualvm, that comes with the JDK).
Memoization for fast hashCode
Computing the hash code is usually a bottleneck. By computing the hash code only once for each object and storing the result you can reduce the cost of hash code computation to a minimum (once at object creation) at the expense of increased space consumption (probably moderate). To achieve this turn the def hashCode into a lazy val or val.
Interning for fast equals
Once you have the cost of hashCode eliminated, computing equals becomes a problem. equals is particularly expensive for collection fields and deep structures in general.
You can minimize the cost of equals by interning. This means that you acquire new objects of the class through a factory method, which checks whether the requested new object already exists, and if so, returns a reference to the existing object. If you assert that every object of this type is constructed in this way you know that there is only one instance of each distinct object and equals becomes equivalent to object identity, which is a cheap reference comparison (eq in Scala).

Scala: Hash ignores initial size (fast hash table for billions of entries)

I am trying to find out how well Scala's hash functions scale for big hash tables (with billions of entries, e.g. to store how often a particular bit of DNA appeared).
Interestingly, however, both HashMap and OpenHashMap seem to ignore the parameters which specify initial size (2.9.2. and 2.10.0, latest build).
I think that this is so because adding new elements becomes much slower after the first 800.000 or so.
I have tried increasing the entropy in the strings which are to be inserted (only the chars ACGT in the code below), without effect.
Any advice on this specific issue? I would also be grateful to hear your opinion on whether using Scala's inbuilt types is a good idea for a hash table with billions of entries.
import scala.collection.mutable.{ HashMap, OpenHashMap }
import scala.util.Random
object HelloWorld {
def main(args: Array[String]) {
val h = new collection.mutable.HashMap[String, Int] {
override def initialSize = 8388608
}
// val h = new scala.collection.mutable.OpenHashMap[Int,Int](8388608);
for (i <- 0 until 10000000) {
val kMer = genkMer()
if(! h.contains(kMer))
{
h(kMer) = 0;
}
h(kMer) = h(kMer) + 1;
if(i % 100000 == 0)
{
println(h.size);
}
}
println("Exit. Hashmap size:\n");
println(h.size);
}
def genkMer() : String =
{
val nucs = "A" :: "C" :: "G" :: "T" :: Nil
var s:String = "";
val r = new scala.util.Random
val nums = for(i <- 1 to 55 toList) yield r.nextInt(4)
for (i <- 0 until 55) {
s = s + nucs(nums(i))
}
s
}
}
I wouldn't use Java data structures to manage a map of billions of entries. Reasons:
The max buckets in a Java HashMap is 2^30 (~1B), so
with default load factor you'll fail when the map tries to resize after 750 M entries
you'll need to use a load factor > 1 (5 would theoretically get you 5 billion items, for example)
With a high load factor you're going to get a lot of hash collisions and both read and write performance is going to start to degrade badly
Once you actually exceed Integer.MAX_INTEGER values I have no idea what gotchas exist -- .size() on the map wouldn't be able to return the real count, for example
I would be very worried about running a 256 GB heap in Java -- if you ever hit a full GC it is going lock the world for a long time to check the billions of objects in old gen
If it was me I'd be looking at an off-heap solution: a database of some sort. If you're just storing (hashcode, count) then one of the many key-value stores out the might work. The biggest hurdle is finding one that can support many billions of records (some max out at 2^32).
If you can accept some error, probabilistic methods might be worth looking at. I'm no expert here, but the stuff listed here sounds relevant.
First, you can't override initialSize, I think scala let's you because it's package private in HashTable:
private[collection] final def initialSize: Int = 16
Second, if you want to set the initial size, you have to give it a HashTable of the initial size that you want. So there's really no good way of constructing this map without starting at 16, but it does grow by a power of 2, so each resize should get better.
Third, scala collections are relatively slow, I would recommend java/guava/etc collections instead.
Finally, billions of entries is a bit much for most hardware, you'll probably run out of memory. You'll most likely need to use memory mapped files, here's a good example (no hashing though):
https://github.com/peter-lawrey/Java-Chronicle
UPDATE 1
Here's a good drop in replacement for java collections:
https://github.com/boundary/high-scale-lib
UPDATE 2
I ran your code and it did slow down around 800,000 entries, but then I boosted the java heap size and it ran fine. Try using something like this for jvm:
-Xmx2G
Or, if you want to use every last bit of your memory:
-Xmx256G
These are the wrong data structures. You will hit a ram limit pretty fast (unless you have 100+gb, and even then you will still hit limits very fast).
I don't know if suitable data structures exist for scala, although someone will have done something with Java probably.

In javascript, what are the trade-offs for defining a function inline versus passing it as a reference?

So, let's say I have a large set of elements to which I want to attach event listeners. E.g. a table where I want each row to turn red when clicked.
So my question is which of these is the fastest, and which uses the least memory. I understand that it's (usually) a tradeoff, so I would like to know my best options for each.
Using the table example, let's say there's a list of all the row elements, "rowList":
Option 1:
for(var r in rowList){
rowList[r].onclick = function(){ this.style.backgroundColor = "red" };
}
My gut feeling is that this is the fastest, since there is one less pointer call, but the most memory intensive, since each rowlist will have its own copy of the function, which might get serious if the onclick function is large.
Option 2:
function turnRed(){
this.style.backgroundColor = "red";
}
for(var r in rowList){
rowList[r].onclick = turnRed;
}
I'm guessing this is going to be only a teensy bit slower than the one above (oh no, one more pointer dereference!) but a lot less memory intensive, since the browser only needs to keep track of one copy of the function.
Option 3:
var turnRed = function(){
this.style.backgroundColor = "red";
}
for(var r in rowList){
rowList[r].onclick = turnRed;
}
I assume this is the same as option 2, but I just wanted to throw it out there. For those wondering what the difference between this and option 2 is: JavaScript differences defining a function
Bonus Section: Jquery
Same question with:
$('tr').click(function(){this.style.backgroundColor = "red"});
Versus:
function turnRed(){this.style.backgroundColor = "red"};
$('tr').click(turnRed);
And:
var turnRed = function(){this.style.backgroundColor = "red"};
$('tr').click(turnRed);
Here's your answer:
http://jsperf.com/function-assignment
Option 2 is way faster and uses less memory. The reason is that Option 1 creates a new function object for every iteration of the loop.
In terms of memory usage, your Option 1 is creating a distinct function closure for each row in your array. This approach will therefore use more memory than Option 2 and Option 3, which only create a single function and then pass around a reference to it.
For this same reason I would also expect Option 1 to be the slowest of the three. Of course, the difference in terms of real-world performance and memory usage will probably be quite small, but if you want the most efficient one then pick either Option 2 or Option 3 (they are both pretty much the same, the only real difference between the two is the scope at which turnRed is visible).
As for jQuery, all three options will have the same memory usage and performance characteristics. In every case you are creating and passing a single function reference to jQuery, whether you define it inline or not.
And one important note that is not brought up in your question is that using lots of inline functions can quickly turn your code into an unreadable mess and make it more difficult to maintain. It's not a big deal here since you only have a single line of code in your function, but as a general rule if your function contains more than 2-3 lines it is a good idea to avoid defining it inline. Instead define it as in Option 2 or Option 3 and then pass around a reference to it.