Objective-C sparse array redux - iphone

First off, I've seen this, but it doesn't quite seem to suit my needs.
I've got a situation where I need a sparse array. Some situations where I could have, say 3000 potential entries with only 20 allocated, other situations where I could have most or all of the 3000 allocated. Using an NSMutableDictionary (with NSString representations of the integer index values) would appear to work well for the first case, but would seemingly be inefficient for the second, both in storage and lookup speed. Using an NSMutableArray with NSNull objects for the empty entries would work fairly well for the second case, but it seems a bit wasteful (and it could produce an annoying delay at the UI) to insert most of 3000 NSNull entries for the first case.
The referenced article mentions using an NSMapTable, since it supposedly allows integer keys, but apparently that class is not available on iPhone (and I'm not sure I like having an object that doesn't retain, either).
So, is there another option?
Added 9/22
I've been looking at a custom class that embeds an NSMutableSet, with set entries consisting of a custom class with integer (ie, element#) and element pointer, and written to mimic an NSMutableArray in terms of adds/updates/finds (but not inserts/removals). This seems to be the most reasonable approach.

A NSMutableDictionary probably will not be slow, dictionaries generally use hashing and are rather fast, bench mark.
Another option is a C array of pointers. Allocation a large array only allocates virtual memory until the real memory is accessed (cure calloc, not malloc, memset). The downside is that memory is allocated in 4KB pages which can be wasteful for small numbers of entries, for large numbers of entries many may fall in the same page.

What about CFDictionary (or actually CFMutableDictionary)? In the documentation, it says that you can use any C data type as a key, so perhaps that would be closer to what you need?

I've got the custom class going and it works pretty well so far. It's 322 lines of code in the h+m files, including the inner class stuff, a lot of blank lines, comments, description formatter (currently giving me more trouble than anything else) and some LRU management code unrelated to the basic concept. Performance-wise it seems to be working faster than another scheme I had that only allowed "sparseness" on the tail end, presumably because I was able to eliminate a lot of special-case logic.
One nice thing about the approach was that I could make much of the API identical to NSMutableArray, so I only needed to change maybe 25% of the lines that somehow reference the class.

I also needed a sparse array and have put mine on git hub.
If you need a sparse array feel free to grab https://github.com/LavaSlider/DSSparseArray

Related

Is there a possibility to create a memory-efficient sequence of bits in the JVM?

I've got a piece of code that takes into account a given amount of features, where each feature is Boolean. I'm looking for the most efficient way to store a set of such features. My initial thought was to try and store these as a BitSet. But then, I realized that this implementation is meant to be used to store numbers in bit format rather than manipulate each bit, which is something I'd like to do (see the effect of switching any feature on and off). I then thought of using a Boolean array, but apparently the JVM uses much more memory for each Boolean element than the one bit it actually needs.
I'm therefore left with the question: What is the most efficient way to store a set of bits that I'd like to treat as independent bits rather than the building blocks of some number?
Please refer to this question: boolean[] vs. BitSet: Which is more efficient?
According to the answer of Peter Lawrey, boolean[] (not Boolean[]) is your way to go since its values can be manipulated and it takes only one byte of memory per bit to store. Consider that there is no way for a JVM application to store one bit in only one bit of memory and let it be directly (array-like) manipulated because it needs a pointer to find the address of the bit and the smallest addressable unit is a byte.
The site you referenced already states that the mutable BitSet is the same as the java.util.BitSet. There is nothing you can do in Java that you can't do in Scala. But since you are using Scala, you probably want a safe implementation which is probably meant to be even multithreaded. Mutable datatypes are not suitable for that. Therefore, I would simply use an immutable BitSet and accept the memory cost.
However, BitSets have their limits (deriving from the maximum number of int). If you need larger data sizes, you may use LongBitSets, which are basically Map<Long, BitSet>. If you need even more space, you may nest them in another map Map<Long, LongBitSet>, but in that case you need to use two or more identifiers (longs).

When is my struct too large?

We're encouraged to use struct over class in Swift.
This is because
The compiler can do a lot of optimizations
Instances are created on the stack which is a lot more performant than malloc/free calls
The downside to struct variables is that they are copied each time when returning from or assigned to a function. Obviously, this can become a bottleneck too.
E.g. imagine a 4x4 matrix. 16 Float values would have to be copied on every assign/return which would be 1'024 bits on a 64 bit system.
One way you can avoid this is using inout when passing variables to functions, which is basically Swifts way of creating a pointer. But then we're also discouraged from using inout.
So to my question:
How should I handle large, immutable data structures in Swift?
Do I have to worry creating a large struct with many members?
If yes, when am I crossing the line?
This accepted answer is not entirely answering the question you had: Swift always copies structs. The trick that Array/Dictionary/String/etc do is that they are just wrappers around classes (which contain the actual stored properties). That way sizeof(Array) is just the size of the pointer to that class (MemoryLayout<Array<String>>.stride == MemoryLayout<UnsafeRawPointer>.stride)
If you have a really big struct, you might want to consider wrapping its stored properties in a class for efficient passing around as arguments, and checking isUniquelyReferenced before mutating to give COW semantics.
Structs have other efficiency benefits: they don't need reference-counting and can be decomposed by the optimiser.
In Swift, values keep a unique copy of their data. There are several
advantages to using value-types, like ensuring that values have
independent state. When we copy values (the effect of assignment,
initialization, and argument passing) the program will create a new
copy of the value. For some large values these copies could be time
consuming and hurt the performance of the program.
https://github.com/apple/swift/blob/master/docs/OptimizationTips.rst#the-cost-of-large-swift-values
Also the section on container types:
Keep in mind that there is a trade-off between using large value types
and using reference types. In certain cases, the overhead of copying
and moving around large value types will outweigh the cost of removing
the bridging and retain/release overhead.
From the very bottom of this page from the Swift Reference:
NOTE
The description above refers to the “copying” of strings, arrays, and dictionaries. The behavior you see in your code will always be as if a copy took place. However, Swift only performs an actual copy behind the scenes when it is absolutely necessary to do so. Swift manages all value copying to ensure optimal performance, and you should not avoid assignment to try to preempt this optimization.
I hope this answers your question, also if you want to be sure that an array doesn't get copied, you can always declare the parameter as inout, and pass it with &array into the function.
Also classes add a lot of overhead and should only be used if you really must have a reference to the same object.
Examples for structs:
Timezone
Latitude/Longitude
Size/Weight
Examples for classes:
Person
A View

How does the "Implementing FP languages with fast equality, sets and maps..." technique deal with garbage collection?

This paper presents a technique for the implementation of functional languages with fast equality, sets and maps, using hash-consing under the hoods. As far as my understanding goes, it uses the address of a hash-consed value as its key when inserting it on a map. This has the advantage that figuring the hashed key of essentially any value is O(1), as opposed to the O(N) standard. What I don't understand, though, is: what happens with a map after a garbage collection? Since the GC process will cause the address of every value to change, then the configuration of the map will be incorrect. In other words, there is no guarantee that addr(value) will be the same for the lifetime of the program.
Since the GC process will cause the address of every value to change
Only moving garbage collectors do that. When using non-moving algorithms like mark-and-sweep, all that happens is that unused objects are freed during the GC cycle - used objects stay exactly where they are.
Moving garbage collectors are generally seen as preferable to mark-and-sweep, but according to the abstract of the paper "mark-and-sweep becomes fast in a maximal sharing environment", which is further expanded on in section 2.4.4.
The paper also describes a way to make moving garbage collectors work (by assigning each object a unique id and using that instead of its address), but deems that impractical (section 2.4.2).

best way of handling self-changing array of information

This question is about handling arrays of information, there's are many ways I could do this, but I would like some input from programmers with more experience, I know what I want to do just not how to organize the information the best way, and objective-C is really making me ponder this, I don't want to get 100 hours into work a decide, oops this wasted the beast way to do this. So here goes:
I have a grid where I'm simulating a playing field, each piece of the grid I call a cell. The cells have around 20 different values each, all integers, nothing fancy. A change to a cell will be either by player input, or occur or by surrounding cells through different algorithms.
The changes to cells will occur once a turn is complete, so it's not real time. Now, I'm not even sure about doing this with a MutableArrays, a plain Array, or just a plain matrix. Arrays are good at keeping such info for one dimension, but I would imagine would become quite cumbersome if you have to address a batch of 10,000 of these cells. On the other hand a simple matrix might not be so elegant, but probably easier to work with.
Any insight would be greatly appreciated.
You have two options here that I see:
1) Use standard containers
Assuming that the playing field is of constant size, then you can create a mutable array of x*y size, and populate it with mutable dictionaries. By giving everything in the second mutable dictionary keys, you can query and set their properties (all objects of course, so wrap ints in NSNumbers etc). For indexing use a macro INDEX_FROM_ROW_COL(row, col) and apply the appropriate code to multiply/add.
2) Create a helper object subclassed from NSObject. It would manage mutable objects as above, but you could load it with functionality specific to your application. You could provide methods that have parameters of "row:" and "col:". Methods that change or set properties of each cell based on some criteria. Personally, I think this is a better idea as you can incapsulate logic here and make the interface to it more high level. It will make it easier to log whats going on too.

Optimizing a Cocoa/Objective-C search

I'm performing a search of a large plist file which contains dictionaries, tens of thousands of them, each with 2 key/string pairs. My search algorithms goes through the dictionaries, and when it finds a text match in either of the strings in the dictionary, the contents of the dictionary are inserted. Here is how it works:
NSDictionary *eachEntry;
NSArray *rawGlossaryArray = [[NSArray alloc] initWithContentsOfFile:thePath]; // this contains the contents of the plist
for (eachEntry in rawGlossaryArray)
{
GlossaryEntry *anEntry = [[GlossaryEntry alloc] initWithDictionary:eachEntry];
NSRange titleResultsRange = [anEntry.title rangeOfString:filterString options:NSCaseInsensitiveSearch];
NSRange defResultsRange = [anEntry.definition rangeOfString:filterString options:NSCaseInsensitiveSearch];
if (titleResultsRange.length > 0 || defResultsRange.length > 0) {
// store that item in the glossary dictionary with the name as the key
[glossaryDictionary setObject:anEntry forKey:anEntry.title];
}
[anEntry release];
}
Each time a search is performed, there is a delay of around 3-4 seconds in my iPhone app (on the device at least; everything runs pretty quickly in the simulator). Can anyone advise on how I might optimize this search?
Without looking at the data set I can't be sure, but if you profile it you are spending the vast percentage of your time in -rangeOfString:options:. If that is the case you will not be able to improve performance without fundamentally changing the data structure you are using to store your data.
You might want to construct some sort trie with strings and substrings pointing to the objects. It is much more complicated thing to setup, and insertions into it will be more expensive, but lookup would be very fast. Given that you are serializing out the structure anyway expensive inserts should not be much of an issue.
That just cries out for using a database, that you pre-populate and put into the application.
A few suggestions:
You're doing a lot of allocing and releasing in that loop. Could you create a single GlossaryEntry before the loop, then just reload it's contents inside the loop? This would avoid a bunch of alloc/releases.
Rather than loading the file each time, could you lazy load it once and keep it cached in memory (maybe in a singleton type object)? Generally this isn't a good idea on the iPhone, but you could have some code in your "didReceiveMemoryWarning" handler that would free the cache if it became an issue.
You should run your application is Instruments, and see what the bottleneck really is. Performance optimizations in the blind are really difficult, and we have tools to make them clear, and the tools are good too!
There's also the possibility that this isn't optimizable. I'm not sure if it's actually hanging the UI in your app or just taking a long time. If it's blocking the UI you need to get out of the main thread to do this work. Same with any significant work to keep an app responsive.
try the following, and see if you get any improvement:
1) use
- (NSRange)rangeOfString:(NSString *)aString options:(NSStringCompareOptions)mask
and as mask, pass the value NSLiteralSearch. This may speedup search considerably as described in the Apple documentation (String Programming Guide for Cocoa):
NSLiteralSearch Performs a byte-for-byte comparison. Differing literal sequences (such as composed character sequences) that would otherwise be considered equivalent are considered not to match. Using this option can speed some operations dramatically.
2) From the documentation (String Programming Guide for Cocoa):
If you simply want to determine whether a string contains a given pattern, you can use a predicate:
BOOL match = [myPredicate evaluateWithObject:myString];
For more about predicates, see Predicate Programming Guide.
You're probably getting the best performance you're likely to get, given your current data structures. You need to change how you're accessing the data, in order to get better performance.
Suggestions, in no particular order:
Don't create your GlossaryEntry objects in a loop while you're filtering them. Rather than storing the data in a Property List, just archive your array of GlossaryEntry objects. See the NSCoding documentation.
Rather than searching through tens of thousands of strings at every keystroke, generate an index of common substrings (maybe 2 or 3 letters), and create an NSDictionary that maps from that common substring to the set of results to use as an index. You can create the index at build time, rather than at run-time. If you can slice up your data set into several smaller pieces, the linear search for matching strings will be considerably faster.
Store your data in an SQLite database, and use SQL to query it - probably overkill for just this problem, but allows for more sophisticated searches in the future, if you'll need them.
If creating a simple index doesn't work well enough, you'll need to create a search tree style data structure.
You should profile it in instruments to find where the bottleneck actually is. If I had to guess, I would say the bottleneck would be [[NSArray alloc] initWithContentsOfFile:thePath].
Having said that, you'd probably get the best performance by storing the data in an sqlite database (which you would search with SQL) instead of using a plist.