Memory Efficient and Speedy iPhone/Android Dictionary Storage/Access

Memory Efficient and Speedy iPhone/Android Dictionary Storage/Access - iphone

Im having trouble with memory on older generation iPhones (ipod touch 1st gen, 2nd gen e.t.c). This is due to the amount of memory allocated when I load and store a 170k word dictionary.
This is the code (very simple):
string[] words = dictionaryRef.text.Split("\n"[0]);
_words = new List<string>(words);
This allocates on start around 12mb of storage, iphone has around 43mb I think. So that + textures + sounds + the OS it tends to break.
Speed wise, accessing using a binary search is fine. But its storing it in memory more efficiently (and loading it more efficiently).
The text.Split appears to take up alot of heap memory.
Any advice?

You can't count too much on how much memory these pre-3.0 devices have available on startup. 43 MB is rather optimistic. Is your app just checking to see if the word is in the list or not? You might want to roll your own hash table instead of using a binary search. I'd search some of the literature and stack overflow to look for efficient ways to store a large dictionary with the particular word sizes you have. A google search on hash table might give you a better implementation.

Use SQLite. It will use less memory and be faster. Create an index on your words column and voila, you have binary search, without having the whole dictionary loaded in memory.

First if dictionaryRef.text is a string (and it looks so) then you already got something huge being allocated (2 bytes per characters). Check this it might well account for a large (near half) amount of the total memory being allocated. You should think about caching this (the database idea is a good one, but a file could do to then use File.ReadAllLines in future execution).
Next you can try do a bit better than Mono's Split method. It creates a List and then turn it into an array (calling ToList) at the end - which you end up creating a new List from. Since your requirement (only '/n') is fairly basic I suggest you to roll your own Split method (or copy/paste/reduce the one from Mono) and avoid the temporary memory allocations.
In any case take a lot of (memory) measurements since allocations, even more for strings, often occurs where we don't look ;-)

I would have to agree with Morningstar that using a SQLite backend for your word storage sounds like the best solution to what you are trying to do.
However, if you insist on using a word list, here's a suggestion:
It looks to me like dictionaryRef.text is constructed by reading a text file in its entirety (File.ReadAllText() or some such).
Instead of doing that, why not use TextReader.ReadLine() to read 1 word at a time from the file into a List, thus avoiding the need to use String.Split() and using tons of temporary storage space?
Ultimately that seems to be what you want anyway... and ReadLine() will "split" on \n for you.

Related

Storing a string that can vary a lot, from very long to very short :: Fragmentation

Ok, so every player in my game has a document in my players collection and each player has 1 string that is a serialized has of their game state. So this string can be
way long or way short and vary a lot for every single player.
I had somebody who doesn't have a ton of mongo experience tell me that i should pad every single string in the collection so that they are all the same length. So like add tons of zeros at the end to all the short and medium game state strings.
So A) is this a good idea?
B) I'm not even totally sure how to find out the longest length of a game so Im not sure how far to pad them and what if later on game states exceed my padding length?
My friend said he had a mongo collection keep blowing up because of fragmentation and when he implemented padding all of his issues went away.
oh i doubt it matters but my code is in php and obviously uses the php pecl mongo driver
Thanks for any thoughts or input!!!!!
-dave

MongoDB allocates space for documents at creation time. If the size of the document increases the document will need to be moved to a new location to accomodate the larger size. The original space is not released to the operating system. Instead, MongoDB will eventually reuse this space. Until this happens, it may appear the database is over-allocated or what is sometimes called fragmented.
So, what probably happened to your friend:
documents were inserted
when fields were updated, their sizes sometimes increased, and the documents therefore grew
documents were moved as they grew, and the database became over-allocated (what your
friend called fragmented)
And by padding the fields in the documents your friend was able to ensure documents never grew in size and therefore his database never became over-allocated.
The padding approach is valid but it also adds complexity to the application. Typically padding is performed for fields that will eventually be created, rather than fixing the size of the values themselves, but the idea is the same. In your case it doesn't sound like padding is a great option because you cannot predict the field size.
Instead, you might consider using usePowerOf2Sizes: http://docs.mongodb.org/manual/reference/command/collMod/
This configuration will automatically pad the space allocated for documents and will increase the chances that space is reused for efficiently by MongoDB at the cost of a slightly larger database.

So A) is this a good idea?
Depends. If the game documents were to be frequently updated in such a manner that they would move on disk a lot then you might find that padding does help, however, considering that the entire works of Shakespear can fit into a 4mb document with some room left I doubt very much that any string you have will cause a heavy amount of fragmentation; in fact I will be quite surprised if it does.
The problem that could, in theory, occur is that you get a lot of documents within your freelists and deleted buckets that cannot be reused causing fragmentation to occur.
Not only that but the IO of disk movement can be a killer if it becomes persistent.
B) I'm not even totally sure how to find out the longest length of a game so Im not sure how far to pad them and what if later on game states exceed my padding length?
Then the idea is useless, infact the idea is 90% of the time useless anyway and you would be better off using a power of 2 sizes allocation on your documents if this were to be a problem: http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes
Using this option would be a far more optimal approach to solving fragmentation issues.
My friend said he had a mongo collection keep blowing up because of fragmentation and when he implemented padding all of his issues went away.
A friend of a friend, of a cousin, of a niece of mine said something similar too...you would be better off testing this for yourself.
I would bet that the bigger problem he had was with indexes and the queries he performed. It is extremely rare for string lengths to cause such a heaving amount of IO usage in disk movement that you would actually use artificial padding.

From your question I understand those strings are just blobs, i.e. they are not structured in some way for allowing db queries/filtering on their contents. If this is the case, store them in files, and store file names in the mongo document.

lazy evaluation of encrypted text stored in large NSArray

I have to store about 10k text lines in an Array. Each line is stored as a separate encrypted entry. When the app runs I only need to access a small number and decrypt them - depending on user input. I thought of some kind of lazy evaluation but don't know how to do it in this case.
This is how I build up my array: [allElements addObject: #"wdhkasuqqbuqwz" ] The string is encrypted. Accessing is like txt = [[allElements objectAtIndex:n] decrypt]
The problem currently is that this uses lots of memory from the very start - most of the items I don't need anyway, just don't know which ones ;). Also I am hesitant to store the text externally eg in a textfile, since this would make it easier to access it.
Is there a way to minimize memory usage in such a case?
ps initialization is very fast, so no issue here

So it's quite a big array, although not really big enough to be triggering any huge memory warnings (unless my maths has gone horribly wrong, I reckon your array of 10,000 40-character strings is about 0.76 MB. Perhaps there are other things going on in your app causing these warnings - are you loading any large images or many assets?
What I'm a little confused about it how you're currently storing these elements before you initalise the array. Because you say you don't want to store the text externally in a text file, but you must be holding them in some kind of file before initialising your array, unless of course your values are generated on the fly.
If you've encrypted correctly, you shouldn't need to care whether your values are stored in plain-sight or not. Hopefully you're using an established standard and not rolling your own encryption, so really I think worrying about users getting hold of the file is a moot point. After all, the whole point of encryption is being able to hide data in plain sight.
I would recommend, as a couple of your commenters already have, is that you should just use some form of database storage. Core Data was made for this purpose - handling large amounts of data with minimal memory impact. But again, I'm not sure how that array alone could trigger a memory warning, so I suspect there's other stuff going on in your app that's eating up your memory.

Objective C iPhone performance issue

Ok guys I am developing an iPhone app I have a Model class which follows a Singleton design pattern.
Now I have an NSArray in it which is initialized to around some 1000 NSStrings in the init method.
Now I need to use this data in some view controller. so I import Model.h, I create an array of NSString objects in view controller & set the data to it. But now the problem is that now I have 2000 NSStrings currently allocated, which I believe is not a good thing on iPhone due to memory considerations.
releasing model object wont help because I've overrided release method to release nothing according to the pattern & I cannot change the design now because now a lot of code works on the assumption of model being a singleton.
& in future maybe the initial NSStrings may grow to 2000 or even more & then I'll have 4000 NSStrings allocated at one time ....
I am a little confused on how to go about it any suggestions

A few thousand strings take barely any memory at all. 4000 strings would take a couple of hundred kB, depending on length. (Rule of thumb here is string length + 20).
Edit: Probably more like string length + 30 or 40, actually; I'm not certain how much overhead NSArray adds.
Reedit: Given the information from the below question; you could probably get away with loading a few hundred strings at the most; just around the area you are browsing; basically turning your SQLite access into a sparse array that caches a few strings around the search area. Not, of course, that I believe it to be necessary; if the strings are location names they probably have an average byte size of 20-30 bytes; giving a (very) rough estimate of 300k of memory to keep them all in memory permanently, greatly reducing access time and giving a better user experience. The iPhone doesn't have a lot of RAM; but you can afford, at the very least a fair few megabytes; 300k isn't going to break your back.

It's difficult to offer specific suggestions without knowing more about your implementation--where do your strings come from? In general, the best performance optimization for this sort of situation is lazy loading. Here are some examples of ways to have a reduced memory footprint with different technologies if you have a table view:
Core Data: Not usually a problem, since objects are faulted and fetched automatically.
SQLite: Again, not usually a problem--you query the database every time you need a particular value (such as when a table view cell needs to display a string).
Internet: Start a request (usually via a thread) when the table view cell is visible.
XML: Trickier, but use SAX (event)-based parsing to find values instead of DOM parsing (which loads the entire document to memory).
That being said, if you made design decisions that are difficult to reverse, it may not be possible to significantly reduce your memory footprint without major refactoring.
EDIT: As per other answers, it's probably not worrying about, but if you were to optimize for memory, you would not load all the SQLite values at application launch, but instead fetch each value from SQLite in cellForRowAtIndexPath. These sorts of problems are made much easier using Core Data--I would highly recommend using Core Data instead of straight SQLite (although it sounds as if you might be too far into development to switch at this point).

If your Model object really is a singleton, then all you need to do is get your strings from the model object in the view controller and use them. I don't see why you would get duplicates.
With NSString, as long as the strings are immutable, the copy method should just retain the object to copy and return the same object to you. Also, if your strings are constant strings i.e. defined like so:
NSString* foo = #"bar";
they are actually part of the executable and will take up no extra RAM at run time.

Obj-C circular buffer object, implementing one?

I've been developing for the iPhone for quite some time and I've been wondering if there's any array object that uses circular buffer in Obj-C? Like Java's Stack or List or Queue.
I've been tinkering with the NSMutableArray, testing it's limits... and it seems that after 50k simple objects inside the array - the application is significantly slowed down.
So, is there any better solution other than the NSMutableArray (which becomes very slow with huge amounts of data). If not, can anyone tell me about a way to create such an object (would that involve using chain (node) objects??).
Bottom line: Populating a UITableView from an SQLite DB directly would be smart? As it won't require memory from an array or anything, but just the queries. And SQLite is fast and not memory grinding.
Thank you very much for you time and attention,
~ Natanavra.
From what I've been thinking it seems that going for Quinn's class is the best option possibly.
I have another question - would it be faster or smarter to load everything straight from the SQLite DB instead of creating an object and pushing it into an array?
Thank you in advance,
~ Natanavra.

Apologies for tooting my own horn, but I implemented a C-based circular buffer in CHDataStructures. (Specifically, check out CHCircularBufferQueue and CHCircularBufferStack.) The project is open source and has benchmarks which demonstrate that a true circular buffer is quite fast when compared to NSMutableArray in the general case, but results will depend on your data and usage, as well as the fact that you're operating on a memory-constrained device (e.g. iPhone). Hope that helps!

If you're seeing performance issues, measure where your app is spending its time, don't just guess. Apple provides an excellent set of performance measurement tools.

It's trivial to have NSMutable array act like a stack, list, queue etc using the various insertObject:atIndex: and removeObjectAtIndex: methods. You can write your own subclasses if you want to hardwire the behavior.
I doubt the performance problems you are seeing are being caused by NSMutableArray especially if your point of reference is the much, much slower Java. The problem is most likely the iPhone itself. As noted previously, 50,000 objective-c objects is not a trivial amount of data in this context and the iPhone hardware may struggle to managed that much data.
If you need some kind of high performance array for bytes, you could use one of the core foundation arrays or roll your own in plain C and then wrap them in a custom class.
It sounds to me like you need to switch to core data so you don't have to keep all this in memory. Core data will efficiently fetch what you want only when you need it.

You can use STL classes in "Objective-C++" - which is a fancy name for Objective-C making use of C++ classes. Just name those source files that use C++ code with a ".mm" extension and you'll get the mixed runtime.

Objective-C objects are not really "simple," so 50,000 of them is going to be pretty demanding. Write your own in straight C or C++ if you want to avoid the bottlenecks and resource demands of the Objective-C runtime.
A rather lengthy and non-theoretical discussion of the overhead associated with convenience:
http://www.cocoabuilder.com/archive/cocoa/35145-nsarray-overhead-question.html#35128
And some simple math for simple people:
All it takes to make an object as opposed to a struct is a single pointer at the beginning.
Let's say that's true, and let's say we're running on a 32-bit system with 4 byte pointers.
4 bytes x 50,000 objects = 200000 bytes
That's nearly 200MB worth of extra memory that your data suddenly needs just because you used Objective-C. Now compound that with the fact that whatever NSArray you add those objects to is going to double that by keeping its own set of pointers to those objects and you've just chewed up 400MB of RAM just so you could use a couple of convenience wrappers.
Refresh my memory here... Are swap files on hard drives as fast as RAM? How much RAM is there in an iPhone? How many function calls and stack frames does it take to send an object a message? Why isn't IOKit written in Objective-C? How many of Apple's flagship applications that do a lot of DSP use AppKit? Anybody got a copy of otool they can check with? I'm seeing zero here.

What's the fastest way to save data and read it next time in a IPhone App?

In my dictionary IPhone app I need to save an array of strings which actually contains about 125.000 distinct words; this transforms in aprox. 3.2Mb of data.
The first time I run the app I get this data from an SQLite db. As it takes ages for this query to run, I need to save the data somehow, to read it faster each time the app launches.
Until now I've tried serializing the array and write it to a file, and afterword I've tested if writing directly to NSUserDefaults to see if there's any speed gain but there's none. In both ways it takes about 7 seconds on the device to load the data. It seems that not reading from the file (or NSUserDefaults) actually takes all that time, but the deserialization does:
objectsForCharacters = [[NSKeyedUnarchiver unarchiveObjectWithData:data] retain];
Do you have any ideeas about how I could write this data structure somehow that I could read/put in memory it faster?

The UITableView is not really designed to handle 10s of thousands of records. If would take a long time for a user to find what they want.
It would be better to load a portion of the table, perhaps a few hundred rows, as the user enters data so that it appears they have all the records available to them (Perhaps providing a label which shows the number of records that they have got left in there filtered view.)
The SQLite db should be perfect for this job. Add an index to the words table and then select a limited number of rows from it to show the user some progress. Adding an index makes a big difference to the performance of the even this simple table.
For example, I created two tables in a sqlite db and populated them with around 80,000 words
#Create and populate the indexed table
create table words(word);
.import dictionary.txt words
create unique index on words_index on word DESC;
#Create and populate the unindexed table
create table unindexed_words(word);
.import dictionary.txt unindexed_words
Then I ran the following query and got the CPU Time taken for each query
.timer ON
select * from words where word like 'sn%' limit 5000;
...
>CPU Time: user 0.031250 sys 0.015625;
select * from unindex_words where word like 'sn%' limit 5000;
...
>CPU Time: user 0.062500 sys 0.0312
The results vary but the indexed version was consistently faster that the unindexed one.
With fast access to parts of the dictionary through an indexed table, you can bind the UITableView to the database using NSFecthedResultsController. This class takes care of fecthing records as required, caches results to improve performance and allows predicates to be easily specified.
An example of how to use the NSFetchedResultsController is included in the iPhone Developers Cookbook. See main.m

Just keep the strings in a file on the disk, and do the binary search directly in the file.
So: you say the file is 3.2mb. Suppose the format of the file is like this:
key DELIMITER value PAIRDELIMITER
where key is a string, and value is the value you want to associate. The DELIMITER and PAIRDELIMITER must be chosen as such that they don't occur in the value and key.
Furthermore, the file must be sorted on the key
With this file you can just do the binary search in the file itself.
Suppose one types a letter, you go to the half of the file, and search(forwards or backwards) to the first PAIRDELIMITER. Then check the key and see if you have to search upwards or downwards. And repeat untill you find the key you need,
I'm betting this will be fast enough.

Store your dictionary in Core Data and use NSFetchedResultsController to manage the display of these dictionary entries in your table view. Loading all 125,000 words into memory at once is a terrible idea, both performance- and memory-wise. Using the -setFetchBatchSize: method on your fetch request for loading the words for your table, you can limit NSFetchedResultsController to only handling the small subset of words that are visible at any given moment, plus a little buffer. As the user scrolls up and down the list of words, new batches of words are fetched in transparently.
A case like yours is exactly why this class (and Core Data) was added to iPhone OS 3.0.

Do you need to store/load all data at once?
Maybe you can just load the chunk of strings you need to display and load all other strings in the background.

Perhaps you can load data into memory in one thread and search from it in another? You may not get search results instantly, but having some searches feel snappier may be better than none at all, by waiting until all data are loaded.

Are some words searched more frequently or repeatedly than others? Perhaps you can cache frequently searched terms in a separate database or other store. Load it in a separate thread as a searchable store, while you are loading the main store.
As for a data structure solution, you might look into a suffix trie to search for substrings in linear time. This will probably increase your storage requirements, though, which may affect your ability to implement this with an iPhone's limited memory and disk storage capabilities.

I really don't think you're on the right path trying to load everything at once.
You've already determined that your bottleneck is the deserialization.
Regardless what the UI does, the user only sees a handful (literally) of search results at a time.
SQLlite already has a robust indexing mechanism, there is likely no need to re-invent that wheel with your own indexing, etc.
IMHO, you need to rethink how you are using UITableView. It only needs a few screenfuls of data at a time, and you should reuse cell objects as they scroll out of view rather than creating a ton of them to begin with.
So, use SQLlite's indexing and grab "TOP x" rows, where x is the right balance between giving the user some immediately-available rows to scroll through without spending too much time loading them. Set the table's scroll bar scaling using a separate SELECT COUNT(*) query, which only needs to be updated when the user types something different.
You can always go back and cache aggressively after you deserialize enough to get something up on-screen. A slight lag after the first flick or typing a letter is more acceptable than a 7-second delay just starting the app.

I have currently a somewhat similar coding problem with a large amount of searchable strings.
My solution is to store the prepared data in one large memory array, containing both the texttual data and offsets as links. Meaning I do not allocate objects for each item. This makes the data use less memory and also allows me to load & save it to a file without further processing.
Not sure if this is an option for you, since this is quite an obvious solution once you've realized that the object tree is causing the slowdown.

I use a large NSData memory block, then search through it. Well, there's more to it, it took me about two days to get it well optimized.
In your case I suspect you have a dictionary with a lot of words that have similar beginnings. You could prepare them on another computer in a format the both compacts the data and also facilitates fast lookup. As a first step, the words should be sorted. With that, you can already perform a binary search on them for a fast lookup. If you store it all in one large memory area, you can do the search quite fast, compared to how sqlite would search, I think.
Another way would be to see the words as a kind of tree: You have many thousands that begin with the same letter. So you divide your data accordingly: You have a sql table for each beginning letter of your set of words. that way, if you look up a word, you'd select one of the now-smaller tables depening on the first letter. This makes the amount that has to be searched already much smaller. and you can do this for the 2nd and 3rd letter as well, and you already could have quite a fast access.
Did this give you some ideas?

Well actually I figured it out myself in the end, but of course I thank you all for your quick and pertinent answers. To be concise I will just say that, the fact that Objective-C, just like any other object-based programming language, due to introspection and other objective requirements is significantly slower than procedural programming languages.
The solution was in fact to load all my data in a continuous chunk of memory using malloc (a char **) and search on-demand in it and transform to objects. This concluded in a .5 sec loading time (from file to memory) and resonable (should be read "fast") operations during execution. Thank you all again and if you have any questions I'm here for you. Thanks

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse