Are NSManagedObject objectIDs unique across space and time like a CFUUID? - iphone

The NSManagedObjectID documentation states:
An NSManagedObjectID object is a compact, universal, identifier for a managed object. This forms the basis for uniquing in the Core Data Framework. A managed object ID uniquely identifies the same managed object both between managed object contexts in a single application, and in multiple applications (as in distributed systems).
Translation in my head: "There is probably no way that any two NSManagedObjectIDs are ever the same across the set of all instances of my application."
The CFUUID documentation states:
UUIDs ... are 128-bit values
guaranteed to be unique. A UUID is
made unique over both space and time
by combining a value unique to the
computer on which it was
generated—usually the Ethernet
hardware address—and a value
representing the number of
100-nanosecond intervals since October
15, 1582 at 00:00:00.
Translation in my head: "There is definitely no way that any two CFUUIDs are ever the same across the set of all instances of my application."
The fact that NSManagedObjectIDs are described as a "universal identifier" makes me almost certain that they offer the same uniqueness as a CFUUID, whereas "unique across space and time" leaves absolutely no room for doubt. Can anybody with more Core Data experience than me confirm or deny my thoughts?

Beyond uniqueness, there is one case where the object ID will change, and that's if you query it before persisting the object to disk. After saving, it will have a different ID. Beyond that point, the ID will not change. I just wanted to point this out because it caused me a bit of confusion until I figured out what was happening.
I can't comment on the hashing used to generate the NSManagedObjectID, but it does seem like the odds of it matching another NSManagedObject are vanishingly small, based on looking at the IDs generated.

Related

Using the HiLoIdGenerator in NoRM for MongoDB to create a unique identifier

I have been struggling a little with the HiLoIdGenerator that comes with NoRM (http://normproject.org/); I want to use it to generate a unique identifier that I can use as a SLUG for my blog posts. At present I use the ObjectId to uniquely identify a document within MongoDB, but as this is GUID-like and it doesn't look very good in a URL, I would prefer to have something like www.myblog.com/posts/1243 and so this is why I have decided to use the HiLoIdGenerator.
I would like to generate my HiLo id's on the client-side and I read on stuart harris' blog http://red-badger.com/Blog/post/A-simple-IRepository3cT3e-implementation-for-MongoDB-and-NoRM.aspx that NoRM's new HiLo Id generator also allows this by allocating a range of integers to the client session that can be used with impunity (other clients will be using a different range) but when i opened the HiLoIdGenerator it said that the HiLoIdGenerator Class that generates a new identity value using the HILO algorithm. Only one instance of this class should be used in your project.
I really have three questions:
1) if I had multiple instances of the HiLoIdGenerator in my application (say I had an instance in my service class that called GenerateId for every new document) could I actually guarantee that all of my id's would be unique, given that the code for the HiLoIdGenerator class says that there should only be a single instance of this class in an application?
2) the HiLoIdGenerator constructor takes a capacity argument, and I would like to know what it does, I passed 0 and all of the generated Id's were the same, I then passed in 1 new HiLoIdGenerator(1) the Id's began at 1 and were incremented by 1; I don't really understand what it does but I am presuming that it has something to do with a range of potential values that the generator can generate, but I am not sure, and I would like to be. Could someone please explain this argument?
3) I think I understand the aim of the HiLo algorithm as explained here What's the Hi/Lo algorithm? but what I don't understand is whether I can have two instances of MongoDB with two different applications each looking at a different instance of a MongoDB but both containing the same collection types, whether generated id's are globally unique, i.e., could I use them the way I would a GUID, or are they simply unique within a given instance of MongoDB, therefore precluding a merge of both collections into a single instance of MongoDB at a later date?
thanks
See here for how to produce monotonically increasing ids:
http://www.mongodb.org/display/DOCS/Atomic+Operations#AtomicOperations-%22InsertifNotPresent%22
Yes they would be unique, each client (HiLoGenerator) would request a range of lo's that could be allocated but they would only be unique if they both used the same capacity
Capacity is the number of Id's that the client can assign with impunity, again if you have a different capacity amongst clients you have the potential to create non-unique values, if you are using monotonically increasing Id's you are only ever assigning a single sequential value, you do not need the HiLo algorithm, you just need a single place that contains a value that you can increment and assign to a new entity, see dm's answer for an implementation of this
Yes as long as both clients are both using the same collection that holds the Hi value, and as long as both clients use the same capcity for generating the lo's

Immutability and shared references - how to reconcile?

Consider this simplified application domain:
Criminal Investigative database
Person is anyone involved in an investigation
Report is a bit of info that is part of an investigation
A Report references a primary Person (the subject of an investigation)
A Report has accomplices who are secondarily related (and could certainly be primary in other investigations or reports
These classes have ids that are used to store them in a database, since their info can change over time (e.g. we might find new aliases for a person, or add persons of interest to a report)
Domain http://yuml.me/13fc6da0
If these are stored in some sort of database and I wish to use immutable objects, there seems to be an issue regarding state and referencing.
Supposing that I change some meta-data about a Person. Since my Person objects immutable, I might have some code like:
class Person(
val id:UUID,
val aliases:List[String],
val reports:List[Report]) {
def addAlias(name:String) = new Person(id,name :: aliases,reports)
}
So that my Person with a new alias becomes a new object, also immutable. If a Report refers to that person, but the alias was changed elsewhere in the system, my Report now refers to the "old" person, i.e. the person without the new alias.
Similarly, I might have:
class Report(val id:UUID, val content:String) {
/** Adding more info to our report */
def updateContent(newContent:String) = new Report(id,newContent)
}
Since these objects don't know who refers to them, it's not clear to me how to let all the "referrers" know that there is a new object available representing the most recent state.
This could be done by having all objects "refresh" from a central data store and all operations that create new, updated, objects store to the central data store, but this feels like a cheesy reimplementation of the underlying language's referencing. i.e. it would be more clear to just make these "secondary storable objects" mutable. So, if I add an alias to a Person, all referrers see the new value without doing anything.
How is this dealt with when we want to avoid mutability, or is this a case where immutability is not helpful?
If X refers to Y, both are immutable, and Y changes (i.e. you replace it with an updated copy), then you have no choice but to replace X also (because it has changed, since the new X points to the new Y, not the old one).
This rapidly becomes a headache to maintain in highly interconnected data structures. You have three general approaches.
Forget immutability in general. Make the links mutable. Fix them as needed. Be sure you really do fix them, or you might get a memory leak (X refers to old Y, which refers to old X, which refers to older Y, etc.).
Don't store direct links, but rather ID codes that you can look up (e.g. a key into a hash map). You then need to handle the lookup failure case, but otherwise things are pretty robust. This is a little slower than the direct link, of course.
Change the entire world. If something is changed, everything that links to it must also be changed (and performing this operation simultaneously across a complex data set is tricky, but theoretically possible, or at least the mutable aspects of it can be hidden e.g. with lots of lazy vals).
Which is preferable depends on your rate of lookups and updates, I expect.
I suggest you to read how they people deal with the problem in clojure and Akka. Read about Software transactional memory. And some of my thoughts...
The immutability exists not for the sake of itself. Immutability is abstraction. It does not "exist" in nature. World is mutable, world is permanently changing. So it's quite natural for data structures to be mutable - they describe the state of the real or simulated object at a given moment in time. And it looks like OOP rulez here. At conceptual level the problem with this attitude is that object in RAM != real object - the data can be inaccurate, it comes with delay etc
So in case of most trivial requirements you can go with everything mutable - persons, reports etc Practical problems will arise when:
data structures are modified from concurrent threads
users provide conficting changes for the same objects
a user provide an invalid data and it should be rolled back
With naive mutable model you will quickly end up with inconsistent data and crushing system. Mutability is error prone, immutability is impossible. What you need is transactional view of the world. Within transaction program sees immutable world. And STM manages changes to be applied in consistent and thread-safe way.
I think you are trying to square the circle. Person is immutable, the list of Reports on a Person is part of the Person, and the list of Reports can change.
Would it be possible for an immutable Person have a reference to a mutable PersonRecord that keeps things like Reports and Aliases?

What is the most practical Solution to Data Management using SQLite on the iPhone?

I'm developing an iPhone application and am new to Objective-C as well as SQLite. That being said, I have been struggling w/ designing a practical data management solution that is worthy of existing. Any help would be greatly appreciated.
Here's the deal:
The majority of the data my application interacts with is stored in five tables in the local SQLite database. Each table has a corresponding Class which handles initialization, hydration, dehydration, deletion, etc. for each object/row in the corresponding table. Whenever the application loads, it populates five NSMutableArrays (one for each type of object). In addition to a Primary Key, each object instance always has an ID attribute available, regardless of hydration state. In most cases it is a UUID which I can then easily reference.
Before a few days ago, I would simply access the objects via these arrays by tracking down their UUID. I would then proceed to hydrate/dehydrate them as I needed. However, some of the objects I have also maintain their own arrays which reference other object's UUIDs. In the event that I must track down one of these "child" objects via it's UUID, it becomes a bit more difficult.
In order to avoid having to enumerate through one of the previously mentioned arrays to find a "parent" object's UUID, and then proceed to find the "child's" UUID, I added a DataController w/ a singleton instance to simplify the process.
I had hoped that the DataController could provide a single access point to the local database and make things easier, but I'm not so certain that is the case. Basically, what I did is create multiple NSMutableDicationaries. Whenever the DataController is initialized, it enumerates through each of the previously mentioned NSMutableArrays maintained in the Application Delegate and creates a key/value pair in the corresponding dictionary, using the given object as the value and it's UUID as the key.
The DataController then exposes procedures that allow a client to call in w/ a desired object's UUID to retrieve a reference to the actual object. Whenever their is a request for an object, the DataController automatically hydrates the object in question and then returns it. I did this because I wanted to take control of hydration out of the client's hands to prevent dehydrating an object being referenced multiple times.
I realize that in most cases I could just make a mutable copy of the object and then if necessary replace the original object down the road, but I wanted to avoid that scenario if at all possible. I therefore added an additional dictionary to monitor what objects are hydrated at any given time using the object's UUID as the key and a fluctuating count representing the number of hydrations w/out an offset dehydration. My goal w/ this approach was to have the DataController automatically dehydrate any object once it's "hydration retainment count" hit zero, but this could easily lead to significant memory leaks as it currently relies on the caller to later call a procedure that decreases the hydration retainment count of the object. There are obviously many cases when this is just not obvious or maybe not even easily accomplished, and if only one calling object fails to do so properly I encounter the exact opposite scenario I was trying to prevent in the first place. Ironic, huh?
Anyway, I'm thinking that if I proceed w/ this approach that it will just end badly. I'm tempted to go back to the original plan but doing so makes me want to cringe and I'm sure there is a more elegant solution floating around out there. As I said before, any advice would be greatly appreciated. Thanks in advance.
I'd also be aware (as I'm sure you are) that CoreData is just around the corner, and make sure you make the right choice for the future.
Have you considered implementing this via the NSCoder interface? Not sure that it wouldn't be more trouble than it's worth, but if what you want is to extract all the data out into an in-memory object graph, and save it back later, that might be appropriate. If you're actually using SQL queries to limit the amount of in-memory data, then obviously, this wouldn't be the way to do it.
I decided to go w/ Core Data after all.

Hashes vs Numeric id's

When creating a web application that some how displays the display of a unique identifier for a recurring entity (videos on YouTube, or book section on a site like mine), would it be better to use a uniform length identifier like a hash or the unique key of the item in the database (1, 2, 3, etc).
Besides revealing a little, what I think is immaterial, information about the internals of your app, why would using a hash be better than just using the unique id?
In short: Which is better to use as a publicly displayed unique identifier - a hash value, or a unique key from the database?
Edit: I'm opening up this question again because Dmitriy brought up the good point of not tying down the naming to db specific property. Will this sort of tie down prevent me from optimizing/normalizing the database in the future?
The platform uses php/python with ISAM /w MySQL.
Unless you're trying to hide the state of your internal object ID counter, hashes are needlessly slow (to generate and to compare), needlessly long, needlessly ugly, and needlessly capable of colliding. GUIDs are also long and ugly, making them just as unsuitable for human consumption as hashes are.
For inventory-like things, just use a sequential (or sharded) counter instead. If you migrate to a different database, you will just have to initialize the new counter to a value at least as large as your largest existing record ID. Pretty much every database server gives you a way to do this.
If you are trying to hide the state of your counter, perhaps because you're counting users and don't want competitors to know how many you have, I suggest avoiding the display of your internal IDs. If you insist on displaying them and don't want the drawbacks of a hash, you might consider using a maximal-period linear feedback shift register to generate IDs.
I typically use hashes if I don't want the user to be able to guess the next ID in the series. But for your book sections, I'd stick with numerical id's.
Using hashes is preferable in case you need to rebuild your database for some reason, for example, and the ordering changes. The ordinal numbers will move around -- but the hashes will stay the same.
Not relying on the order you put things into a box, but on properties of the things, just seems.. safer.
But watch out for collisions, obviously.
With hashes you
Are free to merge the database with a similar one (or a backup), if necessary
Are not doing something that could help some guessing attacks even a bit
Are not disclosing more private information about the user than necessary, e.g. if somebody sees a user number 2 in your current database log in, they're getting information that he is an oldie.
(Provided that you use a long hash or a GUID,) greatly helping youself in case you're bought by YouTube and they decide to integrate your databases.
Helping yourself in case there appears a search engine that indexes by GUID.
Please let us know if the last 6 months brought you some clarity on this question...
Hashes aren't guaranteed to be unique, nor, I believe, consistent.
will your users have to remember/use the value? or are you looking at it from a security POV?
From a security perspective, it shouldn't matter - since you shouldn't just be relying on people not guessing a different but valid ID of something they shouldn't see in order to keep them out.
Yeah, I don't think you're looking for a hash - you're more likely looking for a Guid.If you're on the .Net platform, try System.Guid.
However, the most important reason not to use a Guid is for performance. Doing database joins and lookups on (long) strings is very suboptimal. Numbers are fast. So, unless you really need it, don't do it.
Hashes have the advantage that you can check if they are valid or not BEFORE performing any check to your database whether they exist or not. This can help you to fend off attacks with random hashes as you don't need to burden your database with fake lookups.
Therefor, if your hash has some kind of well-defined format with for example a checksum at the end, you can check if it's correct without needing to go to the database.

How can I generate a unique, small, random, and user-friendly key?

A few months back I was tasked with implementing a unique and random code for our web application. The code would have to be user friendly and as small as possible, but still be essentially random (so users couldn't easily predict the next code in the sequence).
It ended up generating values that looked something like this:
Af3nT5Xf2
Unfortunately, I was never satisfied with the implementation. Guid's were out of the question, they were simply too big and difficult for users to type in. I was hoping for something more along the lines of 4 or 5 characters/digits, but our particular implementation would generate noticeably patterned sequences if we encoded to less than 9 characters.
Here's what we ended up doing:
We pulled a unique sequential 32bit id from the database. We then inserted it into the center bits of a 64bit RANDOM integer. We created a lookup table of easily typed and recognized characters (A-Z, a-z, 2-9 skipping easily confused characters such as L,l,1,O,0, etc.). Finally, we used that lookup table to base-54 encode the 64-bit integer. The high bits were random, the low bits were random, but the center bits were sequential.
The final result was a code that was much smaller than a guid and looked random, even though it absolutely wasn't.
I was never satisfied with this particular implementation. What would you guys have done?
Here's how I would do it.
I'd obtain a list of common English words with usage frequency and some grammatical information (like is it a noun or a verb?). I think you can look around the intertubes for some copy. Firefox is open-source and it has a spellchecker... so it must be obtainable somehow.
Then I'd run a filter on it so obscure words are removed and that words which are too long are excluded.
Then my generation algorithm would pick 2 words from the list and concatenate them and add a random 3 digits number.
I can also randomize word selection pattern between verb/nouns like
eatCake778
pickBasket524
rideFlyer113
etc..
the case needn't be camel casing, you can randomize that as well. You can also randomize the placement of the number and the verb/noun.
And since that's a lot of randomizing, Jeff's The Danger of Naïveté is a must-read. Also make sure to study dictionary attacks well in advance.
And after I'd implemented it, I'd run a test to make sure that my algorithms should never collide. If the collision rate was high, then I'd play with the parameters (amount of nouns used, amount of verbs used, length of random number, total number of words, different kinds of casings etc.)
In .NET you can use the RNGCryptoServiceProvider method GetBytes() which will "fill an array of bytes with a cryptographically strong sequence of random values" (from ms documentation).
byte[] randomBytes = new byte[4];
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(randomBytes);
You can increase the lengh of the byte array and pluck out the character values you want to allow.
In C#, I have used the 'System.IO.Path.GetRandomFileName() : String' method... but I was generating salt for debug file names. This method returns stuff that looks like your first example, except with a random '.xyz' file extension too.
If you're in .NET and just want a simpler (but not 'nicer' looking) solution, I would say this is it... you could remove the random file extension if you like.
At the time of this writing, this question's title is:
How can I generate a unique, small, random, and user-friendly key?
To that, I should note that it's not possible in general to create a random value that's also unique, at least if each random value is generated independently of any other. In addition, there are many things you should ask yourself if you want to generate unique identifiers (which come from my section on unique random identifiers):
Can the application easily check identifiers for uniqueness within the desired scope and range (e.g., check whether a file or database record with that identifier already exists)?
Can the application tolerate the risk of generating the same identifier for different resources?
Do identifiers have to be hard to guess, be simply "random-looking", or be neither?
Do identifiers have to be typed in or otherwise relayed by end users?
Is the resource an identifier identifies available to anyone who knows that identifier (even without being logged in or authorized in some way)?
Do identifiers have to be memorable?
In your case, you have several conflicting goals: You want identifiers that are—
unique,
easy to type by end users (including small), and
hard to guess (including random).
Important points you don't mention in the question include:
How will the key be used?
Are other users allowed to access the resource identified by the key, whenever they know the key? If not, then additional access control or a longer key length will be necessary.
Can your application tolerate the risk of duplicate keys? If so, then the keys can be completely randomly generated (such as by a cryptographic RNG). If not, then your goal will be harder to achieve, especially for keys intended for security purposes.
Note that I don't go into the issue of formatting a unique value into a "user-friendly key". There are many ways to do so, and they all come down to mapping unique values one-to-one with "user-friendly keys" — if the input value was unique, the "user-friendly key" will likewise be unique.
If by user friendly, you mean that a user could type the answer in then I think you would want to look in a different direction. I've seen and done implementations for initial random passwords that pick random words and numbers as an easier and less error prone string.
If though you're looking for a way to encode a random code in the URL string which is an issue I've dealt with for awhile then I what I have done is use 64-bit encoded GUIDs.
You could load your list of words as chakrit suggested into a data table or xml file with a unique sequential key. When getting your random word, use a random number generator to determine what words to fetch by their key. If you concatenate 2 of them, I don't think you need to include the numbers in the string unless "true randomness" is part of the goal.