Mongo DB Collection Versioning - mongodb

Are there any best practices or ways we can use to version a collection or objects in a collection in Mongo DB?
The requirement of versioning a collection is because, the objects in the collection maybe added with new attributes going forward but the already added objects (i.e. old objects) will not be having these attributes and the values for these new attributes. So on retrieval, we need to make sure, the code is not broken in de-serializing the different versions of the same object in the collection.
I can think of adding a version attribute to the objects explicitly, but are there any better built in alternatives in Mongo DB for handling this versioning of objects and/or collections.
Thanks,
Bathiya

I guess the best approach would be to update all objects in a batch process when you start using the new software on the server, since otherwise you’ll never know when an object will be updated and you’ll need to keep the old versions of those forever around.
Another thing what I’m doing so far, and it worked (so far), to have the policy to only allow adding new properties to the object. This way in worst case the DB won’t have all the data, but that is fine with all the json-serializers I know. But this means you aren’t allowed to delete or rename properties as well as modifying their type (from scalar value to object, array; from object to scalar, array; …).
Since usually I want to store additional information instead of less, this seems like a good solution without any real limitation for me.
If you need to change the type because scalar value isn’t enough, you can still create some code around which would transform it for every object which has the old value into the new one. And if a bulk update from your side is able to perform the changes I’d still do it but sometimes it needs user input.
For instance if you used to save passwords only as md5-hash it was a scalar value. But someone told you, they should be stored as sha512 with a salt together, now you need an object field for the password, you could call it password_sha512 where you store the salt and the hashed password.

Related

Mapping to legacy MongoDB store

I'm attempting to write up a Yesod app as a replacement for a Ruby JSON service that uses MongoDB on the backend and I'm running into some snags.
the sql=foobar syntax in the models file does not seem too affect which collection Persistent.MongoDB uses. How can I change that?
is there a way to easily configure mongodb (preferably through the yaml file) to be explicitly read only? I'd take more comfort deploying this knowing that there was no possible way the app could overwrite or damage production data.
Is there any way I can get Persistent.MongoDB to ignore fields it doesn't know about? This service only needs a fraction of the fields in the collection in question. In order to keep the code as simple as possible, I'd really like to just map to the fields I care about and have Yesod ignore everything else. Instead it complains that the fields don't match.
How does one go about defining instances for models, such as ToJSON. I'd like to customize how that JSON gets rendered but I get the following error:
Handler/ProductStat.hs:8:10:
Illegal instance declaration for ToJSON Product'
(All instance types must be of the form (T t1 ... tn)
where T is not a synonym.
Use -XTypeSynonymInstances if you want to disable this.)
In the instance declaration forToJSON Product'
1) seems that sql= is not hooked up to mongo. Since sql is already doing this it shouldn't be difficult for Mongo.
2) you can change the function that runs the queries
in persistent/persistent-mongoDB/Database/Persist there is a runPool function of PersistConfig. That gets used in yesod-defaults. We should probably change the loadConfig function to check a readOnly setting
3) I am ok with changing the reorder function to allow for ignoring, although in the future (if MongoDB returns everything in ordeR) that may have performance implications, so ideally you would list the ignored columns.
4) This shouldn't require changes to Persistent. Did you try turning on TypeSynonymInstances ?
I have several other Yesod/Persistent priorities to attend to before these changes- please roll up your sleeves and let me know what help you need making them. I can change 2 & 3 myself fairly soon if you are committed to testing them.

How can I detect when a NSPersistentStore needs to be saved?

I have an iOS application where I use coreData to store my "documents". They all share a common NSManagedObjectContext, and I frequently save the context.
I would like to keep track of the last modification date for the various "documents" (where each one is a separate NSPersistentStore) and store the date on a particular unique "root" object that each store has.
I could try to keep the modification time stamp up to date while the document is being modified, but it would be cleaner and more robust if I could just find out which persistent stores need saving at the time I am saving the context.
I can't find any way to detect if a persistent store needs saving. I can query the NSManagedObjectContext to see which managed objects need saving, although I can't find an easy way to see which store an object belongs to.
It seems like this is not such a strange thing to do and core data has all of the information that I am looking for, but I am having trouble finding an easy way to get access to that data.
Does anyone know of an easy way?
If I can't find an easier way, I will simply loop over the deleted / modified / inserted objects from the context, and write special code for each entity type to determine the store that the object belongs to.
Thanks in advance for any help!
Ron
[[managedObject objectID] persistentStore] is the persistent store you're looking for (or possibly nil if the object has not been saved yet).
The documentation suggests that it's nil if you've assigned it to a store but haven't saved; I'm not sure that this is true (and I don't see anywhere else where this info might be saved). I'd check it behaviour on 3.x, 4.x, and 5.0 beta if you have access to it.

CoreData or Individual Files (iOS)

I've got a little conundrum: would it be better to use direct file management, or a CoreData SQLite database?
Here's my scenario:
I have a bunch of 'user' objects, each with a list of 'post' objects. This is easily done in CoreData, and would be great - however, the 'post' objects are downloaded from a web server, and they each have a unique identifier. I don't want to have multiple 'post' objects with the same ID. I could solve this by caching CoreData responses into an NSDictionary, however this would not apply well to the design pattern of an application. As far as I am aware, when adding a new 'post' to my CoreData NSManagedObjectContext, I would have to lookup the unique ID to check for its existence (fast), then add it if it does not exist (slow), and update the previous if it does (fast). This is effectively replacing it. How would you guys handle this?
I've been trying to think of alternatives for a few days now, but no matter which way I look at it, CoreData is going to be slower than my alternative:
A file architecture inside the Caches/ directory of an iOS application could solve the problem. Something like this:
Users/
{unique ID}.user
{unique ID}.user
Posts/
{unique ID}.post
{unique ID}.post
Then, when retrieving a post object or user object, I can check the files for the existence of the data, and cache the file contents in an NSDictionary. If the ID exists in the dictionary, retrieve it from there instead. Replacing previous 'user' and 'post' objects is as simple as overwriting the file and updating the cache.
My second alternative would clearly be faster - however, I would not be taking advantage of any efficiencies built into CoreData, and I would have to provide my own memory management scheme to clear my cached dictionaries when a memory warning occurs.
Is there is some way of 'uniquing' in CoreData? That would solve my problem. Something similar to using a primary key in an ordinary SQLite database.
I'll start doing tests to verify speeds of both methods, but I thought I'd post this up here before starting in case anyone has any better solutions.
This exact question comes up a lot.
You can check if a value exist in Core Data without reading in the entire object. Just set the fetch to fetch the specific property you want to test, the ID in this case, and then return the fetch as a dictionary. Provide a predicate that looks for one or more IDs and if the returned dictionary has values, you know you have existing objects.
It's very rare that you can end up with a custom system which is faster and more robust than Core Data. It's rarely worth even trying.
Remember as well that premature optimization is the root of all evil. All this work is predicated on the premise that the simplest Core Data implementation is to slow. Have you actually tested that it is to slow? If not, do so before you try more elaborate designs.
After testing, I've found that CoreData at least halves the amount of time taken. The test I was running was as follows: I added 1000 posts to an empty CoreData object graph; and then retrieved 100 of those objects for updating. The time taken to add the objects was 0.069s, and the time taken to retrieve the objects was 0.181s. I retrieved these values on a 3G iPad device. Using files, adding these objects took 10 times longer, and retrieving them took 4 times longer.
My recommendation: Stick to using CoreData!

How to keep track of objects deleted from an ObservableCollection in CRUD scenarios?

In our multi-tier business application we have ObservableCollections of Self-Tracking Entities that are returned from service calls.
The idea is we want to be able to get entities, add, update and remove them from the collection client side, and then send these changes to the server side, where they will be persisted to the database.
Self-Tracking Entities, as their name might suggest, track their state themselves.
When a new STE is created, it has the Added state, when you modify a property, it sets the Modified state, it can also have Deleted state but this state is not set when the entity is removed from an ObservableCollection (obviously). If you want this behavior you need to code it yourself.
In my current implementation, when an entity is removed from the ObservableCollection, I keep it in a shadow collection, so that when the ObservableCollection is sent back to the server, I can send the deleted items along, so Entity Framework knows to delete them.
Something along the lines of:
protected IDictionary<int, IList> DeletedCollections = new Dictionary<int, IList>();
protected void SubscribeDeletionHandler<TEntity>(ObservableCollection<TEntity> collection)
{
var deletedEntities = new List<TEntity>();
DeletedCollections[collection.GetHashCode()] = deletedEntities;
collection.CollectionChanged += (o, a) =>
{
if (a.OldItems != null)
{
deletedEntities.AddRange(a.OldItems.Cast<TEntity>());
}
};
}
Now if the user decides to save his changes to the server, I can get the list of removed items, and send them along:
ObservableCollection<Customer> customers = MyServiceProxy.GetCustomers();
customers.RemoveAt(0);
MyServiceProxy.UpdateCustomers(customers);
At this point the UpdateCustomers method will verify my shadow collection if any items were removed, and send them along to the server side.
This approach works fine, until you start to think about the life-cycle these shadow collections. Basically, when the ObservableCollection is garbage collected there is no way of knowing that we need to remove the shadow collection from our dictionary.
I came up with some complicated solution that basically does manual memory management in this case. I keep a WeakReference to the ObservableCollection and every few seconds I check to see if the reference is inactive, in which case I remove the shadow collection.
But this seems like a terrible solution... I hope the collective genius of StackOverflow can shed light on a better solution.
EDIT:
In the end I decided to go with subclassing the ObservableCollection. The service proxy code is generated so it was a relatively simple task to change it to return my derived type.
Thanks for all the help!
Instead of rolling your own "weak reference + poll Is it Dead, Is it Alive" logic, you could use the HttpRuntime.Cache (available from all project types, not just web projects).
Add each shadow collection to the Cache, either with a generous timeout, or a delegate that can check if the original collection is still alive (or both).
It isn't dreadfully different to your own solution, but it does use tried and trusted .Net components.
Other than that, you're looking at extending ObservableCollection and using that new class instead (which I'd imagine is no small change), or changing/wrapping UpdateCustomers method to remove the shadow collection form DeletedCollections
Sorry I can't think of anything else, but hope this helps.
BW
If replacing ObservableCollection is a possibility (e.g. if you are using a common factory for all the collections instances) then you could subclass ObservableCollection and add a Finalize method which cleans up the deleted items that belongs to this collection.
Another alternative is to change the way you compute which items are deleted. You could maintain the original collection, and give the client a shallow copy. When the collection comes back, you can compare the two to see what items are no longer present. If the collections are sorted, then the comparison can be done in linear time on the size of the collection. If they're not sorted, then the modified collection values can be put in a hash table and that used to lookup each value in the original collection. If the entities have a natural id, then using that as the key is a safe way of determining which items are not present in the returned collection, that is, have been deleted. This also runs in linear time.
Otherwise, your original solution doesn't sound that bad. In java, a WeakReference can register a callback that gets called when the reference is cleared. There is no similar feature in .NET, but using polling is a close approximation. I don't think this approach is so bad, and if it's working, then why change it?
As an aside, aren't you concerned about GetHashCode() returning the same value for distinct collections? Using a weak reference to the collection might be more appropriate as the key, then there is no chance of a collision.
I think you're on a good path, I'd consider refactoring in this situation. My experience is that in 99% of the cases the garbage collector makes memory managment awesome - almost no real work needed.
but in the 1% of the cases it takes someone to realize that they've got to up the ante and go "old school" by firming up their caching/memory management in those areas. hats off to you for realizing you're in that situation and for trying to avoid the IDispose/WeakReference tricks. I think you'll really help the next guy who works in your code.
As for getting a solution, I think you've got a good grip on the situation
-be clear when your objects need to be created
-be clear when your objects need to be destroyed
-be clear when your objects need to be pushed to the server
good luck! tell us how it goes :)

What is the most practical Solution to Data Management using SQLite on the iPhone?

I'm developing an iPhone application and am new to Objective-C as well as SQLite. That being said, I have been struggling w/ designing a practical data management solution that is worthy of existing. Any help would be greatly appreciated.
Here's the deal:
The majority of the data my application interacts with is stored in five tables in the local SQLite database. Each table has a corresponding Class which handles initialization, hydration, dehydration, deletion, etc. for each object/row in the corresponding table. Whenever the application loads, it populates five NSMutableArrays (one for each type of object). In addition to a Primary Key, each object instance always has an ID attribute available, regardless of hydration state. In most cases it is a UUID which I can then easily reference.
Before a few days ago, I would simply access the objects via these arrays by tracking down their UUID. I would then proceed to hydrate/dehydrate them as I needed. However, some of the objects I have also maintain their own arrays which reference other object's UUIDs. In the event that I must track down one of these "child" objects via it's UUID, it becomes a bit more difficult.
In order to avoid having to enumerate through one of the previously mentioned arrays to find a "parent" object's UUID, and then proceed to find the "child's" UUID, I added a DataController w/ a singleton instance to simplify the process.
I had hoped that the DataController could provide a single access point to the local database and make things easier, but I'm not so certain that is the case. Basically, what I did is create multiple NSMutableDicationaries. Whenever the DataController is initialized, it enumerates through each of the previously mentioned NSMutableArrays maintained in the Application Delegate and creates a key/value pair in the corresponding dictionary, using the given object as the value and it's UUID as the key.
The DataController then exposes procedures that allow a client to call in w/ a desired object's UUID to retrieve a reference to the actual object. Whenever their is a request for an object, the DataController automatically hydrates the object in question and then returns it. I did this because I wanted to take control of hydration out of the client's hands to prevent dehydrating an object being referenced multiple times.
I realize that in most cases I could just make a mutable copy of the object and then if necessary replace the original object down the road, but I wanted to avoid that scenario if at all possible. I therefore added an additional dictionary to monitor what objects are hydrated at any given time using the object's UUID as the key and a fluctuating count representing the number of hydrations w/out an offset dehydration. My goal w/ this approach was to have the DataController automatically dehydrate any object once it's "hydration retainment count" hit zero, but this could easily lead to significant memory leaks as it currently relies on the caller to later call a procedure that decreases the hydration retainment count of the object. There are obviously many cases when this is just not obvious or maybe not even easily accomplished, and if only one calling object fails to do so properly I encounter the exact opposite scenario I was trying to prevent in the first place. Ironic, huh?
Anyway, I'm thinking that if I proceed w/ this approach that it will just end badly. I'm tempted to go back to the original plan but doing so makes me want to cringe and I'm sure there is a more elegant solution floating around out there. As I said before, any advice would be greatly appreciated. Thanks in advance.
I'd also be aware (as I'm sure you are) that CoreData is just around the corner, and make sure you make the right choice for the future.
Have you considered implementing this via the NSCoder interface? Not sure that it wouldn't be more trouble than it's worth, but if what you want is to extract all the data out into an in-memory object graph, and save it back later, that might be appropriate. If you're actually using SQL queries to limit the amount of in-memory data, then obviously, this wouldn't be the way to do it.
I decided to go w/ Core Data after all.