How is tracked data in an ORM kept up to date? - entity-framework

How does something such as Entity Framework track changes to it's data when data changes could originate from other sources? For eg: When there is a cluster of the same asp net core app running and if one app updates a record but it's being tracked on a different instance and that instance receives a get request wouldn't it send out of date data?
Basically, how do ORMs preserve ACIDity if they perform local change tracking?

It helps to think of EF contexts and their local caching especially as short-lived. When you read an entity, that entity's "lifespan" should be thought of as matching the lifespan of the DbContext that originated it. Beyond that lifespan, the object is effectively just assumed to be like any other potentially stale copy of the data. Even within that lifespan it does not synchronize with the underlying data source, so the point of truth is when SaveChanges is called. The caching EF provides is more around the scenario of: "I'm going to load some entities, and those entities reference other entities. As the code iterates over the entities, when it comes across a reference to something else, EF will check to see if that something else has already been loaded and serve it before going to the DB." So in that sense, a long-lived DbContext is a bad thing because some of that cached data could be quite old and stale, and as the DbContext loads more data sifting through these tracked entities gets slower and the context consumes more memory.
In web applications, the DbContext is scoped typically to a single request, or shorter than that. (Unit of Work) This means that edits on concurrently handled requests aren't notified of each other's changes, and neither request sees changes made by other sources between the time those request contexts loaded their data and prepared to save. EF can be made aware of what to check for concurrent changes, normally a row version timestamp, and can block an update where this check fails. Beyond that, it is the developer that has to determine what action to take. This often means catching a concurrency fault and then handing off to an appropriate handler to log the details and notify the user. This could be a First-in-wins scenario where the user is notified that their changes failed and to try again; (with the refreshed data provided) A Last-in-wins scenario where the user is prompted that there have been changes but can overwrite; (and hopefully logged the event in case there are disputes/questions) Or a Merge where the system inspects the changes and provides details of any conflicts and changes for the user to review and adjust/accept/or cancel their update.
EF can help detect this, but ultimately the developer has to code for what to do about it.
In terms of detecting concurrent edits as they happen, that requires deliberate coding to do things like communicating changes between sessions (publish/subscribe) where each session listens for updates to entities it's actively working on, and broadcasting changes to entities as it updates them. To detect possible other changes to data by other sources means another process to listen for DB updates (beyond changes it already knows about made by the system) and broadcasting those change notifications to any active sessions. Certainly a very cool thing to see working in action, but the cost & complexity that it introduces has to be justified beyond just handling concurrency issues on save. :)

Related

Client/Server state synchronization for desktop application

I am working on a desktop application that requires synchronization between several clients. Basically, a group of people (let's say between 2 and 10) all run the same application. One of them hosts a server and the other clients connect to that server. The client that hosts the server also connects to his own server.
The applications should stay synchronized between all clients, meaning all clients see the same data in the application. Specifically, the data in question I can define in two separate forms:
A simple property with a certain value (this value must stay synchronized)
A list of properties (the items in the list and their values must stay synchronized)
Simple examples of (1) could be: which item in a list does the client currently have selected, and what's the current location of the client's mouse pointer within the application window. These properties keep changing continuously but the number of these properties is constant and does not grow (e.g. defined during design time).
An example of (2) could be a list of chat messages. These lists will grow during runtime with no way to predict how many items there will be.
Here is an example code in C# for the state, client and chat messages:
public class State
{
// A single value shared between all clients
public int SimpleInteger {get;set;}
// List of connected clients and their individual states
public List<Client> Clients {get;set;}
// List of chat messages
public List<ChatMessage> Messages {get;set;}
}
public class Client
{
public string ClientId {get;set;}
public string Username {get;set;}
public ClientState ClientState {get;set;}
}
public class ClientState
{
public string ClientId {get;set;}
public int SelectedIndex {get;set;}
public int MouseX {get;set;}
public int MouseY {get;set;}
}
public class ChatMessage
{
public string ClientId {get;set;}
public string Message {get;set;}
}
I've been working on this on and off for a long time but whatever kind of state synchronization I came up with, it never worked well.
When I search for solutions, I only ever find solutions for games, but those are not very helpful because my requirements are different:
I cannot deal with "dropped updates", I cannot predict (interpolate or extrapolate) what the other clients are doing. Every client needs to receive every update to stay in sync.
On the other hand, I don't care about lag (within reason). It is fine if I see the updates of other client with about a second delay.
When a new client connects (or reconnects), a large portion of the state must be transfered (for example: the list of chat messages from example 2). Each client is required to know about the entire history of the chat so this must be downloaded when a client connects.
My current solution can be summarized as follows:
The server keeps track of the state, e.g. the source of truth.
The state contains the properties that require synchronizing.
The state also contains a list of connected users (and their usernames etc).
Clients also each keep a local copy of the state, which they can act upon immediately. For example, they update their mouse position in their local state continously.
Whenever a client updates his local state, this update is sent to the server.
Potential exceptions here are things that change too fast such as the mouse position, those I will only send in regular intervals.
The server also updates the common "source of truth" state.
Finally, the server updates all other clients with the new updated state.
The last two steps are where I'm struggling. I can think of two methods to synchronize the state, one is easy but probably not efficient and the other is efficient but prone to errors.
The server simply sends the entire state to all clients.
As soon as the server receives an update from the client, the update is applied to the state and the new state is broadcasted. Every other client replaces their local state.
I feel this will probably work, but the state can grow in size quickly due to the "list" items (for example chat messages). In my previous attempts, this quickly became a problem and sending the state back become much too slow.
The server re-sends the same update (that it received) to all other clients.
Each client then only applies the new update to their state locally to sync back with the server.
This is probably much more efficient and sending the entire state is only necessary when a client connects.
However, in the past I frequently ran into desync issues where clients were no longer in sync. I don't really know what caused it, probably conflicts between messages (for example server telling the client to update a value in the state, but the client just updated his local value, which has precedence?). After this happens, everything went completely wrong as the updates are now being applied to two different states and have different outcomes.
I'm looking for some guidance on general concepts on how to achieve this. I'm using several messaging libraries to achieve the actual communication between client and server and that part is not an issue I think. I can make sure in these libraries that every message is received for example (though I'm not sure if the order is guaranteed). Like I said before, lag is not an issue, but I must guarantee every state update is received both by the server and by every other client.
Any help would be great! Thanks.
This is a hard problem and there are enough tricky areas that I wouldn't want to build this myself. Authentication, conflicting updates, API management, network outages, single point of failure, and local persistence come to mind.
If you're up for using a cloud-based solution, Google Cloud Firestore takes care of those tricky areas and does what you need:
Clients save data to the database, by creating, updating, or deleting records. Example code.
Whenever a record is created, updated, or deleted, all clients get realtime notifications. Example code.
(After you follow the links above, make sure you click C# above the code boxes to see the C# code).
This is a complicated issue, with many moving parts, as you seem to understand. As I've been researching this, I've read a couple comments on questions like this one on a variety of Q&A sites, stating this kind of thing is a project all on it's own.
Disclaimer: I haven't done this myself, so I don't know how well this would work, but maybe you can take my suggestions and work with them, if you haven't already done so. I've worked on projects where this was implemented, but I wasn't part of that implementation directly.
Connection
Since you haven't said which library you are using for the connection, I'm going to assume you are using websockets or something similar. If not, I suggest you move to something like websockets. It allows for a (near) constant connection between client and server so that data can be pushed both directions, avoiding the client from having to poll and pull the data. The link below seems to have a decent walk-though on how to do it, so I won't try to. Because links die, here's the first example code they give, which seems pretty simple.
​using System.Net.Sockets;
using System.Net;
using System;
class Server {
public static void Main() {
TcpListener server = new TcpListener(IPAddress.Parse("127.0.0.1"), 80);
server.Start();
Console.WriteLine("Server has started on 127.0.0.1:80.{0}Waiting for a connection...", Environment.NewLine);
TcpClient client = server.AcceptTcpClient();
Console.WriteLine("A client connected.");
}
}
https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_WebSocket_server
Client start up
Once you have a stable connection between server and client, you need to make sure the data is in sync. When the user starts the app, you can get the timestamp of the latest change in each table and compare that to the server. If they are exactly the same, you have a somewhat reasonable expectation that the table hasn't changed. I'm assuming each table has a column containing the timestamp for the last edit made to the row.
For the tables that have changed, you can have the server send the new and updated rows to the client based on the client's "last changed timestamp".
Since the internet isn't 100% guaranteed to be connected, you will also need to keep track of the times the client has been connected vs. when they've been on the app (unless the app just won't work without being connected to the server). This information also needs to be sent to the server to compare to data changed during intervals where the client hasn't been connected.
Once timestamp matching has been done, you need to compare the row counts. If they match, you can more reasonably assume the tables are the same. If they aren't, you can see about matching ID/primary keys. There's a variety of different ways to do this, including 1:1 matching (which is slowest but most reliable), or you can do some math with the IDs (assuming numerical IDs) and try to see what's different in batches of 100 rows (for example). Idea: If adding the sorted, auto-increment integer IDs for the first 100 rows is the same on the client and the server, all those rows exist on both servers, but if it doesn't match, you can try the 1:1 match to see what's missing. Because this can be lengthy for large databases, you may want to track this type of sync in another table, so it doesn't need to be done all the time.
Instead, you may want a table to track all the data not sent to a client. This would require a confirmation that the data sent was correctly inserted into the client DB. This could also work on the client side to track what hasn't been sent to the server. Of course, this kind of thing can get cumbersome quickly, even if you're just tracking keys, table names, and timestamps. You can rack up millions of rows quickly, if you don't remove old data periodically. This is why I suggest tracking unsent data, so that anything that becomes "sent" is no longer tracked by this table and removed.
If you don't want to code and manage all that, you can try for a library that does it. There are a variety out there. Even Microsoft has one, but it's on extended support to only 1/1/2021. What happens after that, I doubt even Microsoft knows, but it gets you 1.25 years to come up with a different solution.
Creating Synchronization Providers With The Sync Framework
The Sync Framework can be used to build apps that synchronize data from any data store using any protocol over a network. We'll show you how it works and get you started building a custom sync provider.
https://learn.microsoft.com/en-us/previous-versions/sql/synchronization/mt490616(v=msdn.10)
https://support.microsoft.com/en-us/lifecycle/search?alpha=Microsoft%20Sync%20Framework%202.1
Normal runtime
Once you have your data synced on startup (or in the background after startup), you can simply send the data to the server normally, as in when the user makes changes. Since you'll have a websocket type connection, any changes the server gets from other clients will be able to be pushed to all the other clients.
As far as changing the data in real time in your app, you may have to be constantly polling your local/client DB for timestamp changes so the UI can be appropriately updated. There may be something within C# that does this for you or another library you can find.
Conclusion
At this point, I'm out of ideas. It seems reasonable to me this would work, even though it's a lot of work. Hopefully you can take what I have and use it as a foundation to your own ideas on how to accomplish your task. It seems there's a lot of work ahead of you, so good luck!
Footnote
As I'm currently the only answer after several days of it being unanswered, I'm going to assume no one else has anything better to suggest. If they do, I'd encourage them to make their own answer instead of complaining about mine. People tweaking this answer is expected, but please remember community standards when making comments.
I'm only answering this because I haven't seen anyone else do it on this or other sites. It's only been bits and disconnected pieces here & there, with people still not being able to make sense of it as a whole.
This and similar questions have been asked before on this site and closed as "too broad". If you feel this same way as a reader, please vote so on the Question not this answer.
There are several solutions to your problem.
You could use a BizTalk server out-of-the box. This may not be what you have in mind.
If you want something more home-brewed, you could use WCF (Windows Communication Foundation) with MSMQ (Microsoft Message Queue). This would give you guaranteed message delivery, and durable messages (if you want). You would not have to worry about lost connections, and other errors occurring during messages transmission.
You can go down another level and use direct TCP and UDP protocols to transmit messages. But now, you have to take care of more error cases.
Any SQL DBMS implements one important part of your problem statement: it maintains shared state. Consider what ACID promises:
Consistency. At any one instant, all clients reading from the database are guaranteed to see the same information.
Atomicity. The client updating the database can use as many steps as needed. When the transaction is committed, the data are changed entirely or not at all.
Isolation. The server gives each client the illusion of interacting with it alone. It handles concurrent updates, and updates the database as though the updates arrived serially.
You may not care about durability for this application.
The mediation among the clients is, for my money, the most useful feature of the DBMS for your application. That will save you work, and headaches. Another, non-obvious, benefit is that it can enforce consistency rules for the state information; that can be remarkably useful to prevent an obsolete/corrupt client from munging the shared state.
The second part of your problem statement is notifying 2-10 clients of changed state. There are any number of ways to do that.
Some DBMSs can access OS services from triggers. You could have an update trigger issue a notification. Alternatively, the updating client could do that.
The actual notification mechanism could be quite simple. Clients could connect to a server (that you write) and block on read(2). The server itself listens on a port for update notifications. On receipt of one, it repeats it to all connected clients. When the client's read request returns, it's time to query the database for the updated state, and post a new read.
To prevent a kind of "thundering herd" problem when several updates arrive back-to-back, when a client reads the update message, it could keep reading updates until EWOULDBLOCK, and only then query the DBMS. OTOH, if it's important to see the intermediate states (to see every update, not just the current state), the DBMS is perfectly capable of storing and providing all versions and distinguishing them with a timestamp or serial number.
If you don't want to use TCP sockets directly, you might prefer ZeroMQ.
In this design, each client has three connections: the DBMS, the read-notify socket, and (maybe) the server-notify socket. The server has N+1 connections, for N clients and one listening socket. You have no locks to implement, very little tracking of participation, no problems re-synchronizing, and short windows inconsistency among clients as each one acts on its notification.

JSF2 + EJB3 + CRUD with an user control to commit or rollback?

I'm developping a Java EE application based on JSF2 and Glassfish 3.1. The application has to provide CRUD operations on a database table. The database table data are shown using a primeface dataTable with incell editing behaviour.
I would like to let the user to
modify, add and remove elements in the table and
commit only when the user is sure of his changes (pressing a command button)
rollback when the user wants to discard his changes
The table has an entity and a stateless EJB takes care of interfaces with the entityManager to access the DataBase.
The problem is that everytime I remove a row in the table, the transaction gets committed without any control left to the user.
How can I implement this kind of user control over commits/rollbacks?
The problem is that everytime I remove a row in the table, the transaction gets committed without any control left to the user.
This sounds strange. If you're editing the data in the data table and posting back those edits (via AJAX or direct) AND the data originates from a stateless bean, then it can't be anything other than that you're working on detached entities.
Changes to those entities or the list that contains them, are not automatically reflected in the database. There is no concept of a transaction that gets automatically committed in that case. I strongly disagree with the answer given by Nayan above. Yes, user transactions let you control the commit, but that doesn't seem to be your problem at all.
Although you should maybe show some code, my guess is that you're simply calling a delete method of some sort on your EJB Service after every remove action from the user and then expecting the same transaction and persistence context to be still there. But in a stateless bean those are gone as soon as you exited the method that gave you the entities in the first place.
Your best strategy would be to let the user action operate only on the data that is cached by the #ViewScoped backing bean. Then if and only if the user confirms the update action you call your EJB Service with all the changed items in one go. If there is a parent entity that has a reference to a list with among others all those items you deleted, you only have to pass this parent entity and make sure cascade remove is set on the relation.
That said, there IS support for the pattern you seem to think you were already getting. This pattern involves using a #Stateful session bean and an extended persistence context. In that case the session bean's persistence context will cache all your changes until you associate it with a transaction again. If you do your delete actions in a non-transactional method of the session bean and implement your cancel method also as non-transactional with a #Remove and an entityManager.clear() and a save method as a transactional #Remove method (doesn't need to do anything in its body) then you'll get this effect.
Unless you have a firm grasp of EJB and transactions you'd best go with the first strategy though.
The user operations are being reflected in the database on frequent basis with activity leading to unnecessary database calls & increasing traffic. Instead you should commit only after the final confirmation from the user.
The changes should be made on the copy of the object or clone & let the user operate on it. When the user wants to save the changes, you can save the modified copy. If changes are to be discarded, then the initial state can be restored from original object.
For your current approach :
The problem is that everytime I remove a row in the table, the
transaction gets committed without any control left to the user.
Its because the default flush mode is FlushModeType.AUTO, you can change it to FlushModeType.COMMIT.
How can I implement this kind of user control over commits/rollbacks?
With bean managed transaction using UserTransaction interface, you can have control over transaction begin, commit, rollback etc.
Edit : According to JSR-317, a managed entity might get persisted into the database immediately or later afterwards is implementation specific.
If a transaction is active, a compliant implementation of this
specification is permitted to write to the database immediately (i.e.,
whenever a managed entity is updated, created, and/or removed),
however, the configuration of an implementation to require such
non-deferred database writes is outside the scope of this
specification.

Core Data Sync - Tracking Deleted Objects

I'm setting up a basic sync service for an iPad application I'm developing. The goal is to have data consistent throughout several instances of the iPad app, as well as having a read-only version of the data on the web, hence rolling a custom solution.
The current flow is this:
Each entity has a 'created', 'modified' and 'UUID' field which are automatically updated by Core Data
On sync, each entity with a created or modified date after the last sync date is serialised into JSON and sent to the server
The server persists any changes to a MySQL database using the client-generated UUIDs as PKs (if there's a conflict, it just uses the most recently modified entity as the 'true' version, nothing fancy there) and sends back any updated entities to the client
The client then merges these changes back into its Core Data DB
This all seems to be working fine. My problem is how to track deleted objects using this method? I'm guessing I can add a 'deleted' flag to each entity and set this whenever a client deletes something, I can then push that change to the server with the rest of the sync data. Once the sync is complete then the client can actually delete these entities. My questions are:
Can I override Core Data's delete methods to automatically set this flag?
Will this require keeping all deleted entities indefinitely on the server? We'll have no way of knowing when every client has synced and actually deleted each entity (I'm not currently tracking client instances)
Is there a better way of doing this?
How about you keep a delta history table with UUID and created/updated/deleted field, maybe with a revision number for each update? So you keep a small check list of changes since your last successful sync.
That way, if you delete an object you could add an entry in the delta history table with the deleted UUID and mark it deleted. Same with created and updated objects, you only need to check the delta table to see what items you the server needs to delete, update, create, etc. You could even store every revision on the server to support rolling back to a previous version in the future if you feel like it.
I think a revision number is better than relying on client's clock that could potentially be changed manually.
You could use NSManagedObjectContext's insertedObjects, updatedObjects, deletedObjects methods to create the delta objects before every save procedure :)
My 2 cents
Whether or not you have to keep deleted objects on the server or not totally depends on your needs. You will need a deleted flag locally to mark as deleted for the sync, maybe also on the server depending on your desire to roll back.
I have taken care of this problem a few ways before. Here is one possibility:
When a client deletes something, just mark it to be deleted locally and delete from the server during the sync (at which point you can purge from core data). When other clients request to access that data, send back an HTTP 404 because you dont have the object any more. At that point the client can delete the entity locally. Now if a client requests a list of things and this object has been deleted, it will just be missing from the list of things he gets back so you can detect that and delete it. I do that in a client by creating an array of object IDs when I get a response from the server and deleting any local objects that don't have those IDs.
We have a deleted field on the server, but just to have the ability to roll back in case something is deleted by accident.
Of course you could return deleted objects to the client so they know to delete but if you don't want to keep a copy on the server, you would have to make some assumption that the clients would all update within a time frame. Then you could garbage collect after that time frame has expired.
I don't really like that solution though. If your data is too heavy to ask for all the objects for a complete sync, you could use your current merge strategy for creating and updating, and then run a separate call to check for deleted items. That call could simply ask for all IDs that the client should have on the device. It could delete the ones that don't exist. OR it could send all IDs on the client and get back a list of IDs to delete.
I think you have to provide more details about the nature of the data if you want a more opinionated suggestion.
Regarding your second question: You can design this so that the server doesn't have to keep deleted records around, if you want to. Let each app know if a given piece of data (based on its UUID) is stored on the server (e.g. add an existsOnServer property or similar). This starts out false when a new item is created in the app, but is set to true once it has been synced to the server for the first time. That way, if the app tries to sync later, but the UUID is not found, you can differentiate the two cases: If existsOnServer is false, then then this item is newly created and should be synced to the server, but if it is true then it can be taken to mean that it was already on the sever before, but has now been deleted, so you can delete it in the app too.
I'd probably argue against this approach, since it seems more error prone to me (I imagine a database or connection error incorrectly being interpreted as a deletion) and keeping records around on your server would usually not be a big deal, but it is possible. The "delta-approach" suggested by dzeikei could be used at the same time, so an update to a record that does not exist on the server signifies that it was deleted, while an insert does not.
You may take a look at Cross-Platform Data Synchronization by Dan Grover if you haven't. It's a very well written paper regarding synchronization and iOS.
About your questions:
You can avoid deleting a file in Core Data and set a 'deleted flag': just update the file instead of deleting it. You could make your own 'deleting' method that actually would call and update the flag on the record.
Keep always a last_sync and a last_updated for each record on the server and on each client. This way you'll always know when someone did change something anywhere and if that change was synced or not against the 'truth database'.
Keeping track of deleted files is a hard thing to do, I guess the best way to do it is keeping track of the history of syncs for each table, but is a difficult task. The easiest way, using this 'truth-database' kind of configuration is to flag the files, so that way yes, you should keep the data on the server as well as on the client.
during synchronization of data between tow table some records or deleted when the table rows are same. and when the rows are different the correctly synchronized, i used this Code click here on image

Core Data with Web Services recommended pattern?

I am writing an app for iOS that uses data provided by a web service. I am using core data for local storage and persistence of the data, so that some core set of the data is available to the user if the web is not reachable.
In building this app, I've been reading lots of posts about core data. While there seems to be lots out there on the mechanics of doing this, I've seen less on the general principles/patterns for this.
I am wondering if there are some good references out there for a recommended interaction model.
For example, the user will be able to create new objects on the app. Lets say the user creates a new employee object, the user will typically create it, update it and then save it. I've seen recommendations that updates each of these steps to the server --> when the user creates it, when the user makes changes to the fields. And if the user cancels at the end, a delete is sent to the server. Another different recommendation for the same operation is to keep everything locally, and only send the complete update to the server when the user saves.
This example aside, I am curious if there are some general recommendations/patterns on how to handle CRUD operations and ensure they are sync'd between the webserver and coredata.
Thanks much.
I think the best approach in the case you mention is to store data only locally until the point the user commits the adding of the new record. Sending every field edit to the server is somewhat excessive.
A general idiom of iPhone apps is that there isn't such a thing as "Save". The user generally will expect things to be committed at some sensible point, but it isn't presented to the user as saving per se.
So, for example, imagine you have a UI that lets the user edit some sort of record that will be saved to local core data and also be sent to the server. At the point the user exits the UI for creating a new record, they will perhaps hit a button called "Done" (N.B. not usually called "Save"). At the point they hit "Done", you'll want to kick off a core data write and also start a push to the remote server. The server pus h won't necessarily hog the UI or make them wait till it completes -- it's nicer to allow them to continue using the app -- but it is happening. If the update push to server failed, you might want to signal it to the user or do something appropriate.
A good question to ask yourself when planning the granularity of writes to core data and/or a remote server is: what would happen if the app crashed out, or the phone ran out of power, at any particular spots in the app? How much loss of data could possibly occur? Good apps lower the risk of data loss and can re-launch in a very similar state to what they were previously in after being exited for whatever reason.
Be prepared to tear your hair out quite a bit. I've been working on this, and the problem is that the Core Data samples are quite simple. The minute you move to a complex model and you try to use the NSFetchedResultsController and its delegate, you bump into all sorts of problems with using multiple contexts.
I use one to populate data from your webservice in a background "block", and a second for the tableview to use - you'll most likely end up using a tableview for a master list and a detail view.
Brush up on using blocks in Cocoa if you want to keep your app responsive whilst receiving or sending data to/from a server.
You might want to read about 'transactions' - which is basically the grouping of multiple actions/changes as a single atomic action/change. This helps avoid partial saves that might result in inconsistent data on server.
Ultimately, this is a very big topic - especially if server data is shared across multiple clients. At the simplest, you would want to decide on basic policies. Does last save win? Is there some notion of remotely held locks on objects in server data store? How is conflict resolved, when two clients are, say, editing the same property of the same object?
With respect to how things are done on the iPhone, I would agree with occulus that "Done" provides a natural point for persisting changes to server (in a separate thread).

Core Data questions around typical usage

I have some basic questions about core data (which I am new to) and I would like some points of view on current standards and implementations.
Basically I have an app on the iPhone (supporting iOS 3.0 and above) which gets a lot of data from web calls over HTTP, Im looking at moving the results into local storage for fast retrieval for the next time the user might load the same data again (the data doesnt change, which is why I can rely on the cached version be accurate).
I just wanted to know a few things first:
Do people these days treat the managed objects that extend NSManagedObject as domain objects, or do you create seperate classes strictly for storage and create helper methods to create them into domain objects? I sometimes find keep all persistence logic out of the domain to be a good thing.
What about clean up? How does one typically delete all the data when the app closes, or perhaps, expire data in the local storage? I certainly dont want to hold the data on the users phone at all times.
Is there any type of atomicity with Core Data? My implementation will first check for data locally before hitting the web services, I would like to make sure that there is never half a dataset being committed to the local storage and get funny results.
I would like to run a fair few background threads to fetch data in the background, are there any things I would need to consider when persisting objects on a background thread?
In relation to the above question, what is the best way to create a "background fetching" loop? In the app delegate? Per view, depending on the view? etc...?
I hope these are not too basic :)
Thanks for any help you can give.
Do people these days treat the managed
objects that extend NSManagedObject as
domain objects, or do you create
seperate classes strictly for storage
and create helper methods to create
them into domain objects? I sometimes
find keep all persistence logic out of
the domain to be a good thing.
If you create totally independent domain objects, you have the cost of keeping them in sync with your Core Data model, and keeping the translation between core data and these objects working - plus you have duplicate objects in memory, depending on how many objects you have this might be a concern.
However the benefit side of using separate domain objects is that you are no longer wedded to a managed object context. One case where something like that can hurt you is if you maintain references to managed objects and then some background operation causes the main managed object context to remove objects - now if you access any property in the deleted managed object, you trigger a fault exception (even if you have explicitly had the object loaded with no faulted data).
One thing I have tried with moderate success is occasional very lightweight separate data objects for specific uses - what I did was to define a protocol to represent the data object accessors, with the same names as the core data accessors. Then I had both the core data object and a custom standalone data object implement this protocol, and had a mechanism to automatically copy properties from one to the other. So I didn't do every object as custom, and could treat objects either coming from the local store or standalone interchangeably.
I don't have a clear preference on this one yet but lean to using the managed objects directly, because of the lack of duplication. You can mitigate bad side effects by listening for changes or using the core data controller class.
One thing that helps to keep domain objects and data objects sort of the same yet not, is using mogenerator to generate data objects. It generates very nice object representations of the objects in your core data store, plus front-end objects you are meant to edit - adding custom accessors, or complex methods to. On changing the data store mogenerator regenerates the data object but leaves your custom code alone.
http://rentzsch.github.com/mogenerator/
What about clean up? How does one
typically delete all the data when the
app closes, or perhaps, expire data in
the local storage? I certainly dont
want to hold the data on the users
phone at all times.
The data is generally small enough that I just leave it there, with an expiration timestamp for use so that you know when the data is too old to use directly. There is a ton of value to keeping data around since users close and reopen applications so frequently, and with data already there you can present results instantly while still fetching content updates.
Is there any type of atomicity with
Core Data? My implementation will
first check for data locally before
hitting the web services, I would like
to make sure that there is never half
a dataset being committed to the local
storage and get funny results.
The atomicity comes in that you perform operations in a context and then tell the context to save. So true atomicity means avoiding other components issuing a save before you are ready, which generally means doing something in its own context and merging back into a master context.
I would like to run a fair few
background threads to fetch data in
the background, are there any things I
would need to consider when persisting
objects on a background thread?
Every background thread needs its own context, you should listen for the save notification and merge into the master context at that time.
You should strive mightily to avoid duplicate requests that might be saving to the same objects nearly at the same time, this can sometimes cause core data errors on merge. Related to that - set a merge policy on your main context as the default policy is to throw an exception.
That also means that in doing modeling, go for as many separate objects as you possibly can rather than one large object that aggregates data from a lot of different sources.
For more information on saving and merging into other contexts, see this question:
CoreData and mergeChangesFromContextDidSaveNotification
In relation to the above question,
what is the best way to create a
"background fetching" loop? In the app
delegate? Per view, depending on the
view? etc...?
I like to do this from a separate singleton class (after all, the AppDelegate itself is a singleton...), that I can ask for the main managed object context in addition to a context specific to a thread.
This is also useful in that when starting a new Core Data project, you don't have to use the Core Data template and can just re-use this core data manager.