I am about to work on an app which handles extremely valuable data. Any loss of this data for the user would be very costly, so I'm interested in finding out more about the best architecture design for our needs.
The user will be inputting this data in their iPhone each day. The alternative to using this app is carrying around a piece of paper with this sensitive information on it. So while I know we can be more secure than a piece of paper, I want to make sure we also cover the user stories like "I flushed my phone down the toilet" or "my son deleted the app, where's my data?"
A service like Dropbox comes to mind, but I wouldn't want to require our users to have a Dropbox account; the syncing architecture must be transparent to the user. iCloud is out because web and Android versions may follow.
Can anyone suggest either some good reading on this subject, or some good frameworks to look at? I expect to use a node.js backend, and while we are targeting iPhone first, Android will follow.
The data itself consists of 2 tables, each with a small number of fields, with a many to many relationship. A few new rows will be created by the user each day, but the data will be small and highly compressible.
Turns out this is an extremely difficult issue. In data assurance (this isnt yet a security type situation although could become one because of the assurance aspect) there is ALWAYS a time element. As a simple example what happens if your use has locally updated some piece of data. Just before you have the ability to fully push the data to some cloud service, etc... he / she dumps it in the toilet. Even if good signal was there for transmitting the data there is time in transferring and time necessary for the cloud server to respond saying the data got there properly.
Generally in data assurance, you really have to work to the best you can. You will NEVER be able to solve all issues as there is no data center, nor link to a data center, etc... that is perfect. There is always a chance of data loss. Truly the best you can do, is SYNC as fast as data changes, and if there is loss of connection, as soon as the connection becomes alive again.
Now, for security. Security by itself does not create assurance. If the data itself is something that the customer does not want to lose, and that is his only requirement, then security is un-necessary. If he / she is also worried about other getting their hands on his data, then you have to be worried about data-in-transit (both up and down during syncing), and on the device itself. For the best potential security, encrypt the data locally on the device prior to pushing over the cloud. There are many known attacks that even if using SSL or other services, can get at the data. If you wish, locally encrypt a file, then you could for SOME added security still use SSL (at this point you will have doubly encrypted the data). You also want to sign the data so that there is little chance of it being manipulated in transit, or by the cloud server itself (if a hacker hacked the cloud server). Generally the way to protect the data while on device, you may choose to have the user input a password, and put some fairly strict rules around how passwords are formed, and how many tries you allow before you disallow attempts for 30 minutes or so.
You may also wish to store the data locally in an encrypted form. This way if someone gets the device, they still will need to have the password before they can get the data (unless of course they can crack the algorithm you use to generate the symetric key from the password).
In terms of online data service, you could use iCloud, etc... I am actually NOT a fan of anything cloud. I think it is SO counter enterprise / proprietary data, it isnt even funny. I think it actually almost laughable that so many of these phone / device manufacturers are going SOOOOO cloud based. I think they are abandoning the big companies, as NO big company I know of wants to place their proprietary data on a cloud server that THEY DONT CONTROL. In any case, I would argue that so long as you have a good local encryption scheme prior to sending out the data, then you should be OK. I would from an assurance perspective however look at where the servers are in locale. the reason being that if assurance of data is of prime concern, most larger IT setups like to have replicated data centers on opposing sides of the country / world etc... The reason for this is if an earthquake takes down the data center on one side of the country, it most likely will NOT take down the one on the other side of the country simultaneously. If the data centers for iCloud or whatever you can find are essentially in one locale, then you may consider syncing with one data center on the west coast, and choose a completely differing data center (in this case company) to sync with that is centered on the east coast.
This is all very high level, how you would implement this on an iPhone specifically we could also talk about, byt I hope this at least begins to help pave a path.
I'm planning on developing a multiplayer strategy game for the iOS platform. However, being a strategy game with multiple "units", there will likely be imbalances in gameplay, that will need to be addressed with constant mechanic-tweaking (upgrading / nerfing certain units).
The easiest way to accomplish this would simply be to change the mechanics within the app itself, and constantly submit updates to Apple. However, updates take time to propagate through Apple's review process (so the changes wouldn't be instantaneous), and I would need to do checks to see if all the players in the game are running the same version of the game, force users to update to the latest version of the game, etc.
What I'm thinking of doing instead is something similar to what the game Uniwar does. Every time the game is launched, it appears to check if there are any gameplay tweaks available, and if there are, it downloads the updates (and shows an update message to the user detailing what has changed).
However, as a relatively new programmer, I don't really know what would be the easiest way of accomplishing this. Would I host a text file online containing the unit statistics, and get the game to check that file for changes? Or is there some better, more efficient way? And if I were to do it this way, how would I do it?
First, ensure that your rules are some sort of resource you can easily change (be it binary or text-based). The most convenient way of updating these would be to periodically poll a server, most conveniently using the HTTP protocol, fetching updates as needed. The way I see it, there are two ways of doing this.
The first method uses the excellent caching abilities of the HTTP protocol, and as such requires a server (and client library) that understands these. The basic idea would be to have a copy of the latest version of the game mechanics published on a server (say, to http://example.org/mechanics.gz, and then have the client issue conditional GET requests with the If-modified-since header set to the time the last update check was performed. The HTTP protcol will den effectively do the rest for you, issuing a 304 Not modified if there is no update, and sending the new mechanics if there is one. This method has the disadvantage that the whole mechanics file has to be downloaded on every update (no diffs can be used), and that old versions won't be available, but it has an appealing simplicity.
The second method would consist of having a list of updates (well, their URI, ID and release date), say http://example.org/updates.xml, which the client pulls on every poll. The client then checks if there are any updates it doesn't have, and downloads and applies these in chronological order. Using this method, old updates can be made available (and will have permanent links), and diffs may be used. This is useful if history is important or if game mechanic files are large.
The format of the file doesn't really matter -- use whatever works for you. The key to being able to tune the gameplay, I think, is to turn the rules of the game into objects that you can configure. If the rules are all objects, you can do things like changing the order in which they're applied, the weight given to each one, the conditions under which a rule becomes effective, etc. You might have an object that's responsible for managing and properly applying the rules, basically a "rule model." Once you have that, all you need to do is to implement NSCoding in your rule and rule model classes and you can easily write and read rule configurations.
I'm wondering how you'd implement the following use-case in REST. Is it even possible to do without compromising the conceptual model?
Read or update multiple resources within the scope of a single transaction. For example, transfer $100 from Bob's bank account into John's account.
As far as I can tell, the only way to implement this is by cheating. You could POST to the resource associated with either John or Bob and carry out the entire operation using a single transaction. As far as I'm concerned this breaks the REST architecture because you're essentially tunneling an RPC call through POST instead of really operating on individual resources.
Consider a RESTful shopping basket scenario. The shopping basket is conceptually your transaction wrapper. In the same way that you can add multiple items to a shopping basket and then submit that basket to process the order, you can add Bob's account entry to the transaction wrapper and then Bill's account entry to the wrapper. When all the pieces are in place then you can POST/PUT the transaction wrapper with all the component pieces.
There are a few important cases that aren't answered by this question, which I think is too bad, because it has a high ranking on Google for the search terms :-)
Specifically, a nice propertly would be: If you POST twice (because some cache hiccupped in the intermediate) you should not transfer the amount twice.
To get to this, you create a transaction as an object. This could contain all the data you know already, and put the transaction in a pending state.
POST /transfer/txn
{"source":"john's account", "destination":"bob's account", "amount":10}
{"id":"/transfer/txn/12345", "state":"pending", "source":...}
Once you have this transaction, you can commit it, something like:
PUT /transfer/txn/12345
{"id":"/transfer/txn/12345", "state":"committed", ...}
{"id":"/transfer/txn/12345", "state":"committed", ...}
Note that multiple puts don't matter at this point; even a GET on the txn would return the current state. Specifically, the second PUT would detect that the first was already in the appropriate state, and just return it -- or, if you try to put it into the "rolledback" state after it's already in "committed" state, you would get an error, and the actual committed transaction back.
As long as you talk to a single database, or a database with an integrated transaction monitor, this mechanism will actually work just fine. You might additionally introduce time-outs for transactions, which you could even express using Expires headers if you wanted to.
In REST terms, resources are nouns that can be acted on with CRUD (create/read/update/delete) verbs. Since there is no "transfer money" verb, we need to define a "transaction" resource that can be acted upon with CRUD. Here's an example in HTTP+POX. First step is to CREATE (HTTP POST method) a new empty transaction:
POST /transaction
This returns a transaction ID, e.g. "1234" and according URL "/transaction/1234". Note that firing this POST multiple times will not create the same transaction with multiple IDs and also avoids introduction of a "pending" state. Also, POST can't always be idempotent (a REST requirement), so it's generally good practice to minimize data in POSTs.
You could leave the generation of a transaction ID up to the client. In this case, you would POST /transaction/1234 to create transaction "1234" and the server would return an error if it already existed. In the error response, the server could return a currently unused ID with an appropriate URL. It's not a good idea to query the server for a new ID with a GET method, since GET should never alter server state, and creating/reserving a new ID would alter server state.
Next up, we UPDATE (PUT HTTP method) the transaction with all data, implicitly committing it:
PUT /transaction/1234
<transaction>
<from>/account/john</from>
<to>/account/bob</to>
<amount>100</amount>
</transaction>
If a transaction with ID "1234" has been PUT before, the server gives an error response, otherwise an OK response and a URL to view the completed transaction.
NB: in /account/john , "john" should really be John's unique account number.
Great question, REST is mostly explained with database-like examples, where something is stored, updated, retrieved, deleted. There are few examples like this one, where the server is supposed to process the data in some way. I don't think Roy Fielding included any in his thesis, which was based on http after all.
But he does talk about "representational state transfer" as a state machine, with links moving to the next state. In this way, the documents (the representations) keep track of the client state, instead of the server having to do it. In this way, there is no client state, only state in terms of which link you are on.
I've been thinking about this, and it seems to me reasonable that to get the server to process something for you, when you upload, the server would automatically create related resources, and give you the links to them (in fact, it wouldn't need to automatically create them: it could just tell you the links, and it only create them when and if you follow them - lazy creation). And to also give you links to create new related resources - a related resource has the same URI but is longer (adds a suffix). For example:
You upload (POST) the representation of the concept of a transaction with all the information. This looks just like a RPC call, but it's really creating the "proposed transaction resource". e.g URI: /transaction
Glitches will cause multiple such resources to be created, each with a different URI.
The server's response states the created resource's URI, its representation - this includes the link (URI) to create the related resource of a new "committed transaction resource". Other related resources are the link to delete the proposed transaction. These are states in the state-machine, which the client can follow. Logically, these are part of the resource that has been created on the server, beyond the information the client supplied. e.g URIs: /transaction/1234/proposed, /transaction/1234/committed
You POST to the link to create the "committed transaction resource", which creates that resource, changing the state of the server (the balances of the two accounts)**. By its nature, this resource can only be created once, and can't be updated. Therefore, glitches committing many transactions can't occur.
You can GET those two resources, to see what their state is. Assuming that a POST can change other resources, the proposal would now be flagged as "committed" (or perhaps, not available at all).
This is similar to how webpages operate, with the final webpage saying "are you sure you want to do this?" That final webpage is itself a representation of the state of the transaction, which includes a link to go to the next state. Not just financial transactions; also (eg) preview then commit on wikipedia. I guess the distinction in REST is that each stage in the sequence of states has an explicit name (its URI).
In real-life transactions/sales, there are often different physical documents for different stages of a transaction (proposal, purchase order, receipt etc). Even more for buying a house, with settlement etc.
OTOH This feels like playing with semantics to me; I'm uncomfortable with the nominalization of converting verbs into nouns to make it RESTful, "because it uses nouns (URIs) instead of verbs (RPC calls)". i.e. the noun "committed transaction resource" instead of the verb "commit this transaction". I guess one advantage of nominalization is you can refer to the resource by name, instead of needing to specify it in some other way (such as maintaining session state, so you know what "this" transaction is...)
But the important question is: What are the benefits of this approach? i.e. In what way is this REST-style better than RPC-style? Is a technique that's great for webpages also helpful for processing information, beyond store/retrieve/update/delete? I think that the key benefit of REST is scalability; one aspect of that is not needing to maintain client state explicitly (but making it implicit in the URI of the resource, and the next states as links in its representation). In that sense it helps. Perhaps this helps in layering/pipelining too? OTOH only the one user will look at their specific transaction, so there's no advantage in caching it so others can read it, the big win for http.
I've drifted away from this topic for 10 years. Coming back, I can't believe the religion masquerading as science that you wade into when you google rest+reliable. The confusion is mythic.
I would divide this broad question into three:
Downstream services. Any web service you develop will have downstream services that you use, and whose transaction syntax you have no choice but to follow. You should try and hide all this from users of your service, and make sure all parts of your operation succeed or fail as a group, then return this result to your users.
Your services. Clients want unambiguous outcomes to web-service calls, and the usual REST pattern of making POST, PUT or DELETE requests directly on substantive resources strikes me as a poor, and easily improved, way of providing this certainty. If you care about reliability, you need to identify action requests. This id can be a guid created on the client, or a seed value from a relational DB on the server, it doesn't matter. For server generated ID's, use a 'preflight' request-response to exchange the id of the action. If this request fails or half succeeds, no problem, the client just repeats the request. Unused ids do no harm.This is important because it lets all subsequent requests be fully idempotent, in the sense that if they are repeated n times they return the same result and cause nothing further to happen. The server stores all responses against the action id, and if it sees the same request, it replays the same response. A fuller treatment of the pattern is in this google doc. The doc suggests an implementation that, I believe(!), broadly follows REST principals. Experts will surely tell me how it violates others. This pattern can be usefully employed for any unsafe call to your web-service, whether or not there are downstream transactions involved.
Integration of your service into "transactions" controlled by upstream services. In the context of web-services, full ACID transactions are considered as usually not worth the effort, but you can greatly help consumers of your service by providing cancel and/or confirm links in your confirmation response, and thus achieve transactions by compensation.
Your requirement is a fundamental one. Don't let people tell you your solution is not kosher. Judge their architectures in the light of how well, and how simply, they address your problem.
If you stand back to summarize the discussion here, it's pretty clear that REST is not appropriate for many APIs, particularly when the client-server interaction is inherently stateful, as it is with non-trivial transactions. Why jump through all the hoops suggested, for client and server both, in order to pedantically follow some principle that doesn't fit the problem? A better principle is to give the client the easiest, most natural, productive way to compose with the application.
In summary, if you're really doing a lot of transactions (types, not instances) in your application, you really shouldn't be creating a RESTful API.
You'd have to roll your own "transaction id" type of tx management. So it would be 4 calls:
http://service/transaction (some sort of tx request)
http://service/bankaccount/bob (give tx id)
http://service/bankaccount/john (give tx id)
http://service/transaction (request to commit)
You'd have to handle the storing of the actions in a DB (if load balanced) or in memory or such, then handling commit, rollback, timeout.
Not really a RESTful day in the park.
First of all transferring money is nothing that you can not do in a single resource call. The action you want to do is sending money. So you add a money transfer resource to the account of the sender.
POST: accounts/alice, new Transfer {target:"BOB", abmount:100, currency:"CHF"}.
Done. You do not need to know that this is a transaction that must be atomic etc. You just transfer money aka. send money from A to B.
But for the rare cases here a general solution:
If you want to do something very complex involving many resources in a defined context with a lot of restrictions that actually cross the what vs. why barrier (business vs. implementation knowledge) you need to transfer state. Since REST should be stateless you as a client need to transfer the state around.
If you transfer state you need to hide the information inside from the client. The client should not know internal information only needed by the implementation but does not carry information relevant in terms of business. If those information have no business value the state should be encrypted and a metaphor like token, pass or something need to be used.
This way one can pass internal state around and using encryption and signing the system can be still be secure and sound. Finding the right abstraction for the client why he passes around state information is something that is up to the design and architecture.
The real solution:
Remember REST is talking HTTP and HTTP comes with the concept of using cookies. Those cookies are often forgotten when people talk about REST API and workflows and interactions spanning multiple resources or requests.
Remember what is written in the Wikipedia about HTTP cookies:
Cookies were designed to be a reliable mechanism for websites to remember stateful information (such as items in a shopping cart) or to record the user's browsing activity (including clicking particular buttons, logging in, or recording which pages were visited by the user as far back as months or years ago).
So basically if you need to pass on state, use a cookie. It is designed for exactly the very same reason, it is HTTP and therefore it is compatible to REST by design :).
The better solution:
If you talk about a client performing a workflow involving multiple requests you usually talk about protocol. Every form of protocol comes with a set of preconditions for each potential step like perform step A before you can do B.
This is natural but exposing protocol to clients makes everything more complex. In order to avoid it just think what we do when we have to do complex interactions and things in the real world... . We use an Agent.
Using the Agent metaphor you can provide a resource that can perform all necessary steps for you and store the actual assignment / instructions it is acting upon in its list (so we can use POST on the agent or an 'agency').
A complex example:
Buying a house:
You need to prove your credibility (like providing your police record entries), you need to ensure financial details, you need to buy the actual house using a lawyer and a trusted third party storing the funds, verify that the house now belongs to you and add the buying stuff to your tax records etc. (just as an example, some steps may be wrong or whatever).
These steps might take several days to be completed, some can be done in parallel etc.
In order to do this, you just give the agent the task buy house like:
POST: agency.com/ { task: "buy house", target:"link:toHouse", credibilities:"IamMe"}.
Done. The agency sends you back a reference to you that you can use to see and track the status of this job and the rest is done automatically by the agents of the agency.
Think about a bug tracker for instance. Basically you report the bug and can use the bug id to check whats going on. You can even use a service to listen to changes of this resource. Mission Done.
You must not use server side transactions in REST.
One of the REST contraints:
Stateless
The client–server communication is further constrained by no client context being stored on the server between requests. Each request from any client contains all of the information necessary to service the request, and any session state is held in the client.
The only RESTful way is to create a transaction redo log and put it into the client state. With the requests the client sends the redo log and the server redoes the transaction and
rolls the transaction back but provides a new transaction redo log (one step further)
or finally complete the transaction.
But maybe it's simpler to use a server session based technology which supports server side transactions.
I think that in this case it is totally acceptable to break the pure theory of REST in this situation. In any case, I don't think there is anything actually in REST that says you can't touch dependent objects in business cases that require it.
I really think it's not worth the extra hoops you would jump through to create a custom transaction manager, when you could just leverage the database to do it.
In the simple case (without distributed resources), you could consider the transaction as a resource, where the act of creating it attains the end objective.
So, to transfer between <url-base>/account/a and <url-base>/account/b, you could post the following to <url-base>/transfer.
<transfer>
<from><url-base>/account/a</from>
<to><url-base>/account/b</to>
<amount>50</amount>
</transfer>
This would create a new transfer resource and return the new url of the transfer - for example <url-base>/transfer/256.
At the moment of successful post, then, the 'real' transaction is carried out on the server, and the amount removed from one account and added to another.
This, however, doesn't cover a distributed transaction (if, say 'a' is held at one bank behind one service, and 'b' is held at another bank behind another service) - other than to say "try to phrase all operations in ways that don't require distributed transactions".
I believe that would be the case of using a unique identifier generated on the client to ensure that the connection hiccup not imply in an duplicity saved by the API.
I think using a client generated GUID field along with the transfer object and ensuring that the same GUID was not reinserted again would be a simpler solution to the bank transfer matter.
Do not know about more complex scenarios, such as multiple airline ticket booking or micro architectures.
I found a paper about the subject, relating the experiences of dealing with the transaction atomicity in RESTful services.
I guess you could include the TAN in the URL/resource:
PUT /transaction to get the ID (e.g. "1")
[PUT, GET, POST, whatever] /1/account/bob
[PUT, GET, POST, whatever] /1/account/bill
DELETE /transaction with ID 1
Just an idea.