How do I implement revisions with neo4j? - version-control

I have an object and I need to keep a history of all changes made to it. How would I implement this using neo4j?

As with a RDBMS, it would depend on your domain and data query requirements.
Does your application require regular access to all versions of the object or usually just to the most recent, with the older versions available via the current one? An example of this could be pages on Wikipedia. As as example, let's say we have a page which is on version 3. We could then model this as follows:
(pages)-[:PAGE]->(V3)-[:PREV]->(V2)-[:PREV]->(V1)
^ ^
| |
category current
node version of page
Here, only the current version can be seen to form part of the main structure but you may wish to allow all versions to form part of that structure. In this case, you could use relationship properties to indicate the version and have all page versions link from the category node:
(V1)
^
|
[:PAGE(v=1)]
|
(pages)-[:PAGE(v=2)]->(V2)
|
[:PAGE(v=3)]
|
v
(V3)
Here, you can immediately traverse to a particular version of the page by simply specifying the version in which you are interested.
A third option could be that you wish all older versions to be completely separate from the main structure. For this you could use multiple category nodes, one for (current_pages) and another for (old_pages). As each page is superseded by a new version, it becomes unlinked from the former category and instead linked to the latter. This would form more of an "archive" type of system where the older versions could even be moved into a separate database instance.
So you have these three options, plus more that I haven't thought of! Neo4j allows you great flexibility with this sort of design and there's absolutely no "right" answer. If none of these inspire you however, post a little more information about your domain so that the answer can be more tailored for your needs.
Cheers,
Nige

You could also approach it from the other side:
(pages)-[:VERSION]->(V1)-[:VERSION]->(V2)-[:VERSION]->(V3)
^ ^
| |
category current
node version of page
advantage : when you create a new version, you just add it at the end of the chain, no need to "insert" it between the (page) and the current version.
disadvantage :you can't just throw away old versions, unless you reconstruct the chain. But this is probably not a frequent operation.

Related

Graphql Schema update rollback

We are moving some of our API's to graphql and would like to know to handle the rollback of the deployed package (Schema)and the best practice to the same.
To be more specific let's say we have a Schema S with 3 fields and then we added 4th field "A" . Now for some reason we cannot go forward with this package and field "A". So we have to perform roll back of the package so that now the Schema doesn't have field "A".
Now some consumer might ask for this field "A" and he might get an error. We could of course ask our clients to update but there is a time gap during which we might have failed request.
How do we handle this scenario,specifically an urgent rollback with in few hours or a day?
In general, you should avoid removing fields without warning to avoid the exact scenario you describe.
As your schema evolves, it's not uncommon to have some fields that are no longer needed. For example, rather than introducing a drastic change to a particular field (moving from a nullable to a non-nullable return type, adding required arguments, etc.), we may opt to add another field and encourage clients to transition to using that one instead. In such scenarios, we want to eventually remove the original field. The safest way to do so is to deprecate the field first. Using SDL, we can do so using a directive:
fieldA: String #deprecated(reason: "Use fieldB instead!")
After a certain amount of time, you can then remove the field entirely. How long you wait to remove the field depends on your team and the expectations you've communicated around handling deprecated fields. For example, you may find it helpful to set a deadline, by which point all clients are expected to have stopped using any deprecated fields. This works well as long as your client teams have the bandwidth to handle such technical debt.
A deprecated field's resolver can be changed to return a null value (if the field itself is nullable) or some minimal mock data. This prevents making unnecessary API or database calls, while still ensuring client requests don't result in an error.
In the context of your question, this means you should probably avoid rolling back to a previous release and instead follow the process outlined above for the fields you want to remove.
Alternatively, you could consider versioning. GraphQL generally shies away from the concept of versioning. As the official site explains:
Why do most APIs version? When there's limited control over the data that's returned from an API endpoint, any change can be considered a breaking change, and breaking changes require a new version. If adding new features to an API requires a new version, then a tradeoff emerges between releasing often and having many incremental versions versus the understandability and maintainability of the API.
In contrast, GraphQL only returns the data that's explicitly requested, so new capabilities can be added via new types and new fields on those types without creating a breaking change. This has led to a common practice of always avoiding breaking changes and serving a versionless API.
With that in mind, it's also feasible to still implement versioning with GraphQL by serving different schemas from different endpoints. While it's costly and usually unnecessary to go that route, it may be the right solution for you and your team, particularly if you expect to have to do similar rollbacks in the future.
You cannot do anything w.r.t GraphQL. Since you need to have the field present in the GraphQL type system. There may be libraries available, which will allow you to specify whether the field should be present in the query or not. But, there's no way of allowing non-existent field in the Query.
But what you can do is opt for a Blue-Green deployment strategy. In this strategy, you have both the versions running at the same time.
Let's say: Green is the version with Field A and Blue is the version without Field A. So when your clients are updated they start requesting the Blue version. And once all your clients are updated, shut-down the Green (with Field A).

Handling multiple document version in Mongo Collection

{Hello. I'm not completely satisfied with the title, Mods please help amend it if necessary}
We are trying to come up with ways to implement mongoDB on the back end in our project. We have to address a couple of concerns that were raised as below. Some input by experts in the field would be really helpful.
Remove / Add entirely new fields into the document given the early development changes --> How best can this be accommodated?
As an example to this, suppose my collection contains about 1000 records, and there is an that contains 'Address' data. owing to operational changes, we need to add to (or replace) the 'Address' field with an array of 'Street', 'POBOX' etc. and populate them with a certain default value, how best can this be accommodated?
Specific scenario wherein not all devices that we run would need to be updated to the latest version. This means that the new "fields" added in the DB would essentially be irrelevant to the devices running the older version of the app. --> How best can this scenario be dealt with?
As an example to this, let us assume that some devices run an earlier version of the app which only looks for 'Address' as a field. Certain devices that are updated to the latest app need will need to refer to the 'Street' and 'POBox' fields instead of the address. How best can this be handled from Mongo's perspective?
in simple words:
As you development will progress, document shape can be changed as necessary.
Depend of change type, structure update can be conducted by one update statement
or sometimes there will be a need to use aggregation framework and save results in new collection.
For backward compatibility with app version on device in use - document can contain both versions of fields, so older version of application can be used with newer schema.
This could leads to other problem, what to do if document will be updated? That means app could be setup to read from newer schema, but not write (if possible).
If there will be a possibility to use webApi to communicate with app and then with mongo - you can do all migration on the fly, as api will be aware of changes.....
Any comments welcome!

Is it safe to use fundamental type OID defined in catalog header on client code?

This QA entry shows how to get OID code from catalog header.
It might be the simplest way to get OID numbers. Anyway the header file itself is explicitly separated from client-side header, so it seems it implies not to be used on client side.
Is it safe to use these server-side constants on client side? It's predictable that it will make some legacy issue. Older version of server may lack specific OID code. So I ask excluding this case. I mean, can I assume *once define OID code for fundamental types to be same eternally on future versions*?
Update
I meant only for fundamental types. Such as TEXTOID, INT8OID or TIMESTAMPOID. No custom, composite, user-defined or any other non-fundamental stuffs.
Currently, this is the best I could find. I would go with hardcoding OIDs.
Cited from Merlin Moncure's mention from the mailing list entry.
built in type oids are defined in pg_type.h: cat
src/include/catalog/pg_type.h | grep OID | grep define
built in type oids don't change. you can pretty much copy/pasto the
output of above into an app...just watch out for some types that may
not be in older versions.
user defined type oids (tables, views, composite types, enums, and
domains) have an oid generated when it is created. since that oid can
change via ddl so you should look it up by name at appropriate times.
Is it safe to use these server-side constants on client side? (...) I mean, can I assume once define OID code to be same eternally on future versions?
I honestly doubt it's not safe. The oids will likely be different depending on the Postgres version that was initially installed when new types etc are introduced. Older installs that get upgraded may or may not have the same oids as the same on fresh installs.
For illustration purposes, picture yourself creating an admin user with ID 1 in an app when it gets installed, and hard-coding everything in C by defining ADMIN_USER as that ID. Your customers then add new users, etc. In a subsequent release, you add a quasi-admin user with ID 2 and proceed tohard-code everything around that too. Customers upgrade... and, well, it blows up in your face because on their end, the quasi-admin user can have any ID...
When you use hard-coded oids in Postgres, the same kind of thing may happen. In one version, the built-in types are created in a certain order; in the next, they may be created in another because e.g. Postgres adds a shiny new enum or int4range type. And this doesn't begin to touch the topic of what may potentially occur during upgrades. (Admittedly, dumping and reloading data should yield sane things here, but I wouldn't take the chance myself.)

Forcing web api consumers to accept new fields in responses

I'm creating v2 of an existing RESTful web api.
The responses are JSON lists of objects, roughly in the form:
[
{
name1=value1,
name2=value2,
},
{
name1=value3,
name2=value4,
}
]
One problem we've observed with v1 is that some clients will access fields by integer position, instead of by name. This means that if we decide to add fields to the response (which we had originally considered a compatibility-preserving change), then some of our client's code breaks, unless we add the fields at the end. Even then, other clients code breaks anyway, because they will fail in some way when they encounter an unexpected attribute name.
To counter this in v2, we are considering randomly reordering the fields in every response. This will force clients to index fields by name instead of by position.
Additionally, we are considering adding a randomly-named field to every response. This will force clients to ignore fields they don't recognize.
While this sounds somewhat barmy, it does have the advantage that we will be able to add new fields, safe in the knowledge that this isn't breaking any clients. This means we can issue compatible updates to v2.1, v2.3, etc at the same URL, and that means we will only have to maintain & support a smaller number of API versions.
The alternative is to issue compatibility-breaking v3, v4, at new URLs, which means that we will have to maintain & support many incompatible API versions, which will stretch us that little bit thinner.
Is this a bad idea, and if so, why? Are there any other similar ideas I should think about?
Update: The first couple of responses are pointing out that if I document the issue (i.e. indicate in the docs that fields may be added or reordered) then I am no longer to blame if client code breaks when I subsequently add or reorder fields. Sadly I don't think this is an appropriate option for us: Many dozens of organisations rely on the functionality of our APIs for real-world transactions with substantial financial impact. These organisations are not technically oriented - and the resulting implementations at the client end cover the whole spectrum of technical proficiency. We already did document that fields may get added or reordered in the docs for v1, and that clearly didn't work, because now we're having to issue v2 because many clients, due to lack of time or experience or ability, still wrote code that breaks when we add new fields. If I were now to add fields to the interface, it breaks a dozen different company's interfaces to us, which means they (and us) are bleeding money every minute. If I were to refuse to revert the change or fix it, saying "They should have read the docs!", then I will soon be out of the job, and rightly so. We may attempt to educate the 'failing' partners, but this is doomed to fail as the problem gets larger every month as we continue to grow. My question is, can I systemically head the whole issue off at the pass, preventing this situation from ever arising, no matter what clients try to do? If the techniques I suggest would work, why shouldn't I use them? Why isn't everyone else using them?
If you want your media types to be "evolvable", make that point very clear in the documentation. Similarly, if the placement order of fields is not guaranteed, make that explicitly clear too. If you supply sample code for your API, make sure it does not rely on field ordering.
However, even assuming that you have to maintain different versions of your media types, you don't have to version the URI. REST gives you the ability to maintain the same version-agnostic URI but use HTTP content negotiation (via the Accept and Content-Type headers) to offer different payloads at the same URI.
Therefore any client that doesn't explicitly wish to accept your new v2/v3/etc encoding won't get it. By default, you can return the old v1 encoding with the original field ordering and all of those brittle client apps will work fine. However, new client developers will know (thanks to your documentation) to indicate via Accept that they are willing and able to see the new fields and they don't care about their order. Best of all, you can (and should) use the same URI throughout. Remember - different payloads like this are just different representations of the same underlying resource, so the URI should be the same.
I've decided to run with the described techniques, to the max. I haven't heard any objections to them that hold any water for me. Brian's answer, about re-using the same URI for different API versions, is a solid and much-appreciated complementary idea (with upvote), but I can't award it 'best answer' because it doesn't get to the core of my original question.

Are Operational Transformation Frameworks only meant for text?

Looking at all the examples of Operational Transformation Frameworks out there, they all seem to resolve around the transformation of changes to plain text documents. How would an OT framework be used for more complex objects?
I'm wanting to dev a real-time sticky notes style app, where people can co-create sticky notes, change their positon and text value. Would I be right in assuming that the position values wouldn't be transformed? (I mean, how would they, you can't merge them right?). However, I would want to use an OT framework to resolve conflicts with the posit-its value, correct?
I do not see any problem to use Operational Transformation to work with Complex Objects, what you need is to define what operations your OT system support and how concurrency is solved for them
For instance, if you receive two Sticky notes "coordinates move operation" from two different users from same 'client state', you need to make both states to converge, probably cancelling out second operation.
This is exactly the same behaviour with text when two users generate two updates to delete a text range that overlaps completely, (or maybe partially), the second update processed must be transformed against the previous and the resultant operation will only effectively delete a portion of the original one, (or completely cancelled with a 'no-op')
You can take a look on this nice explanation about how Google Wave Operational Transformation works and guess from this point how it should work your own implementation
See the following paper for an approach to using OT with trees if you want to go down that route:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.74
However, in your particular case, I would use a separate plain text OT document for each stickynote and use an existing library, eg: etherPad, to do the heavy lifting. The positions of the notes could then be broadcast on a last-committer-wins basis.
Operation Transformation is a general technique, it works for any data type. The point is you need to define your transformation functions. Also, there are some atomic attributes that you cannot merge automatically like (position and background color) those will be mostly "last-update wins" or the user solves them manually when there is a conflict.
there are some nice libs and frameworks that provide OT for complex data already out there:
ShareJS : library for Node which provides all operations on JSON objects
DerbyJS: framework for NodeJS, it uses ShareJS for OT stuff.
Open Coweb framework : Dojo foundation project for cooperative web applications using OT