Handling multiple document version in Mongo Collection - mongodb

{Hello. I'm not completely satisfied with the title, Mods please help amend it if necessary}
We are trying to come up with ways to implement mongoDB on the back end in our project. We have to address a couple of concerns that were raised as below. Some input by experts in the field would be really helpful.
Remove / Add entirely new fields into the document given the early development changes --> How best can this be accommodated?
As an example to this, suppose my collection contains about 1000 records, and there is an that contains 'Address' data. owing to operational changes, we need to add to (or replace) the 'Address' field with an array of 'Street', 'POBOX' etc. and populate them with a certain default value, how best can this be accommodated?
Specific scenario wherein not all devices that we run would need to be updated to the latest version. This means that the new "fields" added in the DB would essentially be irrelevant to the devices running the older version of the app. --> How best can this scenario be dealt with?
As an example to this, let us assume that some devices run an earlier version of the app which only looks for 'Address' as a field. Certain devices that are updated to the latest app need will need to refer to the 'Street' and 'POBox' fields instead of the address. How best can this be handled from Mongo's perspective?

in simple words:
As you development will progress, document shape can be changed as necessary.
Depend of change type, structure update can be conducted by one update statement
or sometimes there will be a need to use aggregation framework and save results in new collection.
For backward compatibility with app version on device in use - document can contain both versions of fields, so older version of application can be used with newer schema.
This could leads to other problem, what to do if document will be updated? That means app could be setup to read from newer schema, but not write (if possible).
If there will be a possibility to use webApi to communicate with app and then with mongo - you can do all migration on the fly, as api will be aware of changes.....
Any comments welcome!

Related

Flutter firestore partial updates with type safety

I'd love to be able to make type-safe partial updates to firestore documents in flutter, but can't find a solution anywhere to do it.
We use freezed for our models, and then use withConverter in the firestore SDK to easily translate between firestore and our models... however this requires updating every field in the document for each update, causing us to run into problems with concurrent updates to documents. If two users update different fields of the same document concurrently, last-in wins... and offline mode exacerbates the issue. So I'd prefer to update only the fields that have changed.
Option 1
One solution is to forget about type safety and just use the update method.
Option 2
Another solution is to wrap these updates in transactions, but then we can't take advantage of offline mode.
Option 3
A third solution might be to write a library that apparently doesn't exist, which creates a firestore update map by comparing two objects.
Does anyone know of something better?

Graphql Schema update rollback

We are moving some of our API's to graphql and would like to know to handle the rollback of the deployed package (Schema)and the best practice to the same.
To be more specific let's say we have a Schema S with 3 fields and then we added 4th field "A" . Now for some reason we cannot go forward with this package and field "A". So we have to perform roll back of the package so that now the Schema doesn't have field "A".
Now some consumer might ask for this field "A" and he might get an error. We could of course ask our clients to update but there is a time gap during which we might have failed request.
How do we handle this scenario,specifically an urgent rollback with in few hours or a day?
In general, you should avoid removing fields without warning to avoid the exact scenario you describe.
As your schema evolves, it's not uncommon to have some fields that are no longer needed. For example, rather than introducing a drastic change to a particular field (moving from a nullable to a non-nullable return type, adding required arguments, etc.), we may opt to add another field and encourage clients to transition to using that one instead. In such scenarios, we want to eventually remove the original field. The safest way to do so is to deprecate the field first. Using SDL, we can do so using a directive:
fieldA: String #deprecated(reason: "Use fieldB instead!")
After a certain amount of time, you can then remove the field entirely. How long you wait to remove the field depends on your team and the expectations you've communicated around handling deprecated fields. For example, you may find it helpful to set a deadline, by which point all clients are expected to have stopped using any deprecated fields. This works well as long as your client teams have the bandwidth to handle such technical debt.
A deprecated field's resolver can be changed to return a null value (if the field itself is nullable) or some minimal mock data. This prevents making unnecessary API or database calls, while still ensuring client requests don't result in an error.
In the context of your question, this means you should probably avoid rolling back to a previous release and instead follow the process outlined above for the fields you want to remove.
Alternatively, you could consider versioning. GraphQL generally shies away from the concept of versioning. As the official site explains:
Why do most APIs version? When there's limited control over the data that's returned from an API endpoint, any change can be considered a breaking change, and breaking changes require a new version. If adding new features to an API requires a new version, then a tradeoff emerges between releasing often and having many incremental versions versus the understandability and maintainability of the API.
In contrast, GraphQL only returns the data that's explicitly requested, so new capabilities can be added via new types and new fields on those types without creating a breaking change. This has led to a common practice of always avoiding breaking changes and serving a versionless API.
With that in mind, it's also feasible to still implement versioning with GraphQL by serving different schemas from different endpoints. While it's costly and usually unnecessary to go that route, it may be the right solution for you and your team, particularly if you expect to have to do similar rollbacks in the future.
You cannot do anything w.r.t GraphQL. Since you need to have the field present in the GraphQL type system. There may be libraries available, which will allow you to specify whether the field should be present in the query or not. But, there's no way of allowing non-existent field in the Query.
But what you can do is opt for a Blue-Green deployment strategy. In this strategy, you have both the versions running at the same time.
Let's say: Green is the version with Field A and Blue is the version without Field A. So when your clients are updated they start requesting the Blue version. And once all your clients are updated, shut-down the Green (with Field A).

How to handle application death and other mid-operation faults with Mongo DB

Since Mongo doesn't have transactions that can be used to ensure that nothing is committed to the database unless its consistent (non corrupt) data, if my application dies between making a write to one document, and making a related write to another document, what techniques can I use to remove the corrupt data and/or recover in some way?
The greater idea behind NoSQL was to use a carefully modeled data structure for a specific problem, instead of hitting every problem with a hammer. That is also true for transactions, which should be referred to as 'short-lived transactions', because the typical RDBMS transaction hardly helps with 'real', long-lived transactions.
The kind of transaction supported by RDBMSs is often required only because the limited data model forces you to store the data across several tables, instead of using embedded arrays (think of the typical invoice / invoice items examples).
In MongoDB, try to use write-heavy, de-normalized data structures and keep data in a single document which improves read speed, data locality and ensures consistency. Such a data model is also easier to scale, because a single read only hits a single server, instead of having to collect data from multiple sources.
However, there are cases where the data must be read in a variety of contexts and de-normalization becomes unfeasible. In that case, you might want to take a look at Two-Phase Commits or choose a completely different concurrency approach, such as MVCC (in a sentence, that's what the likes of svn, git, etc. do). The latter, however, is hardly a drop-in replacement for RDBMs, but exposes a completely different kind of concurrency to a higher level of the application, if not the user.
Thinking about this myself, I want to identify some categories of affects:
Your operation has only one database save (saving data into one document)
Your operation has two database saves (updates, inserts, or deletions), A and B
They are independent
B is required for A to be valid
They are interdependent (A is required for B to be valid, and B is required for A to be valid)
Your operation has more than two database saves
I think this is a full list of the general possibilities. In case 1, you have no problem - one database save is atomic. In case 2.1, same thing, if they're independent, they might as well be two separate operations.
For case 2.2, if you do A first then B, at worst you will have some extra data (B data) that will take up space in your system, but otherwise be harmless. In case 2.3, you'll likely have some corrupt data in the event of a catastrophic failure. And case 3 is just a composition of case 2s.
Some examples for the different cases:
1.0. You change a car document's color to 'blue'
2.1. You change the car document's color to 'red' and the driver's hair color to 'red'
2.2. You create a new engine document and add its ID to the car document
2.3.a. You change your car's 'gasType' to 'diesel', which requires changing your engine to a 'diesel' type engine.
2.3.b. Another example of 2.3: You hitch car document A to another car document B, A getting the "towedBy" property set to B's ID, and B getting the "towing" property set to A's ID
3.0. I'll leave examples of this to your imagination
In many cases, its possible to turn a 2.3 scenario into a 2.2 scenario. In the 2.3.a example, the car document and engine are separate documents. Lets ignore the possibility of putting the engine inside the car document for this example. Its both invalid to have a diesel engine and non-diesel gas and to have a non-diesel engine and diesel gas. So they both have to change. But it may be valid to have no engine at all and have diesel gas. So you could add a step that makes the whole thing valid at all points. First, remove the engine, then replace the gas, then change the type of the engine, and lastly add the engine back onto the car.
If you will get corrupt data from a 2.3 scenario, you'll want a way to detect the corruption. In example 2.3.b, things might break if one document has the "towing" property, but the other document doesn't have a corresponding "towedBy" property. So this might be something to check after a catastrophic failure. Find all documents that have "towing" but the document with the id in that property doesn't have its "towedBy" set to the right ID. The choices there would be to delete the "towing" property or set the appropriate "towedBy" property. They both seem equally valid, but it might depend on your application.
In some situations, you might be able to find corrupt data like this, but you won't know what the data was before those things were set. In those cases, setting a default is probably better than nothing. Some types of corruption are better than others (particularly the kind that will cause errors in your application rather than simply incorrect display data).
If the above kind of code analysis or corruption repair becomes unfeasible, or if you want to avoid any data corruption at all, your last resort would be to take mnemosyn's suggestion and implement Two-Phase Commits, MVCC, or something similar that allows you to identify and roll back changes in an indeterminate state.

Mapping to legacy MongoDB store

I'm attempting to write up a Yesod app as a replacement for a Ruby JSON service that uses MongoDB on the backend and I'm running into some snags.
the sql=foobar syntax in the models file does not seem too affect which collection Persistent.MongoDB uses. How can I change that?
is there a way to easily configure mongodb (preferably through the yaml file) to be explicitly read only? I'd take more comfort deploying this knowing that there was no possible way the app could overwrite or damage production data.
Is there any way I can get Persistent.MongoDB to ignore fields it doesn't know about? This service only needs a fraction of the fields in the collection in question. In order to keep the code as simple as possible, I'd really like to just map to the fields I care about and have Yesod ignore everything else. Instead it complains that the fields don't match.
How does one go about defining instances for models, such as ToJSON. I'd like to customize how that JSON gets rendered but I get the following error:
Handler/ProductStat.hs:8:10:
Illegal instance declaration for ToJSON Product'
(All instance types must be of the form (T t1 ... tn)
where T is not a synonym.
Use -XTypeSynonymInstances if you want to disable this.)
In the instance declaration forToJSON Product'
1) seems that sql= is not hooked up to mongo. Since sql is already doing this it shouldn't be difficult for Mongo.
2) you can change the function that runs the queries
in persistent/persistent-mongoDB/Database/Persist there is a runPool function of PersistConfig. That gets used in yesod-defaults. We should probably change the loadConfig function to check a readOnly setting
3) I am ok with changing the reorder function to allow for ignoring, although in the future (if MongoDB returns everything in ordeR) that may have performance implications, so ideally you would list the ignored columns.
4) This shouldn't require changes to Persistent. Did you try turning on TypeSynonymInstances ?
I have several other Yesod/Persistent priorities to attend to before these changes- please roll up your sleeves and let me know what help you need making them. I can change 2 & 3 myself fairly soon if you are committed to testing them.

In salesforce.com can you have multivalued attributes?

I am developing a Novell Identity Manager driver for Salesforce.com, and am trying to understand the Salesforce.com platform better.
I have had really good success to date. I can read pretty much arbitrary object classes out of SFDC, and create eDirectory objects for them, and what not. This is all done and working nicely. (Publisher Channel). Once I got Query events mapped out, most everything started working in the Publisher Channel.
I am now working on sending events back to SFDC (Subscriber channel) when changes occur in eDirectory.
I am using the upsert() function in the SOAP API, and with Novell Identity Manager, you basically build the SOAP doc, and can see the results as you build it. (You can do it in XSLT or you can use the various allowed tokens to build the document in DirXML Script. I am using DirXML Script which has been working well so far.).
The upshot of that comment is that I can build the SOAP document, see it, to be sure I get it right. Which is usually different than the Java/C++ approach that the sample code usually provides. Much more visual this way.
There are several things about upsert() that I do not entirely understand. I know how to blank a value, should I get that sort of event. Inside the <urn:sObjects> node, add a node like (assuming you get your namespaces declared already):
<urn1:fieldsToNull>FieldName</urn1:fieldsToNull>
I know how to add a value (AttrValue) to the attribute (FieldName), add a node like:
<FieldName>AttrValue</FieldName>
All this works and is pretty straight forward.
The question I have is, can a value in SFDC be multi-valued? In eDirectory, a multi valued attribute being changed, can happen two ways:
All values can be removed, and the new set re-added.
The single value removed can be sent as that sort of event (remove-value) or many values can be removed in one operation.
Looking at SFDC, I only ever see Multi-picklist attributes that seem to be stored in a single entry : or ; delimited. Is there another kind of multi valued attribute managed differently in SFDC? And if so, how would one manipulate it via the SOAP API?
I still have to decide if I want to map those multi-picklists to a single string, or a multi valued attribute of strings. First way is easier, second way is more useful... Hmmm... Choices...
Some references:
I have been using the page Sample SOAP messages to understand what the docs should look like.
Apex Explorer is a kicking tool for browsing the database and testing queries. Much like DBVisualizer does for JDBC connected databases. This would have been so much harder without it!
SoapUi is also required, and a lovely tool!
As far as I know there's no multi-value field other than multi-select picklists (and they map to semicolon-separated string). Generally platform encourages you to create a proper relationship with another (possibly new, custom) table if you're in need of having multiple values associated to your data.
Only other "unusual" thing I can think of is how the OwnerId field on certain objects (Case, Lead, maybe something else) can be used to point to User or Queue record. Looks weird when you are used to foreign key relationships from traditional databases. But this is not identical with what you're asking as there will be only one value at a time.
Of course you might be surpised sometimes with values you'll see in the database depending on the viewing user's locale (stuff like System Administrator profile becoming Systeembeheerder in Dutch). But this will be still a single value, translated on the fly just before the query results are sent back to you.
When I had to perform SOAP integration with SFDC, I've always used WSDL files and most of the time was fine with Java code generated out of them with Apache Axis. Hand-crafting the SOAP message yourself seems... wow, hardcore a bit. Are you sure you prefer visualisation of XML over the creation of classes, exceptions and all this stuff ready for use with one of several out-of-the-box integration methods? If they'll ever change the WSDL I need just to regenerate the classes from it; whereas changes to your SOAP message creation library might be painful...