Mongoose versioning: when is it safe to disable it? - mongodb

From the docs:
The versionKey is a property set on each document when first created
by Mongoose. This keys value contains the internal revision of the
document. The name of this document property is configurable. The
default is __v. If this conflicts with your application you can
configure as such:
[...]
Document versioning can also be disabled by setting the versionKey to
false. DO NOT disable versioning unless you know what you are doing.
But I'm curious, in which cases it should be safe to disable this feature?

The version key purpose is optimistic locking.
When enabled, the version value is atomically incremented whenever a document is updated.
This allow your application code to test if changes have been made between a fetch (bringing in version key 42 for example) and a consequent update (ensuring version value still is 42).
If version key has a different value (eg. 43 because an update has been made to the document), your application code can handle the concurrent modification.
The very same concept is often used in relational databases instead of pessimistic locking that can bring horrible performance. All decent ORMs provide such a feature. For example it's nicely described in ObjectDB documentation. It's an object database implemented in Java but the same concept applies.
The blog post linked in Behlül's comment demonstrate the optimistic locking usefulness with a concrete example, but only for arrays changes, see below.
On the opposite, here is a simple case where its useless: a user profile that can be edited by its owner ownly. Here you can get rid of optimistic locking and assume that the last edit always wins.
So, only you know if your application need optimistic locking or not. Use case by use case.
The Mongoose situation is somewhat special.
Optimistic locking is enabled only for arrays because the internal storage format uses positional index. This is the issue described by the blog post linked in the question's comment. I found the explanation given in the mongoose-orm mailing list pretty clear: if you need optimistic locking for other fields, you need to handle it yourself.
Here is a gist showing how to implement a retry strategy for an add operation. Again, how you want to handle it depends on your use cases but it should be enough to get you started.
I hope this clears things up.
Cheers

Related

Check for object ownership with Prisma

I'm new to working with Prisma. One aspect that is unclear to me is the right way to check if a user has permission on an object. Let's assume we have Book and Author models. Every book has an author (one-to-many). Only authors have permission to delete books.
An easy way to enforce this would be this:
prismaClient.book.deleteMany({
id: bookId, <-- id is known
author: {
id: userId <-- id is known
}
})
But this way it's very hard to show an UnauthorizedError to the user. Instead, the response will be a 500 status code since we can't know the exact reason why the query failed.
The other approach would be to query the book first and check the author of the book instance, which would result in one more query.
Is there a best practice for this in Prisma?
Assuming you are using PostgreSQL, the best approach would be to use row-level-security(RLS) - but unfortunately, it is not yet officially supported by Prisma.
There is a discussion about this subject here
https://github.com/prisma/prisma/issues/5128
As for the current situation, to my opinion, it is better to use an additional query and provide the users with informative feedback rather than using the other method you suggested without knowing why it was not deleted.
Eventually, it is up to you to decide based on your use case - whether or not it is important for you to know the reason for failure.
So this question is more generic than prisma - it is also true when running updates/deletes in raw SQL.
When you have extra where clauses to check for ownership, it's difficult to infer which of the clause(s) caused that if the update does not happen, without further queries.
You can achieve this with row level security in postgres, but even that does not come out the box and involves custom configuration to throw specific exceptions when rows are not found due to row level security rules. See this answer for more detail.
I tend to think that doing customised stuff like this is rarely worth the tradeoff, unless you need specialised UX for an uncommon circumstance.
What I would suggest instead in this case is to keep it simple and just use extra queries to check for ownership, but optimise the UX optimistically for the case where the user does own the entity and keep that common and legitimate usecase to a single query.
That is, catch the exception from primsa (or the fact that the update returns 0 rows, or whatever it is in different cases), and only then run a specific select for ownership, to check if that was the reason the update failed.
This is a nice tradeoff because it keeps things simple, and only runs extra queries in the (usually) far less common failure case.
Even having said all that, the broader caveat as always is that probably the extra queries simply won't matter regardless! It's, in 99% of cases, probably best to just always run the extra ownership query upfront as a pattern to keep things as simple as possible, which is what you really want to optimise for over performance until you're running at significant scale.

Graphql Schema update rollback

We are moving some of our API's to graphql and would like to know to handle the rollback of the deployed package (Schema)and the best practice to the same.
To be more specific let's say we have a Schema S with 3 fields and then we added 4th field "A" . Now for some reason we cannot go forward with this package and field "A". So we have to perform roll back of the package so that now the Schema doesn't have field "A".
Now some consumer might ask for this field "A" and he might get an error. We could of course ask our clients to update but there is a time gap during which we might have failed request.
How do we handle this scenario,specifically an urgent rollback with in few hours or a day?
In general, you should avoid removing fields without warning to avoid the exact scenario you describe.
As your schema evolves, it's not uncommon to have some fields that are no longer needed. For example, rather than introducing a drastic change to a particular field (moving from a nullable to a non-nullable return type, adding required arguments, etc.), we may opt to add another field and encourage clients to transition to using that one instead. In such scenarios, we want to eventually remove the original field. The safest way to do so is to deprecate the field first. Using SDL, we can do so using a directive:
fieldA: String #deprecated(reason: "Use fieldB instead!")
After a certain amount of time, you can then remove the field entirely. How long you wait to remove the field depends on your team and the expectations you've communicated around handling deprecated fields. For example, you may find it helpful to set a deadline, by which point all clients are expected to have stopped using any deprecated fields. This works well as long as your client teams have the bandwidth to handle such technical debt.
A deprecated field's resolver can be changed to return a null value (if the field itself is nullable) or some minimal mock data. This prevents making unnecessary API or database calls, while still ensuring client requests don't result in an error.
In the context of your question, this means you should probably avoid rolling back to a previous release and instead follow the process outlined above for the fields you want to remove.
Alternatively, you could consider versioning. GraphQL generally shies away from the concept of versioning. As the official site explains:
Why do most APIs version? When there's limited control over the data that's returned from an API endpoint, any change can be considered a breaking change, and breaking changes require a new version. If adding new features to an API requires a new version, then a tradeoff emerges between releasing often and having many incremental versions versus the understandability and maintainability of the API.
In contrast, GraphQL only returns the data that's explicitly requested, so new capabilities can be added via new types and new fields on those types without creating a breaking change. This has led to a common practice of always avoiding breaking changes and serving a versionless API.
With that in mind, it's also feasible to still implement versioning with GraphQL by serving different schemas from different endpoints. While it's costly and usually unnecessary to go that route, it may be the right solution for you and your team, particularly if you expect to have to do similar rollbacks in the future.
You cannot do anything w.r.t GraphQL. Since you need to have the field present in the GraphQL type system. There may be libraries available, which will allow you to specify whether the field should be present in the query or not. But, there's no way of allowing non-existent field in the Query.
But what you can do is opt for a Blue-Green deployment strategy. In this strategy, you have both the versions running at the same time.
Let's say: Green is the version with Field A and Blue is the version without Field A. So when your clients are updated they start requesting the Blue version. And once all your clients are updated, shut-down the Green (with Field A).

EventStore basics - what's the difference between Event Meta Data/MetaData and Event Data?

I'm very much at the beginning of using / understanding EventStore or get-event-store as it may be known here.
I've consumed the documentation regarding clients, projections and subscriptions and feel ready to start using on some internal projects.
One thing I can't quite get past - is there a guide / set of recommendations to describe the difference between event metadata and data ? I'm aware of the notional differences; Event data is 'Core' to the domain, Meta data for describing, but it is becoming quite philisophical.
I wonder if there are hard rules regarding implementation (querying etc).
Any guidance at all gratefully received!
Shamelessly copying (and paraphrasing) parts from Szymon Kulec's blog post "Enriching your events with important metadata" (emphases mine):
But what information can be useful to store in the metadata, which info is worth to store despite the fact that it was not captured in
the creation of the model?
1. Audit data
who? – simply store the user id of the action invoker
when? – the timestamp of the action and the event(s)
why? – the serialized intent/action of the actor
2. Event versioning
The event sourcing deals with the effect of the actions. An action
executed on a state results in an action according to the current
implementation. Wait. The current implementation? Yes, the
implementation of your aggregate can change and it will either because
of bug fixing or introducing new features. Wouldn’t it be nice if
the version, like a commit id (SHA1 for gitters) or a semantic version
could be stored with the event as well? Imagine that you published a
broken version and your business sold 100 tickets before fixing a bug.
It’d be nice to be able which events were created on the basis of the
broken implementation. Having this knowledge you can easily compensate
transactions performed by the broken implementation.
3. Document implementation details
It’s quite common to introduce canary releases, feature toggling and
A/B tests for users. With automated deployment and small code
enhancement all of the mentioned approaches are feasible to have on a
project board. If you consider the toggles or different implementation
coexisting in the very same moment, storing the version only may be
not enough. How about adding information which features were applied
for the action? Just create a simple set of features enabled, or map
feature-status and add it to the event as well. Having this and the
command, it’s easy to repeat the process. Additionally, it’s easy to
result in your A/B experiments. Just run the scan for events with A
enabled and another for the B ones.
4. Optimized combination of 2. and 3.
If you think that this is too much, create a lookup for sets of
versions x features. It’s not that big and is repeatable across many
users, hence you can easily optimize storing the set elsewhere, under
a reference key. You can serialize this map and calculate SHA1, put
the values in a map (a table will do as well) and use identifiers to
put them in the event. There’s plenty of options to shift the load
either to the query (lookups) or to the storage (store everything as
named metadata).
Summing up
If you create an event sourced architecture, consider adding the
temporal dimension (version) and a bit of configuration to the
metadata. Once you have it, it’s much easier to reason about the
sources of your events and introduce tooling like compensation.
There’s no such thing like too much data, is there?
I will share my experiences with you which may help. I have been playing with akka-persistence, akka-persistence-eventstore and eventstore. akka-persistence stores it's event wrapper, a PersistentRepr, in binary format. I wanted this data in JSON so that I could:
use projections
make these events easily available to any other technologies
You can implement your own serialization for akka-persistence-eventstore to do this, but it still ended up just storing the wrapper which had my event embedded in a payload attribute. The other attributes were all akka-persistence specific. The author of akka-persistence-eventstore gave me some good advice, get the serializer to store the payload as the Data, and the rest as MetaData. That way my event is now just the business data, and the metadata aids the technology that put it there in the first place. My projections now don't need to parse out the metadata to get at the payload.

Meteor method vs. deny/allow rules

In Meteor, when should I prefer a method over a deny rule?
It seems to me that allow/deny rules should be favoured, as their goal is more explicit, and one knows where to look for them.
However, in the Discover Meteor book, preventing duplicate insertions (“duplicate” being defined as adding a document whose url property is already defined in some other document of the same collection) is said to have to be defined through a method (and left as an exercise to the reader, chapter 8.3).
I think I am able to implement this check in a way that I find much clearer:
Posts.deny({
update: function(userId, post, fieldNames, modifier) {
return Posts.findOne({ url: modifier.$set.url, _id: { $ne: post._id } });
}
});
(N.B. if you know the example, yes, I voluntarily left out the “only a subset of the attributes is modified” check from the question to be more specific.)
I understand that there are other update operators than $set in Mongo, but they look typed and I don't feel like leaving a security hole open.
So: are there any flaws in my deny rule? Independently, should I favour a method? What would I gain from it? What would I lose?
Normally I try to avoid subjective answers, but this is a really important debate. First I'd recommend reading Meteor Methods vs Client-Side Operations from the Discover Meteor blog. Note that at Edthena we exclusively use methods for reasons which should become evident.
Methods
pro
Methods can correctly enforce schema and validation rules of arbitrary complexity without the need of an outside library. Side note - check is an excellent tool for validating the structure of your inputs.
Each method is a single source of truth in your application. If you create a 'posts.insert' method, you can easily ensure it is the only way in your app to insert posts.
con
Methods require an imperative style, and they tend to be verbose in relation to the number of validations required for an operation.
Client-side Operations
pro
allow/deny has a simple declarative style.
con
Validating schema and permissions on an update operation is infinitely hard. If you need to enforce a schema you'll need to use an outside library like collection2. This reason alone should give you pause.
Modifications can be spread all over your application. Therefore, it may be tricky to identify why a particular database operation happened.
Summary
In my opinion, allow/deny is more aesthetically pleasing, however it's fundamental weakness is in enforcing permissions (particularly on updates). I would recommend client-side operations in cases where:
Your codebase is relatively small - so it's easy to grep for all instances where a particular modifier occurs.
You don't have many developers - so you don't need to all agree that there is one and only one way to insert into X collection.
You have simple permission rules - e.g. only the owner of a document can modify any aspect of it.
In my opinion, using client-side operations is a reasonable choice when building an MVP, but I'd switch to methods for all other situations.
update 2/22/15
Sashko Stubailo created a proposal to replace allow/deny with insert/update/remove methods.
update 6/1/16
The meteor guide takes the position that allow/deny should always be avoided.

SimpleDB: Guaranteed to see all item attributes if we see the item? (non-consistent read)

I've just discovered an assumption in my use of SimpleDB. I suspect it's safe but would like other opinions since the docs don't seem to cover it.
So say Process 1 stores an item with x attributes. When Process 2 tries to access said item (without consistent read) & finds it, is it guaranteed to have all the attributes stored by Process 1?
I'm excluding the possibility that another process could have changed the data.
I also know that Process 2 has no guarantee of seeing the item unless consistent read is used, I'm just talking about the point when it does eventually see it.
I guess the question is, once I can get an item & am not changing it anywhere else can I assume it has an ad-hoc fixed schema and access all my expected attributes without checking they actually exist?
I don't want to be in a situation where I need to keep requesting items until they have all the attributes I need to use them.
Thanks.
Although Amazon makes no such guarantees in the documentation the current implementation of their eventual consistency guarantees that you'll see all the properties stored by Process 1 or none of them.
See this thread over at the AWS forums and more specifically this answer by an Amazon employee confirming the behavior (emphasis mine).
I don't think we make that guarantee in the documentation, but the
current implementation treats each Put request as a bundle. It won't
split the request up and apply the operations piecemeal. You'll get
either step-1 responses or step-2 responses until eventual consistency
shakes out and leaves you with step-2 responses.
While this is undocumented behavior I suspect quite a few SimpleDB users are relying on it now and as such Amazon won't be likely to change it anytime soon, but that's ju my guess.