Query Couchbase for missing document where no index exists - nosql

I need to query a Couchbase collection for all documents that are missing a particular field.
e.g. SELECT * FROM Bucket01 WHERE Field IS MISSING;
However, I have some annoying limitations:
There is no index on the field.
There is no shared field between affected documents that does have an index.
I am not allowed to create any new indexes.
I do not know any DocKeys.
There is no primary index on the collection.
Can this be accomplished? If so, how?

You could do a one off via Eventing
function OnUpdate(doc, meta) {
if (!doc.somefield) {
// the field is missing, log it or write
// the meta.id to another collection in KV.
log("Missing somefield id", meta.id);
}
}

Given the restrictions you've supplied, this sounds very nearly impossible.
The eventing option in Jon's answer is probably your best bet, but given the restrictions, I'm going to guess they won't allow that either.
Another option (again, probably going to be restricted) is to use and query the Analytics service. The Analytics service provides workload isolation, so that your queries would not affect the normal operations of your production workload.
This would require you adding at least one more node to the cluster with the Analytics service, setting up a data link, and then running your SQL++ against that Analytics data set.
Of course, if this is a temporary, one-off, then adding (and later removing) an Analytics node for a single query might be more trouble than it's worth. But since so many other paths have been closed off to you, it is an option.

Given the restrictions you've supplied, this sounds very nearly impossible.
The eventing option in Jon's answer is probably your best bet, but given the restrictions, I'm going to guess they won't allow that either.
Another option (again, probably going to be restricted) is to use a Couchbase (map/reduce) View. These are deprecated in Couchbase 7, but still technically available.
The syntax would look similar to the eventing function:
function(doc, meta)
{
if(!doc.someField) {
emit(meta.id, [doc.foo, doc.bar, doc.baz]);
}
}
Which would emit the ID of the documents that are missing, along with foo,bar,baz fields (if you need them).
I must stress that creating Views are generally not a good option when compared to SQL++, Eventing, K/V, etc. But if this is a one-off, temporary situation, then it might be okay.

Related

Check for object ownership with Prisma

I'm new to working with Prisma. One aspect that is unclear to me is the right way to check if a user has permission on an object. Let's assume we have Book and Author models. Every book has an author (one-to-many). Only authors have permission to delete books.
An easy way to enforce this would be this:
prismaClient.book.deleteMany({
id: bookId, <-- id is known
author: {
id: userId <-- id is known
}
})
But this way it's very hard to show an UnauthorizedError to the user. Instead, the response will be a 500 status code since we can't know the exact reason why the query failed.
The other approach would be to query the book first and check the author of the book instance, which would result in one more query.
Is there a best practice for this in Prisma?
Assuming you are using PostgreSQL, the best approach would be to use row-level-security(RLS) - but unfortunately, it is not yet officially supported by Prisma.
There is a discussion about this subject here
https://github.com/prisma/prisma/issues/5128
As for the current situation, to my opinion, it is better to use an additional query and provide the users with informative feedback rather than using the other method you suggested without knowing why it was not deleted.
Eventually, it is up to you to decide based on your use case - whether or not it is important for you to know the reason for failure.
So this question is more generic than prisma - it is also true when running updates/deletes in raw SQL.
When you have extra where clauses to check for ownership, it's difficult to infer which of the clause(s) caused that if the update does not happen, without further queries.
You can achieve this with row level security in postgres, but even that does not come out the box and involves custom configuration to throw specific exceptions when rows are not found due to row level security rules. See this answer for more detail.
I tend to think that doing customised stuff like this is rarely worth the tradeoff, unless you need specialised UX for an uncommon circumstance.
What I would suggest instead in this case is to keep it simple and just use extra queries to check for ownership, but optimise the UX optimistically for the case where the user does own the entity and keep that common and legitimate usecase to a single query.
That is, catch the exception from primsa (or the fact that the update returns 0 rows, or whatever it is in different cases), and only then run a specific select for ownership, to check if that was the reason the update failed.
This is a nice tradeoff because it keeps things simple, and only runs extra queries in the (usually) far less common failure case.
Even having said all that, the broader caveat as always is that probably the extra queries simply won't matter regardless! It's, in 99% of cases, probably best to just always run the extra ownership query upfront as a pattern to keep things as simple as possible, which is what you really want to optimise for over performance until you're running at significant scale.

How to design a query where I retrieve last data from resource that I want to apply filter to in RESTful way?

How should a query look like when I want to retrieve last measurements from installations that aren't removed?
Something like that?
/my-web-service/installations/measurements/last?removed=false
The thing is, I don't want to retrieve last measurements that weren't removed from installations. I want to retrieve last measurements from installations that weren't removed.
I see a couple possibilities here:
If you need to read the data from the endpoint transactionally, the way you designed it is the way to go. What I'd change is the name of the param from removed to installationRemoved since it's more descriptive and shorten the endpoint to /my-web-service/measurements/ - since with installations it's unclear in which scope does the client operate. Also, don't you need since param to filter the last measurements?
It there's a chance to split the two endpoints I'd add:
/my-web-service/installations/?removed=false
/my-web-service/measurements/?since=timestamp&installations=<array>
It does not make it better (when it comes to better or worse) but easier and more predictive for the users.
In general try to add more general endpoints with filtering options rather then highly dedicated - doing one particular thing. This way leads to hard to use, loose API. Also, on filtering.
And final notice, your API is good if your clients use it not because they have to but when they like it ;)
According to this best practices article, you could use "aliases for common queries":
To make the API experience more pleasant for the average consumer,
consider packaging up sets of conditions into easily accessible
RESTful paths. For example, the recently closed tickets query above
could be packaged up as GET /tickets/recently_closed
So, in your case, it could be:
/my-web-service/installations/non_removed/measurements/last
where non_removed would be an alias for querying installations that weren't removed.
Hope it helps!

Is it RESTful do DELETE collections?

Some say it's "often not desirable" for a REST server to allow the DELETEion of the entire collection of entities.
DELETE http://www.example.com/customers
Is this a real rule for achieving RESTful nirvana?
And what about sub-collections, defined by query parameters?
DELETE http://www.example.com/customers?gender=m
The answer to this depends more on the requirements and risks of your application than on the inherent RESTfulness of either construct.
It's "not often desirable" to delete an entire collection if you imagine the collection as something with enduring importance like a customer list. It doesn't break with some essential REST wisdom.
If the collection contains information that a user should be able to delete, and potentially a lot of such information, DELETE of the entire collection can be the nicest REST-ish way to go, rather than run a lot of individual DELETEs.
Deleting based on criteria (e.g. the query parameter) is so essential to some applications that if the REST police declared it Officially UnRESTful I would continue to do it without shame.
(They actually say "not often desirable," which one might interpret slightly differently than "often not desirable.")
Yes, it's RESTful. If you have a valid use case, it's fine to do it. Your second scenario (deleting with a query) is frequently useful, and can be an easy way to reduce the number of HTTP requests the client has to make.
Edit: as #peeskillet says, do consider if you actually want to delete something, versus change some flag on the record (e.g. "active").

Meteor method vs. deny/allow rules

In Meteor, when should I prefer a method over a deny rule?
It seems to me that allow/deny rules should be favoured, as their goal is more explicit, and one knows where to look for them.
However, in the Discover Meteor book, preventing duplicate insertions (“duplicate” being defined as adding a document whose url property is already defined in some other document of the same collection) is said to have to be defined through a method (and left as an exercise to the reader, chapter 8.3).
I think I am able to implement this check in a way that I find much clearer:
Posts.deny({
update: function(userId, post, fieldNames, modifier) {
return Posts.findOne({ url: modifier.$set.url, _id: { $ne: post._id } });
}
});
(N.B. if you know the example, yes, I voluntarily left out the “only a subset of the attributes is modified” check from the question to be more specific.)
I understand that there are other update operators than $set in Mongo, but they look typed and I don't feel like leaving a security hole open.
So: are there any flaws in my deny rule? Independently, should I favour a method? What would I gain from it? What would I lose?
Normally I try to avoid subjective answers, but this is a really important debate. First I'd recommend reading Meteor Methods vs Client-Side Operations from the Discover Meteor blog. Note that at Edthena we exclusively use methods for reasons which should become evident.
Methods
pro
Methods can correctly enforce schema and validation rules of arbitrary complexity without the need of an outside library. Side note - check is an excellent tool for validating the structure of your inputs.
Each method is a single source of truth in your application. If you create a 'posts.insert' method, you can easily ensure it is the only way in your app to insert posts.
con
Methods require an imperative style, and they tend to be verbose in relation to the number of validations required for an operation.
Client-side Operations
pro
allow/deny has a simple declarative style.
con
Validating schema and permissions on an update operation is infinitely hard. If you need to enforce a schema you'll need to use an outside library like collection2. This reason alone should give you pause.
Modifications can be spread all over your application. Therefore, it may be tricky to identify why a particular database operation happened.
Summary
In my opinion, allow/deny is more aesthetically pleasing, however it's fundamental weakness is in enforcing permissions (particularly on updates). I would recommend client-side operations in cases where:
Your codebase is relatively small - so it's easy to grep for all instances where a particular modifier occurs.
You don't have many developers - so you don't need to all agree that there is one and only one way to insert into X collection.
You have simple permission rules - e.g. only the owner of a document can modify any aspect of it.
In my opinion, using client-side operations is a reasonable choice when building an MVP, but I'd switch to methods for all other situations.
update 2/22/15
Sashko Stubailo created a proposal to replace allow/deny with insert/update/remove methods.
update 6/1/16
The meteor guide takes the position that allow/deny should always be avoided.

SimpleDB: Guaranteed to see all item attributes if we see the item? (non-consistent read)

I've just discovered an assumption in my use of SimpleDB. I suspect it's safe but would like other opinions since the docs don't seem to cover it.
So say Process 1 stores an item with x attributes. When Process 2 tries to access said item (without consistent read) & finds it, is it guaranteed to have all the attributes stored by Process 1?
I'm excluding the possibility that another process could have changed the data.
I also know that Process 2 has no guarantee of seeing the item unless consistent read is used, I'm just talking about the point when it does eventually see it.
I guess the question is, once I can get an item & am not changing it anywhere else can I assume it has an ad-hoc fixed schema and access all my expected attributes without checking they actually exist?
I don't want to be in a situation where I need to keep requesting items until they have all the attributes I need to use them.
Thanks.
Although Amazon makes no such guarantees in the documentation the current implementation of their eventual consistency guarantees that you'll see all the properties stored by Process 1 or none of them.
See this thread over at the AWS forums and more specifically this answer by an Amazon employee confirming the behavior (emphasis mine).
I don't think we make that guarantee in the documentation, but the
current implementation treats each Put request as a bundle. It won't
split the request up and apply the operations piecemeal. You'll get
either step-1 responses or step-2 responses until eventual consistency
shakes out and leaves you with step-2 responses.
While this is undocumented behavior I suspect quite a few SimpleDB users are relying on it now and as such Amazon won't be likely to change it anytime soon, but that's ju my guess.