What is a secure way to prevent a user from updating specific fields in a MongoDB document? - mongodb

I am trying to prevent users from updating certain fields in a mongodb object for which they are allowed to edit every other field. For instance the user should be able to edit/add/remove all fields except the field "permissions". My current approach is to test each key the user is trying to "$set" and see if it starts with the substring "permissions" (to cover dot notation). Example in python:
def sanitize_set(son):
return {"$set": {k: v for k, v in son.get("$set", {}).items()
if not k.startswith("permissions")}}
This approach is beautifully simple and seems to work. I wanted to reach out to the community to see if anyone else has tackled this issue before or sees obvious flaws in my approach. Thank you,
Joshua

Without seeing some example data with an explanation of what should/shouldn't be updatable - it's hard to say for sure, but the way I would prevent this would be to not allow the user to directly supply the fields they will be updating. For example say you had a function called update_employee which updated information in an employee document. If you implement it like this:
update_employee(employee):
db.employees.update({_id: session.user_id}, {$set: employee})
Whatever gets passed in as the employee object is what will be updated. Instead you could create the update object using the values passed in like so:
update_employee(employee):
updatedEmployee = {
email: employee.email,
address: employee.address,
phone: employee.phone
}
db.employees.update({_id: session.user_id}, {$set: updatedEmployee})
This way you have complete control over what is being updated in your database. So if an extra field (such as salary) is passed in, it will be ignored.

Since (as far as I know) does not have a field lock, what you can do in this case is create a routine to pick up the specific document, present it to the user in any way you wish, but simply only show the fields they are allowed to edit.
You can present the entire JSON representation to the user (editor) and have a routine which simply does not allow changes to the fields that are locked. In other words if you dont want field {"name": "Sam"} to be edited even if the editor changes this value to {"name": "Joe"} just kick it out before updating and only update fields which are allowed to be edited. Since it is all done in memory before an actual update (upsert) you have total control over what is being edited and what is not.
If you follow a scheme which does have a prefix say e_address where you have decided any field with e_ allows editing, the job is that much easier programmatically.
Even in user-defined roles I have not seen any possibility of locking specific fields in a collection. (I could be wrong here.)
The programming constructs here are simple though.
A. Pick up field to memory
B. Editor does editing
C. Only update fields which are allowed to be edited. Any other changes just ignore.
(I kept this answer generic as I do not use Python, though the construct should apply to any language.)

Related

Is it possible to prevent the reading and/or setting of a field value with DBIx Class?

I'm working in a project that uses Catalyst and DBIx::Class.
I have a requirement where, under a certain condition, users should not be able to read or set a specific field in a table (e.g. the last_name field in a list of users that will be presented and may be edited by the user).
Instead of applying the conditional logic to each part of the project where that table field is read or set, risking old or new cases where the logic is missed, is it possible to implement the logic directly in the DBIx::Class based module, to never return or change the value of that field when the condition is met?
I've been trying to find the answer, and I'm still reading, but I'm somewhat new to DBIx::Class and its documentation. Any help would be highly appreciated. Thank you!
I‘d use an around Moose method modifier on the column accessor generated by DBIC.
This won‘t be a real security solution as you can still access data without the Result class, for example when using HashRefInflator.
Same for calling get_column.
Real security would be at the database level with column level security and not allowing the database user used by the application to fetch that field.
Another solution I can think of is an additional Result class for that table that doesn‘t include the column, maybe even defaulting to it and only use the one including the column when the user has a special role.

foo__icontains QuerySet empty when querying for old Google Cloud Datastore records

I have a model Foo which has a field called bar.
class Foo(models.Model):
bar = models.CharField(max_length=70)
Given an existing instance of Foo whose bar field is set to 'qux', the following query returns an empty QuerySet:
Foo.objects.filter(bar__icontains="qux")
However, if I reference/save the previous instance or I create/save a new Foo, I am able to find it using a similar query.
So, how can I find old, existing records using icontains?
Djangae's documentation makes specific reference to using contains and icontains, but I see no mention of this particular behavior or how to address it. (I do see the index being added to djangaeidx.yaml) I also see nothing in the Migration documentation which makes me think I need to explicitly add an index or similar.
The answer can be found in the 0.9.10 migration guide.
In this situation, you'd need to run something like:
defer_iteration(Foo.objects.all(), Foo.save, _target="your-new-app-version")
in order to add the necessary indexes to existing records.
While this worked, it definitely feels heavy handed to me. I'd be happy to hear from anyone else who might have an alternative solution.

Why there are two refs in declaring one-to-many association in mongoose?

I'm very new in mongodb, see this one-to-many example
As per my understanding
This example says that a person can write many stories or a story belongs_to a person , I think storing the person._id in stories collection was enough
why the person collection has the field stories
cases for fetching data
case 1
Fetch all stories of a person whose id is let us say x
solution: For this just fire a query in story collection where author = x
case 2
Fetch the author name of a particular story
solution: For this we have author field story collection
TL;DR
Put simply: Because there is no such notion as explicit relations in MongoDB.
Mongoose can not know how you want to resolve the relationship. Will the search be from a given story object and the author is to find? Or will the search be to find all stories for an author object? So it makes sure that it can resolve the relation regardless.
Note that there is a problem with that approach, and a big one. Say we are not talking of a one-to-few relation as in this example, but a "One-To-A-Shitload"™ relation. Since BSON documents have a size limit of 16MB, you have a limit of relations you can manage this way. Quite some, but there will be an artificial limit.
How to solve this: Instead of using an ODM, do proper modelling yourself. Since you know your use cases. I will give you an example below.
Detailed
Let us first elaborate your cases a bit:
For a given user (aka "we already have all the data of that user document"), what are his or her stories?
List all stories together with the user name on an overview page.
For a selected ("given") story, what are the authors details?
And just for demonstration purposes: A given user wants to change the name under which a story is displayed, be it his user name or natural name (it happens!) or even pseudonym.
Ok, and now lets put mongoose aside for now and let us think about how we could implement this ourselves. Keeping in mind that
Data modelling in MongoDB is deriving your model from the questions which come from your use cases so that they most common use cases are covered in the most efficient way.
As opposed to RDBMS modelling, where you identify your entities, their properties and relations and then jump through some hoops to get your questions answered somehow.
So, looking at our user stories, I guess we can agree that 2 is the most common use case, 3 and 1 next and 4 is rather rare compared to the other ones.
So now we can start
Modelling
We model the data involved in our most common use cases first.
So, we want to make the query for stories the most efficient one. And we want to sort the stories in descending order of submission. Simple enough:
{
_id: new ObjectId(),
user: "Name to Display",
story: "long story cut short",
}
Now lets say you want to display your stories, 10 of them:
db.stories.find({}).sort({_id:-1}).limit(10)
No relation, all the data we need, a single query, used the default index on _id for sorting. Since a timestamp is part of the ObjectId and it is the most significant part, we can use it to sort the stories by time. The question "Hey, but what if one changes his or her user name?" usually comes now. Simple:
db.stories.update({"user":oldname},{$set:{"user":newname}},{multi:true})
Since this is a rare use case, it only has to be doable and does not have to be extremely efficient. However, later we will see that we have to put an index on user anyway.
Talking of authors: Here it really depends on how you want to do it. But I will show you how I tend to model something like that:
{
_id: "username",
info1: "foo",
info2: "bar",
active: true,
...
}
We make use of some properties of _id here: It is a required index with a unique constraint. Just what we want for usernames.
However it comes with a caveat: _id is immutable. So if somebody wants to change his or her username, we need to copy the original user to a document with the _id of the new user name and change the user property in our stories accordingly. The advantage of this way of doing it that even when the update for changing usernames (see above) should fail during its runtime, each and every story can still be related to a user. If the update is successful, I tend to log out the user and have him log in with the new username again.
In case you want to have a distinction between username and displayed name, it all becomes even easier:
{
_id: "username",
displayNames: ["Foo B. Baz","P.S. Eudonym"],
...
}
Then you use the display name in your stories, of course.
Now let us see how we can get the user details of a given story. We know the author's name so it is as easy as:
db.authors.find({"_id":authorNameOfStory})
or
db.authors.find({"displayNames": authorNameOfStory})
Finding all stories for a given user is quite simple, too. It is either:
db.stories.find({"name":idFieldOfUser})
or
db.stories.find({"name":{$in:displayNamesOfUser}})
Now we have all your our use cases covered, now we can make them even more efficient with
Indexing
An obvious index is on the story models user field, so we do it:
db.stories.ensureIndex({"name":1})
If you are good with the "username as _id" way only, you are done with indexing. Using display names, you obviously need to index them. Since you most likely want display names and pseudonyms to be unique, it is a bit more complicated:
db.authors.ensureIndex({"displayNames":1},{sparse:true, unique:true})
Note: We need to make this as sparse index in order to prevent unnecessary errors when somebody has not decided for a display name or pseudonym yet. Make sure you explicitly add this field to an author document only when a user decides for a display name. Otherwise, it would evaluate to null server side , which is a valid value and you will get a constraint violation error, namely "E1100 duplicate key".
Conclusion
We have covered all your use cases with relations handled by the application thereby simplifying our data model a great deal and have the most efficient queries for our most common use cases. Every use case is covered with a single query, taking into account the information we already have at the time we are doing the query.
Note that there is no artificial limit on how many stories a user can publish since we use implicit relations to our advantage.
As for more complicated queries ("How many stories does each user submit per month?"), use the aggregation framework. That is what it is there for.

MongoDB default items with user overwrite

So the problem seems very simple but every way I approach the solution it seems to be a poor implementation approach with either duplicated content or messy data.
The Problem
I want to provide an option for “overwrites” per user on “default” items. Basically I have a mongodb database with a collection containing items with the following information:
ID
Name
Icon
Description
There is a set of 20-30 items in this collection, which each user using the app views.
Most users will be happy to see the default but if a user wishes to say change the icon for an item or the name then how do I handle this “overwrite” on the ”default” item for just that single user.
Possible solutions
My thoughts are to implement one of the following options but all just seem a little wrong (I have provided my thoughts on this):
for each “overwriten” item add a duplicated item to the collection with the changes and a user_id field to link the user - this seems like a little bit of duplicated content as the user might only change the icon and not the name/description. Also if the name is changed int eh future on the default item how do you handle this and also how do you understand that this item must replace one of the “defaults” for just that user. I worry it will be a little bit of a performance issue too when looking up the items and then replacing the changed item
having all the items duplicated per user in the same collection - very much duplication of content but might be the best performing option but could cause issues in the future if new “default” items need to be added or default options need changing
collection per user - same as the previous. This options seems all kinds of wrong but maybe I’m just new to this and it is actually the best option.
collection containing overwrites - this seems like a good idea but equally a bad one due to looking up and comparing. If everything is changed then why not just have all new items rather than effectively a find a replace.
Reason for wanting to get this right
Maybe I’m over thinking this but it seems like I will face this issue a lot and I think I need to get it right to remove future issues with performance, data management and updates to default items.

MongoDB Schema Design With or Without Null Placeholders

I still getting used to using a schema-less document oriented database and I am wondering what a generally accepted practice is regarding schema designs within an application model.
Specifically I'm wondering whether it is a good practice to use enforce a schema within the application model when saving to mongodb like this:
{
_id: "foobar",
name: "John"
billing: {
address: "8237 Landeau Lane",
city: "Eden Prairie",
state: "MN",
postal: null
}
balance: null,
last_activity: null
}
versus only storing the fields that are used like this:
{
_id: "foobar",
name: "John"
billing: {
address: "8237 Landeau Lane",
city: "Eden Prairie",
state: "MN"
}
}
The former is self-descriptive which I like, while the latter makes no assumptions on the mutability of the model schema.
I like the first option because it makes it easy to see at a glance what fields are used by the model yet currently unspecified, but it seems like it would be a hassle to update every document to reflect a new schema design if I wanted to add an extra field, like favorite_color.
How do most veteran mongodb users handle this?
I would suggest second approach.
You can always see the intended structure if you look at your entity class in the source code. Or do you use dynamic language, and don't create an entity?
You save a lot of space per record, because you don't have to store null column names. This may not be expensive on small collections. But on large, with millions of records, I would even go to shorten the names of fields.
As you already mentioned. By specifying optional column names, you create a pattern, which, if you want to follow, you'll have to update all existing records when you add a new field. This is, again, a bad idea for a big DB.
In any case it all goes down your db size. If you don't target for many GBs or TBs of data, then both approaches are fine. But, if you predict, that your DB may grow really large, I would do anything to cut the size. Spending 30-40% of storage for column names is a bad idea.
I prefer the first option, it is easier to code within the application and requires much less state holders and functions to understand how things should work.
As for adding a new field over time you don't need to update all your records to support this new field like you would in SQL all you need to do is write the new field into your model application side and support this field being null if it is not returned from MongoDB.
A good example is in PHP.
I have a class of user at first with only one field, name
class User{
public $name;
}
6 months down the line and 60,000 users later I want to add, say, address. All I do is add that variable to my application model:
class User{
public $name;
public $address = array();
}
This now works exactly like adding a new null field to SQL without having to actually add it to every row on-demand.
It is a very reactive design, don't update what you don't need to. If that row gets used it will get updated, if not then who cares.
So eventually your rows actually become a mix and match between option 1 and 2 but it is really a reactive option 1.
Edit
On the storage side you have also got to think of pre-allocation and movement of documents.
Say the amount of a set record now is only a third of the doc but then suddenly, from the user updating the doc with all of the fields, you now have extra fragmentation from the movement of your docs.
Normally when you are defining a schema like this you are defining one that will eventually grow and apply to that user in most cases (much like an SQL schema does).
This is something to take into consideration that even though storage might be lower in the short term it could cause fragmentation and slow querying due to that fragmentation and you could easily find yourself having to run compacts or repairDbs due to the problems you now face.
I should mention that both of those functions I said above are not designed to be run regularly and have a significant performance problem to them while they run on a production environment.
So really with the structure above you don't need to add a new field across all documents and you will most likely get less movement and problems in the long run.
You can fix the performance problems of consistently growing documents by using power of 2 sizes padding, but then this is collection wide which means that even your fully filled documents will use up at least double their previous space and you small documents will probably be using as much space as your full documents would have on a padding factor of 1.
Aka you lose space, not gain it.