Write-only collections in MongoDB - mongodb

I'm currently using MongoDB to record application logs, and while I'm quite happy with both the performance and with being able to dump arbitrary structured data into log records, I'm troubled by the mutability of log records once stored.
In a traditional database, I would structure the grants for my log tables such that the application user had INSERT and SELECT privileges, but not UPDATE or DELETE. Similarly, in CouchDB, I could write a update validator function that rejected all attempts to modify an existing document.
However, I've been unable to find a way to restrict operations on a MongoDB database or collection beyond the three access levels (no access, read-only, "god mode") documented in the security topic on the MongoDB wiki.
Has anyone else deployed MongoDB as a document store in a setting where immutability (or at least change tracking) for documents was a requirement? What tricks or techniques did you use to ensure that poorly-written or malicious application code could not modify or destroy existing log records? Do I need to wrap my MongoDB logging in a service layer that enforces the write-only policy, or can I use some combination of configuration, query hacking, and replication to ensure a consistent, audit-able record is maintained?

I would say the best bet would be to wrap up the access to MongoDB in a service layer that enforces your specific contracts. We don't do much in the way of fine-grained access control because there are so many different cases that solving all of them correctly is tricky to get right. So for the most part it's up to the application layer to implement those kind of controls.

To add a write only (on collection level) user to the MongoDB do the following.
Say you want to have a user that can only write (insert only) to a certain collection on a certain database.
Create a file createuser.js with following contents:
function createCollectionWriter(database, username, password, rolename, collection)
{
db.getSiblingDB(database).createUser({
user : username,
pwd : password,
roles : [ ]
});
db.getSiblingDB(database).createRole({
role : rolename,
privileges : [
{
resource : { db : database, "collection" : collection },
actions : [ "insert" ]
}
],
roles : []
});
db.getSiblingDB(database).grantRolesToUser(username, [ rolename ] );
}
And execute this from command-line
$ mongo --eval="load('createuser.js'); createCollectionWriter('yourdb', 'user1', 'pass1', 'rolename', 'col1')"
This creates a user with username user1 with password pass1 and this user has write only access to database's yourdb collection col1.
A side-effect of this is that role rolename is created. If you have existing user that should have a write only access to the same collection, grant role rolename to that existing user.
Feel free to use and modify the code provided above :).

In MongoDB 1.3.2+ you can add some restriction in user :
db.addUser("guest", "passwordForGuest", true)
But it's only existing now not better. Maybe you can add some feature request
see information in MongoDB documentation : http://www.mongodb.org/display/DOCS/Security+and+Authentication

Related

Restrict user to read only from SECONDARY mongoDB replicaSet

Is there such possibility from database backend to force user to read only from SECONDARY members ?
I would like to restrict some users to not be able to impact performance in PRIMARY replicaset members in my on-premise deployment ( not atlas )
Issue is easy to solve if customer agree adding to the URI
readPreference=secondary
But I am checking if there is option to force from the database side without asking the customer ...
the only option I have found is to restrict by server IP address:
use admin
db.createUser(
{
user: "dbuser",
pwd: "password"
roles: [ { role: "readWrite", db: "reporting" } ],
authenticationRestrictions: [ {
clientSource: ["192.0.2.0"],
serverAddress: ["198.51.100.1","192.51.100.2"]
} ]
}
)
There are currently no supported ways to enforce this from within MongoDB itself apart from the authenticationRestrictions configurations for defining users which is noted in the question itself.
Regarding the comments - ANALYTICS tag in Atlas are a (automatic) Replica Set Tag. Replica set tags themselves can be used in on-premise deployments. But tags are used in conjunction with read preference which is set by the client application (at least in the connection string). So that approach/solution really doesn't provide any additional enforcement from read preference alone for the purposes of this question. Additional information about tags can be found here and here.
In an 'unsupported'/hacky fashion, you could create the user(s) directly and only on the SECONDARY members that you want the client to read from. This would be accomplished by taking the member out of the replica set, starting it up as a standalone, creating the user, and then joining it back to the replica set. While it would probably work, there are a number of implications that don't make this a particularly good approach. For example, elections (for high availability purposes) would change the PRIMARY (therefore where the client can read from) among other things.
Other approaches to this would be in redirecting/restricting traffic at the network layer. Again not a great approach.

Is there a way to embed metadata into an Apostrophe CMS version 3 user type

I am looking into the Apostrophe CMS for a way to embed data into the mongoDB user documents so that I can keep more data about a given user beyond just their username, password, and role (admin, guest, contributor, etc).
Looking through the Apostrophe CMS version 3 documentation (https://v3.docs.apostrophecms.org/guide/users.html) there seems to be no information about how to interact with the database so that more data can be added to a user. As it stands there don't seem to be any methods available to interact with the database in this fasion.
An example might be:
user: {
role: guest,
group: eastUS,
groupID: 1jfe25226,
isActive: true,
hasBeenContacted: true
}
If anyone has attempted to do this or successfully achieved this please let me know your approach.
Apostrophe users are just Apostrophe pieces. That means you can extend them with custom fields in a project-level modules/#apostrophecms/user/index.js file, just like you would add fields to any piece type. This gives you a way to add additional editable fields.
The documentation on queries also applies to users. However bear in mind that for security reasons only admins can query for users or update them.
To update a single custom property of the current user you might write the following inside an Apostrophe apiRoute, promise event handler, etc.:
await self.apos.doc.db.update({ _id: req.user._id }, { $set: { group: 'xyz' } });
This goes directly to MongoDB to $set one property and bypasses permissions checks, so use with care.
Apostrophe's standard REST APIs also work for users, but bear in mind that for security reasons only an admin user can fetch and edit users.
I can revise and add more clarification if you can be more specific about what you are trying to do and in what situation.

CQRS and Event Sourcing coupled with Relational Database Design

Let me start by saying, I do not have real world experience with CQRS and that is the basis for this question.
Background:
I am building a system where a new key requirement is allowing admins to "playback" user actions (admins want to be able to step through every action that has happened in a system to any particular point). The caveats to this are, the company already has reports that are generated off of their current SQL db that they will not change (at least not in parallel with this new requirement) so the storage of record will be SQL. I do not have access to SQL's Change Data Capture, so creating a bunch of history tables with triggers would be incredibly difficult to maintain so I'd like to avoid that if at all possible. Lastly, there are potentially (not currently) a lot of data entry points that go through a versioning lifecycle that will result in changes to the SQL db (adding/removing fields) so if I tried to implement change tracking in SQL, I'd have to maintain the tables that handled the older versions of the data (nightmare).
Potential Solution
I am thinking about using NoSQL (Azure DocumentDB) to handle data storage (writes) and then have command handlers handle updating the current SQL (Azure SQL) with the relevant data to be queried (reads). That way the audit trail is created and that idea of "playing back" can be handled while also not disturbing the current back end functionality that is provided.
This approach would handle the requirement and satisfy the caveats. I wouldnt use CQRS for the entire app, just for the pieces that I needed this "playback" functionality. I know that I would have to mitigate failure points along the Client -> Write to DocumentDB -> Respond to user with success/fail -> Write to SQL on Success write to DocumentDB path, but my novice CQRS eyes can't see a reason why this isn't a great way to handle this.
Any advice would be greatly appreciated.
This article explained CQRS pattern and provided an example of a CQRS implementation please refer to it.
I am thinking about using NoSQL (Azure DocumentDB) to handle data storage (writes) and then have command handlers handle updating the current SQL (Azure SQL) with the relevant data to be queried (reads).
here is my suggestion, when a user do write operations to update a record, we could always do insert operation before admin audit user’s operations. For example, if user want to update a record, we could insert updating entity with a property that indicates if current operation is audited by admins instead of directly update the record.
Original data in document
{
"version1_data": {
"data": {
"id": "1",
"name": "jack",
"age": 28
},
"isaudit": true
}
}
For updating age field, we could insert entity with updated information instead of updating original data directly.
{
"version1_data": {
"data": {
"id": "1",
"name": "jack",
"age": 28
},
"isaudit": true
},
"version2_data": {
"data": {
"id": "1",
"name": "jack",
"age": 29
},
"isaudit": false
}
}
and then admin could check the current the document to audit user’s operations and determine if updates could write to SQL database.
One potential way to think about this is creating a transaction object that has a unique id and represents the work that needs to be done. The transaction in this case would be write an object to document db or write an object to sql db. It could contain the in memory object to be written and the destination db (doc db, sql, etc.) connection parameters.
Once you define your transaction you would need to adjust your work flow for a proper CQRS. Instead of client writing to doc db directly and waiting on the result of this call, let the client create a transaction with a unique id - which could be something like Date Time tick counts or an incremental transaction id for instance, and then write this transaction to a message queue like azure queue or service bus. Once you write the transaction to the queue return success to user at that point. Create worker roles that would read the transaction messages from this queue and process them, write objects to doc db. That is not overwriting the same entity in doc db, but just writing the transaction with the unique incremental id to doc db for that particular entity. You could also use azure table storage for that afaik.
After successfully updating the doc db transaction, the same worker role could write this transaction to a different message queue which would be processed by its own set of worker roles which would update the entity in sql db. If anything goes wrong in the interim, keep an error table and update failures in that error table to query and retry later on.

meteor: use different database for each user

I currently assign a mongodb to my meteor app using the env variable
"MONGO_URL": "mongodb://localhost:27017/dbName" when I start the meteor instance.
So all data gets written to the mongo database with the name "dbName".
I am looking for a way to individually set the dbName for each custumer upon login in order to seperate their data into different databases.
This generally unsupported as this is defined at startup. However, this thread offers a possible solution:
https://forums.meteor.com/t/switch-database-while-meteor-is-running/4361/6
var database = new MongoInternals.RemoteCollectionDriver("<mongo url>");
MyCollection = new Mongo.Collection("collection_name", { _driver: database });
This would allow you to define the database name in the mongo url but would require a fair bit of extra work to redefine your collections on a customer by customer basis.
Here's another approach that will make your life eternally easier:
Create a generic site with no accounts at mysite.com
When they login at mysite.com, figure out what site they actually belong to and redirect them to customerName.mysite.com and log them in there
Run a separate instance of Meteor configured for a different mongo at each site
nginx might help you with the above.
It is generally good practice to run separate DBs when offering a B2B
solution.
That's a matter of opinion that depends heavily on the platform. Many SaaS providers would argue that point.

How do document databases deal with changing relationships between objects (or do they at all)?

Say, at the beginning of a project, I want to store a collection of Companies, and within each company, a collection of Employees.
Since I'm using a document database (such as MongoDB), my structure might look something like this:
+ Customers[]
+--Customer
+--Employees[]
+--Employee
+--Employee
+--Customer
+--Employees[]
+--Employee
What happens if, later down the track, a new requirement is to have some Employees work at multiple Companies?
How does one manage this kind of change in a document database?
Doesn't the simplicity of a document database become your worse enemy, since it creates brittle data structures which can't easily be modified?
In the example above, I'd have to run modify scripts to create a new 'Employees' collection, and move every employee into that collection, while maintaining some sort of relationship key (e.g. a CompanyID on each employee).
If I did the above thoroughly enough, I'd end up with many collections, and very little hierarchy, and documents being joined by means of keys.
In that case, am I still using the document database as I should be?
Isn't it becoming more like a relational database?
Speaking about MongoDB specifically...because the database doesn't enforce any relationships like a relational database, you're on the hook for maintaining any sort of data integrity such as this. It's wonderfully helpful in many cases, but you end up writing more application code to handle these sorts of things.
Having said all of that, they key to using a system like MongoDB is modeling your data to fit MongoDB. What you have above makes complete sense if you're using MySQL...using Mongo you'd absolutely get in trouble if you structure your data like it's a relational database.
If you have Employees who can work at one or more Companies, I would structure it as:
// company records
{ _id: 12345, name : 'Apple' }
{ _id: 55555, name : 'Pixar' }
{ _id: 67890, name : 'Microsoft' }
// employees
{ _id : ObjectId('abc123'), name : "Steve Jobs", companies : [ 12345, 55555 ] }
{ _id : ObjectId('abc456'), name : "Steve Ballmer", companies : [ 67890 ] }
You'd add an index on employees.companies, which would make is very fast to get all of the employees who work for a given company...regardless of how many companies they work for. Maintaining a short list of companies per employee will be much easier than maintaining a large list of employees for a company. To get all of the data for a company and all of it's employees would be two (fast) queries.
Doesn't the simplicity of a document
database become your worse enemy,
since it creates brittle data
structures which can't easily be
modified?
The simplicity can bite you, but it's very easy to update and change at a later time. You can script changes via Javascript and run them via the Mongo shell.
My recent answer for this question covers this in the RavenDb context:
How would I model data that is heirarchal and relational in a document-oriented database system like RavenDB?