Limit Document Insertion for Embedded Mongo Document

Limit Document Insertion for Embedded Mongo Document - mongodb

I am curious find out if I can create/enforce some limitations on a mongoDB document. I want to limit MongoDB embedded documents to a certain amount of records (10). I am creating a password check system that will query Mongo and check to see if the user's password is either a) like their current password, or b) matches one of their 10 oldest passwords. If there is no match, then the DB will be updated with the newest password and the old passwords document would be updated with the last current password. However, I want to limit this to 10 records, and over-write the oldest record so there are only ever 10 passwords in the oldPassword document.
Does this make sense? And is it possible to enforce such a limit? The mock object would look like the following:
_id: "",
username: "User",
currentPassword: "pass"
oldPasswords:{
password1: "pass1",
password2: "pass2",
password3: "pass3",
password4: "pass4",
password5: "pass5",
password6: "pass6",
password7: "pass7",
password8: "pass8",
password9: "pass9",
}
As a sidebar: Is this is the best way to handle the passwords in Mongo? I have read their modeling documents and it appears that a 1 to many relationship like this would be best in an embedded document, unless the embedded document continues to grow. Then, at that point, it seems that referencing the old passwords would be best served in its own document.
Any help would be greatly appreciated!

If you can switch old passwords to an array instead of an object, you can use slice.
db.passwords.update(
{ _id: 1 },
{
$push: {
oldpasswords: {
$each: ["passabc"],
$slice: -10
}
}
}
)
That should keep the last 10 passwords on your array.

Related

Querying MongoDB collection with heterogeneous schema efficiently

I'm developing a web application with NodeJS, MongoDB and Mongoose. It is intended to act as an interface between the user and a big data environment. The idea is that the users can execute the big data processes in a separated cluster, and the results are stored in a MongoDB collection Results. This collection may store more than 1 million of documents per user.
The document schema of this collection can be completely different between users. For instance, we have user1 and user2. Examples of document in the Resultscollection for user1 and user2:
{
user: ObjectId(user1):, // reference to user1 in the Users collection
inputFields: {variable1: 3, ...},
outputFields: { result1: 504.75 , ...}
}
{
user: ObjectId(user2):,
inputFields: {country: US, ...},
outputFields: { cost: 14354.45, ...}
}
I'm implementing a search engine in the web application so that each user can filter in the fields according to the schemas of their documents (for example, user1 must me able to filter by inputFields.variable1, and user2 by outputFields.cost). Of course I know that I must use indexes, otherwise the queries are so slow.
My first attempt was to create an index for each different field in the Results collection, but it's quite inefficient, since the database server becomes unstable because of the size of the indexes. So my second attempt was to try to reduce the amount of indexes by using partial indexes, so that I create indexes specifying the user id in the option partialFilterExpression.
The problem is that if another user has the same schema in the Results collection as any other user and I try to create the indexes for this user, MongoDB throws this exception:
Index with pattern: { inputFields.country: 1 } already exists with different options
It happens because the partial indexes cannot index the same fields even though the partialFilterExpression is different.
So my questions are: How could I allow the users to query their results efficiently in this environmnet? Is MongoDB really suitable for this use case?
Thanks

Mongodb find where exists in other collection

I have a user collection and a document one, where entries have a owner field that is a reference to the user ObjectId(_id).
I'm amazed I can't find all users that have at least a document...
I tried:
db.getCollection('user').find({_id: {
$in: db.getCollection('document').find({}).map(function(f) {return f.owner}).distinct()
}});
but it won't work and anyway it really feels like not the correct way to do this since all documents need to be loaded in memory.
I tried to use http://www.querymongo.com but it really did not help.
Thanks

mongodb upsert with conditional field update

I have a script that populates a mongo db from daily server log files. Log files come from a number of servers so the chronological order of the data is not guaranteed. To make this simple, let's say that the document schema is this:
{
_id: <username>,
first_seen: <date>,
last_seen: <date>,
most_recent_ip: <string>
}
that is, documents are indexed by the name of the user who accessed the server. For each user, we keep track of the first time the user was seen and the ip from the last visit.
Right now I handle this very inefficiently: first try an insert. If it fails, retrieve a record by _id, then calculate updated values (e.g. first_seen and most_recent_up), and finally update the record. This is 3 db calls per log entry, which makes the script's running time prohibitively long given the very high volume of data.
I'm wondering if I can replace this with an upsert instead. I can see how to handle first/last_seen: probably something like {$min: {'first_seen': <log_entry_date>}} (hope this works correctly when inserting a new doc). But how do I set most_recent_ip to the new value only when <log_entry_date> > $last_seen.
Is there generally a preferred pattern for my use case?

You can just use $set to set the most_recent_ip, e.g.
db.logs.update(
{_id:"user1"},
{$set:{most_recent_ip:"2.2.2.2"}, $min:{first_seen:new Date()}, $max:{last_seen:new Date()}},
{upsert: true}
)

query too large issue with mongodb

let's say we have a collection of users and each user is followed by another user. if I want to find the users that are NOT following me, I need to do something like:
db.users.find({_id: { $nin : followers_ids } } ) ;
if the amount of followers_ids is huge, let's say 100k users, mongodb will start saying the query is too large, plus sending a big amount of data over the network to make the query is not good neither. what are the best practices to accomplish this query without sending all this ids over the network ?.

I recommend that you limit the number of query Results to Reduce Network Demand. According to the Docs,
MongoDB cursors return results in groups of multiple documents. If you know the number of results you want, you can reduce the demand on network resources by issuing the limit() method.
This is typically used in conjunction with sort operations. For
example, if you need only 50 results from your query to the users
collection, you would issue the following command:
db.users.find({$nin : followers_ids}).sort( { timestamp : -1 } ).limit(50)
You can then use the cursor to get retrieve more user documents as needed.
Recommendation to Restructure Followers Schema
I would recommend that you restructure your user documents if the followers will grow to a large amount. Currently user schema may be as such:
{
_id: ObjectId("123"),
username: "jobs",
email: "stevej#apple.com",
followers: [
ObjectId("12345"),
ObjectId("12375"),
ObjectId("12395"),
]
}
The good thing about the schema is whenever this user does anything all of the users you need to notify is right here inside of the document. The downside is that if you needed to find everyone a user is following you will have to query the entire users collection. Also your user document will become larger and more volatile as the followers grow.
You may want to further normalize your followers. You can keep a collection that matches followee to followers with documents that look like this:
{
_id: ObjectId("123"),//Followee's "_id"
followers: [
ObjectId("12345"),
ObjectId("12375"),
ObjectId("12395"),
]
}
This will keep your user documents slender, but will take an extra query to get the followers. As the "followers" array changes in size, you can enable the userPowerOf2Sizes allocation strategy to reduce fragmentation and moves.

MongoDB - Query embbeded documents

I've a collection named Events. Each Eventdocument have a collection of Participants as embbeded documents.
Now is my question.. is there a way to query an Event and get all Participants thats ex. Age > 18?

When you query a collection in MongoDB, by default it returns the entire document which matches the query. You could slice it and retrieve a single subdocument if you want.
If all you want is the Participants who are older than 18, it would probably be best to do one of two things:
Store them in a subdocument inside of the event document called "Over18" or something. Insert them into that document (and possibly the other if you want) and then when you query the collection, you can instruct the database to only return the "Over18" subdocument. The downside to this is that you store your participants in two different subdocuments and you will have to figure out their age before inserting. This may or may not be feasible depending on your application. If you need to be able to check on arbitrary ages (i.e. sometimes its 18 but sometimes its 21 or 25, etc) then this will not work.
Query the collection and retreive the Participants subdocument and then filter it in your application code. Despite what some people may believe, this isnt terrible because you dont want your database to be doing too much work all the time. Offloading the computations to your application could actually benefit your database because it now can spend more time querying and less time filtering. It leads to better scalability in the long run.

Short answer: no. I tried to do the same a couple of months back, but mongoDB does not support it (at least in version <= 1.8). The same question has been asked in their Google Group for sure. You can either store the participants as a separate collection or get the whole documents and then filter them on the client. Far from ideal, I know. I'm still trying to figure out the best way around this limitation.

For future reference: This will be possible in MongoDB 2.2 using the new aggregation framework, by aggregating like this:
db.events.aggregate(
{ $unwind: '$participants' },
{ $match: {'age': {$gte: 18}}},
{ $project: {participants: 1}
)
This will return a list of n documents where n is the number of participants > 18 where each entry looks like this (note that the "participants" array field now holds a single entry instead):
{
_id: objectIdOfTheEvent,
participants: { firstName: 'only one', lastName: 'participant'}
}
It could probably even be flattened on the server to return a list of participants. See the officcial documentation for more information.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse