Mongo get results in order of maximum matches of value - mongodb

I've a Document like this:
{
{
name: 'The best book'
},
{
name: 'The book is the best on Sachin'
},
{
name: 'Best book on Sachin Tendulkar'
}
}
I've search regex mongo query:
db.getCollection('books').find({ $in: [/sachin/i, /tendulkar/i, /best/i, /book/i]})
It's giving results, but as per my requirement it should give results in sorted order of maximum matches:
{
name: 'Best book on Sachin Tendulkar' (4 matches)
},
{
name: 'The book is the best on Sachin' (3 matches)
},
{
name: 'The best book' (2 matches)
}
I'm new to mongo. Please help me in writing the mongo query for getting the results.

Your best bet may be to use the aggregation framework (https://docs.mongodb.com/v3.2/reference/operator/aggregation/) in this case.
I'd do it like this.
Split text into an array of words
Intersect the array of tags you want to match with the array produced in step 1.
Project the size of the intersection into a field
Sort by the field projected in step 3.
Something along these lines
db.books.aggregate([
{$match: {}},
{$project: {
name: {$toLower: "$name"},
... any other amount of fields ...
}},
{$project: {
name: true,
... any other amount of fields ...
wordArray: {$split: ["$name", " "]}
}},
{$project: {
name: true,
... any other amount of fields ...
wordArray: true,
numberOfMatches: {
$size: {
$setIntersection: ["$wordArray", ["best", "book"]]
}
}
}},
{$sort: {
numberOfMatches: -1
}}
]);
Keep in mind that you can put a condition where $match: {} is, and filter the initial set of books you're classifying.
I'm not sure if this works with regular expressions though, so I added the first $project phase as a way to ensure you're always comparing lowercase to lowercase

Related

Merge Names From Data For Message Application

Hello guys I'm writing a Message Application with Node.js and Mongoose. I keep datas in mongodb like that:
I want to list users who messaged before so I need to filter my 'Messages' collection but I can't do what exactly I want. If he sent a message to a person I need to take persons name but, if he take a message from a person I need to take persons name however in first situation person name in reciever, in second situation person name in sender. I made a table for explain more easily. I have left table and I need 3 name like second table.(Need to eliminate one John's name)
Sorry, if this problem asked before but I don't know how can I search this problem.
I tried this but it take user name who logged in and duplicate some names.
Message.find({$or: [{sender: req.user.username}, {reciever: req.user.username}]})
One option is to use an aggregation pipeline to create two sets and simply union them:
db.collection.aggregate([
{$match: {$or: [{sender: req.user.username}, {reciever: req.user.username}]}},
{$group: {
_id: 0,
recievers: {$addToSet: "$reciever"},
senders: {$addToSet: "$sender"}
}},
{$project: {
_id: req.user.username,
previousChats: {"$setDifference":
[
{$setUnion: ["$recievers", "$senders"]},
[req.user.username]
]
}
}}
])
See how it works on the playground example
This is a tricky one, but can be solved with a fairly simple aggregation pipeline.
Explanation
On our first stage of the pipeline, we will want to get all the messages sent or received by the user (in our case David), for that we will use a $match stage:
{
$match: {
$or: [
{sender: 'David'},
{receiver: 'David'}
]
}
}
After we found all the messages from or to David, we can start collecting the people he talks to, for that we will use a $group stage and use 2 operations that will help us to achieve this:
$addToSet - This will add all the names to a set. Sets only contain one instance of the same value and ignore any other instance trying to be added to the set of the same value.
$cond - This will be used to add either the receiver or the sender, depending which one of them is David.
The stage will look like this:
{
$group: {
_id: null,
chats: {$addToSet: {$cond: {
if: {$eq: ['$sender', 'David']},
then: '$receiver',
else: '$sender'
}}}
}
}
Combining these 2 stages together will give us the expected result, one document looking like this:
{
"_id": null, // We don't care about this
"chats": [
"John",
"James",
"Daniel"
]
}
Final Solution
Message.aggregate([{
$match: {
$or: [
{
sender: req.user.username
},
{
receiver: req.user.username
}
]
}
}, {
$group: {
_id: null,
chats: {
$addToSet: {
$cond: {
'if': {
$eq: [
'$sender',
req.user.username
]
},
then: '$receiver',
'else': '$sender'
}
}
}
}
}])
Sources
Aggregation
$match aggregation stage
$group aggregation stage
$addToSet operation
$cond operation

Find a value in multiple nested fields with the same name in MongoDB

In some documents I have a property with a complex structure like:
{
content: {
foo: {
id: 1,
name: 'First',
active: true
},
bar: {
id: 2,
name: 'Second',
active: false
},
baz: {
id: 3,
name: 'Third',
active: true
},
}
I'm trying to make a query that can find all documents with a given value in the field name across the different second level objects foo, bar, baz
I guess that a solution could be:
db.getCollection('mycollection').find({ $or: [
{'content.foo.name': 'First'},
{'content.bar.name': 'First'},
{'content.baz.name': 'First'}
]})
But a I want to do it dynamic, with no need to specify key names of nested fields, nether repeat the value to find in every line.
If some Regexp on field name were available , a solution could be:
db.getCollection('mycollection').find({'content.*.name': 'First'}) // Match
db.getCollection('mycollection').find({'content.*.name': 'Third'}) // Match
db.getCollection('mycollection').find({'content.*.name': 'Fourth'}) // Doesn't match
Is there any way to do it?
I would say this is a bad schema if you don't know your keys in advance. Personally I'd recommend to change this to an array structure.
Regardless what you can do is use the aggregation $objectToArray operator, then query that newly created object. Mind you this approach requires a collection scan each time you execute a query.
db.collection.aggregate([
{
$addFields: {
arr: {
"$objectToArray": "$content"
}
}
},
{
$match: {
"arr.v.name": "First"
}
},
{
$project: {
arr: 0
}
}
])
Mongo Playground
Another hacky approach you can take is potentially creating a wildcard text index and then you could execute a $text query to search for the name, obviously this comes with the text index/query limitations and might not be right for your usecase.

Mongo filter documents by array of objects

I have to filter candidate documents by an array of objects.
In the documents I have the following fields:
skills = [
{ _id: 'blablabla', skill: 'Angular', level: 3 },
{ _id: 'blablabla', skill: 'React', level: 2 },
{ _id: 'blablabla', skill: 'Vue', level: 4 },
];
When I make the request I get other array of skills, for example:
skills = [
{ skill: 'React', level: 2 },
];
So I need to build a query to get the documents that contains this skill and a greater or equal level.
I try doing the following:
const conditions = {
$elemMatch: {
skill: { $in: skills.map(item => item.skill) },
level: { $gte: { $in: skills.map(item => item.level) } }
}
};
Candidate.find(conditions)...
The first one seems like works but the second one doesn't work.
Any idea?
Thank you in advance!
There are so many problems with this query...
First of all item.tech - it had to be item.skill.
Next, $gte ... $in makes very little sense. $gte means >=, greater or equal than something. If you compare numbers, the "something" must be a number. Like 3 >= 5 resolves to false, and 3 >= 1 resolves to true. 3 >= [1,2,3,4,5] makes no sense since it resolves to true to the first 3 elements, and to false to the last 2.
Finally, $elemMatch doesn't work this way. It tests each element of the array for all conditions to match. What you was trying to write was like : find a document where skills array has a subdocument with skill matching at least one of [array of skills] and level is greater than ... something. Even if the $gte condition was correct, the combination of $elementMatch and $in inside doesen't do any better than regular $in:
{
skill: { $in: skills.map(item => item.tech) },
level: { $gte: ??? }
}
If you want to find candidates with tech skills of particular level or higher, it should be $or condition for each skill-level pair:
const conditions = {$or:
skills.map(s=>(
{skill: { $elemMatch: {
skill:s.skill,
level:{ $gte:s.level }
} } }
))
};

MongoDB query to find property of first element of array

I have the following data in MongoDB (simplified for what is necessary to my question).
{
_id: 0,
actions: [
{
type: "insert",
data: "abc, quite possibly very very large"
}
]
}
{
_id: 1,
actions: [
{
type: "update",
data: "def"
},{
type: "delete",
data: "ghi"
}
]
}
What I would like is to find the first action type for each document, e.g.
{_id:0, first_action_type:"insert"}
{_id:1, first_action_type:"update"}
(It's fine if the data structured differently, but I need those values present, somehow.)
EDIT: I've tried db.collection.find({}, {'actions.action_type':1}), but obviously that returns all elements of the actions array.
NoSQL is quite new to me. Before, I would have stored all this in two tables in a relational database and done something like SELECT id, (SELECT type FROM action WHERE document_id = d.id ORDER BY seq LIMIT 1) action_type FROM document d.
You can use $slice operator in projection. (but for what you do i am not sure that the order of the array remain the same when you update it. Just to keep in mind))
db.collection.find({},{'actions':{$slice:1},'actions.type':1})
You can also use the Aggregation Pipeline introduced in version 2.2:
db.collection.aggregate([
{ $unwind: '$actions' },
{ $group: { _id: "$_id", first_action_type: { $first: "$actions.type" } } }
])
Using the $arrayElemAt operator is actually the most elegant way, although the syntax may be unintuitive:
db.collection.aggregate([
{ $project: {first_action_type: {$arrayElemAt: ["$actions.type", 0]}
])

Matching for latest documents for a unique set of fields before aggregating

Assuming I have the following document structures:
> db.logs.find()
{
'id': ObjectId("50ad8d451d41c8fc58000003")
'name': 'Sample Log 1',
'uploaded_at: ISODate("2013-03-14T01:00:00+01:00"),
'case_id: '50ad8d451d41c8fc58000099',
'tag_doc': {
'group_x: ['TAG-1','TAG-2'],
'group_y': ['XYZ']
}
},
{
'id': ObjectId("50ad8d451d41c8fc58000004")
'name': 'Sample Log 2',
'uploaded_at: ISODate("2013-03-15T01:00:00+01:00"),
'case_id: '50ad8d451d41c8fc58000099'
'tag_doc': {
'group_x: ['TAG-1'],
'group_y': ['XYZ']
}
}
> db.cases.findOne()
{
'id': ObjectId("50ad8d451d41c8fc58000099")
'name': 'Sample Case 1'
}
Is there a way to perform a $match in aggregation framework that will retrieve only all the latest Log for each unique combination of case_id and group_x? I am sure this can be done with multiple $group pipeline but as much as possible, I want to immediately limit the number of documents that will pass through the pipeline via the $match operator. I am thinking of something like the $max operator except it is used in $match.
Any help is very much appreciated.
Edit:
So far, I can come up with the following:
db.logs.aggregate(
{$match: {...}}, // some match filters here
{$project: {tag:'$tag_doc.group_x', case:'$case_id', latest:{uploaded_at:1}}},
{$unwind: '$tag'},
{$group: {_id:{tag:'$tag', case:'$case'}, latest: {$max:'$latest'}}},
{$group: {_id:'$_id.tag', total:{$sum:1}}}
)
As I mentioned, what I want can be done with multiple $group pipeline but this proves to be costly when handling large number of documents. That is why, I wanted to limit the documents as early as possible.
Edit:
I still haven't come up with a good solution so I am thinking if the document structure itself is not optimized for my use-case. Do I have to update the fields to support what I want to achieve? Suggestions very much appreciated.
Edit:
I am actually looking for an implementation in mongodb similar to the one expected in How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL? except it involves two distinct field values. Also, the $match operation is crucial because it makes the resulting set dynamic, with filters ranging to matching tags or within a range of dates.
Edit:
Due to the complexity of my use-case I tried to use a simple analogy but this proves to be confusing. Above is now the simplified form of the actual use case. Sorry for the confusion I created.
I have done something similar. But it's not possible with match, but only with one group pipeline. The trick is do use multi key with correct sorting:
{ user_id: 1, address: "xyz", date_sent: ISODate("2013-03-14T01:00:00+01:00"), message: "test" }, { user_id: 1, address: "xyz2", date_sent: ISODate("2013-03-14T01:00:00+01:00"), message: "test" }
if i wan't to group on user_id & address and i wan't the message with the latest date we need to create a key like this:
{ user_id:1, address:1, date_sent:-1 }
then you are able to perform aggregate without sort, which is much faster and will work on shards with replicas. if you don't have a key with correct sort order you can add a sort pipeline, but then you can't use it with shards, because all that is transferred to mongos and grouping is done their (also will get memory limit problems)
db.user_messages.aggregate(
{ $match: { user_id:1 } },
{ $group: {
_id: "$address",
count: { $sum : 1 },
date_sent: { $max : "$date_sent" },
message: { $first : "$message" },
} }
);
It's not documented that it should work like this - but it does. We use it on production system.
I'd use another collection to 'create' the search results on the fly - as new posts are posted - by upserting a document in this new collection every time a new blog post is posted.
Every new combination of author/tags is added as a new document in this collection, whereas a new post with an existing combination just updates an existing document with the content (or object ID reference) of the new blog post.
Example:
db.searchResult.update(
... {'author_id':'50ad8d451d41c8fc58000099', 'tag_doc.tags': ["TAG-1", "TAG-2" ]},
... { $set: { 'Referenceid':ObjectId("5152bc79e8bf3bc79a5a1dd8")}}, // or embed your blog post here
... {upsert:true}
)
Hmmm, there is no good way of doing this optimally in such a manner that you only need to pick out the latest of each author, instead you will need to pick out all documents, sorted, and then group on author:
db.posts.aggregate([
{$sort: {created_at:-1}},
{$group: {_id: '$author_id', tags: {$first: '$tag_doc.tags'}}},
{$unwind: '$tags'},
{$group: {_id: {author: '$_id', tag: '$tags'}}}
]);
As you said this is not optimal however, it is all I have come up with.
If I am honest, if you need to perform this query often it might actually be better to pre-aggregate another collection that already contains the information you need in the form of:
{
_id: {},
author: {},
tag: 'something',
created_at: ISODate(),
post_id: {}
}
And each time you create a new post you seek out all documents in this unqiue collection which fullfill a $in query of what you need and then update/upsert created_at and post_id to that collection. This would be more optimal.
Here you go:
db.logs.aggregate(
{"$sort" : { "uploaded_at" : -1 } },
{"$match" : { ... } },
{"$unwind" : "$tag_doc.group_x" },
{"$group" : { "_id" : { "case" :'$case_id', tag:'$tag_doc.group_x'},
"latest" : { "$first" : "$uploaded_at"},
"Name" : { "$first" : "$Name" },
"tag_doc" : { "$first" : "$tag_doc"}
}
}
);
You want to avoid $max when you can $sort and take $first especially if you have an index on uploaded_at which would allow you to avoid any in memory sorts and reduce the pipeline processing costs significantly. Obviously if you have other "data" fields you would add them along with (or instead of) "Name" and "tag_doc".