MongoDB match the most array elements - mongodb

I have a usecase where I'm not sure if it can be solved with MongoDB in any reasonably efficient way.
The DB contains Consultants, consultants have a set of available weeks (array of week numbers).
I now want to filter on the consultants with the best matching overlap of a given set of weeks.
e.g. consultants:
{
_id: ....
name: "James",
weeks: [1,2,3,4,8,9,13]
}
{
_id: ....
name: "Anna",
weeks: [2,3,4,20,23]
}
Search data: [1,2,4]
The more the overlap, the higher I want to rank the consultant in the search result.
James matches all three entries, 1,2,4. Anna matches 2,4
Is this even possible using Mongo?

You can calculate a weight for each consultant as a setIntersection between your search array and weeks array:
db.consultants.aggregate([
{
$addFields: {
weight: {
$size: { $setIntersection: [ "$weeks", [1,2,4] ] }
}
}
},
{ $sort: { weight: -1 } }
])
The longest the array the more weeks matched so you can $sort by this weight field.

Related

NOSQL MONGODB How to select data (countries) that only has code with 3 characters?

SQL How to select data (countries) that only has code with 3 characters?
My issue is the same as above link, but just that i have to do it in nosql.
I have to extract all the list of countries. however in the datatable under country column (2nd column), it includes continent as well, which is not what i want.
I realised those with the first column (iso_code) with only 3 characters are data with countries, while those with more than 3 characters are continents/non country. How do i go about extracting this?
my code in extracting everything:
db.owid_energy_data.distinct("country")
Use $strLenCP to compute the length of iso_code. $match with 3 to get the documents needed. $group to find distinct countries.
db.collection.aggregate([
{
"$match": {
$expr: {
$eq: [
3,
{
"$strLenCP": "$iso_code"
}
]
}
}
},
{
$group: {
_id: "$country"
}
}
])
Mongo Playground

Mongo filter documents by array of objects

I have to filter candidate documents by an array of objects.
In the documents I have the following fields:
skills = [
{ _id: 'blablabla', skill: 'Angular', level: 3 },
{ _id: 'blablabla', skill: 'React', level: 2 },
{ _id: 'blablabla', skill: 'Vue', level: 4 },
];
When I make the request I get other array of skills, for example:
skills = [
{ skill: 'React', level: 2 },
];
So I need to build a query to get the documents that contains this skill and a greater or equal level.
I try doing the following:
const conditions = {
$elemMatch: {
skill: { $in: skills.map(item => item.skill) },
level: { $gte: { $in: skills.map(item => item.level) } }
}
};
Candidate.find(conditions)...
The first one seems like works but the second one doesn't work.
Any idea?
Thank you in advance!
There are so many problems with this query...
First of all item.tech - it had to be item.skill.
Next, $gte ... $in makes very little sense. $gte means >=, greater or equal than something. If you compare numbers, the "something" must be a number. Like 3 >= 5 resolves to false, and 3 >= 1 resolves to true. 3 >= [1,2,3,4,5] makes no sense since it resolves to true to the first 3 elements, and to false to the last 2.
Finally, $elemMatch doesn't work this way. It tests each element of the array for all conditions to match. What you was trying to write was like : find a document where skills array has a subdocument with skill matching at least one of [array of skills] and level is greater than ... something. Even if the $gte condition was correct, the combination of $elementMatch and $in inside doesen't do any better than regular $in:
{
skill: { $in: skills.map(item => item.tech) },
level: { $gte: ??? }
}
If you want to find candidates with tech skills of particular level or higher, it should be $or condition for each skill-level pair:
const conditions = {$or:
skills.map(s=>(
{skill: { $elemMatch: {
skill:s.skill,
level:{ $gte:s.level }
} } }
))
};

Mongo get results in order of maximum matches of value

I've a Document like this:
{
{
name: 'The best book'
},
{
name: 'The book is the best on Sachin'
},
{
name: 'Best book on Sachin Tendulkar'
}
}
I've search regex mongo query:
db.getCollection('books').find({ $in: [/sachin/i, /tendulkar/i, /best/i, /book/i]})
It's giving results, but as per my requirement it should give results in sorted order of maximum matches:
{
name: 'Best book on Sachin Tendulkar' (4 matches)
},
{
name: 'The book is the best on Sachin' (3 matches)
},
{
name: 'The best book' (2 matches)
}
I'm new to mongo. Please help me in writing the mongo query for getting the results.
Your best bet may be to use the aggregation framework (https://docs.mongodb.com/v3.2/reference/operator/aggregation/) in this case.
I'd do it like this.
Split text into an array of words
Intersect the array of tags you want to match with the array produced in step 1.
Project the size of the intersection into a field
Sort by the field projected in step 3.
Something along these lines
db.books.aggregate([
{$match: {}},
{$project: {
name: {$toLower: "$name"},
... any other amount of fields ...
}},
{$project: {
name: true,
... any other amount of fields ...
wordArray: {$split: ["$name", " "]}
}},
{$project: {
name: true,
... any other amount of fields ...
wordArray: true,
numberOfMatches: {
$size: {
$setIntersection: ["$wordArray", ["best", "book"]]
}
}
}},
{$sort: {
numberOfMatches: -1
}}
]);
Keep in mind that you can put a condition where $match: {} is, and filter the initial set of books you're classifying.
I'm not sure if this works with regular expressions though, so I added the first $project phase as a way to ensure you're always comparing lowercase to lowercase

Mongoose: Score query then sort by score - non text fields

In my db, I have a collection of books.
Each have:
a count of upvotes
a count of downvotes
a count of views
I would like to sort my db by scoring as follows:
upvote: 8 points
downvote: -4 points
view: 1/2 point
So the score will be:
(NumberOfViews*(1/2)) + (NumberOfDownvotes*-4)+ (NumberOfUpvotes*8)
So if I have:
book1 = {name:'book1', views:3000,upvotes:340, downvotes:120}
book2 = {name:'book2', views:9000,upvotes:210, downvotes:620}
book3 = {name:'book3', views:7000,upvotes:6010, downvotes:2}
The score should be:
book1Score = 3740
book2Score = 3700
book3Score = 51572
And the query should output
book3,book1,book2
How can I achieve such a thing in mongoose?
Bonus: What if I want records that are more recent to rank higher than older records on that same query?
Thanks
Well I ended up doing it all inside mongoose.
I run this query every 24 hours to re-score my collection.
Book.aggregate(
[
//I match my query
{$match:query},
{
$project: {
//take the id for reference
_id: 1,
//calculate the score of the views
viewScore: {
$multiply: [ "$views", 0.5 ]
},
//calculate the score of the upvotes
upvoteScore: {
$multiply: [ {$size: '$upvotes'}, 8 ]
},
//calculate the score of the downvotes
downvoteScore: {
$multiply: [ {$size: '$downvotes'}, -4 ]
}
}
},
{
//project a second time
$project: {
//take my id for reference
_id: 1,
//get my total score
score: {
$add:['$viewScore','$upvoteScore','$downvoteScore']
},
}
},
//sort by the score.
{$sort : {'score' : -1}},
]
)
I think the best way would be to query mongoose for the list of book then do the sorting yourself.
Something like:
// Get query results from mongoose then ...
books.sort((a,b) => {
return ((a.views*(1/2))+(a.downvotes*-4)+(a.upvotes*8))-((b.view*(1/2))+ b.downvotes*-4)+(b.upvotes*8))
});
This would sort the books in ascending order of highest points
EDIT: The above answer is for sorting after you've received the query. (And also just realized you want descending for above^ so just switch the placement to be b - a)
If you want to receive the query already sorted, you could instead calculate the score at the time you input the book and add that as a field. The use mongoose's Query#sort. Which would look something like
query.sort({ score: 'desc'});
More info on Query#sort: http://mongoosejs.com/docs/api.html#query_Query-sort

Filter large dataset base on aggregation result

I need to do sort of an "Advanced Search" functionality with MongoDB. It's a sport system, where player statistic are collected for each season like this:
{
player: {
id: int,
name: string
},
goals: int,
season: int
}
Uses can search data across season, for example: I want to search for player who scored > 30 goals from season 2012 - 2016.
I could use mongodb aggregation:
db.stats.aggregate( [
{ $match: { season: { $gte: 2014, $lte: 2016 } } }
{ $group: { _id: "$player", totalGoals: { $sum: "$goals" } } },
{ $match: { $totalGoals: { $gte: 30 } } },
{ $limit: 10 },
{ $skip: 0 }
] )
That's working fine, the speed is acceptable for the collections with more than 3 millions records.
However, if the user just want to search for a larger seasons range, let say: players lifetime statistic. The aggregation turns out to be very very very slow. And I understand that MongoDB has to go through all the docs and calculate the $totalGoals.
I just wonder if there is better approach that could solve this performance problem?
you can have pre-calculated data for past seasons and make two step query:
a) get past data
b) get current data
you could try to optimise indexes on that query
hardware: use SSD
hardware: more memory
introduce sharding to split load