MongoDB - Match multiple values in array - mongodb

I want to be able to find multiple documents that have three or more matching values in an array. Let's say we the following documents:
[{
name: 'John',
cars: [1, 2, 3, 4]
},
{
name: 'Jane',
cars: [1, 2, 3, 8]
},
{
name: 'Smith',
cars: [1, 8, 10]
}]
And we want to find documents that have at least three of the values (in cars) in the following array:
[1, 2, 3, 4, 5, 6, 7]
The results would then be:
[{
name: 'John',
cars: [1, 2, 3, 4]
},
{
name: 'Jane',
cars: [1, 2, 3, 8]
}]
Anyone know how to achieve this?

You can have a $in query issued and then by code filter the record having 3 or more entries in the desired array. (Here is some samle python code)
def dennisQuestion():
permissibleCars = [1,2,3,4,5,6,7]
cursor = db.collection.find({"cars": {"$in": permissibleCars}})
for record in cursor:
if len(set(permissible) & set(record["cars"]))) >= 3
yield record

This is a good question, and I don't think there's a simple way to do it with the usual operators that MongoDB gives you. However I can think of the following methods to achieve this:
1. New Field
Calculate this in app code and maintain the result in a new field on the document.
2. Brute Force
db.Collection.find( { $or: [
{ cars: $all [ 1, 2, 3 ] },
{ cars: $all [ 2, 3, 4 ] },
... list out all 35 combinations
] } )
3. Use $where
db.Collection.find( { cars: { $in: [1,2,3,4,5,6,7] }, $where: function() {
var numMatches = 0;
for (var i = 1; i <= 7; i++)
if (this.cars.indexOf(i) > -1) numMatches++;
return numMatches >= 3;
} } );

I had to slightly modify #Zaid Masud option 3 when the values where strings in Mongo 4.0.3:
db.Collection.find( { cars: { $in: ["s1", "s2", "s3" , "s4", "s5" , "s6" , "s7"] },
$where: function() {
var options = ["s1", "s2", "s3" , "s4", "s5" , "s6" , "s7"];
var numMatches = 0;
for (var i = 0; i < 7; i++)
if (this.cars.indexOf(options[i]) > -1)
numMatches++;
return numMatches >= 3;
}
} );
(This seemed a bit large for a comment)

For Mongo v4.4.1 this query works
[
{
$project: {
name: 1,
cars: 1,
show: {
$let: {
vars: {
"b": {
$gte: [{$size: {$setIntersection: [ [1,2,3,4,5,6,7],"$cars"]}},3]
}
},
in: "$$b"
}
}
}
},
{
$match: {
show: true,
}
},
{
$project: {
show: 0
}
}
]

Related

How to remove duplicated documents in MongoDB 4.0

Lets say I have the next documents in the colecction sample:
{_id: 1, comp_index1: "one", comp_index2: "AAA", field: "lots of text" }
{_id: 2, comp_index1: "two", comp_index2: "BBB", field: "mucho texto" }
{_id: 3, comp_index1: "one", comp_index2: "CCC", field: "more text" }
{_id: 4, comp_index1: "two", comp_index2: "AAA", field: "más texto" }
{_id: 5, comp_index1: "one", comp_index2: "AAA", field: "lots of text" }
I want to make comp_index1 and comp_index2 an actual unique compound index.
If I run db.sample.createIndex( { comp_index1: 1, comp_index2: 1}, { unique: true } ) it will throw E11000 duplicate key error collection so I decided to remove duplicates first (due the removal of dropDups option).
Right now I have this brute force algorithm that does the job:
db.sample.aggregate([
{
$group: {
_id: {
comp_index1: "$comp_index1",
comp_index2: "$comp_index2"
},
count: { $sum: 1 }
}
},
{
$match: { count: { $gt: 1 } }
}
], { allowDiskUse: true }).forEach(function (doc) {
for (i = 1; i < doc.count; i++) {
db.sample.remove({
comp_index1: doc._id.comp_index1,
comp_index2: doc._id.comp_index2
},
{
justOne: true
});
}
print("Removed " + (i-1) + " dups of <" + doc._id.comp_index1 + " " + doc._id.comp_index2 + ">")
})
The problem is that I have over 1.4 M documents and there are almost 200 000 dups, so this takes forever to be done, so I was wondering if there is a faster better approach.
After several hours I finally managed to come with a 1000 times faster solution.
var ids = [];
db.sample.aggregate([
{
$group: {
_id: {
comp_index1: "$comp_index1",
comp_index2: "$comp_index2"
},
unique_ids: { $addToSet: "$_id" }
count: { $sum: 1 }
}
},
{
$match: { count: { $gt: 1 } }
}
], { allowDiskUse: true }).forEach(function (doc) {
var i = 0;
doc.unique_ids.forEach(function (id) {
if (i++ > 0) ids.push(id);
})
})
db.sample.remove({"_id": {$in: ids}});
Despite being overall the same approach, saving in RAM all the ids to remove and then performing remove with the operator $in is way way faster. This one was took only few seconds to execute.
If you come up with another solution that does not requires using RAM, please share.
I recently create a code to delete duplicated documents from MongoDB, this should work:
const query = [
{
$group: {
_id: {
comp_index1: "$comp_index1",
comp_index2: "$comp_index2"
},
dups: {
$addToSet: "$_id",
},
count: {
$sum: 1,
},
},
},
{
$match: {
count: {
$gt: 1,
},
},
},
];
const cursor = collection.aggregate(query).cursor({ batchSize: 10 }).exec();
cursor.eachAsync((doc, i) => {
doc.dups.shift(); // First element skipped for deleting
doc.dups.map(async (dupId) => {
await collection.findByIdAndDelete({ _id: dupId });
});
});

Update fields of array subdocuments with one request

I have the following document in a collection:
{
_id: 'test',
values: [
{ foo: 1, bar: [<very big array>] },
{ foo: 4, bar: [<very big array>] },
{ foo: 3, bar: [<very big array>] }
]
}
I want to update all values[].foo values at once with a pre-calculated array. For performances reasons, I don't want to read the values[].bar arrays and since values can contains many elements, I'm searching a way to do it with only one request (if possible).
For example, I want to write something like this:
db.collection.updateOne({ _id: 'test' }, { $set: { 'values[].foo': [2, 3, 4] }});
And the result would be the following:
{
_id: 'test',
values: [
{ foo: 2, bar: [<very big array>] },
{ foo: 3, bar: [<very big array>] },
{ foo: 4, bar: [<very big array>] }
]
}
But I don't know how i must write my update request.
I'm using MongoDB 4.0 and i don't have access to 4.2 features.
Starting from v4.2 you can benefit from updates with aggregation pipeline.
It gives you ability to calculate new array using $zip and $map:
db.collection.updateOne(
{_id: "test"},
[{$set: {
values: {
$map: {
input: {
$zip: {
inputs: [
"$values.bar",
[2,3,4]
]
}
},
as: "item",
in: {
foo: {
$arrayElemAt: [
"$$item",
1
]
},
bar: {
$arrayElemAt: [
"$$item",
0
]
}
}
}
}
}}]
)
Be sure size of values is the same as size of update array.

Sum reversed Arrays on Mongodb

Supposing I have the following situation on mongodb:
{
"_id" : 654321,
"first_name" : "John",
"demand" : [1, 20, 4, 10 ],
"group" : [1, 2]
}
{
"_id" : 654321,
"first_name" : "Bert",
"demand" : [4, 10 ],
"group" : [1, 3]
}
1 - Is it possible to groupby based on the first index of "group" array ([1]) ?
2- Is it possible to reverse the index order, and sum those demand arrays vertically ?
Desired output:
1 - Select only group.0 : 1
2 - reverse the array order $reverseArray
[1, 20, 4, 10 ] -> [10, 4, 20, 1] (reversed)
[4, 10] -> [10, 4] (reversed)
3 - Sum (vertical axis)
[20, 8, 20, 1]
Finally, return the normal order:
[1, 20, 8, 20]
1 - Is it possible to groupby based on the first index of "group"
array ([1]) ?
To get the first index position (i.e., 0; array indexes start from 0) use the $arrayElemAt aggregation operator:
db.collection.aggregate([ { $group: { _id: { $arrayElemAt: [ "$group", 0 ] } } }, ] )
2- Is it possible to reverse the index order, and sum those demand
arrays vertically ?
You can reverse an array using the $reverseArray aggregation array operator.
To get the sum of values of each array's element position, (i) get the index of each value with unwind, and finally (ii) group by the index and sum the values.
db.collection.aggregate( [
{
$addFields: {
demand: { $reverseArray: "$demand" }
}
},
{
$unwind: { path: "$demand", includeArrayIndex: "ix" }
},
{
$group: {
_id: "$ix",
sum: { $sum: "$demand" }
}
},
{
$sort: { _id: 1 } // this is optional; sorts by index position
}
] )

How to insert to the certain position in the array in the subdocument by using Mongoose an MongoDB 2.6

A great explanation of how to use new $position operator of Mongodb 2.6 by using Mongoose was given in the answer to my question. The suggested solution works perfect for simple arrays. If array is in subdocument or each element of array is array the suggested solution doesn't work. I mean something like this:
List.collection.update(
{/*....*/},
{ "$push": {
"subdocument.list": {
"$each": [ 1, 2, 3 ],
"$position": 0 }
}
},function(err,NumAffected) {
console.log("done");
});
List.collection.update(
{/*....*/},
{ "$push": {
"list1.$.list2": {
"$each": [ 1, 2, 3 ],
"$position": 0 }
}
},function(err,NumAffected) {
console.log("done");
});
In the current Mongoose version (4.4.2) you can use position like this:
Foo.update(query.id,
{
$push: {
arrayData: {
$each: [{
date: new Date
}], $position: 0
}
}
});
No sure what the issue is here:
db.list.insert({ "list": { "sub": [4] } })
db.list.update(
{},
{ "$push": { "list.sub": { "$each": [1,2,3], "$position": 0 } } }
)
{ "list" : { "sub" : [ 1, 2, 3, 4 ] } }
So that works as expected.
And for the other example:
db.list.insert({
outer: [
{
key: "a",
inner: [4]
},
{
key: "b",
inner: [4]
}
]
})
db.list.update(
{ "outer.key": "b" },
{ "$push": {
"outer.$.inner": {
"$each": [1,2,3], "$position": 0
}
}}
)
Again is as expected:
{
"outer" : [
{
"key" : "a",
"inner" : [
4
]
},
{
"key" : "b",
"inner" : [
1,
2,
3,
4
]
}
]
}
The interaction with specific drivers was already explained so there must be something different in the data, but if so then those statements are not valid.
And so exactly the same using Mongoose:
var mongoose = require('mongoose');
var Schema = mongoose.Schema;
mongoose.connect('mongodb://localhost/nodetest');
var listSchema = new Schema({
});
var List = mongoose.model( "List", listSchema );
var Other = mongoose.model( "Other", listSchema );
List.collection.update(
{},
{ "$push": {
"list.sub": {
"$each": [ 1, 2, 3 ],
"$position": 0 }
}
},function(err,NumAffected) {
console.log("done");
}
);
Other.collection.update(
{ "outer.key": "b" },
{ "$push": {
"outer.$.inner": {
"$each": [ 1, 2, 3 ],
"$position": 0
}
}},function(err,NumAffected) {
console.log("done2")
}
);
I have found the code in the MongooseJS Model.update that omits support for the $position modifier.
In version 3.8.8, line 1928-1941 of query.js:
if ('$each' in val) {
obj[key] = {
$each: this._castUpdateVal(schema, val.$each, op)
}
if (val.$slice) {
obj[key].$slice = val.$slice | 0;
}
if (val.$sort) {
obj[key].$sort = val.$sort;
}
}
Here, obj[key] will be set to val.$each. You can see explicit support for setting the $slice & $sort modifiers, but the $position modifier will never be copied into obj[key].
So, although you may be able to by-pass the Model.update function of MongooseJS to directly access MongoDB's update function, it is clear that MongooseJS's Model.update function does not support the $position modifier.

Retrieving a subset of data from MongoDB

If I have a collection similar to:
[
{ "test": [ { "a": 1, "b": 2 }, { "a": 10, "b": 1 } ] },
{ "test": [ { "a": 5, "b": 1 }, { "a": 14, "b": 2 } ] },
...
]
How do I obtain only a subset of data consisting of the a values when b is 2? In SQL, this would be something similar to:
SELECT test.a FROM collection WHERE test.b = 2
I do understand that I can limit what data I get with something like:
collection.find({ }, { "test.a": 1 })
But that returns all the a values. How can I limit it so that it returns only the values in which b is 2 (the WHERE test.b = 2 part of the SQL equivalent)?
You can do this by adding a selector object as the first parameter of your find call and using the $elemMatch projection operator:
collection.find({ 'test.b': 2 }, { test: { $elemMatch: { b: 2 } }, 'test.a': 1 })
But this will only return the first test array element per-doc where b is 2. You would need to use the aggregation framework if there can be multiple b:2 elements in your test arrays.