Finding which elements in an array do not exist for a field value in MongoDB - mongodb

I have a collection users as follows:
{ "_id" : ObjectId("51780f796ec4051a536015cf"), "userId" : "John" }
{ "_id" : ObjectId("51780f796ec4051a536015d0"), "userId" : "Sam" }
{ "_id" : ObjectId("51780f796ec4051a536015d1"), "userId" : "John1" }
{ "_id" : ObjectId("51780f796ec4051a536015d2"), "userId" : "john2" }
Now I am trying to write a code which can provides suggestions of a userId to user in case id provided by user already exists in DB. In same routine I just append values from 1 to 5 to the for example in case user have selected userId to be John, suggested user name array that needs to be checked for Id in database will look like this
[John,John1,John2,John3,John4,John5].
Now I just want to execute it against Db and to find out which of the suggested values do not exist in DB. So instead of selecting any document, I want to select values within suggested array which do not exist for users collection.
Any pointers are highly appreciated.

Your general approach here is you want to find the "distinct" values for the "userId's" that already exist in your collection. You then compare these to your input selection to see the difference.
var test = ["John","John1","John2","John3","John4","John5"];
var res = db.collection.aggregate([
{ "$match": { "userId": { "$in": test } }},
{ "$group": { "_id": "$userId" }}
]).toArray();
res.map(function(x){ return x._id });
test.filter(function(x){ return res.indexOf(x) == -1 })
The end result of this is the userId's that do not match in your initial input:
[ "John2", "John3", "John4", "John5" ]
The main operator there is $in which takes an array as the argument and compares those values against the specified field. The .aggregate() method is the best approach, there is a shorthand mapReduce wrapper in the .distinct() method which directly produces just the values in the array, removing the call to a function like .map() to strip out the values for a given key:
var res = db.collection.distinct("userId",{ "userId": { "$in": test } })
It should run notably slower though, especially on large collections.
Also do not forget to index your "userId" and likely you want this to be "unique" anyway so you really just want the $match or .find() result:
var res = db.collection.find({ "$in": test }).toArray()

Related

Converting older Mongo database references to DBRefs

I'm in the process of updating some legacy software that is still running on Mongo 2.4. The first step is to upgrade to the latest 2.6 then go from there.
Running the db.upgradeCheckAllDBs(); gives us the DollarPrefixedFieldName: $id is not valid for storage. errors and indeed we have some older records with legacy $id, $ref fields. We have a number of collections that look something like this:
{
"_id" : "1",
"someRef" : {"$id" : "42", "$ref" : "someRef"}
},
{
"_id" : "2",
"someRef" : DBRef("someRef", "42")
},
{
"_id" : "3",
"someRef" : DBRef("someRef", "42")
},
{
"_id" : "4",
"someRef" : {"$id" : "42", "$ref" : "someRef"}
}
I want to script this to convert the older {"$id" : "42", "$ref" : "someRef"} objects to DBRef("someRef", "42") objects but leave the existing DBRef objects untouched. Unfortunately, I haven't been able to differentiate between the two types of objects.
Using typeof and $type simply say they are objects.
Both have $id and $ref fields.
In our groovy console when you pull one of the old ones back and one of the new ones getClass() returns DBRef for both.
We have about 80k records with this legacy format out of millions of total records. I'd hate to have to brute force it and modify every record whether it needs it or not.
This script will do what I need it to do but the find() will basically return all the records in the collection.
var cursor = db.someCollection.find({"someRef.$id" : {$exists: true}});
while(cursor.hasNext()) {
var rec = cursor.next();
db.someCollection.update({"_id": rec._id}, {$set: {"someRef": DBRef(rec.someRef.$ref, rec.someRef.$id)}});
}
Is there another way that I am missing that can be used to find only the offending records?
Update
As described in the accepted answer the order matters which made all the difference. The script we went with that corrected our data:
var cursor = db.someCollection.find(
{
$where: "function() { return this.someRef != null &&
Object.keys(this.someRef)[0] == '$id'; }"
}
);
while(cursor.hasNext()) {
var rec = cursor.next();
db.someCollection.update(
{"_id": rec._id},
{$set: {"someRef": DBRef(rec.someRef.$ref, rec.someRef.$id)}}
);
}
We did have a collection with a larger number of records that needed to be corrected where the connection timed out. We just ran the script again and it got through the remaining records.
There's probably a better way to do this. I would be interested in hearing about a better approach. For now, this problem is solved.
DBRef is a client side thing. http://docs.mongodb.org/manual/reference/database-references/#dbrefs says it pretty clear:
The order of fields in the DBRef matters, and you must use the above sequence when using a DBRef.
The drivers benefit from the fact that order of fields in BSON is consistent to recognise DBRef, so you can do the same:
db.someCollection.find({ $expr: {
$let: {
vars: {firstKey: { $arrayElemAt: [ { $objectToArray: "$someRef" }, 0] } },
in: { $eq: [{ $substr: [ "$$firstKey.k", 1, 2 ] } , "id"]}
}
} } )
will return objects where order of the fields doesn't match driver's expectation.

query for multiple value using single query [duplicate]

How to achieve below SQL in MongoShell?
Select TableA.* from TableA where TableA.FieldB in (select TableB.FieldValue from TableB)
Mongo doc gives some example of
db.inventory.find( { qty: { $in: [ 5, 15 ] } } )
I want that array be dynamically from another query. Is it possible?
Extending my question
I have a collection of bot names
bots collection
{
"_id" : ObjectId("53266697c294991f57c36e42"),
"name" : "teoma"
}
I have a collection of user traffic, in that traffic collection, I have a field useragent
userTraffic Collection
{
"_id" : ObjectId("5325ee6efb91c0161cbe7b2c"),
"hosttype" : "http",
"useragent" : "Mediapartners-Google",
"is_crawler" : false,
"City" : "Mountain View",
"State" : "CA",
"Country" : "United States"
}
I want to select all user traffic documents where its useragent contains any name of bot collection
This is what I have come up with
var botArray = db.bots.find({},{name:1, _id:0}).toArray()
db.Sessions.find({
useragent: {$in: botArray}
},{
ipaddress:1
})
Here i believe it is doing equals to comparison, but I want it to do like %% comparison
Once I get the result, I want to do an update to that result set as is_crawler= true
Tried something like this, isn't helpful
db.bots.find().forEach( function(myBot) {
db.Sessions.find({
useragent: /myBot.name/
},{
ipaddress:1
})
});
Another way of looping through the records, but no match found.
var bots = db.bots.find( {
$query: {},
$orderby:{
name:1}
});
while( bots.hasNext()) {
var bot = bots.next();
//print(bot.name);
var botName = bot.name.toLowerCase();
print(botName);
db.Sessions.find({
useragent: /botName/,
is_crawler:false
},{
start_date:1,
ipaddress:1,
useragent:1,
City:1,
State:1,
Country:1,
is_crawler:1,
_id:0
})
}
Not in a single query it isn't.
There is nothing wrong with getting the results from a query and feeding that in as your in condition.
var list = db.collectionA.find({},{ "_id": 0, "field": 1 }).toArray();
results = db.collectionB.find({ "newfield": { "$in": list } });
But your actual purpose is not clear, as using SQL queries alone as the only example of what you want to achieve are generally not a good guide to answer the question. The main cause of this is that you probably should be modelling differently than as you do in relational. Otherwise, why use MongoDB at all?
I would suggest reading the documentation section on Data Modelling which shows several examples of how to approach common modelling cases.
Considering that information, then perhaps you can reconsider what you are modelling, and if you then have specific questions to other problems there, then feel free to ask your questions here.
Finally this is how I could accomplish it.
// Get a array with values for name field
var botArray = db.bots.find({},{name:1}).toArray();
// loop through another collection
db.Sessions.find().forEach(function(sess){
if(sess.is_crawler == false){ // check a condition
// loop in the above array
botArray.forEach(function(b){
//check if exists in the array
if(String(sess.useragent).toUpperCase().indexOf(b.name.toUpperCase()) > -1){
db.Sessions.update({ _id : sess._id} // find by _id
,{
is_crawler : true // set a update value
},
{
upsert:false // do update only
})
}
});
}
});

Rename a sub-document field within an Array

Considering the document below how can I rename 'techId1' to 'techId'. I've tried different ways and can't get it to work.
{
"_id" : ObjectId("55840f49e0b"),
"__v" : 0,
"accessCard" : "123456789",
"checkouts" : [
{
"user" : ObjectId("5571e7619f"),
"_id" : ObjectId("55840f49e0bf"),
"date" : ISODate("2015-06-19T12:45:52.339Z"),
"techId1" : ObjectId("553d9cbcaf")
},
{
"user" : ObjectId("5571e7619f15"),
"_id" : ObjectId("55880e8ee0bf"),
"date" : ISODate("2015-06-22T13:01:51.672Z"),
"techId1" : ObjectId("55b7db39989")
}
],
"created" : ISODate("2015-06-19T12:47:05.422Z"),
"date" : ISODate("2015-06-19T12:45:52.339Z"),
"location" : ObjectId("55743c8ddbda"),
"model" : "model1",
"order" : ObjectId("55840f49e0bf"),
"rid" : "987654321",
"serialNumber" : "AHSJSHSKSK",
"user" : ObjectId("5571e7619f1"),
"techId" : ObjectId("55b7db399")
}
In mongo console I tried which gives me ok but nothing is actually updated.
collection.update({"checkouts._id":ObjectId("55840f49e0b")},{ $rename: { "techId1": "techId" } });
I also tried this which gives me an error. "cannot use the part (checkouts of checkouts.techId1) to traverse the element"
collection.update({"checkouts._id":ObjectId("55856609e0b")},{ $rename: { "checkouts.techId1": "checkouts.techId" } })
In mongoose I have tried the following.
collection.findByIdAndUpdate(id, { $rename: { "checkouts.techId1": "checkouts.techId" } }, function (err, data) {});
and
collection.update({'checkouts._id': n1._id}, { $rename: { "checkouts.$.techId1": "checkouts.$.techId" } }, function (err, data) {});
Thanks in advance.
You were close at the end, but there are a few things missing. You cannot $rename when using the positional operator, instead you need to $set the new name and $unset the old one. But there is another restriction here as they will both belong to "checkouts" as a parent path in that you cannot do both at the same time.
The other core line in your question is "traverse the element" and that is the one thing you cannot do in updating "all" of the array elements at once. Well, not safely and without possibly overwriting new data coming in anyway.
What you need to do is "iterate" each document and similarly iterate each array member in order to "safely" update. You cannot really iterate just the document and "save" the whole array back with alterations. Certainly not in the case where anything else is actively using the data.
I personally would run this sort of operation in the MongoDB shell if you can, as it is a "one off" ( hopefully ) thing and this saves the overhead of writing other API code. Also we're using the Bulk Operations API here to make this as efficient as possible. With mongoose it takes a bit more digging to implement, but still can be done. But here is the shell listing:
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find({ "checkouts.techId1": { "$exists": true } }).forEach(function(doc) {
doc.checkouts.forEach(function(checkout) {
if ( checkout.hasOwnProperty("techId1") ) {
bulk.find({ "_id": doc._id, "checkouts._id": checkout._id }).updateOne({
"$set": { "checkouts.$.techId": checkout.techId1 }
});
bulk.find({ "_id": doc._id, "checkouts._id": checkout._id }).updateOne({
"$unset": { "checkouts.$.techId1": 1 }
});
count += 2;
if ( count % 500 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
}
});
});
if ( count % 500 !== 0 )
bulk.execute();
Since the $set and $unset operations are happening in pairs, we are keeping the total batch size to 1000 operations per execution just to keep memory usage on the client down.
The loop simply looks for documents where the field to be renamed "exists" and then iterates each array element of each document and commits the two changes. As Bulk Operations, these are not sent to the server until the .execute() is called, where also a single response is returned for each call. This saves a lot of traffic.
If you insist on coding with mongoose. Be aware that a .collection acessor is required to get to the Bulk API methods from the core driver, like this:
var bulk = Model.collection.inititializeOrderedBulkOp();
And the only thing that sends to the server is the .execute() method, so this is your only execution callback:
bulk.exectute(function(err,response) {
// code body and async iterator callback here
});
And use async flow control instead of .forEach() such as async.each.
Also, if you do that, then be aware that as a raw driver method not governed by mongoose, you do not get the same database connection awareness as you do with mongoose methods. Unless you know for sure the database connection is already established, it is safter to put this code within an event callback for the server connection:
mongoose.connection.on("connect",function(err) {
// body of code
});
But otherwise those are the only real ( apart from call syntax ) alterations you really need.
This worked for me, I created this query to perform this procedure and I share it, (although I know it is not the most optimized way):
First, make an aggregate that (1) $match the documents that have the checkouts array field with techId1 as one of the keys of each sub-document. (2) $unwind the checkouts field (that deconstructs the array field from the input documents to output a document for each element), (3) adds the techId field (with $addFields), (4) $unset the old techId1 field, (5) $group the documents by _id to have again the checkout sub-documents grouped by its _id, and (6) write the result of these aggregation in a temporal collection (with $out).
const collection = 'yourCollection'
db[collection].aggregate([
{
$match: {
'checkouts.techId1': { '$exists': true }
}
},
{
$unwind: {
path: '$checkouts'
}
},
{
$addFields: {
'checkouts.techId': '$checkouts.techId1'
}
},
{
$project: {
'checkouts.techId1': 0
}
},
{
$group: {
'_id': '$_id',
'checkouts': { $push: { 'techId': '$checkouts.techId' } }
}
},
{
$out: 'temporal'
}
])
Then, you can make another aggregate from this temporal collection to $merge the documents with the modified checkouts field to your original collection.
db.temporal.aggregate([
{
$merge: {
into: collection,
on: "_id",
whenMatched:"merge",
whenNotMatched: "insert"
}
}
])

Inline/combine other collection into one collection

I want to combine two mongodb collections.
Basically I have a collection containing documents that reference one document from another collection. Now I want to have this as a inline / nested field instead of a separate document.
So just to provide an example:
Collection A:
[{
"_id":"90A26C2A-4976-4EDD-850D-2ED8BEA46F9E",
"someValue": "foo"
},
{
"_id":"5F0BB248-E628-4B8F-A2F6-FECD79B78354",
"someValue": "bar"
}]
Collection B:
[{
"_id":"169099A4-5EB9-4D55-8118-53D30B8A2E1A",
"collectionAID":"90A26C2A-4976-4EDD-850D-2ED8BEA46F9E",
"some":"foo",
"andOther":"stuff"
},
{
"_id":"83B14A8B-86A8-49FF-8394-0A7F9E709C13",
"collectionAID":"90A26C2A-4976-4EDD-850D-2ED8BEA46F9E",
"some":"bar",
"andOther":"random"
}]
This should result in Collection A looking like this:
[{
"_id":"90A26C2A-4976-4EDD-850D-2ED8BEA46F9E",
"someValue": "foo",
"collectionB":[{
"some":"foo",
"andOther":"stuff"
},{
"some":"bar",
"andOther":"random"
}]
},
{
"_id":"5F0BB248-E628-4B8F-A2F6-FECD79B78354",
"someValue": "bar"
}]
I'd suggest something simple like this from the console:
db.collB.find().forEach(function(doc) {
var aid = doc.collectionAID;
if (typeof aid === 'undefined') { return; } // nothing
delete doc["_id"]; // remove property
delete doc["collectionAID"]; // remove property
db.collA.update({_id: aid}, /* match the ID from B */
{ $push : { collectionB : doc }});
});
It loops through each document in collectionB and if there is a field collectionAID defined, it removes the unnecessary properties (_id and collectionAID). Finally, it updates a matching document in collectionA by using the $push operator to add the document from B to the field collectionB. If the field doesn't exist, it is automatically created as an array with the newly inserted document. If it does exist as an array, it will be appended. (If it exists, but isn't an array, it will fail). Because the update call isn't using upsert, if the _id in the collectionB document doesn't exist, nothing will happen.
You can extend it to delete other fields as necessary or possibly add more robust error handling if for example a document from B doesn't match anything in A.
Running the code above on your data produces this:
{ "_id" : "5F0BB248-E628-4B8F-A2F6-FECD79B78354", "someValue" : "bar" }
{ "_id" : "90A26C2A-4976-4EDD-850D-2ED8BEA46F9E",
"collectionB" : [
{
"some" : "foo",
"andOther" : "stuff"
},
{
"some" : "bar",
"andOther" : "random"
}
],
"someValue" : "foo"
}
Sadly mapreduce can't produce full documents.
https://jira.mongodb.org/browse/SERVER-2517
No idea why despite all the attention, whining and upvotes they haven't changed it. So you'll have to do this manually in the language of your choice.
Hopefully you've indexed 'collectionAID' which should improve the speed of your queries. Just write something that goes through your A collection one document at a time, loading the _id and then adding the array from Collection B.
There is a much faster way than https://stackoverflow.com/a/22676205/1578508
You can do it the other way round and run through the collection you want to insert your documents in. (Far less executions!)
db.collA.find().forEach(function (x) {
var collBs = db.collB.find({"collectionAID":x._id},{"_id":0,"collectionA":0});
x.collectionB = collBs.toArray();
db.collA.save(x);
})

MongoDB select all where field value in a query list

How to achieve below SQL in MongoShell?
Select TableA.* from TableA where TableA.FieldB in (select TableB.FieldValue from TableB)
Mongo doc gives some example of
db.inventory.find( { qty: { $in: [ 5, 15 ] } } )
I want that array be dynamically from another query. Is it possible?
Extending my question
I have a collection of bot names
bots collection
{
"_id" : ObjectId("53266697c294991f57c36e42"),
"name" : "teoma"
}
I have a collection of user traffic, in that traffic collection, I have a field useragent
userTraffic Collection
{
"_id" : ObjectId("5325ee6efb91c0161cbe7b2c"),
"hosttype" : "http",
"useragent" : "Mediapartners-Google",
"is_crawler" : false,
"City" : "Mountain View",
"State" : "CA",
"Country" : "United States"
}
I want to select all user traffic documents where its useragent contains any name of bot collection
This is what I have come up with
var botArray = db.bots.find({},{name:1, _id:0}).toArray()
db.Sessions.find({
useragent: {$in: botArray}
},{
ipaddress:1
})
Here i believe it is doing equals to comparison, but I want it to do like %% comparison
Once I get the result, I want to do an update to that result set as is_crawler= true
Tried something like this, isn't helpful
db.bots.find().forEach( function(myBot) {
db.Sessions.find({
useragent: /myBot.name/
},{
ipaddress:1
})
});
Another way of looping through the records, but no match found.
var bots = db.bots.find( {
$query: {},
$orderby:{
name:1}
});
while( bots.hasNext()) {
var bot = bots.next();
//print(bot.name);
var botName = bot.name.toLowerCase();
print(botName);
db.Sessions.find({
useragent: /botName/,
is_crawler:false
},{
start_date:1,
ipaddress:1,
useragent:1,
City:1,
State:1,
Country:1,
is_crawler:1,
_id:0
})
}
Not in a single query it isn't.
There is nothing wrong with getting the results from a query and feeding that in as your in condition.
var list = db.collectionA.find({},{ "_id": 0, "field": 1 }).toArray();
results = db.collectionB.find({ "newfield": { "$in": list } });
But your actual purpose is not clear, as using SQL queries alone as the only example of what you want to achieve are generally not a good guide to answer the question. The main cause of this is that you probably should be modelling differently than as you do in relational. Otherwise, why use MongoDB at all?
I would suggest reading the documentation section on Data Modelling which shows several examples of how to approach common modelling cases.
Considering that information, then perhaps you can reconsider what you are modelling, and if you then have specific questions to other problems there, then feel free to ask your questions here.
Finally this is how I could accomplish it.
// Get a array with values for name field
var botArray = db.bots.find({},{name:1}).toArray();
// loop through another collection
db.Sessions.find().forEach(function(sess){
if(sess.is_crawler == false){ // check a condition
// loop in the above array
botArray.forEach(function(b){
//check if exists in the array
if(String(sess.useragent).toUpperCase().indexOf(b.name.toUpperCase()) > -1){
db.Sessions.update({ _id : sess._id} // find by _id
,{
is_crawler : true // set a update value
},
{
upsert:false // do update only
})
}
});
}
});