Converting older Mongo database references to DBRefs - mongodb

I'm in the process of updating some legacy software that is still running on Mongo 2.4. The first step is to upgrade to the latest 2.6 then go from there.
Running the db.upgradeCheckAllDBs(); gives us the DollarPrefixedFieldName: $id is not valid for storage. errors and indeed we have some older records with legacy $id, $ref fields. We have a number of collections that look something like this:
{
"_id" : "1",
"someRef" : {"$id" : "42", "$ref" : "someRef"}
},
{
"_id" : "2",
"someRef" : DBRef("someRef", "42")
},
{
"_id" : "3",
"someRef" : DBRef("someRef", "42")
},
{
"_id" : "4",
"someRef" : {"$id" : "42", "$ref" : "someRef"}
}
I want to script this to convert the older {"$id" : "42", "$ref" : "someRef"} objects to DBRef("someRef", "42") objects but leave the existing DBRef objects untouched. Unfortunately, I haven't been able to differentiate between the two types of objects.
Using typeof and $type simply say they are objects.
Both have $id and $ref fields.
In our groovy console when you pull one of the old ones back and one of the new ones getClass() returns DBRef for both.
We have about 80k records with this legacy format out of millions of total records. I'd hate to have to brute force it and modify every record whether it needs it or not.
This script will do what I need it to do but the find() will basically return all the records in the collection.
var cursor = db.someCollection.find({"someRef.$id" : {$exists: true}});
while(cursor.hasNext()) {
var rec = cursor.next();
db.someCollection.update({"_id": rec._id}, {$set: {"someRef": DBRef(rec.someRef.$ref, rec.someRef.$id)}});
}
Is there another way that I am missing that can be used to find only the offending records?
Update
As described in the accepted answer the order matters which made all the difference. The script we went with that corrected our data:
var cursor = db.someCollection.find(
{
$where: "function() { return this.someRef != null &&
Object.keys(this.someRef)[0] == '$id'; }"
}
);
while(cursor.hasNext()) {
var rec = cursor.next();
db.someCollection.update(
{"_id": rec._id},
{$set: {"someRef": DBRef(rec.someRef.$ref, rec.someRef.$id)}}
);
}
We did have a collection with a larger number of records that needed to be corrected where the connection timed out. We just ran the script again and it got through the remaining records.
There's probably a better way to do this. I would be interested in hearing about a better approach. For now, this problem is solved.

DBRef is a client side thing. http://docs.mongodb.org/manual/reference/database-references/#dbrefs says it pretty clear:
The order of fields in the DBRef matters, and you must use the above sequence when using a DBRef.
The drivers benefit from the fact that order of fields in BSON is consistent to recognise DBRef, so you can do the same:
db.someCollection.find({ $expr: {
$let: {
vars: {firstKey: { $arrayElemAt: [ { $objectToArray: "$someRef" }, 0] } },
in: { $eq: [{ $substr: [ "$$firstKey.k", 1, 2 ] } , "id"]}
}
} } )
will return objects where order of the fields doesn't match driver's expectation.

Related

query for multiple value using single query [duplicate]

How to achieve below SQL in MongoShell?
Select TableA.* from TableA where TableA.FieldB in (select TableB.FieldValue from TableB)
Mongo doc gives some example of
db.inventory.find( { qty: { $in: [ 5, 15 ] } } )
I want that array be dynamically from another query. Is it possible?
Extending my question
I have a collection of bot names
bots collection
{
"_id" : ObjectId("53266697c294991f57c36e42"),
"name" : "teoma"
}
I have a collection of user traffic, in that traffic collection, I have a field useragent
userTraffic Collection
{
"_id" : ObjectId("5325ee6efb91c0161cbe7b2c"),
"hosttype" : "http",
"useragent" : "Mediapartners-Google",
"is_crawler" : false,
"City" : "Mountain View",
"State" : "CA",
"Country" : "United States"
}
I want to select all user traffic documents where its useragent contains any name of bot collection
This is what I have come up with
var botArray = db.bots.find({},{name:1, _id:0}).toArray()
db.Sessions.find({
useragent: {$in: botArray}
},{
ipaddress:1
})
Here i believe it is doing equals to comparison, but I want it to do like %% comparison
Once I get the result, I want to do an update to that result set as is_crawler= true
Tried something like this, isn't helpful
db.bots.find().forEach( function(myBot) {
db.Sessions.find({
useragent: /myBot.name/
},{
ipaddress:1
})
});
Another way of looping through the records, but no match found.
var bots = db.bots.find( {
$query: {},
$orderby:{
name:1}
});
while( bots.hasNext()) {
var bot = bots.next();
//print(bot.name);
var botName = bot.name.toLowerCase();
print(botName);
db.Sessions.find({
useragent: /botName/,
is_crawler:false
},{
start_date:1,
ipaddress:1,
useragent:1,
City:1,
State:1,
Country:1,
is_crawler:1,
_id:0
})
}
Not in a single query it isn't.
There is nothing wrong with getting the results from a query and feeding that in as your in condition.
var list = db.collectionA.find({},{ "_id": 0, "field": 1 }).toArray();
results = db.collectionB.find({ "newfield": { "$in": list } });
But your actual purpose is not clear, as using SQL queries alone as the only example of what you want to achieve are generally not a good guide to answer the question. The main cause of this is that you probably should be modelling differently than as you do in relational. Otherwise, why use MongoDB at all?
I would suggest reading the documentation section on Data Modelling which shows several examples of how to approach common modelling cases.
Considering that information, then perhaps you can reconsider what you are modelling, and if you then have specific questions to other problems there, then feel free to ask your questions here.
Finally this is how I could accomplish it.
// Get a array with values for name field
var botArray = db.bots.find({},{name:1}).toArray();
// loop through another collection
db.Sessions.find().forEach(function(sess){
if(sess.is_crawler == false){ // check a condition
// loop in the above array
botArray.forEach(function(b){
//check if exists in the array
if(String(sess.useragent).toUpperCase().indexOf(b.name.toUpperCase()) > -1){
db.Sessions.update({ _id : sess._id} // find by _id
,{
is_crawler : true // set a update value
},
{
upsert:false // do update only
})
}
});
}
});

Rename a sub-document field within an Array

Considering the document below how can I rename 'techId1' to 'techId'. I've tried different ways and can't get it to work.
{
"_id" : ObjectId("55840f49e0b"),
"__v" : 0,
"accessCard" : "123456789",
"checkouts" : [
{
"user" : ObjectId("5571e7619f"),
"_id" : ObjectId("55840f49e0bf"),
"date" : ISODate("2015-06-19T12:45:52.339Z"),
"techId1" : ObjectId("553d9cbcaf")
},
{
"user" : ObjectId("5571e7619f15"),
"_id" : ObjectId("55880e8ee0bf"),
"date" : ISODate("2015-06-22T13:01:51.672Z"),
"techId1" : ObjectId("55b7db39989")
}
],
"created" : ISODate("2015-06-19T12:47:05.422Z"),
"date" : ISODate("2015-06-19T12:45:52.339Z"),
"location" : ObjectId("55743c8ddbda"),
"model" : "model1",
"order" : ObjectId("55840f49e0bf"),
"rid" : "987654321",
"serialNumber" : "AHSJSHSKSK",
"user" : ObjectId("5571e7619f1"),
"techId" : ObjectId("55b7db399")
}
In mongo console I tried which gives me ok but nothing is actually updated.
collection.update({"checkouts._id":ObjectId("55840f49e0b")},{ $rename: { "techId1": "techId" } });
I also tried this which gives me an error. "cannot use the part (checkouts of checkouts.techId1) to traverse the element"
collection.update({"checkouts._id":ObjectId("55856609e0b")},{ $rename: { "checkouts.techId1": "checkouts.techId" } })
In mongoose I have tried the following.
collection.findByIdAndUpdate(id, { $rename: { "checkouts.techId1": "checkouts.techId" } }, function (err, data) {});
and
collection.update({'checkouts._id': n1._id}, { $rename: { "checkouts.$.techId1": "checkouts.$.techId" } }, function (err, data) {});
Thanks in advance.
You were close at the end, but there are a few things missing. You cannot $rename when using the positional operator, instead you need to $set the new name and $unset the old one. But there is another restriction here as they will both belong to "checkouts" as a parent path in that you cannot do both at the same time.
The other core line in your question is "traverse the element" and that is the one thing you cannot do in updating "all" of the array elements at once. Well, not safely and without possibly overwriting new data coming in anyway.
What you need to do is "iterate" each document and similarly iterate each array member in order to "safely" update. You cannot really iterate just the document and "save" the whole array back with alterations. Certainly not in the case where anything else is actively using the data.
I personally would run this sort of operation in the MongoDB shell if you can, as it is a "one off" ( hopefully ) thing and this saves the overhead of writing other API code. Also we're using the Bulk Operations API here to make this as efficient as possible. With mongoose it takes a bit more digging to implement, but still can be done. But here is the shell listing:
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find({ "checkouts.techId1": { "$exists": true } }).forEach(function(doc) {
doc.checkouts.forEach(function(checkout) {
if ( checkout.hasOwnProperty("techId1") ) {
bulk.find({ "_id": doc._id, "checkouts._id": checkout._id }).updateOne({
"$set": { "checkouts.$.techId": checkout.techId1 }
});
bulk.find({ "_id": doc._id, "checkouts._id": checkout._id }).updateOne({
"$unset": { "checkouts.$.techId1": 1 }
});
count += 2;
if ( count % 500 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
}
});
});
if ( count % 500 !== 0 )
bulk.execute();
Since the $set and $unset operations are happening in pairs, we are keeping the total batch size to 1000 operations per execution just to keep memory usage on the client down.
The loop simply looks for documents where the field to be renamed "exists" and then iterates each array element of each document and commits the two changes. As Bulk Operations, these are not sent to the server until the .execute() is called, where also a single response is returned for each call. This saves a lot of traffic.
If you insist on coding with mongoose. Be aware that a .collection acessor is required to get to the Bulk API methods from the core driver, like this:
var bulk = Model.collection.inititializeOrderedBulkOp();
And the only thing that sends to the server is the .execute() method, so this is your only execution callback:
bulk.exectute(function(err,response) {
// code body and async iterator callback here
});
And use async flow control instead of .forEach() such as async.each.
Also, if you do that, then be aware that as a raw driver method not governed by mongoose, you do not get the same database connection awareness as you do with mongoose methods. Unless you know for sure the database connection is already established, it is safter to put this code within an event callback for the server connection:
mongoose.connection.on("connect",function(err) {
// body of code
});
But otherwise those are the only real ( apart from call syntax ) alterations you really need.
This worked for me, I created this query to perform this procedure and I share it, (although I know it is not the most optimized way):
First, make an aggregate that (1) $match the documents that have the checkouts array field with techId1 as one of the keys of each sub-document. (2) $unwind the checkouts field (that deconstructs the array field from the input documents to output a document for each element), (3) adds the techId field (with $addFields), (4) $unset the old techId1 field, (5) $group the documents by _id to have again the checkout sub-documents grouped by its _id, and (6) write the result of these aggregation in a temporal collection (with $out).
const collection = 'yourCollection'
db[collection].aggregate([
{
$match: {
'checkouts.techId1': { '$exists': true }
}
},
{
$unwind: {
path: '$checkouts'
}
},
{
$addFields: {
'checkouts.techId': '$checkouts.techId1'
}
},
{
$project: {
'checkouts.techId1': 0
}
},
{
$group: {
'_id': '$_id',
'checkouts': { $push: { 'techId': '$checkouts.techId' } }
}
},
{
$out: 'temporal'
}
])
Then, you can make another aggregate from this temporal collection to $merge the documents with the modified checkouts field to your original collection.
db.temporal.aggregate([
{
$merge: {
into: collection,
on: "_id",
whenMatched:"merge",
whenNotMatched: "insert"
}
}
])

How to access embedded documents in MongoTemplate when the key is an empty string?

{
"_id" : ObjectId("550add7ee0b4b54a3e7ad53c"),
"day" : "14-03-2015",
"node" : "2G",
"nodeName" : "BLR_SGSN",
"" : {
"A" : 905.84,
"B" : 261.34,
"C" : 2103.94,
"D" : 39.67
}
}
I have this as my data in mongo.
How do I get values of A,B,C,D. ??
You cannot query on this as the sub-document fields cannot be selected.
This can only be a result of a programming error doing something like this ( and probably trying to compute a key name in the process ):
db.collection.insert({
"": {
"A": 1,
"B": 2,
"C": 3
}
})
So you cannot get to sub-elements by standard query ways like:
db.collection.find({ ".A": 905.84 })
You can fix this by updating the documents in the collection affected in this way by giving them a proper key name. But it is of course this is an iterative process. Not sure how to fix this other than with JavaScript from the shell due to the naming problem but:
db.collection.find({ "": { "$exists": true } }).forEach(function(doc) {
if ( doc.hasOwnProperty("") ) {
doc.newprop = doc[""];
delete doc[""];
db.collection.update({ "_id": doc._id }, doc );
}
})
Then at least you can access things by the new "newprop" key ( or whatever you call it ):
db.collection.find({ "newprop.A": 905.84 })
And the same sort of thing will work in other drivers.
My advice here is "go and fix this" and find out the code that caused this key name to be blank in the first place.
There should be a bug report submitted to the MongoDB core project as none of the dirvers handle this well. I thought I could even use $rename here, but you can't.
So blank "" keys are a problem that needs to be fixed.

MongoDB select all where field value in a query list

How to achieve below SQL in MongoShell?
Select TableA.* from TableA where TableA.FieldB in (select TableB.FieldValue from TableB)
Mongo doc gives some example of
db.inventory.find( { qty: { $in: [ 5, 15 ] } } )
I want that array be dynamically from another query. Is it possible?
Extending my question
I have a collection of bot names
bots collection
{
"_id" : ObjectId("53266697c294991f57c36e42"),
"name" : "teoma"
}
I have a collection of user traffic, in that traffic collection, I have a field useragent
userTraffic Collection
{
"_id" : ObjectId("5325ee6efb91c0161cbe7b2c"),
"hosttype" : "http",
"useragent" : "Mediapartners-Google",
"is_crawler" : false,
"City" : "Mountain View",
"State" : "CA",
"Country" : "United States"
}
I want to select all user traffic documents where its useragent contains any name of bot collection
This is what I have come up with
var botArray = db.bots.find({},{name:1, _id:0}).toArray()
db.Sessions.find({
useragent: {$in: botArray}
},{
ipaddress:1
})
Here i believe it is doing equals to comparison, but I want it to do like %% comparison
Once I get the result, I want to do an update to that result set as is_crawler= true
Tried something like this, isn't helpful
db.bots.find().forEach( function(myBot) {
db.Sessions.find({
useragent: /myBot.name/
},{
ipaddress:1
})
});
Another way of looping through the records, but no match found.
var bots = db.bots.find( {
$query: {},
$orderby:{
name:1}
});
while( bots.hasNext()) {
var bot = bots.next();
//print(bot.name);
var botName = bot.name.toLowerCase();
print(botName);
db.Sessions.find({
useragent: /botName/,
is_crawler:false
},{
start_date:1,
ipaddress:1,
useragent:1,
City:1,
State:1,
Country:1,
is_crawler:1,
_id:0
})
}
Not in a single query it isn't.
There is nothing wrong with getting the results from a query and feeding that in as your in condition.
var list = db.collectionA.find({},{ "_id": 0, "field": 1 }).toArray();
results = db.collectionB.find({ "newfield": { "$in": list } });
But your actual purpose is not clear, as using SQL queries alone as the only example of what you want to achieve are generally not a good guide to answer the question. The main cause of this is that you probably should be modelling differently than as you do in relational. Otherwise, why use MongoDB at all?
I would suggest reading the documentation section on Data Modelling which shows several examples of how to approach common modelling cases.
Considering that information, then perhaps you can reconsider what you are modelling, and if you then have specific questions to other problems there, then feel free to ask your questions here.
Finally this is how I could accomplish it.
// Get a array with values for name field
var botArray = db.bots.find({},{name:1}).toArray();
// loop through another collection
db.Sessions.find().forEach(function(sess){
if(sess.is_crawler == false){ // check a condition
// loop in the above array
botArray.forEach(function(b){
//check if exists in the array
if(String(sess.useragent).toUpperCase().indexOf(b.name.toUpperCase()) > -1){
db.Sessions.update({ _id : sess._id} // find by _id
,{
is_crawler : true // set a update value
},
{
upsert:false // do update only
})
}
});
}
});

Return actual type of a field in MongoDB

In MongoDB, using $type, it is possible to filter a search based on if the field matches a BSON data type (see DOCS).
For eg.
db.posts.find({date2: {$type: 9}}, {date2: 1})
which returns:
{
"_id" : ObjectId("4c0ec11e8fd2e65c0b010000"),
"date2" : "Fri Jul 09 2010 08:25:26 GMT"
}
I need a query that will tell me what the actual type of the field is, for every field in a collection. Is this possible with MongoDB?
Starting from MongoDB 3.4, you can use the $type aggregation operator to return a field's type.
db.posts.aggregate(
[
{ "$project": { "fieldType": { "$type": "$date2" } } }
]
)
which yields:
{
"_id" : ObjectId("4c0ec11e8fd2e65c0b010000"),
"fieldType" : "string"
}
type the below query in mongo shell
typeof db.employee.findOne().first_name
Syntax
typeof db.collection_name.findOne().field_name
OK, here are some related questions that may help:
Get all field names in a collection using map-reduce.
Here's a recursive version that lists all possible fields.
Hopefully that can get you started. However, I suspect that you're going to run into some issues with this request. There are two problems here:
I can't find a "gettype" function for JSON. You can query by $type, but it doesn't look like you can actually run a gettype function on a field and have that maps back to the BSON type.
A field can contain data of multiple types, so you'll need a plan to handle this. Even if it's not apparent Mongo could store some numbers as ints and others floats without you really knowing. In fact, with the PHP driver, this is quite possible.
So if you assume that you can solve problem #1, then you should be able to solve problem #2 using a slight variation on "Get all field Names".
It would probably look something like this:
"map" : function() { for (var key in this) { emit(key, [ typeof value[key] ]); } }
"reduce" : function(key, stuff) { return (key, add_to_set(stuff) ); }
So basically you would emit the key and the type of key value (as an array) in the map function. Then from the reduce function you would add unique entries for each type.
At the end of the run you would have data like this
{"_id":[255], "name" : [1,5,8], ... }
Of course, this is all a lot of work, depending on your actual problem, you may just want to ensure (from your code) that you're always putting in the right type of data. Finding the type of data after the data is in the DB is definitely a pain.
Taking advantage of the styvane query, I added a $group listing to make it easier to read when we have different data types.
db.posts.aggregate(
[
{ "$project": { _id:0, "fieldType": { "$type": "$date2" } } },
{"$group": { _id: {"fieldType": "$fieldType"},count: {$sum: 1}}}
])
And have this result:
{ "_id" : { "fieldType" : "missing" }, "count" : 50 }
{ "_id" : { "fieldType" : "date" }, "count" : 70 }
{ "_id" : { "fieldType" : "string" }, "count" : 10 }
Noting that a=5;a.constructor.toString() prints function Number() { [native code] }, one can do something similar to:
db.collection.mapReduce(
function() {
emit(this._id.constructor.toString()
.replace(/^function (\S+).+$/, "$1"), 1);
},
function(k, v) {
return Array.sum(v);
},
{
out: { inline: 1 }
});