Return actual type of a field in MongoDB - mongodb

In MongoDB, using $type, it is possible to filter a search based on if the field matches a BSON data type (see DOCS).
For eg.
db.posts.find({date2: {$type: 9}}, {date2: 1})
which returns:
{
"_id" : ObjectId("4c0ec11e8fd2e65c0b010000"),
"date2" : "Fri Jul 09 2010 08:25:26 GMT"
}
I need a query that will tell me what the actual type of the field is, for every field in a collection. Is this possible with MongoDB?

Starting from MongoDB 3.4, you can use the $type aggregation operator to return a field's type.
db.posts.aggregate(
[
{ "$project": { "fieldType": { "$type": "$date2" } } }
]
)
which yields:
{
"_id" : ObjectId("4c0ec11e8fd2e65c0b010000"),
"fieldType" : "string"
}

type the below query in mongo shell
typeof db.employee.findOne().first_name
Syntax
typeof db.collection_name.findOne().field_name

OK, here are some related questions that may help:
Get all field names in a collection using map-reduce.
Here's a recursive version that lists all possible fields.
Hopefully that can get you started. However, I suspect that you're going to run into some issues with this request. There are two problems here:
I can't find a "gettype" function for JSON. You can query by $type, but it doesn't look like you can actually run a gettype function on a field and have that maps back to the BSON type.
A field can contain data of multiple types, so you'll need a plan to handle this. Even if it's not apparent Mongo could store some numbers as ints and others floats without you really knowing. In fact, with the PHP driver, this is quite possible.
So if you assume that you can solve problem #1, then you should be able to solve problem #2 using a slight variation on "Get all field Names".
It would probably look something like this:
"map" : function() { for (var key in this) { emit(key, [ typeof value[key] ]); } }
"reduce" : function(key, stuff) { return (key, add_to_set(stuff) ); }
So basically you would emit the key and the type of key value (as an array) in the map function. Then from the reduce function you would add unique entries for each type.
At the end of the run you would have data like this
{"_id":[255], "name" : [1,5,8], ... }
Of course, this is all a lot of work, depending on your actual problem, you may just want to ensure (from your code) that you're always putting in the right type of data. Finding the type of data after the data is in the DB is definitely a pain.

Taking advantage of the styvane query, I added a $group listing to make it easier to read when we have different data types.
db.posts.aggregate(
[
{ "$project": { _id:0, "fieldType": { "$type": "$date2" } } },
{"$group": { _id: {"fieldType": "$fieldType"},count: {$sum: 1}}}
])
And have this result:
{ "_id" : { "fieldType" : "missing" }, "count" : 50 }
{ "_id" : { "fieldType" : "date" }, "count" : 70 }
{ "_id" : { "fieldType" : "string" }, "count" : 10 }

Noting that a=5;a.constructor.toString() prints function Number() { [native code] }, one can do something similar to:
db.collection.mapReduce(
function() {
emit(this._id.constructor.toString()
.replace(/^function (\S+).+$/, "$1"), 1);
},
function(k, v) {
return Array.sum(v);
},
{
out: { inline: 1 }
});

Related

Converting older Mongo database references to DBRefs

I'm in the process of updating some legacy software that is still running on Mongo 2.4. The first step is to upgrade to the latest 2.6 then go from there.
Running the db.upgradeCheckAllDBs(); gives us the DollarPrefixedFieldName: $id is not valid for storage. errors and indeed we have some older records with legacy $id, $ref fields. We have a number of collections that look something like this:
{
"_id" : "1",
"someRef" : {"$id" : "42", "$ref" : "someRef"}
},
{
"_id" : "2",
"someRef" : DBRef("someRef", "42")
},
{
"_id" : "3",
"someRef" : DBRef("someRef", "42")
},
{
"_id" : "4",
"someRef" : {"$id" : "42", "$ref" : "someRef"}
}
I want to script this to convert the older {"$id" : "42", "$ref" : "someRef"} objects to DBRef("someRef", "42") objects but leave the existing DBRef objects untouched. Unfortunately, I haven't been able to differentiate between the two types of objects.
Using typeof and $type simply say they are objects.
Both have $id and $ref fields.
In our groovy console when you pull one of the old ones back and one of the new ones getClass() returns DBRef for both.
We have about 80k records with this legacy format out of millions of total records. I'd hate to have to brute force it and modify every record whether it needs it or not.
This script will do what I need it to do but the find() will basically return all the records in the collection.
var cursor = db.someCollection.find({"someRef.$id" : {$exists: true}});
while(cursor.hasNext()) {
var rec = cursor.next();
db.someCollection.update({"_id": rec._id}, {$set: {"someRef": DBRef(rec.someRef.$ref, rec.someRef.$id)}});
}
Is there another way that I am missing that can be used to find only the offending records?
Update
As described in the accepted answer the order matters which made all the difference. The script we went with that corrected our data:
var cursor = db.someCollection.find(
{
$where: "function() { return this.someRef != null &&
Object.keys(this.someRef)[0] == '$id'; }"
}
);
while(cursor.hasNext()) {
var rec = cursor.next();
db.someCollection.update(
{"_id": rec._id},
{$set: {"someRef": DBRef(rec.someRef.$ref, rec.someRef.$id)}}
);
}
We did have a collection with a larger number of records that needed to be corrected where the connection timed out. We just ran the script again and it got through the remaining records.
There's probably a better way to do this. I would be interested in hearing about a better approach. For now, this problem is solved.
DBRef is a client side thing. http://docs.mongodb.org/manual/reference/database-references/#dbrefs says it pretty clear:
The order of fields in the DBRef matters, and you must use the above sequence when using a DBRef.
The drivers benefit from the fact that order of fields in BSON is consistent to recognise DBRef, so you can do the same:
db.someCollection.find({ $expr: {
$let: {
vars: {firstKey: { $arrayElemAt: [ { $objectToArray: "$someRef" }, 0] } },
in: { $eq: [{ $substr: [ "$$firstKey.k", 1, 2 ] } , "id"]}
}
} } )
will return objects where order of the fields doesn't match driver's expectation.

Counting entries of subdocument in MongoDB documents

I have a document structure like so
{
"_id" : "3:/content/somepath/test.txt",
"_revisions" : {
"r152f47f1daf-0-2" : "c",
"r152f48413c1-0-2" : "c",
"r152f4851bf7-0-1" : "c"
}
}
My task is to find all documents with the following conditions:
The "_id" needs to start with "5:"
The number of revisions need to be exclusively greater then 3
The first part is easy, I have solved it with
db.nodes.find( {'_id': /^5:/} )
But I am struggling with the second part, am supposed to use $gt.
Since I am new to MongoDB, I was first looking at $size, but _revisions is not an array, it is a subdocument, right?.
Was also looking at $unwind and then counting the results, but that does not make sense either, since my result need to be the documents that match the above two conditions.
Any pointers highly appreciated.
Using the $where operator.
db.nodes.find(function() {
return (/^5:/.test(this._id) && Object.keys(this._revisions).length > 3 );
})
The problem with this as mentioned in the documentation is that:
$where evaluates JavaScript and cannot take advantage of indexes. Therefore, query performance improves when you express your query using the standard MongoDB operators (e.g., $gt, $in).
You should definitely consider to change the _revisions field to an array of sub-documents like this:
{
"_id" : "3:/content/somepath/test.txt",
"_revisions" : [
{
"rev": "r152f47f1daf-0-2",
"value": "c"
},
{
"rev": "r152f48413c1-0-2",
"value": "c"
},
{
"rev": "r152f4851bf7-0-1",
"value": "c"
}
]
}
And use the $exists operator.
db.nodes.find({ "_id": /^5:/, "_revisions.3": { "$exists": true } } )

MongoDB Aggregation with DBRef

Is it possible to aggregate on data that is stored via DBRef?
Mongo 2.6
Let's say I have transaction data like:
{
_id : ObjectId(...),
user : DBRef("user", ObjectId(...)),
product : DBRef("product", ObjectId(...)),
source : DBRef("website", ObjectId(...)),
quantity : 3,
price : 40.95,
total_price : 122.85,
sold_at : ISODate("2015-07-08T09:09:40.262-0700")
}
The trick is "source" is polymorphic in nature - it could be different $ref values such as "webpage", "call_center", etc that also have different ObjectIds. For example DBRef("webpage", ObjectId("1")) and DBRef("webpage",ObjectId("2")) would be two different webpages where a transaction originated.
I would like to ultimately aggregate by source over a period of time (like a month):
db.coll.aggregate( { $match : { sold_at : { $gte : start, $lt : end } } },
{ $project : { source : 1, total_price : 1 } },
{ $group : {
_id : { "source.$ref" : "$source.$ref" },
count : { $sum : $total_price }
} } );
The trick is you get a path error trying to use a variable starting with $ either by trying to group by it or by trying to transform using expressions via project.
Any way to do this? Actually trying to push this data via aggregation to a subcollection to operate on it there. Trying to avoid a large cursor operation over millions of records to transform the data so I can aggregate it.
Mongo 4. Solved this issue in the following way:
Having this structure:
{
"_id" : LUUID("144e690f-9613-897c-9eab-913933bed9a7"),
"owner" : {
"$ref" : "person",
"$id" : NumberLong(10)
},
...
...
}
I needed to use "owner.$id" field. But because of "$" in the name of field, I was unable to use aggregation.
I transformed "owner.$id" -> "owner" using following snippet:
db.activities.find({}).aggregate([
{
$addFields: {
"owner": {
$arrayElemAt: [{ $objectToArray: "$owner" }, 1]
}
}
},
{
$addFields: {
"owner": "$owner.v"
}
},
{"$group" : {_id:"$owner", count:{$sum:1}}},
{$sort:{"count":-1}}
])
Detailed explanations here - https://dev.to/saurabh73/mongodb-using-aggregation-pipeline-to-extract-dbref-using-lookup-operator-4ekl
You cannot use DBRef values with the aggregation framework. Instead you need to use JavasScript processing of mapReduce in order to access the property naming that they use:
db.coll.mapReduce(
function() {
emit( this.source.$ref, this["total_price"] )
},
function(key,values) {
return Array.sum( values );
},
{
"query": { "sold_at": { "$gte": start, "$lt": end } },
"out": { "inline": 1 }
}
)
You really should not be using DBRef at all. The usage is basically deprecated now and if you feel you need some external referencing then you should be "manually referencing" this with your own code or implemented by some other library, with which you can do so in a much more supported way.

How can I write a Mongoose find query that uses another field as it's conditional?

Consider the following:
I have a Mongoose model called 'Person'. In the schema for the Person mode, each Person has two fields: 'children' and 'maximum_children'. Both fields are of type Number.
I would like to write a find query that returns Persons when that Persons 'children' value is less that it's 'maximum_children' value.
I have tried:
person_model.find({
children: {
$lt: maximum_children
}
}, function (error, persons) {
// DO SOMETHING ELSE
});
and
person_model.find({
children: {
$lt: 'maximum_children'
}
}, function (error, persons) {
// DO SOMETHING ELSE
});
I'm doing something wrong in trying to specify the field name that I want to compare 'children' against.
OK.
I found a solution, just after I posted this question.
The answer seems to be:
person_model.find({
$where: "children < maximum_children"}, function (error, persons)
}, {
// DO SOMETHING ELSE
});
Seems to work OK, although it seems messy.
$where must execute its JavaScript conditional against every doc so its performance can be quite poor. Instead, you can use aggregate to include a new field in a $project stage the indicates whether the doc matches or not and then filter on that:
person_model.aggregate([
{$project: {
isMatch: {$lt: ['$children', '$maximum_children']},
doc: '$$ROOT'
}},
{$match: {isMatch: true}},
{$project: {_id: 0, doc: 1}}
], function(err, results) {...});
This uses $$ROOT to include the original doc as the doc field of the projection, with a final $project used to remove the isMatch field that was added.
results looks like:
{
"doc" : {
"_id" : ObjectId("54d04591257efd80c6965ada"),
"children" : 5,
"maximum_children" : 10
}
},
{
"doc" : {
"_id" : ObjectId("54d04591257efd80c6965add"),
"children" : 5,
"maximum_children" : 6
}
}
If you want to remove the added doc level of the objects you can use Array#map on results like so:
results = results.map(function(item) { return item.doc; });
Which reshapes results to put them back into their original form:
{
"_id" : ObjectId("54d04591257efd80c6965ada"),
"children" : 5,
"maximum_children" : 10
},
{
"_id" : ObjectId("54d04591257efd80c6965add"),
"children" : 5,
"maximum_children" : 6
}

In mongoDb, how do you remove an array element by its index?

In the following example, assume the document is in the db.people collection.
How to remove the 3rd element of the interests array by it's index?
{
"_id" : ObjectId("4d1cb5de451600000000497a"),
"name" : "dannie",
"interests" : [
"guitar",
"programming",
"gadgets",
"reading"
]
}
This is my current solution:
var interests = db.people.findOne({"name":"dannie"}).interests;
interests.splice(2,1)
db.people.update({"name":"dannie"}, {"$set" : {"interests" : interests}});
Is there a more direct way?
There is no straight way of pulling/removing by array index. In fact, this is an open issue http://jira.mongodb.org/browse/SERVER-1014 , you may vote for it.
The workaround is using $unset and then $pull:
db.lists.update({}, {$unset : {"interests.3" : 1 }})
db.lists.update({}, {$pull : {"interests" : null}})
Update: as mentioned in some of the comments this approach is not atomic and can cause some race conditions if other clients read and/or write between the two operations. If we need the operation to be atomic, we could:
Read the document from the database
Update the document and remove the item in the array
Replace the document in the database. To ensure the document has not changed since we read it, we can use the update if current pattern described in the mongo docs
You can use $pull modifier of update operation for removing a particular element in an array. In case you provided a query will look like this:
db.people.update({"name":"dannie"}, {'$pull': {"interests": "guitar"}})
Also, you may consider using $pullAll for removing all occurrences. More about this on the official documentation page - http://www.mongodb.org/display/DOCS/Updating#Updating-%24pull
This doesn't use index as a criteria for removing an element, but still might help in cases similar to yours. IMO, using indexes for addressing elements inside an array is not very reliable since mongodb isn't consistent on an elements order as fas as I know.
in Mongodb 4.2 you can do this:
db.example.update({}, [
{$set: {field: {
$concatArrays: [
{$slice: ["$field", P]},
{$slice: ["$field", {$add: [1, P]}, {$size: "$field"}]}
]
}}}
]);
P is the index of element you want to remove from array.
If you want to remove from P till end:
db.example.update({}, [
{ $set: { field: { $slice: ["$field", 1] } } },
]);
Starting in Mongo 4.4, the $function aggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.
For instance, in order to update an array by removing an element at a given index:
// { "name": "dannie", "interests": ["guitar", "programming", "gadgets", "reading"] }
db.collection.update(
{ "name": "dannie" },
[{ $set:
{ "interests":
{ $function: {
body: function(interests) { interests.splice(2, 1); return interests; },
args: ["$interests"],
lang: "js"
}}
}
}]
)
// { "name": "dannie", "interests": ["guitar", "programming", "reading"] }
$function takes 3 parameters:
body, which is the function to apply, whose parameter is the array to modify. The function here simply consists in using splice to remove 1 element at index 2.
args, which contains the fields from the record that the body function takes as parameter. In our case "$interests".
lang, which is the language in which the body function is written. Only js is currently available.
Rather than using the unset (as in the accepted answer), I solve this by setting the field to a unique value (i.e. not NULL) and then immediately pulling that value. A little safer from an asynch perspective. Here is the code:
var update = {};
var key = "ToBePulled_"+ new Date().toString();
update['feedback.'+index] = key;
Venues.update(venueId, {$set: update});
return Venues.update(venueId, {$pull: {feedback: key}});
Hopefully mongo will address this, perhaps by extending the $position modifier to support $pull as well as $push.
I would recommend using a GUID (I tend to use ObjectID) field, or an auto-incrementing field for each sub-document in the array.
With this GUID it is easy to issue a $pull and be sure that the correct one will be pulled. Same goes for other array operations.
For people who are searching an answer using mongoose with nodejs. This is how I do it.
exports.deletePregunta = function (req, res) {
let codTest = req.params.tCodigo;
let indexPregunta = req.body.pregunta; // the index that come from frontend
let inPregunta = `tPreguntas.0.pregunta.${indexPregunta}`; // my field in my db
let inOpciones = `tPreguntas.0.opciones.${indexPregunta}`; // my other field in my db
let inTipo = `tPreguntas.0.tipo.${indexPregunta}`; // my other field in my db
Test.findOneAndUpdate({ tCodigo: codTest },
{
'$unset': {
[inPregunta]: 1, // put the field with []
[inOpciones]: 1,
[inTipo]: 1
}
}).then(()=>{
Test.findOneAndUpdate({ tCodigo: codTest }, {
'$pull': {
'tPreguntas.0.pregunta': null,
'tPreguntas.0.opciones': null,
'tPreguntas.0.tipo': null
}
}).then(testModificado => {
if (!testModificado) {
res.status(404).send({ accion: 'deletePregunta', message: 'No se ha podido borrar esa pregunta ' });
} else {
res.status(200).send({ accion: 'deletePregunta', message: 'Pregunta borrada correctamente' });
}
})}).catch(err => { res.status(500).send({ accion: 'deletePregunta', message: 'error en la base de datos ' + err }); });
}
I can rewrite this answer if it dont understand very well, but I think is okay.
Hope this help you, I lost a lot of time facing this issue.
It is little bit late but some may find it useful who are using robo3t-
db.getCollection('people').update(
{"name":"dannie"},
{ $pull:
{
interests: "guitar" // you can change value to
}
},
{ multi: true }
);
If you have values something like -
property: [
{
"key" : "key1",
"value" : "value 1"
},
{
"key" : "key2",
"value" : "value 2"
},
{
"key" : "key3",
"value" : "value 3"
}
]
and you want to delete a record where the key is key3 then you can use something -
db.getCollection('people').update(
{"name":"dannie"},
{ $pull:
{
property: { key: "key3"} // you can change value to
}
},
{ multi: true }
);
The same goes for the nested property.
this can be done using $pop operator,
db.getCollection('collection_name').updateOne( {}, {$pop: {"path_to_array_object":1}})