MongoDB:Spring data:- Embedded Array Count - mongodb

How to identify the number of elements in the particular embedded document (or) how to find the number of elements in the embedded array?
Award
{
"brand" : [
{
"name" : "multi",
"descr" : "Multpple"
},
{
"name" : "multi",
"descr" : "two"
},
{
"name" : "multi",
"descr" : "three"
}
],
"name" : "Test",
"narname" : "Nar"
}
For Eg: In the above document how to find the number of elements that is inside the embedded array BRAND using Spring Data.?
Any pointers would be greatly appreciated !

I don't think there is a method which can get the answer directly.
You can use aggregate to implement it. For example, if you want to get the count of elements in array brand in a specific document, this way should be available (run on mongo shell):
db.Award.aggregate({$match:{_id:id}}, {$unwind:"$brand"}, {$group:{_id:"$_id", count:{$sum:1}}});
count is the result you want.
Then implement that using spring-data-mongodb syntax.
-------------- APPENDED ---------------------
// You can find the relative aggregation method from MongoTemplate.java file to handle your requirements.
// For exmaple:
// public <O> AggregationResults<O> aggregate(Aggregation aggregation, Class<?> inputType, Class<O> outputType)
// The version is around spring-data-mongodb-1.5.0 or higher.
// Below I call the mongo-java-driver directly because I find it needs some time to learn it from spring-data-mongodb. :)
protected int getArraySize(Object id, String collName) {
// Attention: make sure id is in the correct data type because the following statement would not convert it automatically.
// Issue codes according to this command line:
// db.Award.aggregate({$match:{_id:id}}, {$unwind:"$brand"}, {$group:{_id:"$_id", count:{$sum:1}}});
DBObject match = BasicDBObjectBuilder.start().push("$match").append("_id", id).get();
DBObject unwind = new BasicDBObject("$unwind", "$brand");
DBObject group = BasicDBObjectBuilder.start().push("$group").append("_id", "$_id").push("count").append("$sum", 1).get();
List<DBObject> pipeline = Arrays.asList(match, unwind, group);
// This aggregate method is supported in higher version of mongo-java-driver, here I use is 2.12.3
AggregationOutput aggr = this.mongoTemplate.getCollection(collName).aggregate(pipeline);
for (DBObject dbo : aggr.results()) {
Object count = dbo.get("count");
if (count instanceof Number) {
return ((Number)count).intValue();
}
}
return 0;
}

Related

Getting distinct values from object array MongoDB

{
"_id" : NUUID("f5050a5d-b3be-4de6-a135-a119436fb511"),
"CoursesData" : [
{
"Name" : "Naturgræs",
"Value" : 1
}
],
"FacilityType" : {
"_id" : NUUID("a1b4844b-518b-40e2-8aa5-8ee399ac2d4e")
}
}
I want to retrieve a list with the distinct values from the field Name inside my object array of CourseData. Filtered by FacilityType._id. I tried using both $facet and the distinct operator, but it doesn't seems to like object arrays.
My result should look like this (or similar):
FacilityType (a1b4844b-518b-40e2-8aa5-8ee399ac2d4e),
CourseData: [Name1, Name2, Name3]
Update
From the answer given below, this is how you do it with the C# driver, if anyone needs to do the same.
FieldDefinition<FacilityDocument, string> field = "CoursesData.Name";
var result = FacilityCollection.Distinct(field, Builders<FacilityDocument>.Filter.Eq(x => x.FacilityType.ID, new Guid("a1b4844b-518b-40e2-8aa5-8ee399ac2d4e"))).ToList();
You can use distinct(). It will return distinct element for a specific field from document which match a query
For example if you want distinct value of Name field for facility "a1b4844b-518b-40e2-8aa5-8ee399ac2d4e", run this query:
db.collection.distinct("CoursesData.Name", {"FacilityType._id": "a1b4844b-518b-40e2-8aa5-8ee39ac2d4e"})
it will return :
[ "Naturgræs", ... ]

Mongo DB Update to a sub array document

I have a structure
{
"_id" : ObjectId("562dfb4c595028c9r74fda67"),
"office_id" : "123456",
"employee" : [
{
"status" : "declined",
"personId" : "123456",
"updated" : NumberLong("1428407042401")
}
]
}
This office can have multiple persons.Is there a way if I want to update the employee status for all the person under that specific office_id to say "approved".I am trying the same through plain mongo java driver.What I am trying is get all the office id using a query builder , then iterate over the list and save the document.Somewhat I am not satisfied with the iterative approach(fetch,iterate and save ) that I am following.Please suggest if there is alternative way.
You can update using the $ positional operator:
db.collection.update(
{
"office_id" : "123456",
"employee.status": "declined"
},
{
"$set": { "employee.$.status": "approved" }
}
);
The positional operator saves the index (0 in the case above) of the element from the array that matched the query. This means that if you knew the position of the element beforehand (which is nearly impossible in a real life case), you could just change the update statement to: {"$set": {"employee.0.status": "approved"}}.
Please note that the $ positional operator (for now) updates the first relevant document ONLY, there is a JIRA ticket for this.
EDIT:
Using the Java driver, the above update may be done like so (untested):
BasicDBObject update = new BasicDBObject();
BasicDBObject query = new BasicDBObject();
query.put("office_id", "123456");
query.put("employee.status", "declined");
BasicDBObject set = new BasicDBObject("$set", update);
update.put(""employee.$.status", "approved");
collection.update(query, set);

MongoDB: $in with an ObjectId array

Just a quick question about something I've just experienced and I'm still thinking about why:
mongos> db.tickets.count({ "idReferenceList" : { "$in" : [ { "$oid" : "53f1f09f2cdcc8f339e5efa2"} , { "$oid" : "5409ae2e2cdc31c5aa0ce0a5"}]}});
0
mongos> db.tickets.count({ "idReferenceList" : { "$in" : [ ObjectId("53f1f09f2cdcc8f339e5efa2") , ObjectId("5409ae2e2cdc31c5aa0ce0a5")]}});
2
I thought that both $oid and ObjectId spelling formats where exactly the same for MongoDB. Does anyone know why with the first query return 0 results and with the second one is returning 2 (the right answer)?
Furthermore, I'm using Morphia framework which uses MongoDB Java driver to interact with MongoDB. I've realised that there exists a problem by searching with $in operator in ObjectIds arrays over fields that are not _id by executing this lines of code:
List< ObjectId > fParams = new ArrayList< ObjectId >();
fParams.add(...);
Query<Ticket> query = genericDAO.createQuery();
query.field("idReferenceList").in(fParams);
result = genericDAO.find(query).asList();
Thank you very much in advance.
Regards,
Luis Cappa
Both these formats are valid representations of an object id in MongoDB, according to the documentation,
http://docs.mongodb.org/manual/reference/mongodb-extended-json/
and they represented differently in the two modes,
Strict Mode mongo Shell Mode
----------- ----------------
{ "$oid": "<id>" } ObjectId( "<id>" )
So, to query fields which contain objectid, from the shell/console mode, you need to use ObjectId("<id>").
Which is the syntax to be followed in the mongo shell mode.
Hence the query:
db.tickets.count({ "idReferenceList" : { "$in" : [ ObjectId("53f1f09f2cdcc8f339e5efa2") , ObjectId("5409ae2e2cdc31c5aa0ce0a5")]}});
would return you row count.
Now to do it via the Java API,
You need to do it as below:
String[] ids = {"53f1f09f2cdcc8f339e5efa2","5409ae2e2cdc31c5aa0ce0a5"};
ObjectId[] objarray = new ObjectId[ids.length];
for(int i=0;i<ids.length;i++)
{
objarray[i] = new ObjectId(ids[i]);
}
BasicDBObject inQuery = new BasicDBObject("$in", objarray);
BasicDBObject query = new BasicDBObject("idReferenceList", inQuery);
DBCursor cursor = db.collection.find(query);
while(cursor.hasNext())
{
DBObject doc = cursor.next();
// process the doc.
}
I faced the same issue.
I resolved like this way.
db.collection('post').find({ 'postIds': { $elemMatch: { $in:
deletedPosts.map(_post => {ObjectId(_post._id)}) } } })

Use MongoDB aggregation to find set intersection of two sets within the same document

I'm trying to use the Mongo aggregation framework to find where there are records that have different unique sets within the same document. An example will best explain this:
Here is a document that is not my real data, but conceptually the same:
db.house.insert(
{
houseId : 123,
rooms: [{ name : 'bedroom',
owns : [
{name : 'bed'},
{name : 'cabinet'}
]},
{ name : 'kitchen',
owns : [
{name : 'sink'},
{name : 'cabinet'}
]}],
uses : [{name : 'sink'},
{name : 'cabinet'},
{name : 'bed'},
{name : 'sofa'}]
}
)
Notice that there are two hierarchies with similar items. It is also possible to use items that are not owned. I want to find documents like this one: where there is a house that uses something that it doesn't own.
So far I've built up the structure using the aggregate framework like below. This gets me to 2 sets of distinct items. However I haven't been able to find anything that could give me the result of a set intersection. Note that a simple count of set size will not work due to something like this: ['couch', 'cabinet'] compare to ['sofa', 'cabinet'].
{'$unwind':'$uses'}
{'$unwind':'$rooms'}
{'$unwind':'$rooms.owns'}
{'$group' : {_id:'$houseId',
use:{'$addToSet':'$uses.name'},
own:{'$addToSet':'$rooms.owns.name'}}}
produces:
{ _id : 123,
use : ['sink', 'cabinet', 'bed', 'sofa'],
own : ['bed', 'cabinet', 'sink']
}
How do I then find the set intersection of use and own in the next stage of the pipeline?
You were not very far from the full solution with aggregation framework - you needed one more thing before the $group step and that is something that would allow you to see if all the things that are being used match up with something that is owned.
Here is the full pipeline
> db.house.aggregate(
{'$unwind':'$uses'},
{'$unwind':'$rooms'},
{'$unwind':'$rooms.owns'},
{$project: { _id:0,
houseId:1,
uses:"$uses.name",
isOkay:{$cond:[{$eq:["$uses.name","$rooms.owns.name"]}, 1, 0]}
}
},
{$group: { _id:{house:"$houseId",item:"$uses"},
hasWhatHeUses:{$sum:"$isOkay"}
}
},
{$match:{hasWhatHeUses:0}})
and its output on your document
{
"result" : [
{
"_id" : {
"house" : 123,
"item" : "sofa"
},
"hasWhatHeUses" : 0
}
],
"ok" : 1
}
Explanation - once you unwrap both arrays you now want to flag the elements where used item is equal to owned item and give them a non-0 "score". Now when you regroup things back by houseId you can check if any used items didn't get a match. Using 1 and 0 for score allows you to do a sum and now a match for item which has sum 0 means it was used but didn't match anything in "owned". Hope you enjoyed this!
So here is a solution not using the aggregation framework. This uses the $where operator and javascript. This feels much more clunky to me, but it seems to work so I wanted to put it out there if anyone else comes across this question.
db.houses.find({'$where':
function() {
var ownSet = {};
var useSet = {};
for (var i=0;i<obj.uses.length;i++){
useSet[obj.uses[i].name] = true;
}
for (var i=0;i<obj.rooms.length;i++){
var room = obj.rooms[i];
for (var j=0;j<room.owns.length;j++){
ownSet[room.owns[j].name] = true;
}
}
for (var prop in ownSet) {
if (ownSet.hasOwnProperty(prop)) {
if (!useSet[prop]){
return true;
}
}
}
for (var prop in useSet) {
if (useSet.hasOwnProperty(prop)) {
if (!ownSet[prop]){
return true;
}
}
}
return false
}
})
For MongoDB 2.6+ Only
As of MongoDB 2.6, there are set operations available in the project pipeline stage. The way to answer this problem with the new operations is:
db.house.aggregate([
{'$unwind':'$uses'},
{'$unwind':'$rooms'},
{'$unwind':'$rooms.owns'},
{'$group' : {_id:'$houseId',
use:{'$addToSet':'$uses.name'},
own:{'$addToSet':'$rooms.owns.name'}}},
{'$project': {int:{$setIntersection:["$use","$own"]}}}
]);

In MongoDB mapreduce, how can I flatten the values object?

I'm trying to use MongoDB to analyse Apache log files. I've created a receipts collection from the Apache access logs. Here's an abridged summary of what my models look like:
db.receipts.findOne()
{
"_id" : ObjectId("4e57908c7a044a30dc03a888"),
"path" : "/videos/1/show_invisibles.m4v",
"issued_at" : ISODate("2011-04-08T00:00:00Z"),
"status" : "200"
}
I've written a MapReduce function that groups all data by the issued_at date field. It summarizes the total number of requests, and provides a breakdown of the number of requests for each unique path. Here's an example of what the output looks like:
db.daily_hits_by_path.findOne()
{
"_id" : ISODate("2011-04-08T00:00:00Z"),
"value" : {
"count" : 6,
"paths" : {
"/videos/1/show_invisibles.m4v" : {
"count" : 2
},
"/videos/1/show_invisibles.ogv" : {
"count" : 3
},
"/videos/6/buffers_listed_and_hidden.ogv" : {
"count" : 1
}
}
}
}
How can I make the output look like this instead:
{
"_id" : ISODate("2011-04-08T00:00:00Z"),
"count" : 6,
"paths" : {
"/videos/1/show_invisibles.m4v" : {
"count" : 2
},
"/videos/1/show_invisibles.ogv" : {
"count" : 3
},
"/videos/6/buffers_listed_and_hidden.ogv" : {
"count" : 1
}
}
}
It's not currently possible, but I would suggest voting for this case: https://jira.mongodb.org/browse/SERVER-2517.
Taking the best from previous answers and comments:
db.items.find().hint({_id: 1}).forEach(function(item) {
db.items.update({_id: item._id}, item.value);
});
From http://docs.mongodb.org/manual/core/update/#replace-existing-document-with-new-document
"If the update argument contains only field and value pairs, the update() method replaces the existing document with the document in the update argument, except for the _id field."
So you need neither to $unset value, nor to list each field.
From https://docs.mongodb.com/manual/core/read-isolation-consistency-recency/#cursor-snapshot
"MongoDB cursors can return the same document more than once in some situations. ... use a unique index on this field or these fields so that the query will return each document no more than once. Query with hint() to explicitly force the query to use that index."
AFAIK, by design Mongo's map reduce will spit results out in "value tuples" and I haven't seen anything that will configure that "output format". Maybe the finalize() method can be used.
You could try running a post-process that will reshape the data using
results.find({}).forEach( function(result) {
results.update({_id: result._id}, {count: result.value.count, paths: result.value.paths})
});
Yep, that looks ugly. I know.
You can do Dan's code with a collection reference:
function clean(collection) {
collection.find().forEach( function(result) {
var value = result.value;
delete value._id;
collection.update({_id: result._id}, value);
collection.update({_id: result.id}, {$unset: {value: 1}} ) } )};
A similar approach to that of #ljonas but no need to hardcode document fields:
db.results.find().forEach( function(result) {
var value = result.value;
delete value._id;
db.results.update({_id: result._id}, value);
db.results.update({_id: result.id}, {$unset: {value: 1}} )
} );
All the proposed solutions are far from optimal. The fastest you can do so far is something like:
var flattenMRCollection=function(dbName,collectionName) {
var collection=db.getSiblingDB(dbName)[collectionName];
var i=0;
var bulk=collection.initializeUnorderedBulkOp();
collection.find({ value: { $exists: true } }).addOption(16).forEach(function(result) {
print((++i));
//collection.update({_id: result._id},result.value);
bulk.find({_id: result._id}).replaceOne(result.value);
if(i%1000==0)
{
print("Executing bulk...");
bulk.execute();
bulk=collection.initializeUnorderedBulkOp();
}
});
bulk.execute();
};
Then call it:
flattenMRCollection("MyDB","MyMRCollection")
This is WAY faster than doing sequential updates.
While experimenting with Vincent's answer, I found a couple of problems. Basically, if you perform updates within a foreach loop, this will move the document to the end of the collection and the cursor will reach that document again (example). This can be circumvented if $snapshot is used. Hence, I am providing a Java example below.
final List<WriteModel<Document>> bulkUpdate = new ArrayList<>();
// You should enable $snapshot if performing updates within foreach
collection.find(new Document().append("$query", new Document()).append("$snapshot", true)).forEach(new Block<Document>() {
#Override
public void apply(final Document document) {
// Note that I used incrementing long values for '_id'. Change to String if
// you used string '_id's
long docId = document.getLong("_id");
Document subDoc = (Document)document.get("value");
WriteModel<Document> m = new ReplaceOneModel<>(new Document().append("_id", docId), subDoc);
bulkUpdate.add(m);
// If you used non-incrementing '_id's, then you need to use a final object with a counter.
if(docId % 1000 == 0 && !bulkUpdate.isEmpty()) {
collection.bulkWrite(bulkUpdate);
bulkUpdate.removeAll(bulkUpdate);
}
}
});
// Fixing bug related to Vincent's answer.
if(!bulkUpdate.isEmpty()) {
collection.bulkWrite(bulkUpdate);
bulkUpdate.removeAll(bulkUpdate);
}
Note : This snippet takes an average of 7.4 seconds to execute on my machine with 100k records and 14 attributes (IMDB dataset). Without batching, it takes an average of 25.2 seconds.