Mongo: Import results of an aggregate query? - mongodb

Sorry for the basic question, I'm new to mongo and learning my way around.
I have run an aggregate query in Mongo:
> var result = db.urls.aggregate({$group : {_id : "$pagePath"} });
> result
{ "_id" : "/section1/page1" }
{ "_id" : "/section1/page2" }
...
Type "it" for more
I would now like to save the results of this aggregate query into a new collection. This is what I've tried:
> db.agg1.insert(result);
WriteResult({ "nInserted" : 1 })
But this seems to have inserted all the rows as just one row:
> db.agg1.count()
1
> db.agg1.findOne();
{ "_id" : "/section1/page1" }
{ "_id" : "/section1/page2" }
...
Type "it" for more
How can I insert these as separate rows?
I've tried inserting the _id directly, without success:
> db.agg1.insert(result._id);
2014-12-17T15:23:26.679+0000 no object passed to insert! at src/mongo/shell/collection.js:196

Use the $out pipeline operator for that:
db.urls.aggregate([
{$group : {_id : "$pagePath"} },
{$out: "agg1"}
]);
Note that $out was added in MongoDB 2.6.

Related

Wrong reference in Spring MongoDB data aggregation

I have mongodb documents of following kind:
{
"_id" : {
"refId" : ObjectId("55e44dd70a975a6fec7ae66e"),
"someString" : "foo"
},
"someValue" : 1,
"date" : NumberLong("1441025536869"),
"subdoc" : {
"someRef" : ObjectId("xf2h55e44dd70a975a6fec7a"),
"count" : 99
}
}
Want to get list of documents with unique "subdoc.someRef" with earliest "date" (if there are any documents with same "subdoc.someRef") and sorted by any field. The issue is in sorting by field "subdoc.count".
I use following spring data aggregation:
Aggregation agg = Aggregation.newAggregation(
Aggregation.match(match),
Aggregation.project("date", "someValue")
.and("subdoc.someRef").as("someRef")
.and("subdoc.count").as("count")
.and("_id.someString").as("someString")
.and("_id.refId").as("refId"),
Aggregation.sort(Sort.Direction.ASC, "date"),
Aggregation.group("someRef")
.min("date").as("date")
.first("refId").as("refId")
.first("someValue").as("someValue")
.first("count").as("count"),
Aggregation.project("someValue", "date", "refId", "count")
.and("_id").as("someRef"),
Aggregation.sort(pageable.getSort()),
Aggregation.skip(pageable.getOffset()),
Aggregation.limit(pageable.getPageSize())
);
Everething is fine except how spring data converts: .first("count").as("count")
I always got "count" : null in aggregation result.
DBObject created by spring is not what I've expected. That line in log concerns me:
"$group" : { "_id" : "$someRef" , "date" : { "$min" : "$date"} , "refId" : { "$first" : "$_id.refId"} , "someValue" : { "$first" : "$someValue"} , "count" : { "$first" : "$subdoc.count"}}}
I cannot understand why it always puts "$subdoc.count" instead of putting "$count" as result of previous step of aggreagation pipeline. So "count" in result is always null as "$subdoc.count" is always null in last grop step.
How to make Spring use the value that I want instead of putting reference?
Thanks to comment, the solution is to eliminate projections and updating group stage. Grouping work well with subdocument references, that I didn't expected:
Aggregation agg = Aggregation.newAggregation(
Aggregation.match(match),
Aggregation.sort(Sort.Direction.ASC, "date"),
Aggregation.group("subdoc.someRef")
.first("subdoc.count").as("count")
.first("_id.refId").as("refId")
.first("_id.someString").as("someString")
.first("date").as("date")
.first("someValue").as("someValue"),
Aggregation.sort(pageable.getSort()),
Aggregation.skip(pageable.getOffset()),
Aggregation.limit(pageable.getPageSize())
);

pymongo query for distinct values and case insensitive

How to get distinct values from mongodb collection with case insensitive. with given examples I can able to find distinct values.
collection:location schema
{ "_id" : ObjectId("542bc237e75e4a30c2e13b7e"),"place" : ["Hyderabad"]}
{ "_id" : ObjectId("542bc238e75e4a30c2e13b7f"),"place" : ["hyderabad"]}
Example:
from pymongo import MongoClient
MongoClient client = MongoClient('mongodb://localhost:27017/')
db = client.india
collection = db.location
doc = collection.distinct("place")
print doc [[u'Hyderabad'],[u'hyderabad']]
But I need to get only one value as hyderabad, as value being same in two documents.
Why is place an array? I guess you have > 1 values in it sometimes in real documents? Start by unwinding, then use $toLower string operator in an otherwise standard compute-distinct-values aggregation pipeline:
db.test.aggregate([
{ "$unwind" : "$place" },
{ "$group" : { "_id" : { "$toLower" : { "$place" } } } }
])

$or query in mongodb not working

I already have this document at the db:
> db.test.find()
{ "_id" : ObjectId("4fd349242b153bfbd95a15a8"), "nombre" : "Javier", "apellido" : "Roger" }
Now I execute this query:
db.test.find({"nombre": "Javier"})
{ "_id" : ObjectId("4fd349242b153bfbd95a15a8"), "nombre" : "Javier", "apellido" : "Roger" }
It works as spected.
But when I execute this query, mongodb is not returning any results:
db.test.find({$or:[{"nombre": "Javier"}, {"apellido": "Javier"}]})
When I insert that document your syntax works for me.
$or was new in MongoDB v1.6. Is it possible you're running a really old version?

self referencing update using MongoDB

I wonder if there is a way to make a self referencing update in MongoDB, so you can use object's params on a $set query. Here is an example:
> db.labels.save({"name":"label1", "test":"hello"})
> db.labels.save({"name":"label2", "test":"hello"})
> db.labels.save({"name":"label3", "test":"hello"})
> db.labels.find()
{ "_id" : ObjectId("4f1200e2f8509434f1d28496"), "name" : "label1", "test" : "hello" }
{ "_id" : ObjectId("4f1200e6f8509434f1d28497"), "name" : "label2", "test" : "hello" }
{ "_id" : ObjectId("4f1200eaf8509434f1d28498"), "name" : "label3", "test" : "hello" }
I saw that you can use this syntax on $where queries: http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-JavascriptExpressionsand%7B%7B%24where%7D%7D
> db.myCollection.find( { a : { $gt: 3 } } );
> db.myCollection.find( { $where: "this.a > 3" } );
> db.myCollection.find("this.a > 3");
> f = function() { return this.a > 3; } db.myCollection.find(f);
So, I tried with:
db.labels.update({"test":"hola"}, {$set : {"test": this.name})
but it didn't work.
The expected result is:
{ "_id" : ObjectId("4f1200e2f8509434f1d28496"), "name" : "label1", "test" : "label1" }
{ "_id" : ObjectId("4f1200e6f8509434f1d28497"), "name" : "label2", "test" : "label2" }
{ "_id" : ObjectId("4f1200eaf8509434f1d28498"), "name" : "label3", "test" : "label3" }
Any thoughts? Thanks in advance
Update:
It can be done by now using
db.labels.updateMany(
{"test":"hola"},
[{ $set: { test: "$name" }}],
)
Old Answer
At present there is no straight way to do that. But you can workaround this by
db.labels.find({"test":"hola"}).forEach(function (doc) {
doc.test = doc.name;
db.labels.save(doc);
})
new in MongoDB 4.2
[FYI] Below approach avoids row-by-row operations (which can cause performance issues), and shifts the processing load onto the DB itself.
"Starting in MongoDB 4.2, the db.collection.update() method can
accept an aggregation pipeline that specifies the modifications to
perform." docs
The pipeline has access to each documents' fields, thus allowing self-referencial updates.
Please see the documentation on this, which includes an example of this sort of update.
Following the example from the OP's question the update would be:
db.labels.update(
{"test":"hello"},
[{ $set: { test: "$name" }}],
{ multi: true }
);
Please note that the $set used in the pipeline refers to the aggregation stage $set, and not the update operator $set.
For those familiar with the aggregate pipeline in earlier MongoDB versions: the $set stage is an alias for $addFields.

Get "data from collection b not in collection a" in a MongoDB shell query

I have two MongoDB collections that share a common _id. Using the mongo shell, I want to find all documents in one collection that do not have a matching _id in the other collection.
Example:
> db.Test.insert({ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "foo" : 1 })
> db.Test.insert({ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "foo" : 2 })
> db.Test.insert({ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 })
> db.Test.insert({ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 })
> db.Test.find()
{ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "foo" : 1 }
{ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "foo" : 2 }
{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 }
{ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }
> db.Test2.insert({ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "bar" : 1 });
> db.Test2.insert({ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "bar" : 2 });
> db.Test2.find()
{ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "bar" : 1 }
{ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "bar" : 2 }
Now I want some query or queries that returns the two documents in Test where the _id's do not match any document in Test2:
{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 }
{ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }
I've tried various combinations of $not, $ne, $or, $in but just can't get the right combination and syntax. Also, I don't mind if db.Test2.find({}, {"_id": 1}) is executed first, saved to some variable, which is then used in a second query (though I can't get that to work either).
Update: Zachary's answer pointing to the $nin answered the key part of the question. For example, this works:
> db.Test.find({"_id": {"$nin": [ObjectId("4f08a75f306b428fb9d8bb2e"), ObjectId("4f08a766306b428fb9d8bb2f")]}})
{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 }
{ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }
But (and acknowledging this is not scalable but trying to it anyway because its not an issue in this situation) I still can't combine the two queries together in the shell. This is the closest I can get, which is obviously less than ideal:
vals = db.Test2.find({}, {"_id": 1}).toArray()
db.Test.find({"_id": {"$nin": [ObjectId(vals[0]._id), ObjectId(vals[1]._id)]}})
Is there a way to return just the values in the find command so that vals can be used directly as the array input to $nin?
In mongo 3.2 the following code seems to work
db.collectionb.aggregate([
{
$lookup: {
from: "collectiona",
localField: "collectionb_fk",
foreignField: "collectiona_fk",
as: "matched_docs"
}
},
{
$match: {
"matched_docs": { $eq: [] }
}
}
]);
based on this https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#use-lookup-with-an-array example
Answering your follow-up. I'd use map().
Given this:
> b1 = {i: 1}
> db.b.save(b1)
> db.b.save({i: 2})
> db.a.save({_id: b1._id})
All you need is:
> vals = db.a.find({}, {id: 1}).map(function(a){return a._id;})
> db.b.find({_id: {$nin: vals}})
which returns
{ "_id" : ObjectId("4f08c60d6b5e49fa3f6b46c1"), "i" : 2 }
You will have to save the _ids from collection A to not pull them again from collection B, but you can do it using $nin. See Advanced Queries for all of the MongoDB operators.
Your end query, using the example you gave would look something like:
db.Test.find({"_id": {"$nin": [ObjectId("4f08a75f306b428fb9d8bb2e"),
ObjectId("4f08a766306b428fb9d8bb2f")]}})`
Note that this approach won't scale. If you need a solution that scales, you should be setting a flag in collections A and B indicating if the _id is in the other collection and then query off of that instead.
Updated for second part:
The second part is impossible. MongoDB does not support joins or any sort of cross querying between collections in a single query. Querying from one collection, saving the results and then querying from the second is your only choice unless you embed the data in the rows themselves as I mention earlier.
I've made a script, marking all documents on the second collection that appears in first collection. Then processed the second collection documents.
var first = db.firstCollection.aggregate([ {'$unwind':'$secondCollectionField'} ])
while (first.hasNext()){ var doc = first.next(); db.secondCollection.update( {_id:doc.secondCollectionField} ,{$set:{firstCollectionField:doc._id}} ); }
...process the second collection that has no mark
db.secondCollection.find({"firstCollectionField":{$exists:false}})