MongoDB aggregation $out step: store data in another database - mongodb

I'm building a new big data project and we are using MongoDB as operational database. We use Pentaho Kettle to run the ETL that processes and cleans de data.
In a step of this ETL we make a MongoDB aggregation pipeline with the next operators in this order:
{$unwind: '$metrics'},
{$match:{'metrics.event.name': 'Hover',
'observationTime':{
$gte:{
$date:"${DATA}"
}
}
}
},
{ $project : { '_id': 0,
'remoteLocation':1,
"remoteIP":1,
"language" : 1,
"observationTime" : 1,
"device" : 1,
"session_ID" : 1,
"user_ID" : 1,
"metrics" : 1,
}
},
{ $out: "Hover_tmp"}
My problem is that this query is executed over a replica database, so I cannot write there and that's what $out does in the final step.
A posible solution would be specify a collection of another non-replica MongoDB database.
Is that posible ?
Do you find any other solution?

Related

Mongo aggregate is not updating the actual document

As can be seen from the below example when I do aggregation it spits
out the required result but the actual result is not getting replaces.
Could some tell me how to persist aggregate o/p?
> db.demo95.find();
{ "_id" : ObjectId("5eed924ae3fc5c755e1198a2"), "Id" : "5ab9cbe531c2ab715d42129a" }
> db.demo95.aggregate([ { "$addFields": { "Id" : { "$toObjectId": "$Id" } }} ])
{ "_id" : ObjectId("5eed924ae3fc5c755e1198a2"), "Id" : ObjectId("5ab9cbe531c2ab715d42129a") }
> db.demo95.find();
{ "_id" : ObjectId("5eed924ae3fc5c755e1198a2"), "Id" : "5ab9cbe531c2ab715d42129a" }
Aggregate is supposed to read the data from a collection. You can write the output to another collection by using a $out or $merge stage.
Only from v4.4 (not generally available yet as of June 20th, 2020), you can use a $merge stage to output to the same collection.
However, starting from version 4.2, you can use "updates with aggregation pipeline". The syntax for the pipeline is the same, but you can use only selected stages.
Your query can be translated to:
db.demo95.updateMany({}, [ { "$addFields": { "Id" : { "$toObjectId": "$Id" } }} ])
Refer to updateMany with aggregation pipeline for more information.
If you have an issue with updateMany, you can refer to another answer by #whoami on a different question:
As of now, aggregation-pipeline in .updateMany() is not supported by
many clients even few mongo shell versions - back then my ticket to
them got resolved by using .update(), if it doesn't work then try to
use update + { multi : true }.

how to design mongo db model and query with different collection?

I have three collections how can i create a model such that if i want to find a auto_mobile_reference_no of a user with adjustor id: "ABA123" how can i do that?
//Company collection
{
"company_id" : "NUV123",
"company_name" : "ABC",
}
//Adjustor collection
{
"admin" : true,
"claim_adjustor_id" : "ABA123",
"company_id" : "NUV123",
"adjustor_username" : "test",
"adjustor_password" : "test"
},
{
"admin" : true,
"claim_adjustor_id" : "XYQ324",
"company_id" : "NUV123",
"adjustor_username" : "test1",
"adjustor_password" : "test22"
}
//Image collection
{
"claim_adjustor_id" : "ABA123",
"automobile_reference_no" : "1LNHM83W13Y609413",
"date_last_predicted" : "03/12/2019"
}
MongoDB is not designed to do efficient joins across collections. You should look at merging your data (based on access pattern) into a single collections.
If your collections are unsharded, you can however you aggregation framework. You can use combination of $match and $lookup stages to query a foreign collection and join the data with the other. As of version 4.0, $lookup doesn't support sharded collections. So it might not work for you if your dataset is huge.
You can read more here

MongoDB 4.0 aggregation addFields not saving documents after using toDate

I have the following documents,
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "0"
},
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "20180330"
},
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "20180402"
},
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "20180323"
},
I tried to convert date to ISODate using $toDate in aggregation,
db.documents.aggregate( [ { "$addFields": { "received_date": { "$cond": [ {"$ne": ["$date", "0"] }, {"$toDate": "$date"}, new Date("1970-01-01") ] } } } ] )
the query executed fine, but when I
db.documents.find({})
to examine all the documents, nothing changed, I am wondering how to fix it. I am using MongoDB 4.0.6 on Linux Mint 19.1 X64.
As they mentioned in the comments, aggregate doesn't update documents in the database directly (just an output of them).
If you'd like to permanently add a new field to documents via aggregation (aka update the documents in the database), use the following .forEach/.updateOne method:
Your example:
db.documents
.aggregate([{"$addFields":{"received_date":{"$cond":[{"$ne":["$date","0"]}, {"$toDate": "$date"}, new Date("1970-01-01")]}}}])
.forEach(function (x){db.documents.updateOne({_id: x._id}, {$set: {"received_date": x.received_date}})})
Since _id's value is an ObjectID(), there may be a slight modification you need to do to {_id:x._id}. If there is, let me know and I'll update it!
Another example:
db.users.find().pretty()
{ "_id" : ObjectId("5acb81b53306361018814849"), "name" : "A", "age" : 1 }
{ "_id" : ObjectId("5acb81b5330636101881484a"), "name" : "B", "age" : 2 }
{ "_id" : ObjectId("5acb81b5330636101881484b"), "name" : "C", "age" : 3 }
db.users
.aggregate([{$addFields:{totalAge:{$sum:"$age"}}}])
.forEach(function (x){db.users.updateOne({name: x.name}, {$set: {totalAge: x.totalAge}})})
Being able to update collections via the aggregation pipeline seems to be quite valuable because of what you have the power to do with aggregation (e.g. what you did in your question, doing calculations based on other fields within the document, etc.). I'm newer to MongoDB so maybe updating collections via aggregation pipeline is "bad practice", but it works and it's been quite valuable for me. I wonder why it isn't more straight-forward to do?
Note: I came up with this method after discovering Nazo's now-deprecated .save() method. Shoutout to Nazo!

How can we achieve "Select For Update" in Mongodb?

Below is my sample document. I wanted to load this document & read total_comments value & perform certain logic on it and update the document again with new total_comments value. How should I ensure total_comments value is not updated by some other request before I complete my above explained steps?
{ "_id" : ObjectId("56782a933d5c6ca02100002b"), "total_comments" : 12, "time_updated" : 1450715963 }
In mysql, we can do it by doing "select for update"?. How can we achieve this in Mongodb?.
Here is my MongoDB version:
> db.version()
3.0.7
Here is my storage engine details:
> db.serverStatus().storageEngine
{ "name" : "mmapv1" }
Use findAndModify() to update a document.
For example you can increment total_comments by 1 with
db.collection.findAndModify({
_id: ObjectId("56782a933d5c6ca02100002b")
}, {}, {
$inc: { "total_comments": 1 }
});

Save Subset of MongoDB Collection to Another Collection

I have a set like so
{date: 20120101}
{date: 20120103}
{date: 20120104}
{date: 20120005}
{date: 20120105}
How do I save a subset of those documents with the date '20120105' to another collection?
i.e db.subset.save(db.full_set.find({date: "20120105"}));
I would advise using the aggregation framework:
db.full_set.aggregate([ { $match: { date: "20120105" } }, { $out: "subset" } ])
It works about 100 times faster than forEach at least in my case. This is because the entire aggregation pipeline runs in the mongod process, whereas a solution based on find() and insert() has to send all of the documents from the server to the client and then back. This has a performance penalty, even if the server and client are on the same machine.
Here's the shell version:
db.full_set.find({date:"20120105"}).forEach(function(doc){
db.subset.insert(doc);
});
Note: As of MongoDB 2.6, the aggregation framework makes it possible to do this faster; see melan's answer for details.
Actually, there is an equivalent of SQL's insert into ... select from in MongoDB. First, you convert multiple documents into an array of documents; then you insert the array into the target collection
db.subset.insert(db.full_set.find({date:"20120105"}).toArray())
The most general solution is this:
Make use of the aggregation (answer given by #melan):
db.full_set.aggregate({$match:{your query here...}},{$out:"sample"})
db.sample.copyTo("subset")
This works even when there are documents in "subset" before the operation and you want to preserve those "old" documents and just insert a new subset into it.
Care must be taken, because the copyTo() command replaces the documents with the same _id.
There's no direct equivalent of SQL's insert into ... select from ....
You have to take care of it yourself. Fetch documents of interest and save them to another collection.
You can do it in the shell, but I'd use a small external script in Ruby. Something like this:
require 'mongo'
db = Mongo::Connection.new.db('mydb')
source = db.collection('source_collection')
target = db.collection('target_collection')
source.find(date: "20120105").each do |doc|
target.insert doc
end
Mongodb has aggregate along with $out operator which allow to save subset into new collection. Following are the details :
$out Takes the documents returned by the aggregation pipeline and writes them to a specified collection.
The $out operation creates a new collection in the current database if one does not already exist.
The collection is not visible until the aggregation completes.
If the aggregation fails, MongoDB does not create the collection.
Syntax :
{ $out: "<output-collection>" }
Example
A collection books contains the following documents:
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 }
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 }
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
The following aggregation operation pivots the data in the books collection to have titles grouped by authors and then writes the results to the authors collection.
db.books.aggregate( [
{ $group : { _id : "$author", books: { $push: "$title" } } },
{ $out : "authors" }
] )
After the operation, the authors collection contains the following documents:
{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }
In the asked question, use following query and you will get new collection named 'col_20120105' in your database
db.products.aggregate([
{ $match : { date : "20120105" } },
{ $out : "col_20120105" }
]);
You can also use $merge aggregation pipeline stage.
db.full_set.aggregate([
{$match: {...}},
{ $merge: {
into: { db: 'your_db', coll: 'your_another_collection' },
on: '_id',
whenMatched: 'keepExisting',
whenNotMatched: 'insert'
}}
])