Create a new aggregate collection from two existing collections? - mongodb

I have a unique field in two collections which is "ip". One collection contains machine data and the other contains geographic data
Is there a way to aggregate data from two collections and create a third collection with this data?
For example:
geo:
"ip" : "1.1.1.1", "lat" : 1.29, "lon" : 103.86
"ip" : "2.2.2.2", "lat" : 1.29, "lon" : 103.86
machines:
"ip" : "1.1.1.1", "load" : 5
"ip" : "2.2.2.2", "load" : 7
## becomes a new collection
"lat" : 1.29, "lon" : 103.86, "load" : 12
I am using the Python driver for MongoDB.

yes there is a way to do it with $out statement
let's say that you have two collections "pc" and "geo" and you are going to create collection "allData" - and mongo query will look like this:
db.pc.aggregate([
{
$lookup:
{
from:"geo" ,
localfield:"ip",
foreginField:"ip",
as: geoData
}
},
{
$unwind:"$geoData"
},
{$project:{transform document as needed}},
{
$out :"allData"
}
])
To shape your allData collection you can use $project to add/remove fields as desired.

Related

Does MongoDB takes the DBRefs inside a document as a full size document for the 16MB threshold?

For example, I have a document called Country that contains multiple DBRefs to another document called Cities. If each City document has the size of 8MB for example, I only can store two DBRefs in the Country document, or the DBRef is just a reference and is not taking the full size of the document referenced?
Compare the two different ways to store relational data:
Embedded data
Referenced data
Embedded Data:
Countries collection
(city information is embedded as an array of sub documents in country document)
{
"_id" : ObjectId("5e336c2bc0a47e0a958a2b9c"),
"name" : "France",
"cities" : [
{
"name" : "Paris",
"population" : 2190327
},
{
"name" : "Marseille",
"population" : 862211
}
]
}
{
"_id" : ObjectId("5e336c85c0a47e0a958a2b9d"),
"name" : "Germany",
"cities" : [
{
"name" : "Berlin",
"population" : 3520031
},
{
"name" : "Hamburg",
"population" : 1787408
},
{
"name" : "Munich",
"population" : 1450381
}
]
}
Referenced Data
(Cities are in separate collection)
Cities collection
{
"_id" : ObjectId("5e336cfdc0a47e0a958a2b9e"),
"name" : "Paris",
"population" : 2190327
}
{
"_id" : ObjectId("5e336cfdc0a47e0a958a2b9f"),
"name" : "Marseille",
"population" : 862211
}
{
"_id" : ObjectId("5e336d11c0a47e0a958a2ba0"),
"name" : "Berlin",
"population" : 3520031
}
{
"_id" : ObjectId("5e336d11c0a47e0a958a2ba1"),
"name" : "Hamburg",
"population" : 1787408
}
{
"_id" : ObjectId("5e336d11c0a47e0a958a2ba2"),
"name" : "Munich",
"population" : 1450381
}
Countries collection
{
"_id" : ObjectId("5e336c2bc0a47e0a958a2b9c"),
"name" : "France",
"cities" : [
DBRef("cities", ObjectId("5e336cfdc0a47e0a958a2b9e"), "mydatabase"),
DBRef("cities", ObjectId("5e336cfdc0a47e0a958a2b9f"), "mydatabase")
]
}
{
"_id" : ObjectId("5e336c85c0a47e0a958a2b9d"),
"name" : "Germany",
"cities" : [
DBRef("cities", ObjectId("5e336d11c0a47e0a958a2ba0"), "mydatabase"),
DBRef("cities", ObjectId("5e336d11c0a47e0a958a2ba1"), "mydatabase"),
DBRef("cities", ObjectId("5e336d11c0a47e0a958a2ba2"), "mydatabase")
]
}
Evaluation
When comparing these two we can see the size of the documents differ. When using references the size of the country document is smaller. This example is academic at best, but consider if the sub documents are large...
Size of documents (in bytes)
Embedded documents:
Object.bsonsize(db.countries.findOne({name: "France"}))
144
Object.bsonsize(db.countries.findOne({name: "Germany"}))
189
Referenced versions
Object.bsonsize(db.countries.findOne({name: "France"}))
166
Object.bsonsize(db.countries.findOne({name: "Germany"}))
224
Conclusions
Well, the whole point of this exercise was to show the embedded documents are heavier than referenced documents, but the sub documents are so small (in this example) it caused this experiment to show the opposite! The embedded documents are smaller than the references because ObjectIds are heavy (relatively). Consider if the embedded sub documents are large. This will sway the experiment the other way. You can experiment with different data sets to see the differences in sizes to help determine the best schema approach. Size limits are only one aspect of schema design. DBRef lookups are far slower than embedded documents because each sub document is essentially a query itself. Also, consider the idea of embedding an ObjectID as a raw data point instead of an actual DBRef - this could yield a smaller and faster approach. Some articles have described using this approach as a poor-mans DBRef and advise against DBRef.
Please add comments or questions and I will be happy to discuss!

Zipping two collections in mongoDB

Not a question about joins in mongoDB
I have two collections in mongoDB, which do not have a common field and which I would like to apply a zip function to (like in Python, Haskell). Both collections have the same number of documents.
For example:
Let's say one collection (Users) is for users, and the other (Codes) is of unique randomly generated codes.
Collection Users:
{ "_id" : ObjectId(""), "userId" : "123"}
{ "_id" : ObjectId(""), "userId" : "456"}
Collection Codes:
{ "_id" : ObjectId(""), "code" : "randomCode1"}
{ "_id" : ObjectId(""), "code" : "randomCode2"}
The desired output would to assign a user to a unique code. As follows:
Output
{ "_id" : ObjectId(""), "code" : "randomCode1", "userId" : "123"}
{ "_id" : ObjectId(""), "code" : "randomCode2", "userId" : "456"}
Is there any way of doing this with the aggregation pipeline?
Or perhaps with map reduce? Don't think so because it only works on one collection.
I've considered inserting another random id into both collections for each document pair, and then using $lookup with this new id, but this seems like an overkill. Also the alternative would be to export and use Python, since there aren't so many documents, but again I feel like there should be a better way.
I would do something like this to get the records from collection 1 & 2 and merge the required fields into single object.
You have already confirmed that number of records in collection 1 and 2 are same.
The below code will loop through the cursor and map the required fields into one object. Finally, you can print the object to console or insert into another new collection (commented the insert).
var usersCursor = db.users.find( { } );
var codesCursor = db.codes.find( { } );
while (usersCursor.hasNext() && codesCursor.hasNext()) {
var user = usersCursor.next();
var code = codesCursor.next();
var outputObj = {};
outputObj ["_id"] = new ObjectId();
outputObj ["userId"] = user["userId"];
outputObj ["code"] = code["code"];
printjson( outputObj);
//db.collectionName.insertOne(outputObj);
}
Output:-
{
"_id" : ObjectId("58348512ba41f1f22e600c74"),
"userId" : "123",
"code" : "randomCode1"
}
{
"_id" : ObjectId("58348512ba41f1f22e600c75"),
"userId" : "456",
"code" : "randomCode2"
}
Unlike relational database in MongoDB you doing JOIN stuff at the app level (so it will be easy to horizontal scale the database). You need to do that in the app level.

Model tree structure with aggregation in MongoDb

I want to aggregate a model tree structure directly in mongodb database with Aggregations.
Is it possible to do hierarchical aggregations like that ? Currently, I do that in a program.
I want to use a collection like :
{
"Name" : "john",
"Parents" : ["sandy", "bryan"]
}
{
"Name" : "sandy",
"Parents" : ["bill", "daisy"]
}
{
"Name" : "bryan",
"Parents" : ["dora", "david"]
}
{
"Name" : "dora",
"Parents" : ["cliff", "darla"]
}
And generate a new collection like :
{
"Name" : "sandy",
"Parents" : ["bill", "daisy"],
"Ancestrors" : ["bill", "daisy"]
}
{
"Name" : "dora",
"Parents" : ["cliff", "darla"],
"Ancestrors" : ["cliff", "darla"]
}
{
"Name" : "bryan",
"Parents" : ["dora", "david"],
"Ancestrors" : ["dora", "david", "cliff", "darla"]
}
{
"Name" : "john",
"Parents" : ["sandy", "bryan"],
"Ancestrors" : ["sandy", "bryan", "bill", "daisy", "dora", "david", "cliff", "darla"]
}
I don't think that use a MapReduce to do a tree structure aggregation is possible since MongoDb 2.4 because we can't use "db.mycollection.find(...)" in map functions.
So we can't retrieve hierarchicals documents in map functions...
In MongoDB 2.4, map-reduce operations, the group command, and $where operator expressions cannot access certain global functions or
properties, such as db, that are available in the mongo shell.
When upgrading to MongoDB 2.4, you will need to refactor your code if
your map-reduce operations, group commands, or $where operator
expressions include any global shell functions or properties that are
no longer available, such as db.

MongoDB Group querying for Embeded Document

I have a mongo document which has structure like
{
"_id" : "THIS_IS_A_DHP_USER_ID+2014-11-26",
"_class" : "weight",
"items" : [
{
"dateTime" : ISODate("2014-11-26T08:08:38.716Z"),
"value" : 98.5
},
{
"dateTime" : ISODate("2014-11-26T08:18:38.716Z"),
"value" : 95.5
},
{
"dateTime" : ISODate("2014-11-26T08:28:38.663Z"),
"value" : 90.5
}
],
"source" : "MANUAL",
"to" : ISODate("2014-11-26T08:08:38.716Z"),
"from" : ISODate("2014-11-26T08:08:38.716Z"),
"userId" : "THIS_IS_A_DHP_USER_ID",
"createdDate" : ISODate("2014-11-26T08:38:38.776Z")
}
{
"_id" : "THIS_IS_A_DHP_USER_ID+2014-11-25",
"_class" : "weight",
"items" : [
{
"dateTime" : ISODate("2014-11-25T08:08:38.716Z"),
"value" : 198.5
},
{
"dateTime" : ISODate("2014-11-25T08:18:38.716Z"),
"value" : 195.5
},
{
"dateTime" : ISODate("2014-11-25T08:28:38.716Z"),
"value" : 190.5
}
],
"source" : "MANUAL",
"to" : ISODate("2014-11-25T08:08:38.716Z"),
"from" : ISODate("2014-11-25T08:08:38.716Z"),
"userId" : "THIS_IS_A_DHP_USER_ID",
"createdDate" : ISODate("2014-11-26T08:38:38.893Z")
}
The query that want to fire on this document structure,
finding documents for a particular user id
unwinding the embedded array
Grouping the documents based over _id with -
summing the items.value of the embedded array
getting the minimum of the items.dateTime of the embedded array
Note. The sum and min, I want to get as a object i.e. { value : sum , dateTime : min of the items.dateTime} inside an array of items
Can this be achieved in an single aggregation call using push or some other technique.
When you group over a particular _id, and apply aggregation operators such as $min and $sum, there exists only one record per group(_id), that holds the sum and the minimum date for that group. So there is no way to obtain a different sum and a different minimum date for the same _id, which also logically makes no sense.
What you would want to do is:
db.collection.aggregate([
{$match:{"userId":"THIS_IS_A_DHP_USER_ID"}},
{$unwind:"$items"},
{$group:{"_id":"$_id",
"values":{$sum:"$items.value"},
"dateTime":{$min:"$items.dateTime"}}}
])
But in case when you do not query for a particular userId, then you would have multiple groups, each group having its own sum and min date. Then it makes sense to accumulate all these results together in an array using the $push operator.
db.collection.aggregate([
{$unwind:"$items"},
{$group:{"_id":"$_id",
"result":{$sum:"$items.value"},
"dateTime":{$min:"$items.dateTime"}}},
{$group:{"_id":null,"result":{$push:{"value":"$result",
"dateTime":"$dateTime",
"id":"$_id"}}}},
{$project:{"_id":0,"result":1}}
])
you should use following aggregation may it works
db.collectionName.aggregate(
{"$unwind":"$items"},
{"$match":{"userId":"THIS_IS_A_DHP_USER_ID"}},
{"$group":{"_id":"$_id","sum":{"$sum":"$items.value"},
"minDate":{"$min":"$items.dateTime"}}}
)

MongoDb select different object types in collection

I have a MongoDB collection called Users.
I do not know object types in advance.
This collection has at least 3 different type structured of objects. For example:
Type 1:
{
"_id" : "9e1736d4-f3a1-47ed-bb51-3318129664f0",
"userid" : 6711,
"registerDate" : "2014-10-28T14:42:06",
"lastLoginDate" : "2014-10-28T14:42:06",
}
Type 2:
{
"_id" : "9e1736d4-f3a1-47ed-bb51-3318129664f1",
"userid" : 6712,
"email" : "johndoe#example.com",
"username" : "john doe",
}
Type 3:
{
"_id" : "9e1736d4-f3a1-47ed-bb51-3318129664f2",
"userid" : 63713,
"city" : "orange",
"state" : "new york",
"country" : "US",
}
How can I get distinct types (or top 1st object from each type) from my collection?
So if I have 1 million users and 3 different structure above, I would like to get 3 results.
Add a field "type" identifing type of the document.
Then use db.collection.group to group results on your wish.
Issue query for each type, you may
db.c.findOne({registerDate: {$existed: true}});
db.c.findOne({email: {$existed: true}});
db.c.findOne({city: {$existed: true}});
This should be very fast and simple. :)