Mongodb - combine data from two collections - mongodb

I know this has been covered quite a lot on here, however, i'm very new to MongoDB and am struggling with applying answers i've found to my situation.
In short, I have two collections 'total_by_country_and_isrc' which is the output from a MapReduce function and 'asset_report' which contains an asset_id not present in the 'total_by_country_and_isrc' collection or the original raw data collection this was MapReduced from.
An example of the data in 'total_by_country_and_isrc' is:
{ "_id" : { "custom_id" : 4748532, "isrc" : "GBCEJ0100080",
"country" : "AE" }, "value" : 0 }
And an example of the data in the 'asset_report' is:
{ "_id" : ObjectId("51824ef016f3edbb14ef5eae"), "Asset ID" :
"A836656134476364", "Asset Type" : "Web", "Metadata Origination" :
"Unknown", "Custom ID" : "4748532", "ISRC" : "", }
I'd like to end up with the following ('total_by_country_and_isrc_with_asset_id'):
{ "_id" : { "Asset ID" : "A836656134476364", "custom_id" : 4748532,
"isrc" : "GBCEJ0100080", "country" : "AE" }, "value" : 0 }
I know how I would approach with in a relational database but I really want to try and get this working in Mongo as i'm dealing with some pretty large collections and feel Mongo is the right tool for the job.
Can anyone offer some guidance here?

I think you want to use the "reduce" output action: Output to a Collection with an Action. You'll need to regenerate total_by_country_and_isrc, because it doesn't look like asset_report has the fields it needs to generate the keys you already have in total_by_country_and_isrc – so "joining" the data is impossible.
First, write a map method that is capable of generating the same keys from the original collection (used to generate total_by_country_and_isrc) and also from the asset_report collection. Think of these keys as the "join" fields.
Next, map and reduce your original collection to create total_by_country_and_isrc with the correct keys.
Finally, map asset_report with the same method you used to generate total_by_country_and_isrc, but use a reduce function that can be used to reduce the intersection (by key) of this mapped data from asset_report and the data in total_by_country_and_isrc.

Related

why is mongodb not indexing my collection

I have created a collection and added just a name field and tried to apply the following index.
db.names.createIndex({"name":1})
Even after applying the index I see the below result.
db.names.find()
{ "_id" : ObjectId("57d14139eceab001a19f7e82"), "name" : "kkkk" } {
"_id" : ObjectId("57d1413feceab001a19f7e83"), "name" : "aaaa" } {
"_id" : ObjectId("57d14144eceab001a19f7e84"), "name" : "zzzz" } {
"_id" : ObjectId("57d14148eceab001a19f7e85"), "name" : "dddd" } {
"_id" : ObjectId("57d1414ceceab001a19f7e86"), "name" : "rrrrr" }
What am I missing here.
Khans...
the way you built your index is correct however building an ascending index on names wont return the results in ascending order.
if you need results to be ordered by name you have to use
{db.names.find().sort({names:1})}
what happens when you build an index is that when you search for data the Mongo process perform the search behind the scenes in an ordered fashion for faster outcomes.
Please note: if you just want to see output in sorted order. you dont even need an index.
You won't be able to see if an index has been successfully created (unless there is a considerable speed performance) by running a find() command.
Instead, use db.names.getIndexes() to see if the index has been created (it may take some time if you're running the index in the background for it to appear in the index list)

Complex Grid Implementation in meteor(blaze)

First let me explain schema of my collections.
I have 3 collections
company,deal,price
I want to use information from all three collection and make a single reactive,responsive table. Here is the image
Now the schema for price collection is like this
{
"_id" : "kSqH7QydFnPFHQmQH",
"timestamp" : ISODate("2015-10-11T11:49:50.241Z"),
"dealId" : "X5zTJ2y675PjmaLMx",
"deal" : "Games",
"price" : [{
"type" : "worth",
"value" : "Bat"
}, {
"type" : "Persons",
"value" : 4
}, {
"type" : "Cost",
"value" : 5
}],
"company" : "Company1"
}
Schema for company collection is
{
"_id" : "da2da"
"name" : "Company1"
}
Schema for deal collection is
{
"_id" : "X5zTJ2y675PjmaLMx",
"name" : "Games"
}
For each company there will be 3 columns added in table(worth,persons,cost)
For each deal there will be a new row in table.
As the information is coming from 3 collections into a single table. First I want to ask is it wise to make a table from 3 different collections? If yes how could I do that in blaze?
If no. Then I will have to make table from price collection only . What should be schema of this collection in best way.
P.S in both cases I want to make table reactive.
Firstly, I recommend reywood:publish-composite for publishing related collections.
Secondly there is no intrinsic problem in setting up a table like this, you'll first figure out which collection to loop over with your {{#each}} in spacebars and then you'll define helpers that return the values from the related collections to your templates.
As far as your schema design, the choice as to whether to use nesting within a collection vs. using an entirely separate collection is typically driven by size. If the related object is overall "small" then nesting can work well. You automatically get the nested object when you publish and query that collection. If otoh it's going to be "large" and/or you want to avoid having to update every document when something in the related object changes then a separate collection can be better.
If you do separate your collections then you'll want to refer to objects from the other collection by _id and not by name since names can easily change. For example in your price collection you'd want to use companyId: "da2da" instead of company: "Company1"

get all embedded documents in collection

I have contact documents which contain an embedded document "lead". So the data would look like this:
{
"_id" : ObjectId("54f8fa496d6163ad64010000"),
"name" : "teretyrrtuyytiuyi",
"email" : "rertytruyy#fdgjioj.com",
"fax" : "",
"birth_date" : null,
"phone" : "dfgdfhfgjg",
"phone_2" :
"hgjhkhjkljlj",
"lead" : { "_id" : ObjectId("54f8fa496d6163ad64020000"), "appointment status" : "dfhgfgjghjk" }
}
When there are many contacts, there will be many leads. I want to retrieve all the leads in the collection, without retrieving the owning contact. I tried the following but none seem to work:
db.lead.find()
db.contacts.find({ 'lead.$' : 1})
Any way to do this?
If that query makes sense for you, you should have probably used a different data structure. If your embedded document has an id, it is almost certainly supposed to be a first-level citizen instead.
You can work around this using the aggregation framework, but I'd consider that a hack that probably works around some more profound problem with your data model.
It's also not very elegant:
>
> db.contacts.aggregate({ $project : {
"appointment_status" : "$lead."appointment_status",
"lead_id" : "$lead.id", ... } });
>
That way, it'll look as if leads was a collection of its own, but it's not and this is just a bad hack around it.
Note that there's no wildcard operator, so if you want to have all fields projected to the root level, you'll have to do it manually. It'd be much easier to simply read the regular documents - if that's not what you need, correct your schema design.

MongoDB - How to find equals in collection and in embedded document

Gurus - I'm stuck in a situation that I can't figure out how I can query from the following collection "spouse", which has embedded document "surname" and check for equality with "surname" of this document:
{
"_id" : ObjectId("50bd2bb4fcfc6066b7ef090d"),
"name" : "Gwendolyn",
"surname" : "Davis",
"birthyear" : 1978,
"spouse" : {
"name" : "Dennis",
"surname" : "Evans",
"birthyear" : 1969
},
I need to query:
Output data for all spouses with the same surnames (if the surname of
one of the spouses is not specified, assume that it coincides with the
name of another)
I tried something like this:
db.task.find( {"surname" : { "spouse.surname" : 1 }} )
but it failed)
PLEASE PLEASE Guide me how I can achieve this any example/sample? based on this will be really helpful :-)
Thanks a lot!
You have three options.
Use $where modifier:
db.task.find({$where: 'this.spouse.surname === this.surname'})
Update all your documents and add special flag. After that you will be able to query documents by this flag. It's faster then $where, but requires altering your data.
Use MapReduce. It's quite complicated, but it allows you to do nearly anything.

Mongo Map Dot notation

I would like to construct a query that will return just the names of the classes for the datastructure below.
So far, the closest I've come is using dot notation
db.mycoll.find({name:"game1"},{"classes.1.name":true})
But the problem with this approach is that it will only return the name of the first class. Please help me get the names of all three classes.
I wish I could use a wild card as below, but I'm not sure if that exists.
db.mycoll.find({name:"game1"},{"classes.$*.name":true})
Datastructure:
{
"name" : "game1",
"classes" : {
"1" : {
"name" : "warlock",
"version" : "1.0"
},
"2" : {
"name" : "shaman",
"version" : "2.0"
},
"3" : {
"name" : "mage",
"version" : "1.0"
}
}
There is no simple query that will achieve the results you seek. MongoDB has limited support for querying against sub-objects or arrays of objects. The basic premise with MongoDB is that you are querying for the top-level document.
That said, things are changing and you still have some options:
Use the new MongoDB Aggregation Framework. This has a $project operation that should do what you're looking for.
You can return just the classes field and then merge the names together. This should be trivial with most languages.
Note, that it's not clear what you're doing with classes. Is this an array or an object? If it's an object, what does classes.1 actually represent? Is it different from classes.warlock?