Suppose that I have a database with student information:
{'student_name' : 'Alen', 'subjects' : {'cse101' : 4, 'cse102' : 3, 'cse201' : 4}}
Suppose I need to store the aggregate information of the student as well. I can add the field 'aggregate' : 3.67 to the record. But the aggregate changes when another subject is added to the subjects list. Is there a way I can write a "dynamic field" which could calculate the aggregate whenever requested? Something like student['aggregate'] which is not persistent but available when needed?
P.S: Aggregate is just a simple example. I am dealing with something more complex involving various other fields of the element.
There are no dynamic or calculated fields in MongoDB at the moment (although there are some tickets in the jira).
But you can always implement this functionality in the app code.
Related
I have a collection named Company which has the following structure:
{
"_id" : ObjectId("57336ea1a7454c0100d889e4"),
"currentMonth" : 62,
"variables1": { ... },
...
"variables61": { ... },
"variables62" : {
"name" : "Test",
"email": "email#test.com",
...
},
"country" : "US",
}
My need is to be able to search for companies by name with up-to-date data. I don't have permission to change this data structure because many applications still use it. For the moment I haven't found a way to index these variables with this data structure, which makes the search slow.
Today each of these documents can be several megabytes in size and there are over 20,000 of them in this collection.
The system I want to implement uses a search engine to index the names of companies, but for that it needs to be able to detect changes in the collection.
MongoDB's change stream seems like a viable option but I'm not sure how to make it scalable and efficient.
Do you have any suggestions that would help me solve this problem? Any suggestion on the steps needed to set up the above system?
Usually with MongoDB you can add new fields to documents and existing applications would simply ignore the extra fields (though they naturally would not be populated by old code). Therefore:
Create a task that is regularly executed which goes through all documents in your collection, figures out the name for each document from its fields, then writes the name into a top-level field.
Add an index on that field.
In your search code, look up by the values of that field.
Compare the calculated name to the source-of-truth name. If different, discard the document.
If names don't change once set, step 1 only needs to go through documents that are missing the top-level name and step 4 is not needed.
Using the change detection pattern with monstache, I was able to synchronise in real time MongoDB with ElasticSearch, performing a Filter based on the current month and then Map the result of the variables to be indexed 🎊
I've created a document model where I'm using a field named id in addition to MongoDB's auto-generated _id field.
Will this cause any problems for me down the line?
I can imagine a circumstance where something in Mongo might assume my "id" property is referring to "_id" instead, when it isn't (like some API feature with the good intention of preventing you from having to type that underscore, where in my case such a well-meaning feature would be a disaster).
Will this be ok?
I have an use case in which I want to compare each record of two collections in mongodb and after comparing each record I need to find mismatch fields of all record.
Let us take an example, in collection1 I have one record as {id : 1, name : "bks"}
and in collection2 I have a record as {id : 1, name : "abc"}
When I compare above two records with same key, then field name is a mismatch field as name is different.
I am thinking to achieve this use case using mapreduce in mongodb. But I am facing some problems while accessing collection name in map function. When I tried to compare it in map function, I got error as : "errmsg" : "exception: ReferenceError: db is not defined near '
Can anyone give me some thoughts on how to compare records using mapreduce?
I might have helped you to read the documentation:
When upgrading to MongoDB 2.4, you will need to refactor your code if your map-reduce operations, group commands, or $where operator expressions include any global shell functions or properties that are no longer available, such as db.
So from your error fragment, you appear to be referencing db in order to access another collection. You cannot do that.
If indeed you are intending to "compare" items in one collection to those in another, then there is no other approach other than looping code:
db.collection.find().forEach(function(doc) {
var another = db.anothercollection.findOne({ "_id": doc._id });
// Code to compare
})
There is simply no concept of "joins" as such available to MongoDB, and operations such as mapReduce or aggregate or others strictly work with one collection only.
The exception is db.eval(), but as per all of strict warnings in the documentation, this is almost always a very bad idea.
Live with your comparison in looping code.
I use mongo_insert() three times to insert my data in three different collections. The problem is that the "_id" field must be exactly the same in each of the collections, but I do not know how to (ideally) recover and reuse the "_id" field generated in my first mongo_insert...
Please, advice me how to do it.
Normally, you could have different field, like CustomId for your private needs, and leave _id for mongo generation.
But if you still need it to be exactly the same - there could be 2 variants:
1) setting custom generated _id do each doc.
2) Save first doc, then read it again, check it's _id and set it to the other docs.
We are scraping a huge products website.
So, we will get and persist so many products, and almost each product has a different set of features/details.
Naturally, we consider using a NoSQL database (MongoDB) for this job. We will make a collection "products", and a document for each product where each key/value pair map to detail_name/detail_description of the product.
Since products are quite different, we have almost no idea what are the product details/features. In other words, we have no knowledge of the available keys.
According to this link MongoDB case insensitive key search, It is a "gap" for MongoDB (that we do not have some idea of the available keys).
Is this true? If yes, what are the alternatives?
Your key problem isn't that much of an issue for MongoDB provided you can live with a slightly different schema and big indexes :
Normally you would do something like :
{
productId :..
details : {
detailName1 : detailValue1,
detailName2 : detailValue2;
}
}
But if you do this you can index the details field :
{
productId :..
details : [
{field : detailName1, value : detailValue1},
{field : detailName2, value : detailValue2}
]
}
Do note that this will result in a very big index. Not necessarily a problem but something to be aware of. The index will then be {details.field:1, details.value:1} (or just {details:1} if you're not adding additional fields per detail).
Once you've scraped all of the data you could examine it to determine if there is a field/set of fields in the documents that you could add an index to in order to improve performance.