I'm trying to remove some portions of data from documents with given fairly simple structure, which will get much deeper and heavier than this as the project goes:
{
id: "...",
name: "...",
phone: "...",
data: {
key1: "val1",
...
}
...
}
I'm aware that there is no way of updating/removing sections from the nested parts other than replacing the whole tree with updated tree.
For example, if I want to delete key1 from document data, I need to update the documents data section with a copy of it where key1 is not contained
document.update({data: new dict without key1})
Is there any eaiser way of deleting a portion from the root of document -like name field- without updating the whole document with a copy of itself that does not contain the name key and value? Do I have to deep copy and filter the document every time i need to remove some portions of data?
Below is a query that removes a key from the root of the document:
r.table('foo').get(document_id).replace(r.row.without('key'))
You can also do it for multiple documents as follows:
r.table('foo').filter(condition).replace(r.row.without('key'))
As of the upcoming 1.8 release, you will also be able to do it for nested keys as follows:
r.table('foo').get(document_id).replace(r.row.without({data: { key1: true}}))
Currently, the commands above essentially replace the document with the copy of itself without the relevant keys on the server. In the next few releases this will be heavily optimized to minimize document copying in memory (so while it looks like you're replacing the document with a copy of itself without a key, under the hood the operation will be performed destructively without any copying). Future releases might update the underlying structure so that the full document won't have to be written to disk.
If you use the without command, you won't have to do anything to take advantage of these optimizations (other than upgrading the server).
Hope this helps.
Related
I'd like to use the fresh Atlas search index feature to perform search through my models.
It seems to me that the data model that I used can't be coupled with this mongo feature.
It seems to work really fine on embedded models, but for consistency reasons I can't nest objects, they are referenced by their id.
Example
Collection Product
{
name: "Foo product"
quantity: 3
tags: [
"id_123"
]
}
Collection Vendor
{
name: "Bar vendor"
address: ...
tags: [
"id_123"
]
}
Collection Tags
{
id: "id_123"
name: "food"
}
What I want
I want to type food in my search bar, and find the products associated to the tag food.
Detailed problematic
I have multiple business objects that are labelled by the same tag. I'd like to build a search index to search through my products, but I would want to $lookup before to denormalize my ids and to be able to find all the products that have the tag "food".
From the documentation, the $search operator must be the first operator of the aggregation pipeline, preventing me from lookup before searching. I had the idea to build a view first, to unpack the id with the correct tag to prepare the field. But impossible to build a search index on a view.
Is it completely impossible to make this work ? Do I need to give up on consistency on my tags by flattening and by embedding each of them directly in each model I need them to be able to use this feature ? That means if I want to update a tag, I need to find every business object that carry around the tag, and perform the update ?
I got in touch with the MongoDB support, and the Atlas Search proposed three ways to resolve this problem. I want to share the solutions with you if anybody steps on the same problem than I had to go through due to this model design.
Recommended: Transform the model in the Embedded way
The ideal MongoDB way of doing this would be to denormalize the model, and not using reference to various model. It has some drawbacks, like painful updates: each tags would be added directly in the model of Product, and Vendor, so there is no $lookup operations needed anymore. For my part, it is a no-go, the tags are planned to be updatable, and will be shared in almost every business objects I plan on making a model.
Collection Product
{
name: "Foo product"
quantity: 3
tags: [
"food"
]
}
Collection Vendor
{
name: "Bar vendor"
address: ...
tags: [
"food"
]
}
Not recommended but possible: Break the request in multiple parts
This would imply to keep the existing model, and to request the collections individually and resolving the sequential requests, application side.
We could put a Atlas Search index on Tags collection and use the research feature to find out the id of the tag we want. Then we could use this id to fetch directly in the Product/Vendor collection to find the product corresponding to the "food" tag. By tinkering the search application side, we could obtain satisfying results.
It is not the recommended way of doing it.
Theoretically my preferred way: Use the Materialized View feature
That is an intermediary solution, that will be the one I will try out. It is not perfect but for what I see, it tries to conciliated both of the capabilities of Referenced Model and Embedded model.
Atlas Search indexes are not usable on regular views. The workaround that can make this possible is Materialized view (which are more collection than view in definitive). This is made through the usage of the $merge operator which enables to save the results of ones aggregation pipeline in a collection. By re-running the pipeline, we can update the Materialized view. The trick is to make all required $lookup operations to denormalize the referenced model. Then use as final step the $merge operator to create the collection that supports the Atlas Search Index from scratch as any collection.
The only concern is the interval of update to choose for updating the Materialized view, that can be performance greedy. But on the paper, it is a really good solution for people like me that cannot (won't?) pay the price of painful updates strategy on Embedded models.
I was reading the manual references part from the MongoDB Database References documentation, but I don't really understand the part of the "second query to resolve the referenced fields". Could you give me an example of this query, so i can get a better idea of what they are talking about.
"Manual references refers to the practice of including one document’s _id field in another document. The application can then issue a second query to resolve the referenced fields as needed."
The documentation is pretty clear in the manual section you are referring to which is the section on Database References. The most important part in comprehending this is contained in the opening statement on the page:
"MongoDB does not support joins. In MongoDB some data is denormalized, or stored with related data in documents to remove the need for joins. However, in some cases it makes sense to store related information in separate documents, typically in different collections or databases."
The further information covers the topic of how you might choose to deal with accessing data that you store in another collection.
There is the DBRef specification which without going into too much more detail, may be implemented in some drivers as a way that when these are found in your documents they will automatically retrieve (expand) the referenced document into the current document. This would be implemented "behind the scenes" with another query to that collection for the document of that _id.
In the case of Manual References this is basically saying that there is merely a field in your document that has as it's content the ObjectId from another document. This only differs from the DBRef as something that will never be processed by a base driver implementation is leaves how you handle any further retrieval of that other document soley up to you.
In the case of:
> db.collection.findOne()
{
_id: <ObjectId>,
name: "This",
something: "Else",
ref: <AnotherObjectId>
}
The ref field in the document is nothing more than a plain ObjectId and does nothing special. What this allows you to do is submit your own query to get the Object details this refers to:
> db.othercollection.findOne({ _id: <AnotherObjectId > })
{
_id: <ObjectId>
name: "That"
something: "I am a sub-document to This!"
}
Keep in mind that all of this processes on the client side via the driver API. None of this fetching other documents happens on the server in any case.
I am confused with the term 'link' for connecting documents
In OrientDB page http://www.orientechnologies.com/orientdb-vs-mongodb/ it states that they use links to connect documents, while in MongoDB documents are embedded.
Since in MongoDB http://docs.mongodb.org/manual/core/data-modeling-introduction/, documents can be referenced as well, I can not get the difference between linking documents or referencing them.
The goal of Document Oriented databases is to reduce "Impedance Mismatch" which is the degree to which data is split up to match some sort of database schema from the actual objects residing in memory at runtime. By using a document, the entire object is serialized to disk without the need to split things up across multiple tables and join them back together when retrieved.
That being said, a linked document is the same as a referenced document. They are simply two ways of saying the same thing. How those links are resolved at query time vary from one database implementation to another.
That being said, an embedded document is simply the act of storing an object type that somehow relates to a parent type, inside the parent. For example, I have a class as follows:
class User
{
string Name
List<Achievement> Achievements
}
Where Achievement is an arbitrary class (its contents don't matter for this example).
If I were to save this using linked documents, I would save User in a Users collection and Achievement in an Achievements collection with the List of Achievements for the user being links to the Achievement objects in the Achievements collection. This requires some sort of joining procedure to happen in the database engine itself. However, if you use embedded documents, you would simply save User in a Users collection where Achievements is inside the User document.
A JSON representation of the data for an embedded document would look (roughly) like this:
{
"name":"John Q Taxpayer",
"achievements":
[
{
"name":"High Score",
"point":10000
},
{
"name":"Low Score",
"point":-10000
}
]
}
Whereas a linked document might look something like this:
{
"name":"John Q Taxpayer",
"achievements":
[
"somelink1", "somelink2"
]
}
Inside an Achievements Collection
{
"somelink1":
{
"name":"High Score",
"point":10000
}
"somelink2":
{
"name":"High Score",
"point":10000
}
}
Keep in mind these are just approximate representations.
So to summarize, linked documents function much like RDBMS PK/FK relationships. This allows multiple documents in one collection to reference a single document in another collection, which can help with deduplication of data stored. However it adds a layer of complexity requiring the database engine to make multiple disk I/O calls to form the final document to be returned to user code. An embedded document more closely matches the object in memory, this reduces Impedance Mismatch and (in theory) reduces the number of disk I/O calls.
You can read up on Impedance Mismatch here: http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch
UPDATE
I should add, that choosing the right database to implement for your needs is very important from the start. If you have a lot of questions about each database, it might make sense to contact each supplier and get some of their training material. MongoDB offers 2 free courses you can take to learn more about their product and best uses at MongoDB University. OrientDB does offer training, however it is not free. It might be best to try contacting them directly and getting some sort of pre-sales training (if you are looking to license the db), usually they will put you in touch with some sort of pre-sales consultant to help you evaluate their product.
MongoDB works like RDBMS where the object id is like a foreign key. This means a "JOIN" that is run-time expensive. OrientDB, instead, has direct links that are created only once and have a very low run-time cost.
I want to save changes made to my document. The easiest way to do this is to store the actual changes made to a document. What I mean:
var changes = {
$set: {
text: 'Some text.'
}
}
db.posts.update({
_id: _id
}, changes)
db.changes.insert({
postid: _id,
changes: changes
})
However I'm getting the error (with good reason):
Error: key $set must not start with '$'
What's the easiest way to store changes?
Or perhaps I'm approaching the problem wrong and you have a better solution. I want users to be able to see a log of changes people make to any post or, in fact, anything. I'm not going to make a function for every time of change. Editing the text is just one of many ways to make changes to a document.
Another option, with many reservations, is to store your change log as a json string. The content would not be as easily searched, of course, but you can retain the simplicity of storing your original data as a string and decoding the json on retrieval. If you are simply storing a change log, this approach might work.
db.changes.insert({
postid: _id,
changes: JSON.stringify(changes)
})
This is a limitation within MongoDB. There are certain reserved characters one of them being $ due to how querying must work. When using operators there would be ambiguity between the document in the collection and the document used for updating.
I would recommend stripping out the $ symbols. I would instead use these words in place of the operators you are trying to use:
CREATE
SET
DELETE
I'm using MongoDB to store user profiles, and now I want to use GridFS to store a picture for each profile.
The two ways I'm comparing linking the two documents are:
A) Store a reference to the file ID in the user's image field:
User:
{
"_id": ObjectId('[user_id here]'),
"username": 'myusername',
"image": ObjectId('[file_id here]')
}
B) Store a reference to the user in the file's metadata:
File metadata:
{
"user_id": ObjectId('[user_id here]')
}
I know in a lot of ways it's up to me and dependent on the particulars of the app (it'll be mobile, if that helps), but I'm just wondering if there's any universal benefit to doing it one way or the other?
The answer here really depends on your application's usage pattern. My assumption (feel free to correct me) is that the most likely pattern is something like this:
Look Up User --> Find User --> Display Profile(Fetch Picture)
With this generalized use case, with method A, you find the user document, to display the profile) wich contains the image object ID and you subsequently fetch the file using that ID (2 basic operations and you are done).
Note: the actual fetching of the file from GridFS I am treating as a single logical operation, in reality there are multiple operations involved, but most of the drivers/APIs obscure this anyway.
With method B, you are going to have to find the user document, then do another query to find the relevant user_id in the file metadata collection, then go fetch the file. By my count, that is three operations (an extra find you do not have with method A).
Does that make sense?
Of course, if my assumption is incorrect and your application is (for example) image driven, then your query pattern may come up with a different answer.