I want to add a new object to an existing MongoDB document which I don't control and I don't want to break the vendors application. I've tried this in test code and it works fine, but I wanted some confirmation.
I'm using a RestAPI to drive a commercial product and under the hood the application is using MongoDB to persist. I can add new and arbitrary fields/objects to the JSon messages and they're persisted into Mongo as expected. Am I right that as long as my naming is different from existing/new vendor fields, then the Vendors application should just keep working, ignoring my new data?
Bonus points if there's an article covering this that I can reference.
MongoDB does not have a fixed schema and it treats all documents in a collection differently. With the new storage engine WiredTiger, even there is a document level transaction. So adding a new document to the existing collection should not matter most. However, if you are going to read that new document and its not indexed then reading time will be high
Related
I'm currently experimenting with a test collection on a LAN-accessible MongoDB server and data in a Meteor (v1.6) application. View layer of choice is React and right now I'm using the createContainer to bind the subscriptions to props.
The data that gets put in the MongoDB storage is updated on a daily basis and consists of a big set of data from several SQL databases, netting up to about 60000 lines of JSON per day. The data has been ever-so-slightly reshaped to be turned into a usable format whilst remaining as RAW as I'd like it to be.
The working solution right now is fetching all this data and doing further manipulations client-side to prepare the data for visualization. The issue should seem obvious: each client is fetching a set of documents that grows every day and repeats a lot of work on earlier entries before being ready to display. I want to do this manipulation on the server, through MongoDB's Aggregation Framework.
My initial idea is to do the aggregations on the server and to create new Collections containing smaller, more specific datasets without compromising the RAWness of the original Collection. That would mean the "reduced" Collections can still be reactive, as I've been able to confirm through testing in a Remote Desktop, subscribing to an aggregated Collection which I can update through Robo3T.
I don't know if this would be ideal. As far as storage goes, there's plenty of room for the extra Collections. But I have no idea how to set up an automated aggregation script on said server. And regarding Meteor, I've tried using meteorhacks:aggregate and jcbernack:reactive-aggregate but couldn't figure out how to deal with either one of them. If anyone is dealing, or has dealt with, something similar; I'd love to hear ideas / suggestions.
I am using MongoDB 2.6 standard. I have successfully created multiple collections, inserted data, queried data, etc. Works fine inside of the Mongo shell, as well as from NodeJS apps using MongoSkin. So far, so good. Now I need to use a third-party tool (D2RQ) to access the data. It appears that D2RQ uses the _schema collection to obtain collection names, column names, data types, and so on. D2RQ works for three of the collections because the collections are in _schema in MongoDB. A fourth collection is not in _schema and seems to be invisible. However, the fourth collection is present in MongoDB. The collection has data. I can query the collection in the Mongo shell, and from NodeJS using Mongoskin. Any idea why the collection is not appearing in _schema? Is this a MongoDB bug?
This is not a MongoDB bug. The root cause of the problem is that D2RQ uses the UnityJDBC driver to access MongoDB. There is a parameter on the JDBC connection string indicating whether or not to rebuild _schema. D2RQ does not properly pass the parameter when making the JDBC connection to MongoDB, resulting in the _schema collection being out of date on all calls after the first call. The solution has two parts:
Part one was to write a tiny NodeJS application that does nothing but force the _schema rebuild on connection. That solved my immediate issue.
Part two was to then extend the tiny NodeJS application to be a full featured export process that generates an RDF file from MongoDB. This allowed me to remove both D2RQ and the UnityJDBC driver from the solution stack.
"The most reliable component in the architecture is the one that is not present"
Speaking in general, I want to know what are the best practices for querying (and therefore indexing) of schemaless data structures? (i.e. documents)
Lets say I use MongoDB to store and query deterministic data structures in a collection. At this point all documents have the same structure therefore I can easily create indexes for any queries in my app since I know each document has required field(s) for the index.
What happens after I change the structure and try to save new documents to the db? Lets say I joined two fields FirstName and Lastname to FullName. As a result the collection contains nondeterministic data. I see two problems here:
Old indexes cannot cover new data, therefore new indexes needed that handle both fields old and new
App should take care of dealing with two representations of the documents
This may result in a big problem when there are many changes in the db resulting in many versions of document structures.
I see two main approaches:
Lazy migration. This means that each document is migrated on demand (i.e. only after loading from collection) to final structure and then stored back to colection. This approach actually does not solve the problems because it concedes nondeterminism at any point of time.
Forced migration. This is the same approach as for RDBMS migrations. The migration is performed for all documents at one point of time while the app does not run. The main con is downtime of the app.
So the question: Is there any good way of solving the problem, especially without app downtime?
If you can't have downtime then the only choice is to do the migrations "on the fly":
Change the application so that when new documents are saved the new field is created, but read from the old ones.
Update your collection with a script/queries to add the new field in the collection.
Create new indexes on that field.
Change the application so that it reads from the new fields.
Drop the unnecessary indexes and remove the old fields from the documents.
Changing the schema on a live database is never an easy process, no matter what database you use. It always requires some forward thinking and careful planning.
is indexing a pain?
Indexing is not a pain, but premature optimization is. You should always test and check that you actually need indexes before adding them and when you have them, check that they are being properly used.
If you're worried about performance issues on a live system when creating indexes, then you should consider having replica sets and doing rolling maintenance (in short: taking secondaries down from replication, creating indexes on them, bringing them back into replication and then repeating the process for all the subsequent replica set members).
Edit
What I was describing is basically a process of migrating your schema to a new one while temporary supporting both versions of the documents.
In step 1, you're basically adding support for multiple versions of documents. You're updating existing documents i.e. creating new fields, while you're reading data from the previous version fields. Step 2 is optional, because you can gradually update your documents as they are being saved.
In step 4 you're removing the support for the previous versions from your application code and migrating to a new version. Finally, in step 5 you're removing the previous version fields from your actual MongoDB documents.
I have some data in MongoDB GridFS. I am using the Spring Data GridFsOperations class to do my GridFS read/writes.
I have a requirement to replace the content of an existing GridFS file i.e. the _id and filename should stay the same, but the file content should be updated.
Spring Data [GridFsOperations] (API) primarily allows find, which returns a Mongo GridFSDBFile, and store. GridFSDBFile (API) does not allow updating content. The store method could in theory be used if the file was deleted first, and then stored with the same _id as the previous file. However store does not allow specifying the _id field.
The only solution I have found so far is to use the Mongo API directly to delete the existing file, and store a new one with the same _id. Answers to this effect are not useful: the question is specific to Spring Data MongoDB.
The reason there's no API exposed yet is that there's no support for that on the MongoDB GridFS. You essentially work around this issue implementing a pattern like described here. But as this boils down to a non-atomic operation we decided to not expose it as operation in the first place.
In case you thing there's a reliable implementation of this pattern (plus the appropriate handling of error cases) feel free to open ticket in our JIRA to discuss options.
In the project I've been working on, I've added a new field - an embedded document. Will this addition affect data prior to the change? From what I've read this shouldn't affect prior data, and this is actually one of the benefits of using MongoDB. Its just the previous data won't have that field.
Thanks!
No it won't affect your previous documents - each document in your collection can have it's own unique fields if you want. It is up to you to handle this at application level.