mongodb schema design - add collection or extend existing one - mongodb

There's a collection with 100.000 documents. Only 10 of them must have additional property that is not necessary for other documents (e.g. list of departments with only top ones have property 'Location');
As far as I understand both approaches should work just fine, but which one is preferable since using noSql db:
add one more collection with documents that have 2 property: DepartmentId, Location.
add property 'Location' to only selected documents, so others won't have it.

The problem you are facing is well known. You have the same with source code for example.
When you are updating a piece of code, do you save it as User.js, User2.js, User3.js ... ?
Or do you use a versionning system like git and have an unique User.js?
Translating the git analogy to your issue, you should update the current data.
In mongodb you actually have two choice to perform the update.
Update the model in your code, and update every entry in database to match the new model.
Create a new model that will apply to new entries, and still have the old model to handle old formatted data.
use-more-than-one-schema-per-collection-on-mongodb

Related

Preventing duplicates in MongoDB using Spring Data (Spring Roo)

I have been trying to get my head wrapped around MongoDB, as it's used by Spring, so I decided to start a little project in Spring Roo.
In my project, I am storing my User Login data to MongoDB. The trouble is that the registration process, which creates a new User object and stores it in the MongoDB, has a tendency to create duplicates despite the fact I have #Unique on the loginId field.
Now, I know part of the problem is that I am thinking about things from a JPA/RDBMS perspective, and MongoDB is not a relational DB and thus has a different set of parameters in which to operate with, but I having trouble finding guidance in anything more than a VERY simple sample code.
First, what Spring/Other annotations are available, and more importantly, commonly used when dealing with MongoDB from a Spring-world? Second, when dealing with documents that need to be "uniqued", how does one typically do this? Do you first search on the unique field to ensure it's not already there first, then do the insert? Third, in JPA-land, I could use the annotations #PrePersist and #PreUpdate to do last-minute data manipulation, like MD5-hashing passwords that have been updated or adding/updating a "Last Modified" date just prior to storing. I know this are JPA-isms, but can I still use those, and if not, is there an alternative for use with Spring Data/MongoDB?
I ended up using the #Id annotation on my Entities, which indicates which field is used as the id field. As long as the field is unique, writting subsequent updates will properly replace the existing entity instead of adding a new one.
I ended up creating additional method to check if there exists a data which have a duplicate value to the one we are entering.
If it exists, i return failure mentioning that there exist duplicate value. Otherwise it saves the newly entered value

How do I tell EF to ignore columns from my database?

I'm building my EF (v4.0.30319) data model from my SQL Server database. Each table has Created and Updated fields populated via database triggers.
This database is the backend of a ASP.NET Web API application and I recently discovered a problem. The problem is, since the Created and Updated fields are not populated in the model passed into the api endpoint, they are written to the database as NULL. This overwrites any values already in the database for those properties.
I discovered I can edit the EF data model and just delete those two columns from the entity. It works and the datetimes are not overwritten with NULL. But this leads to another, less serious but more annoying, problem... my data model has a bunch of tables that contain these properties and all the tables need to be updated by removing these two columns.
Is there a way to tell EF to ignore certain columns in entities across the entire data model without manually deleting them?
As far as I know, no. Generating the model from the database is going to always create all of the fields from the database table. Well, as long as it has a primary key it will.
It is possible to only update the fields that you want i.e. don't include the "Created" and "Updated" fields in your create and update methods. I'd have to say though, that I'd think it'd be better if those fields didn't even exist on the model at that point. You may at one point see those fields on the model and not remember that they won't get persisted to the DB.
You may want to look into just inserting the datetimes into those fields when you call your create() and update() methods. Then you could just ditch the triggers. You'd obviously want to use a class library for all of your database operations so this functionality would be in one place. To keep it nice and neat, you know?

Managing changes in class structure to be consistent with mongodb collection

We are using mongodb with c#. We are trying to figure out a way to keep our collection consistent seamlessly. Right now, if a developer make any changes to the class structure(add a field or change data type or changing the property within a nested class) he/she has to change the mongo collection manually.
Its a pain as our project is growing and the developers working on the project keeps increasing. Was wondering whether someone already have figured out a way to manage this issue.
Research
I found a similar question. however, couldn't find the solution.
Found a way to find all properties Finding the properties; however, datatype and nested documents becomes an issue.
If you want to migrate gradually as records are accessed you need to follow a few simple rules:
1) If you add a field it had better be nullable or have a default value specified.
2) Never rename fields, never change field types
- Instead always add new fields, add migration code, remove the old fields only when all documents have been migrated over.
For prototyping with MongoDB and C# I build a dynamic wrapper ... that lets you specify your objects using only interfaces (no classes needed), and it lets you dynamically add new interfaces to an existing object. Not ready for production use but for prototyping it saves a lot of effort and makes migration really easy.

DDD: Persisting aggregates

Let's consider the typical Order and OrderItem example. Assuming that OrderItem is part of the Order Aggregate, it an only be added via Order. So, to add a new OrderItem to an Order, we have to load the entire Aggregate via Repository, add a new item to the Order object and persist the entire Aggregate again.
This seems to have a lot of overhead. What if our Order has 10 OrderItems? This way, just to add a new OrderItem, not only do we have to read 10 OrderItems, but we should also re-insert all these 10 OrderItems again. (This is the approach that Jimmy Nillson has taken in his DDD book. Everytime he wants to persists an Aggregate, he clears all the child objects, and then re-inserts them again. This can cause other issues as the ID of the children are changed everytime because of the IDENTITY column in database.)
I know some people may suggest to apply Unit of Work pattern at the Aggregate Root so it keeps track of what has been changed and only commit those changes. But this violates Persistence Ignorance (PI) principle because persistence logic is leaking into the Domain Model.
Has anyone thought about this before?
Mosh
This doesn't have to be a problem, some ORM's support lazy lists.
e.g.
You could load the order entity and add items to the Details collection w/o actually materializing all of the other entities in that list.
I think N/Hibernate supports this.
If you are writing your own entity persistence code w/o any ORM, then you are pretty much out of luck, you would have to re-implement the same dirty tracking machinery as ORMappers give you for free.
The entire aggregate must be loaded from database because DDD assumes that aggregate roots ensure consistency within boundaries of aggregates. For these rules to be checed, all necessary data must be loaded. If there is a requirement that an order can be worth no more then $100000 for particular customer, aggregate root (Order) must check this rule before persisting changes. This does not imply that all the exisiting items must be loaded and their value summed up. Order can maintain pre-calculated sum of existing items which is updated on adding new ones. This way checking the business rule requires only Order data to be loaded when adding new items.
I'm not 100% sure about this approach , but I think applying unit of work pattern could be the answer . Keeping in mind that any transaction should be done , in application or domain services , you could populate the unit of work class/object with the objects from the aggregate that you have changed . After that let the UoW class/object do the magic (ofcourse building a proper UoW might be hard for some cases)
Here is a description of the unit of work pattern from here :
A Unit of Work keeps track of everything you do during a business transaction that can affect the database. When you're done, it figures out everything that needs to be done to alter the database as a result of your work.

Row insertion order entity framework

I'm using a transaction to insert multiple rows in multiple tables. For these rows I would like to add these rows in order. Upon calling SaveChanges all the rows are inserted out of order.
When not using a transaction and saving changes after each insertion does keep order, but I really need a transaction for all entries.
The order inserts/updates and deletes are made in the Entity Framework is dependent upon many things in the Entity Framework.
For example if you insert a new Product in a new Category we have to add the Category before the Product.
This means that if you have a large set of changes there are local ordering constraints that we must satisfy first, and indeed this is what we do.
The order that you do things on the context can be in conflict with these rules. For example if you do this:
ctx.AddToProducts(
new Product{
Name = "Bovril",
Category = new Category {Name = "Food"}
}
);
the effect is that the Product is added (to the context) first and then when we walk the graph we add the Category too.
i.e. the insert order into the context is:
Product
Category
but because of referential integrity constraints we must re-order like this before attempting to insert into the database:
Category
Product
So this kind of local re-ordering is kind of non-negotiable.
However if there are no local dependencies like this, you could in theory preserve ordering. Unfortunately we don't currently track 'when' something was added to the context, and for efficiency reason we don't track entities in order preserving structures like lists. As a result we can't currently preserve the order of unrelated inserts.
However we were debating this just recently, so I am keen to see just how vital this is to you?
Hope this helps
Alex
Program Manager Entity Framework Team
I'm in the process of crossing this bridge. I'm replacing NHibernate with EF and the issue that I'm running across is how lists are inserted into the DB. If I add items to a list like so (in pseduo code):
list.Add(testObject);
list.Add(testObject1);
I'm not currently guaranteed to get the same order when is run 'SaveChanges'. That's a shame because my list object (i.e. linked list) knows the order it was created in. Objects that are chained together using references MUST be saved to the DB in the same order. Not sure why you mentioned that you're "debating" this. =)