Javascript function to save new fields from existing fields (migration) in mongodb collection - mongodb

We have a design change in application to accommodate few new requirements. Design change forced us to migrate one of mongodb collection, instead of having individual fields, have to create a derived JSON string as a field from existing fields.
The migration process will be invoked by end-user doing a action in UI (like saving a change). But that one action might update few thousands of documents. So we would like to write Javascript code to be executed on server side, so that we can avoid loading many records to application.
But the issue we running into is, cannot call the java script function using eval as the collection is sharded. And other option we cannot consider is to make the collection un-sharded as the migration has to happen on live system.
Please help us if you know of any alternate approach.
Example migration : ExampleDoc (collection) has fields a1, a2, b1 and b2. The migration will create a new fields called fieldJSON : { a : "", b : ""}. Here a and b are derived from existing fields a1, a2, b1 and b2.

OK, now I understand that
you want to create a new field into the same collection which is sharded;
the content of this new field is generated by existing fields;
you don't want to fetch these existing fields from database to application for handling perhaps because of large volume;
you can't invoke eval database command because it's a sharded collection;
you can't un-shard the current collection.
Then, is it possible to fulfill the intent through mapReduce?
Query the exact document you want to update;
mapping, reducing and then overwriting the original document of this collection by specifying some parameters such as {out:{merge:<collectionName>, sharded:true}}.

Related

Inserting into MongoDB using Spring Data inserts into two Collections

I'm creating a Spring Boot application that is inserting objects into a MongoDB database. The problem is that when I'm doing the insert it is creating records in two collections. Basically I have a POJO A and POJO B that extends A. When the insert happens it creates documents in both collection A and collection B for the same ObjectId. My expectation was that it would only create documents in collection B for POJO B as that is what is being passed into the repository.insert method. What am I doing wrong? I can provide configurations and version numbers if needed. Also, I am using Groovy if that makes a difference.
The answer was based on my application and therefore an issue that I introduced. I'm using Spring Batch and using the MongoReader and MongoWriter. I'm reading from collection "a" and I expected the writer to write back to collection "a" but inside of the processor I'm converting the object to POJO B and because I'm not specifying the collection then the MongoWriter determines the collection based on the class and creates collection "b". I added the same Document annotation to both classes and that resolved the issue.

mongodb schema design - add collection or extend existing one

There's a collection with 100.000 documents. Only 10 of them must have additional property that is not necessary for other documents (e.g. list of departments with only top ones have property 'Location');
As far as I understand both approaches should work just fine, but which one is preferable since using noSql db:
add one more collection with documents that have 2 property: DepartmentId, Location.
add property 'Location' to only selected documents, so others won't have it.
The problem you are facing is well known. You have the same with source code for example.
When you are updating a piece of code, do you save it as User.js, User2.js, User3.js ... ?
Or do you use a versionning system like git and have an unique User.js?
Translating the git analogy to your issue, you should update the current data.
In mongodb you actually have two choice to perform the update.
Update the model in your code, and update every entry in database to match the new model.
Create a new model that will apply to new entries, and still have the old model to handle old formatted data.
use-more-than-one-schema-per-collection-on-mongodb

Data Mapper pattern implementation with zend

I am implementing data mapper in my zend framework 1.12 project and its working fine as expected. Now further more to enhance it i wants to optimize it in following way.
While fetching any data what id i wants to fetch any 3 field data out of 10 fields in my model table? - The current issue is if i fetches the only required values then other valus in domain object class remains blank and while saving that data i am saving while model object not a single field value.
Can any one suggest the efficient way of doing this so that i can fetch/update only required values and no need to fetch all field data to update the record.
If property is NULL ignore it when crafting the update? If NULLs are valid values, then I think you would need to track loaded/dirty states per property.
How do you go about white-listing the fields to retrieve when making the call to the mapper? If you can persist that information I think it would make sense to leverage that knowledge when going to craft the update.
I don't typically go down this path. I will lazy load certain fields on a model when it makes sense, but I don't allow loading parts of the object like this, rather I create an alternate object for use in rendering a list when loading the full object is too resource intensive. A generic dummy list object I just use with tabular data. It being populated from SQL or stored procedures result-sets, usually with my generic table mapper.

How do I tell EF to ignore columns from my database?

I'm building my EF (v4.0.30319) data model from my SQL Server database. Each table has Created and Updated fields populated via database triggers.
This database is the backend of a ASP.NET Web API application and I recently discovered a problem. The problem is, since the Created and Updated fields are not populated in the model passed into the api endpoint, they are written to the database as NULL. This overwrites any values already in the database for those properties.
I discovered I can edit the EF data model and just delete those two columns from the entity. It works and the datetimes are not overwritten with NULL. But this leads to another, less serious but more annoying, problem... my data model has a bunch of tables that contain these properties and all the tables need to be updated by removing these two columns.
Is there a way to tell EF to ignore certain columns in entities across the entire data model without manually deleting them?
As far as I know, no. Generating the model from the database is going to always create all of the fields from the database table. Well, as long as it has a primary key it will.
It is possible to only update the fields that you want i.e. don't include the "Created" and "Updated" fields in your create and update methods. I'd have to say though, that I'd think it'd be better if those fields didn't even exist on the model at that point. You may at one point see those fields on the model and not remember that they won't get persisted to the DB.
You may want to look into just inserting the datetimes into those fields when you call your create() and update() methods. Then you could just ditch the triggers. You'd obviously want to use a class library for all of your database operations so this functionality would be in one place. To keep it nice and neat, you know?

Row insertion order entity framework

I'm using a transaction to insert multiple rows in multiple tables. For these rows I would like to add these rows in order. Upon calling SaveChanges all the rows are inserted out of order.
When not using a transaction and saving changes after each insertion does keep order, but I really need a transaction for all entries.
The order inserts/updates and deletes are made in the Entity Framework is dependent upon many things in the Entity Framework.
For example if you insert a new Product in a new Category we have to add the Category before the Product.
This means that if you have a large set of changes there are local ordering constraints that we must satisfy first, and indeed this is what we do.
The order that you do things on the context can be in conflict with these rules. For example if you do this:
ctx.AddToProducts(
new Product{
Name = "Bovril",
Category = new Category {Name = "Food"}
}
);
the effect is that the Product is added (to the context) first and then when we walk the graph we add the Category too.
i.e. the insert order into the context is:
Product
Category
but because of referential integrity constraints we must re-order like this before attempting to insert into the database:
Category
Product
So this kind of local re-ordering is kind of non-negotiable.
However if there are no local dependencies like this, you could in theory preserve ordering. Unfortunately we don't currently track 'when' something was added to the context, and for efficiency reason we don't track entities in order preserving structures like lists. As a result we can't currently preserve the order of unrelated inserts.
However we were debating this just recently, so I am keen to see just how vital this is to you?
Hope this helps
Alex
Program Manager Entity Framework Team
I'm in the process of crossing this bridge. I'm replacing NHibernate with EF and the issue that I'm running across is how lists are inserted into the DB. If I add items to a list like so (in pseduo code):
list.Add(testObject);
list.Add(testObject1);
I'm not currently guaranteed to get the same order when is run 'SaveChanges'. That's a shame because my list object (i.e. linked list) knows the order it was created in. Objects that are chained together using references MUST be saved to the DB in the same order. Not sure why you mentioned that you're "debating" this. =)