Inserting and updating disconnected entities in EF code first - entity-framework

I am retrieving data about books from lots of different sources such as XML and web services which I then store in the database using EF Code First 6 via a Generic Repository and and obviously DbContext.
The problem is that performance is very bad.
I have the following (fictional but analogous) POCO in my Model
public class Book
{
public int Id {get; set;}
public string Title {get; set;}
}
also
public class BookDataSource
{
public int Id {get; set;}
public virtual List<Book> Books {get; set;};
}
So I retrieve the book data from some source and construct the above book object.
I then need to check whether the book already exists in the DB and update it if it does or insert it if it does not. I also need to then delete any books that no longer exist on the data source.
//The following method takes the data source (type: IBookDataSource) to update from as the parameter
public string UpdateBooks(BookDatasource dataSource)
{
string successMessage = "";
//Disconnected entities
List<Book> retreivedBooks= dataSource.RetreiveBooks();
foreach (Book retreivedBook in retreivedBooks)
{
//Check if the dataSource already contains a book (based on title)
Book localBook =
dataSource.Books.SingleOrDefault(
b => b.Title== retreivedBook.Title);
if (localBook ==null)
{
//Insert a new one
_unitOfWork.BookRepository.Insert(retreivedBook);
}
else
{
//Update existing
localBook.Title= retreivedPortalMerchant.PortalsMerchantName;
_unitOfWork.PortalMerchantRepository.Update(localPortalMerchant);
}
}
//Soft delete any existing ones that no longer exist in the received data
foreach (Book existingBook in dataSource.Books)
{
if ( !retreivedBooks.Exists(
b => m.Title == existingBook.Title))
{
existingBook.Deleted = true;
_unitOfWork.PortalMerchantRepository.Update(existingBook);
}
}
}
However the performance is very bad. Sometimes there are 25000 books retrieved from the data source and I am having to do two for loops. ForEach retreived book, check if one exists in the db the insert/update accordingly. And another one to loop all existing books and check whether it no longer exists on the datasource and soft delete.
Is there a better way to attach the entities and monitor their state. In the above example I think I am querying the context each time and not the DB so why such bad performance. Should I revert to T-SQL?

For the proper algorithm of inserting-updating-deleting disconnected entities, you can check "Setting the State of Entities in a Graph" section of "Chapter 4. Working with Disconnected Entities Including N-Tier Applications" of "Programming Entity Framework: DbContext by Julia Lerman, Rowan Miller" book.
Also in this SO answer some ways to increase performance of EF is explained. The answer is for bulk inserting however it may work for your scenerio also.

The fastest way would be using bulk insert extension
Here's maxlego's description:
It uses SqlBulkCopy and custom datareader to get max performance. As a result it is over 20 times faster than using regular insert or AddRange EntityFramework.BulkInsert vs EF AddRange
context.BulkInsert(hugeAmountOfEntities);

Related

Do I need a foreign key in the EF core code first?

I'm new at code first in entity framework and reading up on relationships, I see everyone does it differently. It might be because of earlier versions, might be the same or might be because of performance.
Let's say I have two tables Company and User.
I would set the company-to-user relationship like this:
public List<User> Users { get; set; } = new List<User>();
Then if I from the user-to-company perspective needed to find the company, I would have this in the User:
public Company Company { get; set; }
And do this query:
return await _clientContext.Users.Where(x => x.Company.Id == companyId).ToListAsync();
Or I could have this:
public int CompanyId { get; set; }
[ForeignKey("CompanyId")]
public Company Company { get; set; }
And have this query:
return await _clientContext.Users.Where(x => x.CompanyId == companyId).ToListAsync();
Also some define Company with the keyword virtual like this:
public virtual Company Company { get; set; }
I'm not sure if every scenario is the same and doing x.CompanyId instead of x.Company.Id would actually be the same. What is used normally?
I generally recommend the first option using Shadow Properties for FKs over the second when using navigation properties. The main reason is that with the second approach there are two sources of truth. For instance with a Team referencing a Coach, some code may use team.CoachId while other code uses team.Coach.CoachId. These two values are not guaranteed to always be in sync. (depending on when you happen to check them when one or the other is updated.)
Updating references between entities via a FK property can have varied behaviour depending on whether the referenced entity is loaded or not.
What is the expected difference between if want to update a team's coach:
var teamA = context.Teams.Single(x => x.TeamId == teamId);
If Team has a Coach navigation property and a CoachId FK reference I could do...
teamA.CoachId = newCoachId;
If TeamA's old coach ID was 1, and the newCoachId = 2, what do you think happens if I have code that lazy loads the coach before SaveChanges?
var coachName = teamA.Coach.Name;
You might expect that since the Coach hadn't been loaded yet it would load in Coach #2's name, but it loads Coach #1 because the change hasn't been committed even though teamA.CoachId == 2. If you check the Coach reference after SaveChanges you get Coach #2.
Depending on whether lazy loading is enabled or not you can get a bit strange behaviour by setting a FK property where navigation properties are nulled. Even when eager loading, changing a FK property will potentially trigger a new lazy load if that new entity isn't already tracked:
var teamA = context.Teams.Include(x => x.Coach).Single(x => x.TeamId == teamId);
teamA.CoachId == newCoachId;
var coachName = teamA.Coach.Name; // Still points to Coach #1's name as expected.
context.SaveChanges();
coachName = teamA.Coach.Name; // Triggers lazy load and return new coach's name.
Saving a FK against an entity that has eager loaded the reference does not automatically re-populate referenced entities. So for instance if you have lazy loading disabled, the same above code:
context.SaveChanges();
coachName = teamA.Coach.Name; // Potential NullReferenceException on teamA.Coach.
This will potentially trigger a null reference exception unless the new coach happens to be tracked by the DbContext prior to SaveChanges being called. If the DbContext is tracking the entity, the new reference will be swapped in on SaveChanges, otherwise it is nulled. (With lazy loading this is covered by the new lazy load call after it was nulled)
When working with navigation properties my default recommendation is to hide FK properties as Shadow Properties. (For EF6 this means using .Map(x => x.MapKey()). For relationships where I only care about the ID, I will expose the FK with no navigation property. So, one or the other. (Such as lookups or bounded contexts where I want raw speed.)
I will deviate sparingly from this for exposing FKs for relationships I may inspect by ID frequently, and treat it as read-only, but still have infrequent need of the navigation property. An example of this would be CreatedBy / CreatedByUserId. Many queries may inspect the CreatedByUserId for data filtering, while some projections may want the CreatedBy.Name etc. A record's CreatedBy doesn't change so I avoid potential pitfalls of the data getting out of sync.
Your second scenario is used normally.
i.e.
public int CompanyId { get; set; }
[ForeignKey("CompanyId")]
public Company Company { get; set; }
And have this query:
return await _clientContext.Users.Where(x => x.CompanyId == companyId).ToListAsync();

Simple contract for use with FromSql()

With its recent improvements, I'm looking to move from Dapper back to EF (Core).
The majority of our code currently uses the standard patterns of mapping entities to tables, however we'd also like to be able to make simple ad-hoc queries that map to a simple POCO.
For example, say I have a SQL statement which returns a result set of strings. I created a class as follows...
public class SimpleStringDTO
{
public string Result { get; set; }
}
.. and called it as such.
public DbSet<SimpleStringDTO> SingleStringResults { get; set; }
public IQueryable<SimpleStringDTO> Names()
{
var sql = $"select name [result] from names";
var result = this.SingleStringResults.FromSql(sql);
return result;
}
My thoughts are that I could use the same DBSet and POCO for other simple queries to other tables.
When I execute it, EF throws an error "The entity type 'SimpleStringDTO' requires a primary key to be defined.".
Do I really need to define another field as a PK? There'll be cases where there isn't a PK defined. I just want something simple and flexible. Ideally, I'd rather not define a DBSet or POCO at all, just return the results straight to an IEnumerable<string>.
Can someone please point me towards best practises here?
While I wait for EF Core 2.1 I've ended up adding a fake key to my model
[Key]
public Guid Id { get; set; }
and then returning a fake Guid from SQL.
var sql = $"select newid(), name [result] from names";

Paging and sorting Entity Framework on a field from Partial Class

I have a GridView which needs to page and sort data which comes from a collection of Customer objects.
Unfortunately my customer information is stored separately...the customer information is stored as a Customer ID in my database, and the Customer Name in a separate DLL.
I retrieve the ID from the database using Entity Framework, and the name from the external DLL through a partial class.
I am getting the ID from my database as follows:
public class DAL
{
public IEnumberable<Customer> GetCustomers()
{
Entities entities = new Entities();
var customers = (from c in entities.Customers
select c);
//CustomerID is a field in the Customer table
return customers;
}
}
I have then created a partial class, which retrieves the data from the DLL:
public partial class Customer
{
private string name;
public string Name
{
if (name==null)
{
DLLManager manager = new DLLManager();
name= manager.GetName(CustomerID);
}
return name;
}
}
In my business layer I can then call something like:
public class BLL
{
public List<Customer> GetCustomers()
{
DAL customersDAL = new DAL();
var customers = customersDAL.GetCustomers();
return customers.ToList();
}
}
...and this gives me a collection of Customers with ID and Name.
My problem is that I wish to page and sort by Customer Name, which as we have seen, is populated from a DLL. This means I cannot page and sort in the database, which is my preferred solution. I am therefore assuming I am going to have to call of the database records into memory, and perform paging and sorting at this level.
My question is - what is the best way to page and sort an in-memory collection. Can I do this with my List in the BLL above? I assume the List would then need to be stored in Session.
I am interested in people's thoughts on the best way to page and sort a field that does not come from the database in an Entity Framework scenario.
Very grateful for any help!
Mart
p.s. This question is a development of this post here:
GridView sorting and paging Entity Framework with calculated field
The only difference here is that I am now using a partial class, and hopefully this post is a little clearer.
Yes, you can page and sort within you list in the BLL. As long as its fast enough I wouldn't care to much about caching something in the session. An other way would be to extend your database with the data from you DLL.
I posted this question slightly differently on a different forum, and got the following solution.
Basically I return the data as an IQueryable from the DAL which has already been forced to execute using ToList(). This means that I am running my sorting and paging against an object which consists of data from the DB and DLL. This also allows Scott's dynamic sorting to take place.
The BLL then performs OrderBy(), Skip() and Take() on the returned IQueryable and then returns this as a List to my GridView.
It works fine, but I am slightly bemused that we are perfoming IQueryable to List to IQueryable to List again.
1) Get the results from the database as an IQueryable:
public class DAL
{
public IQueryable<Customer> GetCustomers()
{
Entities entities = new Entities();
var customers = (from c in entities.Customers
select c);
//CustomerID is a field in the Customer table
return customers.ToList().AsQueryable();
}
}
2) Pull the results into my business layer:
public class BLL
{
public List<Customer> GetCustomers(intint startRowIndex, int maximumRows, string sortParameter)
{
DAL customersDAL = new DAL();
return customersDAL.GetCustomers().OrderBy(sortParameter).Skip(startRowIndex).Take(maximumRows).ToList();
}
}
Here is the link to the other thread.
http://forums.asp.net/p/1976270/5655727.aspx?Paging+and+sorting+Entity+Framework+on+a+field+from+Partial+Class
Hope this helps others!

Why would this EF scenario NOT lazy load a child entity?

I have a simple controller method (via ajax) that inserts a new entity into the db and then queries the database for all of the same type of records and returns a list of those records(json).
What I can't figure out is that when I insert the first record and query for my list of records, any child entities are not lazy loaded. However, when I add a second or any number of subsequent records my list includes all child entities lazy loaded as expected. Here is some code:
public class Person
{
public int Id {get;set;}
public string Name {get;set;}
public int StateId {get;set;}
public virtual State State {get;set;}
}
public class PersonController : Controller
{
public ActionResult CreatePerson(PersonModel model)
{
var person = model.ToPersonEntity();
_personService.InsertPerson(person);
var people = _personService.GetAllPeople();//on first person inserted this list does NOT have State loaded
var personAddresssList = people.Select(x => x.ToPersonAddressFormat());
return Json(personAddressList, JsonRequestBehavior.AllowGet);
}
}
There isn't really any more relevant code as this is a pretty simple operation. I fixed the problem by using .Include(x => x.State) in my linq query but I've never had to do this as long as my properties were marked 'Virtual'.
The only thing I can think of is that EF still has my original person as a tracked entity and that when I pull up the list of persons and the only person is the one I just inserted, it uses the cached entity which would not have any child properties attached to it yet. If this is true then when I load a list of more than one person, some funky black magic in EF says "I see 2 items in the list I won't use the cached person I just inserted".
Any ideas?

Dealing with complex properties with Entity Framework's ChangeTracker

I'll try and be as descriptive as I can in this post. I've read a dozen or more SO questions that were peripherally related to my issue, but so far none have matched up with what's going on.
So, for performing audit-logging on our database transactions (create, update, delete), our design uses an IAuditable interface, like so:
public interface IAuditable
{
Guid AuditTargetId { get; set; }
int? ContextId1 { get; }
int? ContextId2 { get; }
int? ContextId3 { get; }
}
The three contextual IDs are related to how the domain model is laid out, and as noted, some or all of them may be null, depending on the entity being audited (they're used for filtering purposes for when admins only want to see the audit logs for a specific scope of the application). Any model that needs to be audited upon a CUD action just needs to implement this interface.
The way that the audit tables themselves are being populated is through an AuditableContext that sits between the base DbContext and our domain's context. It contains the audit table DbSets, and it overrides the SaveChanges method using the EF ChangeTracker, like so:
foreach (var entry in ChangeTracker.Entries<IAuditable>())
{
if (entry.State != EntityState.Modified &&
entry.State != EntityState.Added &&
entry.State != EntityState.Deleted)
{
continue;
}
// Otherwise, make audit records!
}
base.SaveChanges();
The "make audit records" process is a slightly-complex bit of code using reflection and other fun things to extract out fields that need to be audited (there are ways for auditable models to have some of their fields "opt out" of auditing) and all that.
So that logic is all well and good. The issues comes when I have an auditable model like this:
public class Foo: Model, IAuditable
{
public int FooId { get; set; }
// other fields, blah blah blah...
public virtual Bar Bar { get; set; }
#region IAuditable members
// most of the auditable members are just pulling from the right fields
public int? ContextId3
{
get { return Bar.BarId; }
}
#endregion
}
As is pointed out, for the most part, those contextual audit fields are just standard properties from the models. But there are some cases, like here, where the context id needs to be pulled from a virtual complex property.
This ends up resulting in a NullReferenceException when trying to get that property out from within the SaveChanges() method - it says that the virtual Bar property does not exist. I've read some about how ChangeTracker is built to allow lazy-loading of complex properties, but I can't find the syntax to get it right. The fields don't exist in the "original values" list, and the object state manager doesn't have those fields, I guess because they come from the interface and not the entities directly being audited.
So does anyone know how to get around this weird issue? Can I just force eager-loading of the entire object, virtual properties included, instead of the lazy loading that is apparently being stubborn?
Sorry for the long-ish post, I feel like this is a really specific problem and the detail is probably needed.
TIA! :)