I'm having some trouble working with a particular ef query. I've simplified everything down as much as I can. I'm only querying for two columns. Here is my model.
[Table("TAXROLL", Schema = "CLAND")]
public class TaxRoll
{
[Key]
[Column("TAXROLL_ID")]
public string Id { get; set; }
[Column("APN")]
public string APN { get; set; }
}
When I execute my query in my controller, if I do firstordefault, the results take as long as 15-18 seconds to return. If I do a where query, the results are almost instantaneous (less than 1 second), (see my commented timing statements below. When I say 15-18 seconds and almost instantaneous, that's where my numbers are coming from).
[ResponseType(typeof(TaxRoll))]
public async Task<IHttpActionResult> Get(string id)
{
//var start = DateTime.Now;
//Debug.WriteLine("Starting Query");
var apnRecord = await ctx.TaxRoll.FirstOrDefaultAsync(x => x.APN == id);
//Debug.WriteLine("Returning APN after " + DateTime.Now.Subtract(start).TotalSeconds);
return Ok(apnRecord);
}
When I query for the primary key (Id), results return consistently fast every single time regardless of how I run the query. This is only a problem when I'm querying for APN. Yes, APN is indexed. It's also unique. I could use it as PK, and in fact I tried that. No dice. I know that executing a query that searches based on APN consistently returns fast when I do it directly against the database.
Any help or direction is greatly appreciated -- I am thoroughly confused.
Your APN Column is NULLABLE that makes EF add OR operator, 99% it makes SQL to "seek" the column (which does not use index). make APN column NOT NULL.
Additionally to the user skalinkin answer, you can set DbContextConfiguration.UseDatabaseNullSemantics property to true.
public class YourDbContext : DbContext
{
public YourDbContext()
{
Configuration.UseDatabaseNullSemantics = true;
// ...
}
}
The query that takes 15-18s
var apnRecord = await ctx.TaxRoll.FirstOrDefaultAsync(x => x.APN == id);
Is same as
var apnRecord = await ctx.TaxRoll.Where(x => x.APN == id).FirstOrDefaultAsync();
If you are using just Where(), nothing will be materialized from the database.
Also consider using Stopwatch instead of calculating timestamps.
var sw = new Stopwatch();
sw.Start();
// do something
Debug.WriteLine(sw.Elapsed);
Related
Using ASP.NET Core 2.2 with EF Core, I have followed various guides in trying to implement the automatic creation of date/time values when creating either a new record or editing/updating an existing one.
The current result is when i initially create a new record, the CreatedDate & UpdatedDate column will be populated with the current date/time.
However first time I edit this same record, the UpdatedDate column is then given a new date/time value (as expected) BUT for some reason, EF Core is wiping out the value of the original CreatedDate which results in SQL assigning a default value.
Required result I need as follows:
Step 1: New row created, both CreatedDate & UpdatedDate column is given a date/time value (this already works)
Step 2: When editing and saving an existing row, I want EF Core to update the UpdatedDate column with the updated date/time only, BUT leave the other CreatedDate column unmodified with the original creation date.
I'm using EF Core code first, and do no want to go down the fluent API route.
One of the guides i was partially following is https://www.entityframeworktutorial.net/faq/set-created-and-modified-date-in-efcore.aspx but neither this or other solutions I've tried is giving the result I am after.
Baseclass:
public class BaseEntity
{
public DateTime? CreatedDate { get; set; }
public DateTime? UpdatedDate { get; set; }
}
DbContext Class:
public override Task<int> SaveChangesAsync(bool acceptAllChangesOnSuccess, CancellationToken cancellationToken = default(CancellationToken))
{
var entries = ChangeTracker.Entries().Where(E => E.State == EntityState.Added || E.State == EntityState.Modified).ToList();
foreach (var entityEntry in entries)
{
if (entityEntry.State == EntityState.Modified)
{
entityEntry.Property("UpdatedDate").CurrentValue = DateTime.Now;
}
else if (entityEntry.State == EntityState.Added)
{
entityEntry.Property("CreatedDate").CurrentValue = DateTime.Now;
entityEntry.Property("UpdatedDate").CurrentValue = DateTime.Now;
}
}
return base.SaveChangesAsync(acceptAllChangesOnSuccess, cancellationToken);
}
UPDATE FOLLOWING ADVICE FROM STEVE IN COMMENTS BELOW
I spent a bit more time debugging today, turns out the methods I posted above are appear to be functioning as expected i.e. when editing an existing row and saving it, only the entityEntry.State == EntityState.Modified IF statement is being called. So what I'm finding is that after saving the entity, the CreatedDate column is being overwitten with a Null value, I can see this by watching the SQL explorer after a refresh. I believe the issue is along the lines of what Steve mentions below "If it is #null then this might also explain the behavior in that it is not being loaded with the entity for whatever reason."
But i'm a little lost in tracing where this CreatedDate value is being dropped somewhere through edit/save process.
Image below shows the result at the point just before the entity is saved following an update. In the debugger I'm not quite sure where to find the entry of the CreatedDate to see what value is held at this step, but it appears to be missing from the debugger list so wandering whether somehow it doesn't know about the existence of this field when saving.
Below is the method I have in my form 'Edit' Razor page model class:
public class EditModel : PageModel
{
private readonly MyProject.Data.ApplicationDbContext _context;
public EditModel(MyProject.Data.ApplicationDbContext context)
{
_context = context;
}
[BindProperty]
public RuleParameters RuleParameters { get; set; }
public async Task<IActionResult> OnGetAsync(int? id)
{
if (id == null)
{
return NotFound();
}
RuleParameters = await _context.RuleParameters
.Include(r => r.SystemMapping).FirstOrDefaultAsync(m => m.ID == id);
if (RuleParameters == null)
{
return NotFound();
}
ViewData["SystemMappingID"] = new SelectList(_context.SystemMapping, "ID", "MappingName");
return Page();
}
public async Task<IActionResult> OnPostAsync()
{
if (!ModelState.IsValid)
{
return Page();
}
_context.Attach(RuleParameters).State = EntityState.Modified;
try
{
await _context.SaveChangesAsync();
}
catch (DbUpdateConcurrencyException)
{
if (!RuleParametersExists(RuleParameters.ID))
{
return NotFound();
}
else
{
throw;
}
}
return RedirectToPage("./Index");
}
private bool RuleParametersExists(int id)
{
return _context.RuleParameters.Any(e => e.ID == id);
}
}
Possibly one of the reasons for this issue is the fact that I have not included the CreatedDate field in my Edit Razor Page form, so when I update the entity which in turn will run the PostAsync method server side, there is no value stored for the CreatedDate field and therefore nothing in the bag by the tine the savechangesasync method is called in my DbContext Class. But I also didn't think this was necessary? otherwise I'd struggle to see what value there is in the this process of using an inherited BaseEntity class i.e. not having to manually add the CreatedDate & UpdatedDate attribute to every model class where I want to use it...
It may be easier to just give your BaseEntity a constructor:
public BaseEntity()
{
UpdatedDate = DateTime.Now;
CreatedDate = CreatedDate ?? UpdatedDate;
}
Then you can have your DbContext override SaveChangesAsync like:
public override Task<int> SaveChangesAsync(
bool acceptAllChangesOnSuccess,
CancellationToken token = default)
{
foreach (var entity in ChangeTracker
.Entries()
.Where(x => x.Entity is BaseEntity && x.State == EntityState.Modified)
.Select(x => x.Entity)
.Cast<BaseEntity>())
{
entity.UpdatedDate = DateTime.Now;
}
return base.SaveChangesAsync(acceptAllChangesOnSuccess, token);
}
Possibly one of the reasons for this issue is the fact that I have not included the CreatedDate field in my Edit Razor Page form, so when I update the entity which in turn will run the PostAsync method server side, there is no value stored for the CreatedDate field and therefore nothing in the bag by the tine the savechangesasync method is called in my DbContext Class.
That's true.Your post data does not contains the original CreatedDate,so when save to database, it is null and could not know what the exact value unless you assign it before saving.It is necessary.
You could just add below code in your razor form.
<input type="hidden" asp-for="CreatedDate" />
Update:
To overcome it in server-side,you could assign data manually:
public async Task<IActionResult> OnPostAsync()
{
RuleParameters originalData = await _context.RuleParameters.FirstOrDefaultAsync(m => m.ID == RuleParameters.ID);
RuleParameters.CreatedDate = originalData.CreatedDate;
_context.Attach(RuleParameters).State = EntityState.Modified;
await _context.SaveChangesAsync();
}
I don't suspect EF is doing this, but rather your database, or you're inadvertently inserting records instead of updating them.
A simple test: Put break-points in your SaveChangesAsnc method within both the Modified and Added handlers and then run a unit test that loads an entity, edits it, and saves. Which breakpoint is hit? If the behavior seems to be normal with a simple unit test, repeat again with your code.
If the Modified breakpoint is hit, and only the Modified handler is hit then check the state of the CreatedDate value in the entity modified. Does it still reflect the original CreatedDate? If yes, then it would appear that something in your schema will be overwriting it on save. If no then you have a bug in your code that has caused it to update. If it is #null then this might also explain the behaviour in that it is not being loaded with the entity for whatever reason. Check that the property has not been configured as something like a Computed property.
If the Added breakpoint is hit at all, then this would point at a scenario where you're dealing with a detached entity, such as an entity that was read from a different DB Context and being associated to another entity in the current DB Context and saved as a byproduct. When a DbContext encounters an entity that was loaded and disassociated with a different DbContext, it will treat that entity as a completely new entity and insert a new record. The biggest single culprit for this is invariably MVC code where people pass entities to/from views. Entity references are loaded in one request, serialized to the view, and then passed back on another request. Devs assume they are receiving an entity that they can just associate to a new entity and save, but the Context of this request doesn't know about that entity, and that "entity" isn't actually an entity, it is now a POCO shell of data that the serializer created. It's no different to you newing up a new class and populating fields. EF won't know the difference. The result of this is you will trip the Added condition for your entity, and after completion you will have a duplicate record. (with different PK if EF is configured to treat PKs as Identity)
So an example is an Order screen: When presenting a screen to create a new order I may have loaded the Customer and passed that to the view to display customer information and will want to associate to the new order:
var customer = context.Customers.Single(x => x.CustomerId == 15);
var newOrder = new Order { Customer = customer };
return View(newOrder);
This looks innocent enough. When we go to save the new order after setting their details:
public ActionResult Save(Order newOrder)
{
context.Orders.Add(newOrder);
newOrder.Customer.Orders.Add(newOrder);
context.SaveChanges();
// ...
}
newOrder had a reference to Customer #14, so all looks good. We're even associating the new order to the customer's order collection. We might even want to have updated fields on the customer record to reflect a change to the Modified date. However, newOrder in this case, and all associated data including .Customer are plain 'ol C# objects at this point. We've added the new order to the Context, but as far as the context is concerned, the Customer referenced is also a new record. It will ignore the Customer ID if that is set as an Identity column and it will save a brand new Customer record (ID #15 for example) with all of the same details as Customer ID 14 and associate that to the new order. It can be subtle and easy to miss until you start querying Customers and spotting duplicate looking rows.
If you are passing entities to/from views, I'd be very wary of this gotcha. Attaching and setting modified state is one option, but that involves trusting that the data has not been tampered with. As a general rule, calls to update entities should never pass entities & attach them, but rather re-load those entities, validate row version, validate the data coming in, and only copy across fields you expect should ever be modified before saving the entity associated to the DbContext.
Hopefully that gives you a few ideas on things to check to get to the bottom of the issue.
I have a model which has an auto-incrementing ID field by default as is normal. However, I wish to seed the database with initial data and because there are foreign keys I wish to explicitly set the IDs of the seeded data.
My model
public class EntAttribute
{
public int ID { get; set; }
public string Title { get; set; }
}
My seeding code:
public class Seeder
{
private class AllAttributes
{
public List<EntAttribute> Attributes { get; set; }
}
public bool SeedData()
{
AllAttributes seedAttributes;
string strSource;
JsonSerializer JsonSer = new JsonSerializer();
strSource = System.IO.File.ReadAllText(#"Data/SeedData/Attributes.json");
seedAttributes = JsonConvert.DeserializeObject<AllAttributes>(strSource);
_context.AddRange(seedAttributes.Attributes);
_context.SaveChanges();
return true;
}
}
Please note, I'm very new to both EFCore and C#. The above is what I've managed to cobble together and it seems to work right up until I save the changes. At this point I get:
SqlException: Cannot insert explicit value for identity column in table 'Attribute' when IDENTITY_INSERT is set to OFF.
Now I'm smart enough to know that this is because I can't explicitly set the ID field in the EntAttribute table because it wants to assign its own via auto-increment. But I'm not smart enough to know what to do about it.
Any help appreciated.
EDIT: Adding the solution based on the accepted answer below because the actual code might help others...
So I added to my Context class the following:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.HasSequence<int>("EntAttributeNumbering")
.StartsAt(10);
modelBuilder.Entity<EntAttribute>()
.Property(i => i.ID)
.HasDefaultValueSql("NEXT VALUE FOR EntAttributeNumbering");
}
This first ensures the a sequence is created (the name is arbitrary) and then secondly, sets it to be used for the relevant table instead of auto-increment. Once this was done I was able to my seed data. There are fewer than 10 records so I only needed to set the start value for the sequence to 10. More would normally make sense but I know there will never be more.
I also had to blitz my migrations because they'd somehow got in a mess but that's probably unrelated.
With EF Core you can create and use a Sequence object to assign the IDs, and you can reserve a range of IDs for manual assignment by picking where the sequence starts. With a Sequence you can assign the IDs yourself, or let the database do it for you.
FYI for people using EF Core 3, if using int for your key you can set the start sequence value incase you have seeded data. I found this a much cleaner to solve this problem in my use case which just had a single seeded record.
e.g
modelBuilder.Entity<TableA>()
.Property(p => p.TableAId)
.HasIdentityOptions(startValue: 2);
modelBuilder.Entity<TableA>()
.HasData(
new TableA
{
TableAId = 1,
Data = "something"
});
https://github.com/npgsql/efcore.pg/issues/367#issuecomment-602111259
I currently have an object like this (simplified):
public class Image {
public int Id { get; set; }
public int ExternalId { get; set; }
}
Now let's say I have this method (mostly pseudo-code):
public void GetImage(int externalId) {
var existingImage = db.Images.FirstOrDefault(i => i.ExternalId == externalId);
if (existingImage != null) {
return existingImage;
}
var newImage = new Image() { ExternalId = externalId };
db.Images.Attach(newImage);
db.SaveChanges();
return newImage;
}
Because ExternalId isn't a key, the change tracker won't care if I have "duplicate" images in the tracker.
So now, let's say this method gets called twice, at the same time via AJAX and Web API (my current scenario). It's async, so there are two threads calling this method now.
If the time between calls is short enough (in my case it is), two rows will be added to the database with the same external ID because neither existing check will return a row. I've greatly simplified this example, since in my real one, there's a timing issue as I fetch the "image" from a service.
How can I prevent this? I need the image to be returned regardless if it's new or updated. I've added a Unique Constraint in the database, so I get an exception, but then on the client, the call fails whereas it should use the existing image instead of throwing an exception.
If I understand EF correctly, I could handle this by making ExternalId a primary key and then use concurrency to handle this, right? Is there any way to avoid changing my current model or is this the only option?
If you already have property defining uniqueness of your entity (ExternalId) you should use it as a key instead of creating another dummy key which does not specify a real uniqueness of your entity. If you don't use ExternalId as a key you must put unique constraint on that column in the database and handle exception in your code to load existing Image from the database.
How can I update a single property of a record without retrieving it first?
I'm asking in the context of EF Code First 4.1
Says I have a class User, mapping to table Users in Database:
class User
{
public int Id {get;set;}
[Required]
public string Name {get;set;}
public DateTime LastActivity {get;set;}
...
}
Now I want to update LastActivity of a user. I have user id. I can easily do so by querying the user record, set new value to LastActivity, then call SaveChanges(). But this would result in a redundant query.
I work around by using Attach method. But because EF throws a validation exception on Name if it's null, I set Name to a random string (will not be updated back to DB). But this doesn't seem a elegant solution:
using (var entities = new MyEntities())
{
User u = new User {Id = id, Name="this wont be updated" };
entities.Users.Attach(u);
u.LastActivity = DateTime.Now;
entities.SaveChanges();
}
I would be very appriciate if someone can provide me a better solution. And forgive me for any mistake as this is the first time I've asked a question on SO.
This is a problem of validation implementation. The validation is able to validate only a whole entity. It doesn't validate only modified properties as expected. Because of that the validation should be turned off in scenarios where you want to use incomplete dummy objects:
using (var entities = new MyEntities())
{
entities.Configuration.ValidateOnSaveEnabled = false;
User u = new User {Id = id, LastActivity = DateTime.Now };
entities.Users.Attach(u);
entities.Entry(user).Property(u => u.LastActivity).IsModified = true;
entities.SaveChanges();
}
This is obviously a problem if you want to use the same context for update of dummy objects and for update of whole entities where the validation should be used. The validation take place in SaveChanges so you can't say which objects should be validated and which don't.
I'm actually dealing with this right now. What I decided to do was override the ValidateEntity method in the DB context.
protected override DbEntityValidationResult ValidateEntity(DbEntityEntry entityEntry, IDictionary<object, object> items)
{
var result = base.ValidateEntity(entityEntry, items);
var errors = new List<DbValidationError>();
foreach (var error in result.ValidationErrors)
{
if (entityEntry.Property(error.PropertyName).IsModified)
{
errors.Add(error);
}
}
return new DbEntityValidationResult(entityEntry, errors);
}
I'm sure there's some holes that can be poked in it, but it seemed better than the alternatives.
You can try a sort of hack:
context.Database.ExecuteSqlCommand("update [dbo].[Users] set [LastActivity] = #p1 where [Id] = #p2",
new System.Data.SqlClient.SqlParameter("p1", DateTime.Now),
new System.Data.SqlClient.SqlParameter("p2", id));
I have a tree structure in the DB with TreeNodes table. the table has nodeId, parentId and parameterId. in the EF, The structure is like TreeNode.Children where each child is a TreeNode...
I also have a Tree table with contain id,name and rootNodeId.
At the end of the day I would like to load the tree into a TreeView but I can't figure how to load it all at once.
I tried:
var trees = from t in context.TreeSet.Include("Root").Include("Root.Children").Include("Root.Children.Parameter")
.Include("Root.Children.Children")
where t.ID == id
select t;
This will get me the the first 2 generations but not more.
How do I load the entire tree with all generations and the additional data?
I had this problem recently and stumbled across this question after I figured a simple way to achieve results. I provided an edit to Craig's answer providing a 4th method, but the powers-that-be decided it should be another answer. That's fine with me :)
My original question / answer can be found here.
This works so long as your items in the table all know which tree they belong to (which in your case it looks like they do: t.ID). That said, it's not clear what entities you really have in play, but even if you've got more than one, you must have a FK in the entity Children if that's not a TreeSet
Basically, just don't use Include():
var query = from t in context.TreeSet
where t.ID == id
select t;
// if TreeSet.Children is a different entity:
var query = from c in context.TreeSetChildren
// guessing the FK property TreeSetID
where c.TreeSetID == id
select c;
This will bring back ALL the items for the tree and put them all in the root of the collection. At this point, your result set will look like this:
-- Item1
-- Item2
-- Item3
-- Item4
-- Item5
-- Item2
-- Item3
-- Item5
Since you probably want your entities coming out of EF only hierarchically, this isn't what you want, right?
.. then, exclude descendants present at the root level:
Fortunately, because you have navigation properties in your model, the child entity collections will still be populated as you can see by the illustration of the result set above. By manually iterating over the result set with a foreach() loop, and adding those root items to a new List<TreeSet>(), you will now have a list with root elements and all descendants properly nested.
If your trees get large and performance is a concern, you can sort your return set ASCENDING by ParentID (it's Nullable, right?) so that all the root items are first. Iterate and add as before, but break from the loop once you get to one that is not null.
var subset = query
// execute the query against the DB
.ToList()
// filter out non-root-items
.Where(x => !x.ParentId.HasValue);
And now subset will look like this:
-- Item1
-- Item2
-- Item3
-- Item4
-- Item5
About Craig's solutions:
You really don't want to use lazy loading for this!! A design built around the necessity for n+1 querying will be a major performance sucker. ********* (Well, to be fair, if you're going to allow a user to selectively drill down the tree, then it could be appropriate. Just don't use lazy loading for getting them all up-front!!)I've never tried the nested set stuff, and I wouldn't suggest hacking EF configuration to make this work either, given there is a far easier solution. Another reasonable suggestion is creating a database view that provides the self-linking, then map that view to an intermediary join/link/m2m table. Personally, I found this solution to be more complicated than necessary, but it probably has its uses.
When you use Include(), you are asking the Entity Framework to translate your query into SQL. So think: How would you write an SQL statement which returns a tree of an arbitrary depth?
Answer: Unless you are using specific hierarchy features of your database server (which are not SQL standard, but supported by some servers, such as SQL Server 2008, though not by its Entity Framework provider), you wouldn't. The usual way to handle trees of arbitrary depth in SQL is to use the nested sets model rather than the parent ID model.
Therefore, there are three ways which you can use to solve this problem:
Use the nested sets model. This requires changing your metadata.
Use SQL Server's hierarchy features, and hack the Entity Framework into understanding them (tricky, but this technique might work). Again, you'll need to change your metadata.i
Use explicit loading or EF 4's lazy loading instead of eager loading. This will result in many database queries instead of one.
I wanted to post up my answer since the others didn't help me.
My database is a little different, basically my table has an ID and a ParentID. The table is recursive. The following code gets all children and nests them into a final list.
public IEnumerable<Models.MCMessageCenterThread> GetAllMessageCenterThreads(int msgCtrId)
{
var z = Db.MCMessageThreads.Where(t => t.ID == msgCtrId)
.Select(t => new MCMessageCenterThread
{
Id = t.ID,
ParentId = t.ParentID ?? 0,
Title = t.Title,
Body = t.Body
}).ToList();
foreach (var t in z)
{
t.Children = GetChildrenByParentId(t.Id);
}
return z;
}
private IEnumerable<MCMessageCenterThread> GetChildrenByParentId(int parentId)
{
var children = new List<MCMessageCenterThread>();
var threads = Db.MCMessageThreads.Where(x => x.ParentID == parentId);
foreach (var t in threads)
{
var thread = new MCMessageCenterThread
{
Id = t.ID,
ParentId = t.ParentID ?? 0,
Title = t.Title,
Body = t.Body,
Children = GetChildrenByParentId(t.ID)
};
children.Add(thread);
}
return children;
}
For completeness, here's my model:
public class MCMessageCenterThread
{
public int Id { get; set; }
public int ParentId { get; set; }
public string Title { get; set; }
public string Body { get; set; }
public IEnumerable<MCMessageCenterThread> Children { get; set; }
}
I wrote something recently that does N+1 selects to load the whole tree, where N is the number of levels of your deepest path in the source object.
This is what I did, given the following self-referencing class
public class SomeEntity
{
public int Id { get; set; }
public int? ParentId { get; set; }
public string Name { get; set;
}
I wrote the following DbSet helper
using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
using System.Threading.Tasks;
namespace Microsoft.EntityFrameworkCore
{
public static class DbSetExtensions
{
public static async Task<TEntity[]> FindRecursiveAsync<TEntity, TKey>(
this DbSet<TEntity> source,
Expression<Func<TEntity, bool>> rootSelector,
Func<TEntity, TKey> getEntityKey,
Func<TEntity, TKey> getChildKeyToParent)
where TEntity: class
{
// Keeps a track of already processed, so as not to invoke
// an infinte recursion
var alreadyProcessed = new HashSet<TKey>();
TEntity[] result = await source.Where(rootSelector).ToArrayAsync();
TEntity[] currentRoots = result;
while (currentRoots.Length > 0)
{
TKey[] currentParentKeys = currentRoots.Select(getEntityKey).Except(alreadyProcessed).ToArray();
alreadyProcessed.AddRange(currentParentKeys);
Expression<Func<TEntity, bool>> childPredicate = x => currentParentKeys.Contains(getChildKeyToParent(x));
currentRoots = await source.Where(childPredicate).ToArrayAsync();
}
return result;
}
}
}
Whenever you need to load a whole tree you simply call this method, passing in three things
The selection criteria for your root objects
How to get the property for the primary key of the object (SomeEntity.Id)
How to get the child's property that refers to its parent (SomeEntity.ParentId)
For example
SomeEntity[] myEntities = await DataContext.SomeEntity.FindRecursiveAsync(
rootSelector: x => x.Id = 42,
getEntityKey: x => x.Id,
getChildKeyToParent: x => x.ParentId).ToArrayAsync();
);
Alternatively, if you can add a RootId column to the table then for each non-root entry you can set this column to the ID of the root of the tree. Then you can fetch everything with a single select
DataContext.SomeEntity.Where(x => x.Id == rootId || x.RootId == rootId)
For an example of loading in child objects, I'll give the example of a Comment object that holds a comment. Each comment has a possible child comment.
private static void LoadComments(<yourObject> q, Context yourContext)
{
if(null == q | null == yourContext)
{
return;
}
yourContext.Entry(q).Reference(x=> x.Comment).Load();
Comment curComment = q.Comment;
while(null != curComment)
{
curComment = LoadChildComment(curComment, yourContext);
}
}
private static Comment LoadChildComment(Comment c, Context yourContext)
{
if(null == c | null == yourContext)
{
return null;
}
yourContext.Entry(c).Reference(x=>x.ChildComment).Load();
return c.ChildComment;
}
Now if you were having something that has collections of itself you would need to use Collection instead of Reference and do the same sort of diving down. At least that's the approach I took in this scenario as we were dealing with Entity and SQLite.
This is an old question, but the other answers either had n+1 database hits or their models were conducive to bottom-up (trunk to leaves) approaches. In this scenario, a tag list is loaded as a tree, and a tag can have multiple parents. The approach I use only has two database hits: the first to get the tags for the selected articles, then another that eager loads a join table. Thus, this uses a top-down (leaves to trunk) approach; if your join table is large or if the result cannot really be cached for reuse, then eager loading the whole thing starts to show the tradeoffs with this approach.
To begin, I initialize two HashSets: one to hold the root nodes (the resultset), and another to keep a reference to each node that has been "hit."
var roots = new HashSet<AncestralTagDto>(); //no parents
var allTags = new HashSet<AncestralTagDto>();
Next, I grab all of the leaves that the client requested, placing them into an object that holds a collection of children (but that collection will remain empty after this step).
var startingTags = await _dataContext.ArticlesTags
.Include(p => p.Tag.Parents)
.Where(t => t.Article.CategoryId == categoryId)
.GroupBy(t => t.Tag)
.ToListAsync()
.ContinueWith(resultTask =>
resultTask.Result.Select(
grouping => new AncestralTagDto(
grouping.Key.Id,
grouping.Key.Name)));
Now, let's grab the tag self-join table, and load it all into memory:
var tagRelations = await _dataContext.TagsTags.Include(p => p.ParentTag).ToListAsync();
Now, for each tag in startingTags, add that tag to the allTags collection, then travel down the tree to get the ancestors recursively:
foreach (var tag in startingTags)
{
allTags.Add(tag);
GetParents(tag);
}
return roots;
Lastly, here's the nested recursive method that builds the tree:
void GetParents(AncestralTagDto tag)
{
var parents = tagRelations.Where(c => c.ChildTagId == tag.Id).Select(p => p.ParentTag);
if (parents.Any()) //then it's not a root tag; keep climbing down
{
foreach (var parent in parents)
{
//have we already seen this parent tag before? If not, instantiate the dto.
var parentDto = allTags.SingleOrDefault(i => i.Id == parent.Id);
if (parentDto is null)
{
parentDto = new AncestralTagDto(parent.Id, parent.Name);
allTags.Add(parentDto);
}
parentDto.Children.Add(tag);
GetParents(parentDto);
}
}
else //the tag is a root tag, and should be in the root collection. If it's not in there, add it.
{
//this block could be simplified to just roots.Add(tag), but it's left this way for other logic.
var existingRoot = roots.SingleOrDefault(i => i.Equals(tag));
if (existingRoot is null)
roots.Add(tag);
}
}
Under the covers, I am relying on the properties of a HashSet to prevent duplicates. To that end, it's important that the intermediate object that you use (I used AncestralTagDto here, and its Children collection is also a HashSet), override the Equals and GetHashCode methods as appropriate for your use-case.