Why does EntityFramework return entire objects when I only ask for one column? - entity-framework

I am investigating slow performance over the network of my application which uses EntityFramework version 5. One option I tried was to retrieve only the Id field, instead of the entire object. However, examining in Wireshark, I see that all objects are transferred anyway. In other words, the following two code blocks cause exactly the same network activity. Would anyone know how I can cause the db to only return Ids in the first query?
List<long> test = dbContext.UserActivities
.Where(ua => ua.Activity.Id == activityId)
.Select(ua => ua.User)
.ToList()
.Select(u => u.Id)
.ToList();
List<long> test = dbContext.UserActivities
.Where(ua => ua.Activity.Id == activityId)
.Select(ua => ua.User)
.ToList();

.ToList() materializes the object. Essentially, executing the query. Anything after is LINQ to Objects.
try something like this:
List<long> test = dbContext.UserActivities
.Where(ua => ua.Activity.Id == activityId)
.Select(ua => ua.User.Id).ToList();

List<long> test = dbContext.UserActivities
.Where(ua => ua.Activity.Id == activityId)
.Select(ua => ua.User.Id).ToList();

Related

ASP.NET Core 3 EF Framework - transient failure when running lots of queries

I am using EF Core for an ASP.NET website. In one method I have about 7 complex queries similar to (simple example):
var query1 = context.Playarea
.Include(x => x.Cats)
.Where(x => x.Cats.Any())
.SelectMany(x => x.Cats.Select(y => new MyClass(x.Id, y.Name, y.Age))).ToList();
var query2 = context.Playarea
.Include( x => x.Dogs)
.Where(x => x.Dogs.Any())
.SelectMany(x => x.Dogs, (x, y) => new MyClass(x.Id, y.Name, y.Age)).ToList();
var query3 = context.Playarea
.Include( x => x.Dogs)
.Include( x => x.Leads)
.Where(x => x.Dogs.Any())
.SelectMany(x => x.Dogs, (x, y) => new MyClass(x.Id, y.Leads.Name, y.Age)).ToList();
var query4 = context.Playarea
.Include( x => x.Birds)
.Where(x => x.Birds.Any())
.SelectMany(x => x.Birds, (x, y) => new MyClass(x.Id, y.Name, y.Age)).ToList();
return query1.Concat(query2).Concat(query3).Concat(query4).ToList();
This usually works, but occasionally the page crashes with:
An unhandled exception was thrown by the application. System.InvalidOperationException: An exception has been raised that is likely due to a transient failure. Consider enabling transient error resiliency by adding 'EnableRetryOnFailure()' to the 'UseSqlServer' call.
---> Microsoft.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.)
---> System.ComponentModel.Win32Exception (121): The semaphore timeout period has expired.
I know I could add 'EnableRetryOnFailure()' but I am worried that the true cause is that it is running multiple queries at the same time. Is there a way to make this safer and more efficient?
I've tried looking up guides, but none of them cover what to do if you are trying lots of queries at once.
Honestly, I think this probably has less to do with your code and more to do with some sort of network or database issue. While there's ways this code could be improved, there's really no reason it shouldn't consistently work, as-is.
That said, you're blocking on all these queries, which is never a good idea. Everything EF Core does is async. The sync methods merely block on the async ones, and exist only for scenarios where async cannot be used, such as event delegates in desktop apps and such. In short, you should always use the async methods, unless you run into a specific situation where you can't. In an ASP.NET Core application, there is no such situation, so async should always be used at all times. Long and short, use ToListAsync instead of ToList and await each line.
Next, there's no point in your where clauses. Whether you select many on only items that have dogs, for example, or all items, whether or not they have dogs, you still get back the same results. All of this will be run at the database, either way, so there isn't even a performance benefit to one approach or the other. You also don't need to use Include if you're selecting from the relationships. EF is smart enough to issue the joins based on the data it needs to return.
var query1 = await context.Playarea
.SelectMany(x => x.Cats.Select(y => new MyClass(x.Id, y.Name, y.Age))).ToListAsync();
There's also the issue of your two dog queries. The same results will be returned for both queries. The only difference is that one set will have y.Name and the other set will have y.Leads.Name as the Name value. The act of including Leads doesn't somehow filter out result that don't have Leads, and of course, the first query does no filtering at all, either way. I'd imagine you'd want something like the following instead:
var query3 = await context.Playarea
.SelectMany(x => x.Dogs, (x, y) => new MyClass(x.Id, y.Leads is null ? y.Name : y.Leads.Name, y.Age)).ToListAsync();
In other words, Leads.Name will be used if the relationship exists, and it will fallback to Name otherwise.

Scala, paging without max

I have to query an API with pagination. But because the API is weird I have to query it until the response is empty (contains no items).
Actually it is done in Java with a do-while :
do {
objects = doQuery(from);
from += 200
} while ( !objects.isEmpty() )
But I would like to convert it in scala. My first idea is to use a stream with a step and takeWhile :
Stream.iterate(0)(_+200)
.map( from => doQuery(from) )
.takeWhile( objects -> !objects.isEmpty )
But another changes make doQuery returns a Future. Thus I cannot do the test in takeWhile and have no best idea on how to do it (maybe a recursive call).
Hopefully this new code will be into an Akka actor that will tell another for each object (no need to return anything)
This gives you Stream[Future[(Boolean, T)]], where the first time will be false as soon as the rest will be empty. Not sure how to do the takeWhile without blocking.
Stream.iterate(0)(_+200)
.scanLeft(Future((true, List[Int]())))({case (prev, from) =>
prev.flatMap({case (cont, _) =>
if(cont) {
doQuery(from).map(res => (res.isEmpty, res.toList))
} else {
Future((false, List()))
}
})
})

Confused about side effects / ContinueAfter

I have a scenario in which I download parent entities from an api and save them to a database. I then want, once all of the parents have been saved, to download and save their children.
I've seen (or misunderstood) some comments about how this is a side-effect as I will not be passing the result of the parent save operation to the save children operation. I simply want to begin it when the parents are saved.
Could someone explain to me the best way of doing this?
Perhaps try something like this:
Observable
.Create<int>(o =>
{
var parentIds = new int?[] { null };
return
Observable
.While(
() => parentIds.Any(),
parentIds
.ToObservable()
.Select(parentId => Save(parentId)))
.Finally(() => { /* update `parentIds` here with next level */ })
.Subscribe(o);
})
.Subscribe(x => { });
This is effectively doing a breadth-first traversal of all of the entities, saving them as it goes, but outputting a single observable that you can subscribe to.

Get entities that have been only locally removed (not saved)?

I'm doing a "Preview" action so that you can preview your changes on edits before doing them. Effectively what I'm doing is calling the update function (not action) without saving.
Update function code-bits:
// Updates existing schedules to new values; Working
UpdateSchedules(...);
// Removes existing schedules no longer in date-range; Not working
RemoveSchedules(...);
{
...
foreach(schedule in schedulesToRemove) { db.Entry(schedule).State = EntityState.Deleted; }
db.Schedules.RemoveRange(...);
...
}
// Adds schedules new to date-range; Working
AddSchedules(...);
{
...
db.Schedules.Add(...);
...
}
Retrieval Code:
// The results have the modified and added entities, but they also have the removed entities.
viewModel.Results = db.PaymentRecurringSchedules.Where(s => s.PaymentSetupHeaderID == headerID).OrderBy(s => s.PaymentDate).ToList();
My intention with this code is to not call db.SaveChanges() since it's just a preview, but I still want only the schedules that weren't removed (saved or unsaved).
What I've tried
I tried changing the state of the removed items as you can see above.
I tried doing a .Where(s => db.Entry(s).State != EntityState.Deleted) it didn't like that at all (i.e. execution errors).
How can I get the results where the items I removed but are unsaved will still be filtered out of the results list?
Version: 6.0 (In case it matters)
Don't have VS at hand, but i think that this should work as you requested:
List<Object> deletedEntities = context.ChangeTracker.Entries()
.Where(x => x.State == System.Data.EntityState.Deleted)
.Select(x => x.Entity)
.ToList();
Take care that you may want to filter also to specific Entity type/s.

NHibernate equivalent of Entity Framework "Future Queries"

There is an extension to Entity Framework called Future Queries that allows queries to be batched up and processed at the same time.
For example, from their docs:
// build up queries
var q1 = db.Users
.Where(t => t.EmailAddress == "one#test.com")
.Future();
var q2 = db.Tasks
.Where(t => t.Summary == "Test")
.Future();
// this triggers the loading of all the future queries
var users = q1.ToList();
Is there any equivalent in NHibernate, or an extension that might give this type of functionality?
Yes, there is a such feature. Best if you'll read:
Ayende's NHibernate Futures
Small cite:
One of the nicest new features in NHibernate 2.1 is the Future<T>() and FutureValue<T>() functions. They essentially function as a way to defer query execution to a later date, at which point NHibernate will have more information about what the application is supposed to do, and optimize for it accordingly. This build on an existing feature of NHibernate, Multi Queries, but does so in a way that is easy to use and almost seamless.
Some example from that article:
using (var s = sf.OpenSession())
using (var tx = s.BeginTransaction())
{
var blogs = s.CreateCriteria<Blog>()
.SetMaxResults(30)
.Future<Blog>();
var countOfBlogs = s.CreateCriteria<Blog>()
.SetProjection(Projections.Count(Projections.Id()))
.FutureValue<int>();
Console.WriteLine("Number of blogs: {0}", countOfBlogs.Value);
foreach (var blog in blogs)
{
Console.WriteLine(blog.Title);
}
tx.Commit();
}
To expand on #Radim's answer, here's what the equivalent code would look like, using LINQ to NHibernate:
IEnumerable<Users> users = session.Query<Users>()
.Where(p => p.EmailAddress == "one#test.com")
.ToFuture();
IEnumerable<Tasks> tasks = session.Query<Tasks>()
.Where(p => p.Summary == "Test")
.ToFuture();
As you'd expect, both queries will be batched and executed when either needs to be processed.