I'm having issues with slow LINQ query compilation in EF6. I know EF caches the compiled query plans for LINQ queries, but that there are some gotchas (e. g. Enumerable.Contains prevents caching). I'd like to view the cache for debugging purposes to validate whether or not I'm getting proper caching for my queries. How can I do this?
Note: since this is purely for debugging, I'd be happy with an answer using reflection or other means that wouldn't be used in production.
You can reflect down to the cache using something like this (in EF 6.1.3):
var method = context.Database.GetType().GetMethod("CreateStoreItemCollection", BindingFlags.Instance | BindingFlags.NonPublic);
var storeItemsCollection = method.Invoke(context.Database, null);
var queryCacheManagerField = storeItemsCollection.GetType().GetField("_queryCacheManager", BindingFlags.Instance | BindingFlags.NonPublic);
var queryCacheManager = queryCacheManagerField.GetValue(storeItemsCollection);
var cacheField = queryCacheManager.GetType().GetField("_cacheData", BindingFlags.Instance | BindingFlags.NonPublic);
var cacheData = cacheField.GetValue(queryCacheManager) as ICollection;
foreach (var item in cacheData)
{
Console.WriteLine(item.ToString());
}
Unfortunately, all the items in the cache are of internal types (in the System.Data.Entity.Core.Common.QueryCache namespace), so getting useful information out of them will require a bunch more reflection and poking around. Luckily, CompiledQueryCacheKey overrides ToString so it gives up a little (cryptic) information about itself. After running a single query (Table.Count()), the code above spits out these two entries:
[System.Data.Entity.Core.Common.QueryCache.ShaperFactoryQueryCacheKey`1[System.Int32], System.Data.Entity.Core.Common.QueryCache.QueryCacheEntry]
[FUNC<Edm.Count(In Transient.collection[Edm.Int32(Nullable=True,DefaultValue=)](Nullable=True,DefaultValue=))>:ARGS(([Project](BV'LQ1'=([Scan](DashboardAutoContext.Organizations:Transient.collection[DashboardAuto.Organization(Nullable=True,DefaultValue=)]))(1:Edm.Int32(Nullable=True,DefaultValue=)))))|||AppendOnly|True, System.Data.Entity.Core.Common.QueryCache.QueryCacheEntry]
Good luck figuring out what that means, or correlating things when you have a bunch of entries.
A different tactic I've had some success with is to create a class with a method I can assign to DbContext.Database.Log during testing. The class starts a timer when it sees an "Opened connection..." message, then stops the timer when it sees an "-- Executing at...". That time roughly corresponds to the time it takes EF to compile your LINQ query down to SQL. If your query is being cached, that time will shrink to almost nothing after the first time the query is executed; if it's not cached, the time will stay consistently high.
(It should go without saying that you'd only want to do this in a testing context.)
You can inject an IDbCommandTreeInterceptor implementation for logging any query tree creation. I used it successfully in a combination with a call stack keyed dictionary for spotting multiple compilations of "bad" queries.
Related
I have the following exceprt of my query. I am using ASP.NET Core 3.1 project with EF Core.
I read that the server vs client has changed, so how I used to perform the WHERE part in Core 2.1 (using variables from elsewhere in my code) doesn't seem to work anymore.
So as below I have changed (as per something I read) to use ToList() in each part, but now is it not hitting the database more (in my Core 2.1 I would of only had the ToList on the final part as per the code comment below).
So now for Core 3.1 I need to have a dynamic where in the initial "// Load data" part - how do I do a dynamic Where in the initial part, or is there a way, now the server vs client changes are in EF Core to work around that (note it is the final "// Search" part that fails in EF under Core 3.1 (prior to adding the ToList 's)
public List<KBEntryListVM> lstKBEntry;
// Load data
var q = await (from _k in _context.KBEntry
join _kc in _context.KBCategory on _k.CategoryId equals _kc.Id
into _kc2
from _kc3 in _kc2.DefaultIfEmpty()
select new KBEntryListVM()
{
Id = _k.Id,
DateCreated = DateTime.Parse(_k.DateCreated.ToString()),
CategoryId = _k.CategoryId,
CategoryTitle = _kc3.Title.ToString().Trim(),
Text = _k.Text.ToString().Trim(),
Title = _k.Title.ToString().Trim()
}).ToListAsync();
// KBCategory
if (!string.IsNullOrEmpty(c) && Guid.TryParse(c.ToString().Trim(), out var newGuid))
{
q = q.Where(w => w.CategoryId == Guid.Parse($"{c.ToString()}")).ToList();
}
// Search
if (!string.IsNullOrEmpty(s))
{
q = q.Where(w => w.Title.ToLower().Contains($"{s.ToLower()}") || w.CategoryTitle.ToLower().Contains($"{s.ToLower()}") || w.Text.ToLower().Contains($"{s.ToLower()}")).ToList();
}
lstKBEntry = q; //.ToList(); this would of been the only place in Core 2.1 I would of had ToList()
Arthur
So as below I have changed (as per something I read) to use ToList() in each part
EF Core 3.x+ client evaluation exception message suggests to either (1)
rewrite the query in a form that can be translated
or (2)
switch to client evaluation explicitly by inserting a call to either AsEnumerable(), AsAsyncEnumerable(), ToList(), or ToListAsync()
So you are taking the option (2) which is easier, but you should try utilizing the option (1) which is harder, but better from performance perspective and the main reason implicit client evaluation have been removed by EFC 3.0. Option (2) should be your last resort only in case there is no way to apply option (1).
The exception message also contains the failed expression. Unfortunately it is not the exact part, but the whole expression (the whole Where predicate for instance), so you need to analyze it, find the failing part(s) and try to replace them with translatable constructs.
One of the general rules for simple data expressions is to avoid explicit conversions (ToString(), Parse). Store dates and numbers in database as such rather than strings, or utilize value conversions when using old existing database and aren't allowed to change it.
In this particular query, the unsupported (non translatable) construct most likely are ToString() calls of string type properties (e.g. Title, Text). EF Core still supports implicit client evaluation of the final Select, so you won't notice it if there is no Where (or other) clause after that referencing such expressions. But as being said at the beginning, you should avoid them regardless - query the raw data and let the usages (UI) do the desired formatting.
Anyway, I can't tell exactly because you haven't show your model, but removing ToString() should make the query translatable, hence no need of intermediate ToList() or similar client materialization:
CategoryTitle = _kc3.Title.Trim(),
Text = _k.Text.Trim(),
Title = _k.Title.Trim()
You should probably also replace
DateCreated = DateTime.Parse(_k.DateCreated.ToString())
with just
DateCreated = _k.DateCreated
because it seems that DateCreated is already DateTime, so the double conversion through string doesn't make sense and would cause similar troubles. And even if the database type is string, still remove Parse / ToString and setup value converter which does that.
I use the GetTableSql() function in BIML a lot, but I often need to remove some columns from this function before it executes. Is this possible?
You'd have to write your own Extension method to do so. Looking through the code, you might be better off just scrubbing the column(s) from results of the method call - it just depends on what you're looking to do.
The current GetTableSql method is an extension method that chains a call to EmitTableScript which in turn calls a number of methods to build out the returning SQL. At least in the BimlStudio product, EmitTableScript is in Varigence.Biml.CoreLowerer.Capabilities.TableToPackageLowerer class in BimlExtensions.dll
After a bit more thinking, what might be an even better, less support headache would be to create a clone of the table node and then remove the columns you don't want.
Code approximately
var table0 = this.RootNode.Tables[0];
var tablePrime = table0;
// I don't have a biml project handy so this section is a guess
tablePrime.Columns.Clear();
// Might be AddRange if this method exists
// Remove all the columns that start with ignore, as an example of filtering columns
tablePrime.Columns.Add(table0.Columns.Where(x => !x.name.StartsWith("ignore"));
// end guess block
var sql = tablePrime.GetTableSql();
I'm constructing and executing my queries in a way that's independent of EF-Core, so I'm relying on IQueryable<T> to obtain the required level of abstraction. I'm replacing awaited SingleAsync() calls with awaited ToAsyncEnumerable().Single() calls. I'm also replacing ToListAsync() calls with ToAsyncEnumerable().ToList() calls. But I just happened upon the ToAsyncEnumerable() method so I'm unsure I'm using it correctly or not.
To clarify which extension methods I'm referring to, they're defined as follows:
SingleAsync and ToListAsync are defined on the EntityFrameworkQueryableExtensions class in the Microsoft.EntityFrameworkCore namespace and assembly.
ToAsyncEnumerable is defined on the AsyncEnumerable class in the System.Linq namespace in the System.Interactive.Async assembly.
When the query runs against EF-Core, are the calls ToAsyncEnumerable().Single()/ToList() versus SingleAsync()/ToListAsync() equivalent in function and performance? If not then how do they differ?
For methods returning sequence (like ToListAsync, ToArrayAsync) I don't expect a difference.
However for single value returning methods (the async versions of First, FirstOrDefault, Single, Min, Max, Sum etc.) definitely there will be a difference. It's the same as the difference by executing those methods on IQueryable<T> vs IEnumerable<T>. In the former case they are processed by database query returning a single value to the client while in the later the whole result set will be returned to the client and processed in memory.
So, while in general the idea of abstracting EF Core is good, it will cause performance issues with IQueryable<T> because the async processing of queryables is not standartized, and converting to IEnumerable<T> changes the execution context, hence the implementation of single value returning LINQ methods.
P.S. By standardization I mean the following. The synchronous processing of IQueryable is provided by IQueryProvider (standard interface from System.Linq namespace in System.Core.dll assembly) Execute methods. Asynchronous processing would require introducing another standard interface similar to EF Core custom IAsyncQueryProvider (inside Microsoft.EntityFrameworkCore.Query.Internal namespace in Microsoft.EntityFrameworkCore.dll assembly). Which I guess requires cooperation/approval from the BCL team and takes time, that's why they decided to take a custom path for now.
When the original source is a DbSet, ToAsyncEnumerable().Single() is not as performant as SingleAsync() in the exceptional case where the database contains more than one matching row. But in in the more likely scenario, where you both expect and receive only one row, it's the same. Compare the generated SQL:
SingleAsync():
SELECT TOP(2) [l].[ID]
FROM [Ls] AS [l]
ToAsyncEnumerable().Single():
SELECT [l].[ID]
FROM [Ls] AS [l]
ToAsyncEnumerable() breaks the IQueryable call chain and enters LINQ-to-Objects land. Any downstream filtering occurs in memory. You can mitigate this problem by doing your filtering upstream. So instead of:
ToAsyncEnumerable().Single( l => l.Something == "foo" ):
SELECT [l].[ID], [l].[Something]
FROM [Ls] AS [l]
you can do:
Where( l => l.Something == "foo" ).ToAsyncEnumerable().Single():
SELECT [l].[ID], [l].[Something]
FROM [Ls] AS [l]
WHERE [l].[Something] = N'foo'
If that approach still leaves you squirmish then, as an alternative, consider defining extension methods like this one:
using System.Linq;
using System.Threading.Tasks;
using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.Query.Internal;
static class Extensions
{
public static Task<T> SingleAsync<T>( this IQueryable<T> source ) =>
source.Provider is IAsyncQueryProvider
? EntityFrameworkQueryableExtensions.SingleAsync( source )
: Task.FromResult( source.Single() );
}
According to the official Microsoft documentation for EF Core (all versions, including the current 2.1 one):
This API supports the Entity Framework Core infrastructure and is not intended to be used directly from your code. This API may change or be removed in future releases.
Source: https://learn.microsoft.com/en-us/dotnet/api/microsoft.entityframeworkcore.query.internal.asynclinqoperatorprovider.toasyncenumerable?view=efcore-2.1
p.s. I personally found it problematic in combination with the AutoMapper tool (at least, until ver. 6.2.2) - it just doesn't map collection of type IAsyncEnumerable (unlike IEnumerable, with which the AutoMapper works seamlessly).
I took a peek at the source code of Single (Line 90).
It cleary illustrates that the enumerator is only advanced once (for a successful operation).
using (var e = source.GetEnumerator())
{
if (!await e.MoveNext(cancellationToken)
.ConfigureAwait(false))
{
throw new InvalidOperationException(Strings.NO_ELEMENTS);
}
var result = e.Current;
if (await e.MoveNext(cancellationToken)
.ConfigureAwait(false))
{
throw new InvalidOperationException(Strings.MORE_THAN_ONE_ELEMENT);
}
return result;
}
Since this kind of implementation is as good as it gets (nowadays), one can say with certainty that using the Ix Single Operator would not harm performance.
As for SingleAsync, you can be sure that it is implemented in a similar manner, and even if it is not (which is doubtful), it could not outperform the Ix Single operator.
I have a design problem, but don't know how to fix it. I have a Policy object, with a boolean property like so:
public bool IsCancelled
{
get
{
return (CancellationDate != null && Convert.ToDateTime(CancellationDate) < DateTime.Today);
}
}
The problem with this approach is that if I want to get...
context.Policies.where(q => q.IsCancelled)
...LinqToEntities can't execute this against the database; I must load every policy object into memory, like this statement below, which kills performance and is completely unnecessary:
context.policies.ToList().where(q => q.IsCancelled)
A colleague tells me I should be able to use a Func or Expression to do this, but I'm at a loss as to what phrase to even Google for this. Can someone recommend a link or two that explains how to do this?
Keep in mind, I want this to be available to queries like the one above, and to an instance of a Policy object in memory, without having to code the logic twice (DRY and all that).
Thanks.
The problem is your Convert method. I assume CancelationDate is a string. The real problem here is that SQL doesn't do date comparisons as strings, they need to be in date format. This can't be translated to SQL, and thus won't work in the database.
You really should be storing dates as the date type, not as strings. Then it would be trivial. If you can change this, then do it, then no conversion is necessary.
Your other option is to futz with the EntityFunctions, SqlFunctions, DbFunctions to try to make it work.
See:
Comparing date with string Entity Framework
I'm trying to use a SqlDataReader (I'm quite aware of the beauty of Linq, etc, but the application I'm building is partly a Sql Generator, so Linq doesn't fit my needs). Unfortunately, I'm not sure what the best practices are when using SqlDataReader. I use code like the following in several places in my code:
using (SqlDataReader reader = ...)
{
int ID = reader.GetInt32(0);
int tableID = reader.GetInt32(1);
string fieldName = reader[2] as string;
...//More, similar code
}
But it feels very unstable. If the database changes (which is actually extremely unlikely in this case) the code breaks. Is there an equivalent to SqlDataReader's GetInt32, GetString, GetDecimal, that takes a column name instead of an index? What's considered best practice in this case? What's fastest? These parts of my code are the most time intensive portions of my code (I've profiled it a few times) and so speed is important.
[EDIT]
I'm aware of using the indexer with a string, I misworded the above. I'm running into slow runtime. My code works fine, but I am looking for any way I can steal back a few seconds inside these loops. Would accessing by string slow me down? I know that the db-access is the primary time intensive operation, there's nothing I can do about that, so I want to cut back the processing time for each element accessed.
[EDIT]
I've decided to just run with GetOrdinal unless someone has more concrete examples. I'll run efficiency test later. I'll try to remember to post them when I actually run the tests.
The indexer property takes a string key, so you can do the following.
reader["text_column"] as string;
Convert.ToInt32(reader["numeric_column"]);
Additional suggestion
If you're concerned about the string lookup being slow, and assuming numeric lookup is quicker, you could try using GetOrdinal to find the column indices before looping through a large result set.
int textColumnIndex = reader.GetOrdinal("text_column");
int numericColumnIndex = reader.GetOrdinal("numeric_column");
while (reader.Read())
{
string text = reader[textColumnIndex] as string;
int number = Convert.ToInt32(reader[numericColumnIndex]);
}