Working with Entity Framework client vs server evaluation changes - entity-framework-core

I have the following exceprt of my query. I am using ASP.NET Core 3.1 project with EF Core.
I read that the server vs client has changed, so how I used to perform the WHERE part in Core 2.1 (using variables from elsewhere in my code) doesn't seem to work anymore.
So as below I have changed (as per something I read) to use ToList() in each part, but now is it not hitting the database more (in my Core 2.1 I would of only had the ToList on the final part as per the code comment below).
So now for Core 3.1 I need to have a dynamic where in the initial "// Load data" part - how do I do a dynamic Where in the initial part, or is there a way, now the server vs client changes are in EF Core to work around that (note it is the final "// Search" part that fails in EF under Core 3.1 (prior to adding the ToList 's)
public List<KBEntryListVM> lstKBEntry;
// Load data
var q = await (from _k in _context.KBEntry
join _kc in _context.KBCategory on _k.CategoryId equals _kc.Id
into _kc2
from _kc3 in _kc2.DefaultIfEmpty()
select new KBEntryListVM()
{
Id = _k.Id,
DateCreated = DateTime.Parse(_k.DateCreated.ToString()),
CategoryId = _k.CategoryId,
CategoryTitle = _kc3.Title.ToString().Trim(),
Text = _k.Text.ToString().Trim(),
Title = _k.Title.ToString().Trim()
}).ToListAsync();
// KBCategory
if (!string.IsNullOrEmpty(c) && Guid.TryParse(c.ToString().Trim(), out var newGuid))
{
q = q.Where(w => w.CategoryId == Guid.Parse($"{c.ToString()}")).ToList();
}
// Search
if (!string.IsNullOrEmpty(s))
{
q = q.Where(w => w.Title.ToLower().Contains($"{s.ToLower()}") || w.CategoryTitle.ToLower().Contains($"{s.ToLower()}") || w.Text.ToLower().Contains($"{s.ToLower()}")).ToList();
}
lstKBEntry = q; //.ToList(); this would of been the only place in Core 2.1 I would of had ToList()
Arthur

So as below I have changed (as per something I read) to use ToList() in each part
EF Core 3.x+ client evaluation exception message suggests to either (1)
rewrite the query in a form that can be translated
or (2)
switch to client evaluation explicitly by inserting a call to either AsEnumerable(), AsAsyncEnumerable(), ToList(), or ToListAsync()
So you are taking the option (2) which is easier, but you should try utilizing the option (1) which is harder, but better from performance perspective and the main reason implicit client evaluation have been removed by EFC 3.0. Option (2) should be your last resort only in case there is no way to apply option (1).
The exception message also contains the failed expression. Unfortunately it is not the exact part, but the whole expression (the whole Where predicate for instance), so you need to analyze it, find the failing part(s) and try to replace them with translatable constructs.
One of the general rules for simple data expressions is to avoid explicit conversions (ToString(), Parse). Store dates and numbers in database as such rather than strings, or utilize value conversions when using old existing database and aren't allowed to change it.
In this particular query, the unsupported (non translatable) construct most likely are ToString() calls of string type properties (e.g. Title, Text). EF Core still supports implicit client evaluation of the final Select, so you won't notice it if there is no Where (or other) clause after that referencing such expressions. But as being said at the beginning, you should avoid them regardless - query the raw data and let the usages (UI) do the desired formatting.
Anyway, I can't tell exactly because you haven't show your model, but removing ToString() should make the query translatable, hence no need of intermediate ToList() or similar client materialization:
CategoryTitle = _kc3.Title.Trim(),
Text = _k.Text.Trim(),
Title = _k.Title.Trim()
You should probably also replace
DateCreated = DateTime.Parse(_k.DateCreated.ToString())
with just
DateCreated = _k.DateCreated
because it seems that DateCreated is already DateTime, so the double conversion through string doesn't make sense and would cause similar troubles. And even if the database type is string, still remove Parse / ToString and setup value converter which does that.

Related

Replacement for deprecated PostgresDataType.JSON?

I'm using JOOQ with PostgreSQL, and trying to implement a query like this:
INSERT INTO dest_table (id,name,custom_data)
SELECT key as id,
nameproperty as name,
CONCAT('{"propertyA": "',property_a,'", "propertyB": "',property_b,'","propertyC": "',property_c,'"}')::json as custom_data
FROM source_table
The concatenation/JSON bit is what I'm here to ask about. I actually have managed to get it working, but only by using this (Kotlin):
val concatBits = mutableListOf<Field<Any>>()
... build up various bits of the concatenation ...
val concatField = concat(*(concatBits.toTypedArray())).cast(PostgresDataType.JSON)
It concerns me that PostgresDataType is deprecated. The documentation says I should use SQLDataType instead, but it has no JSON value.
What's the recommended way to do this?
EDIT: a bit more information ...
I'm building the query like this:
val innerSelectFields = listOf(
field("key").`as`(DEST_TABLE.ID),
field("nameproperty").`as`(DEST_TABLE.NAME),
concatField.`as`(DEST_TABLE.CUSTOM_DATA)
)
val innerSelect = dslContext
.select(innerSelectFields)
.from(table("source_table"))
val insertInto = dslContext
.insertInto(DEST_TABLE)
.select(innerSelect)
The initial query I posted is slightly misleading, as the resulting SQL from this code doesn't have the
(id,name,custom_data) part.
Also, in case it matters, "source_table" is a temporary table, created during runtime, so there are no autogenerated classes for it.
jOOQ currently doesn't support the JSON data type out of the box. The main reason is that it is unclear what Java type to bind a JSON data structure to, as the JDK doesn't have such a standard type, and jOOQ will not prefer one third party library over the other.
The currently recommended approach is to create your own custom data type binding for your preferred third party JSON library:
https://www.jooq.org/doc/latest/manual/code-generation/custom-data-type-bindings
In that case, you will no longer need to explicitly cast your bind variable to some JSON type, because your binding will take care of that transparently.

Zend\db\sql - prepareStatementForSqlObject - still need to bind or worry about sql injection?

I'm using zf 2.4 and for this example in Zend\db\sql. Do I need to worry about sql injection or do I still need to do quote() or escape anything if I already use prepareStatementForSqlObject()? The below example will do the blind variable already?
https://framework.zend.com/manual/2.4/en/modules/zend.db.sql.html
use Zend\Db\Sql\Sql;
$sql = new Sql($adapter);
$select = $sql->select();
$select->from('foo');
$select->where(array('id' => $id));
$statement = $sql->prepareStatementForSqlObject($select);
$results = $statement->execute();
The Select class will cleverly check your predicate(s) and add them in a safe manner to the query to prevent SQL-injection. I'd recommend you to take a look at the source for yourself so I'll point you to the process and the classes that are responsible for this in the latest ZF version.
Predicate Processing
Take a look at the class PredicateSet. The method \Zend\Db\Sql\Predicate::addPredicates determines the best way to handle your predicate based on their type. In your case you are using an associative array. Every item in that array will be checked and processed based on type:
If an abstraction replacement character (questionmark) is found, it will be turned into an Expression.
If the value is NULL, an IS NULL check will be performed on the column found in the key: WHERE key IS NULL.
If the value is an array, and IN check will be performed on the kolumn found in the key: WHERE key IN (arrayVal1, arrayVal2, ...).
Otherwise, the predicate will be a new Operator of the type 'equals': WHERE key = value.
In each case the final predicate to be added to the Select will be implementing PredicateInterface
Preparing the statement
The method \Zend\Db\Sql\Sql::prepareStatementForSqlObject instructs its adapter (i.e. PDO) to create a statement that will be prepared. From here it gets a little bit more complicated.
\Zend\Db\Sql is where the real magic happens where in method \Zend\Db\Sql::createSqlFromSpecificationAndParameters the function vsprintf is used to build the query strings, as you can see here.
NotePlease consider using the new docs.framework.zend.com website from now on. This website is leading when it comes to documentation of the latest version.

ToAsyncEnumerable().Single() vs SingleAsync()

I'm constructing and executing my queries in a way that's independent of EF-Core, so I'm relying on IQueryable<T> to obtain the required level of abstraction. I'm replacing awaited SingleAsync() calls with awaited ToAsyncEnumerable().Single() calls. I'm also replacing ToListAsync() calls with ToAsyncEnumerable().ToList() calls. But I just happened upon the ToAsyncEnumerable() method so I'm unsure I'm using it correctly or not.
To clarify which extension methods I'm referring to, they're defined as follows:
SingleAsync and ToListAsync are defined on the EntityFrameworkQueryableExtensions class in the Microsoft.EntityFrameworkCore namespace and assembly.
ToAsyncEnumerable is defined on the AsyncEnumerable class in the System.Linq namespace in the System.Interactive.Async assembly.
When the query runs against EF-Core, are the calls ToAsyncEnumerable().Single()/ToList() versus SingleAsync()/ToListAsync() equivalent in function and performance? If not then how do they differ?
For methods returning sequence (like ToListAsync, ToArrayAsync) I don't expect a difference.
However for single value returning methods (the async versions of First, FirstOrDefault, Single, Min, Max, Sum etc.) definitely there will be a difference. It's the same as the difference by executing those methods on IQueryable<T> vs IEnumerable<T>. In the former case they are processed by database query returning a single value to the client while in the later the whole result set will be returned to the client and processed in memory.
So, while in general the idea of abstracting EF Core is good, it will cause performance issues with IQueryable<T> because the async processing of queryables is not standartized, and converting to IEnumerable<T> changes the execution context, hence the implementation of single value returning LINQ methods.
P.S. By standardization I mean the following. The synchronous processing of IQueryable is provided by IQueryProvider (standard interface from System.Linq namespace in System.Core.dll assembly) Execute methods. Asynchronous processing would require introducing another standard interface similar to EF Core custom IAsyncQueryProvider (inside Microsoft.EntityFrameworkCore.Query.Internal namespace in Microsoft.EntityFrameworkCore.dll assembly). Which I guess requires cooperation/approval from the BCL team and takes time, that's why they decided to take a custom path for now.
When the original source is a DbSet, ToAsyncEnumerable().Single() is not as performant as SingleAsync() in the exceptional case where the database contains more than one matching row. But in in the more likely scenario, where you both expect and receive only one row, it's the same. Compare the generated SQL:
SingleAsync():
SELECT TOP(2) [l].[ID]
FROM [Ls] AS [l]
ToAsyncEnumerable().Single():
SELECT [l].[ID]
FROM [Ls] AS [l]
ToAsyncEnumerable() breaks the IQueryable call chain and enters LINQ-to-Objects land. Any downstream filtering occurs in memory. You can mitigate this problem by doing your filtering upstream. So instead of:
ToAsyncEnumerable().Single( l => l.Something == "foo" ):
SELECT [l].[ID], [l].[Something]
FROM [Ls] AS [l]
you can do:
Where( l => l.Something == "foo" ).ToAsyncEnumerable().Single():
SELECT [l].[ID], [l].[Something]
FROM [Ls] AS [l]
WHERE [l].[Something] = N'foo'
If that approach still leaves you squirmish then, as an alternative, consider defining extension methods like this one:
using System.Linq;
using System.Threading.Tasks;
using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.Query.Internal;
static class Extensions
{
public static Task<T> SingleAsync<T>( this IQueryable<T> source ) =>
source.Provider is IAsyncQueryProvider
? EntityFrameworkQueryableExtensions.SingleAsync( source )
: Task.FromResult( source.Single() );
}
According to the official Microsoft documentation for EF Core (all versions, including the current 2.1 one):
This API supports the Entity Framework Core infrastructure and is not intended to be used directly from your code. This API may change or be removed in future releases.
Source: https://learn.microsoft.com/en-us/dotnet/api/microsoft.entityframeworkcore.query.internal.asynclinqoperatorprovider.toasyncenumerable?view=efcore-2.1
p.s. I personally found it problematic in combination with the AutoMapper tool (at least, until ver. 6.2.2) - it just doesn't map collection of type IAsyncEnumerable (unlike IEnumerable, with which the AutoMapper works seamlessly).
I took a peek at the source code of Single (Line 90).
It cleary illustrates that the enumerator is only advanced once (for a successful operation).
using (var e = source.GetEnumerator())
{
if (!await e.MoveNext(cancellationToken)
.ConfigureAwait(false))
{
throw new InvalidOperationException(Strings.NO_ELEMENTS);
}
var result = e.Current;
if (await e.MoveNext(cancellationToken)
.ConfigureAwait(false))
{
throw new InvalidOperationException(Strings.MORE_THAN_ONE_ELEMENT);
}
return result;
}
Since this kind of implementation is as good as it gets (nowadays), one can say with certainty that using the Ix Single Operator would not harm performance.
As for SingleAsync, you can be sure that it is implemented in a similar manner, and even if it is not (which is doubtful), it could not outperform the Ix Single operator.

Spring Data Neo4j - ORDER BY {order} fails

I have a query where the result should be ordered depending on the passed parameter:
#Query("""MATCH (u:User {userId:{uid}})-[:KNOWS]-(:User)-[h:HAS_STUFF]->(s:Stuff)
WITH s, count(h) as count ORDER BY count {order}
RETURN o, count SKIP {skip} LIMIT {limit}""")
fun findFromOthersByUserIdAndSortByAmountOfStuff(
#Param("uid") userId: String,
#Param("skip") skip: Int,
#Param("limit") limit: Int,
#Param("order) order: String): List<StuffWithCountResult>
For the order parameter I use the following enum and its sole method:
enum class SortOrder {
ASC,
DESC;
fun toNeo4JSortOrder(): String {
when(this) {
ASC -> return ""
DESC -> return "DESC"
}
}
}
It seems that SDN does not handle the {order} parameter properly? On execution, I get an exception telling that
Caused by: org.neo4j.kernel.impl.query.QueryExecutionKernelException: Invalid input 'R': expected whitespace, comment or a relationship pattern (line 3, column 5 (offset: 244))
" RETURN o, count SKIP {skip} LIMIT {limit}"
^
If I remove the parameter from the Cypher statement or replace it with a hardcoded DESC the method succeeds. I believe it's not because of the enum since I use (other) enums in other repository methods and all these methods succeed. I already tried a different parameter naming like sortOrder, but this did not help.
What am I missing here?
This is the wrong model for changing sorting and paging information. You can skip to the answer below for using those options, or continue reading for an explanation of what is wrong in your code as it stands.
You cannot bind where things aren't allowed to be bound:
You cannot bind a parameter into a syntax element of the query that is not setup for "parameter binding". Parameter binding doesn't do simple string substitutions (because you would be open for injection attacks) but rather uses binding API's to bind parameters. You are treating the query annotation like it is performing string substitution instead, and that is not what is happening.
The parameter binding docs for Neo4J and the Java manual for Query Parameters show exactly where you can bind, the only places allowed are:
in place of String Literals
in place of Regular Expressions
String Pattern Matching
Create node with properties, as the properties
Create multiple nodes with properties, as the properties
Setting all properties of a node
numeric values for SKIP and LIMIT
as the Node ID
as multiple Node IDs
Index Value
Index Query
There is nothing that says what you are trying is allowed, binding in the ORDER BY clause.
That isn't to say that the authors of Spring Data couldn't work around this and allow binding in other places, but it doesn't appear they have done more than what Neo4J Java API allows.
You can instead use the Sort class:
(the fix to allow this is marked for version 4.2.0.M1 which is a pre-release as of Sept 8, 2016, see below for using milestone builds)
Spring Data has a Sort class, if your #Query annotated method has a parameter of this type, it should apply sorting and allow that to dynamically modify the query.
I assume the code would look something like (untested):
#Query("MATCH (movie:Movie {title={0}})<-[:ACTS_IN]-(actor) RETURN actor")
List<Actor> getActorsThatActInMovieFromTitle(String movieTitle, Sort sort);
Or you can use the PageRequest class / Pageable interface:
(the fix to allow this is marked for version 4.2.0.M1 which is a pre-release as of Sept 8, 2016, see below for using milestone builds)
In current Spring Data + Neo4j docs you see examples using paging:
#Query("MATCH (movie:Movie {title={0}})<-[:ACTS_IN]-(actor) RETURN actor")
Page<Actor> getActorsThatActInMovieFromTitle(String movieTitle, PageRequest page);
(sample from Cypher Examples in the Spring Data + Neo4j docs)
And this PageRequest class also allows sorting parameterization. Anything that implements Pageable will do the same. Using Pageable instead is probably more proper:
#Query("MATCH (movie:Movie {title={0}})<-[:ACTS_IN]-(actor) RETURN actor")
Page<Actor> getActorsThatActInMovieFromTitle(String movieTitle, Pageable page);
You might be able to use SpEL in earlier versions:
As an alternative, you can look at using SpEL expressions to do substitutions in other areas of the query. I am not familiar with it but it says:
Since this mechanism exposes special parameter types like Sort or Pageable as well, we’re now able to use pagination in native queries.
But the official docs seem to say it is more limited.
And you should know this other information:
Here is someone reporting your exact same problem in a GitHub issue. Which then leads to DATAGRAPH-653 issue which was marked as fixed in version 4.2.0.M1. This references other SO questions here which are outdated so you should ignore those like Paging and sorting in Spring Data Neo4j 4 which are no longer correct.
Finding Spring Data Neo4j Milestone Builds:
You can view the dependencies information for any release on the project page. And for the 4.2.0.M1 build the information for Gradle (you can infer Maven) is:
dependencies {
compile 'org.springframework.data:spring-data-neo4j:4.2.0.M1'
}
repositories {
maven {
url 'https://repo.spring.io/libs-milestone'
}
}
Any newer final release should be used instead.

How can I view the Entity Framework LINQ query plan cache?

I'm having issues with slow LINQ query compilation in EF6. I know EF caches the compiled query plans for LINQ queries, but that there are some gotchas (e. g. Enumerable.Contains prevents caching). I'd like to view the cache for debugging purposes to validate whether or not I'm getting proper caching for my queries. How can I do this?
Note: since this is purely for debugging, I'd be happy with an answer using reflection or other means that wouldn't be used in production.
You can reflect down to the cache using something like this (in EF 6.1.3):
var method = context.Database.GetType().GetMethod("CreateStoreItemCollection", BindingFlags.Instance | BindingFlags.NonPublic);
var storeItemsCollection = method.Invoke(context.Database, null);
var queryCacheManagerField = storeItemsCollection.GetType().GetField("_queryCacheManager", BindingFlags.Instance | BindingFlags.NonPublic);
var queryCacheManager = queryCacheManagerField.GetValue(storeItemsCollection);
var cacheField = queryCacheManager.GetType().GetField("_cacheData", BindingFlags.Instance | BindingFlags.NonPublic);
var cacheData = cacheField.GetValue(queryCacheManager) as ICollection;
foreach (var item in cacheData)
{
Console.WriteLine(item.ToString());
}
Unfortunately, all the items in the cache are of internal types (in the System.Data.Entity.Core.Common.QueryCache namespace), so getting useful information out of them will require a bunch more reflection and poking around. Luckily, CompiledQueryCacheKey overrides ToString so it gives up a little (cryptic) information about itself. After running a single query (Table.Count()), the code above spits out these two entries:
[System.Data.Entity.Core.Common.QueryCache.ShaperFactoryQueryCacheKey`1[System.Int32], System.Data.Entity.Core.Common.QueryCache.QueryCacheEntry]
[FUNC<Edm.Count(In Transient.collection[Edm.Int32(Nullable=True,DefaultValue=)](Nullable=True,DefaultValue=))>:ARGS(([Project](BV'LQ1'=([Scan](DashboardAutoContext.Organizations:Transient.collection[DashboardAuto.Organization(Nullable=True,DefaultValue=)]))(1:Edm.Int32(Nullable=True,DefaultValue=)))))|||AppendOnly|True, System.Data.Entity.Core.Common.QueryCache.QueryCacheEntry]
Good luck figuring out what that means, or correlating things when you have a bunch of entries.
A different tactic I've had some success with is to create a class with a method I can assign to DbContext.Database.Log during testing. The class starts a timer when it sees an "Opened connection..." message, then stops the timer when it sees an "-- Executing at...". That time roughly corresponds to the time it takes EF to compile your LINQ query down to SQL. If your query is being cached, that time will shrink to almost nothing after the first time the query is executed; if it's not cached, the time will stay consistently high.
(It should go without saying that you'd only want to do this in a testing context.)
You can inject an IDbCommandTreeInterceptor implementation for logging any query tree creation. I used it successfully in a combination with a call stack keyed dictionary for spotting multiple compilations of "bad" queries.