What are the ways to optimize Entity Framework queries with Contains()?

What are the ways to optimize Entity Framework queries with Contains()? - entity-framework

Wee load large object graph from DB.
The query has many Includes and Where()uses Contains() to filter the final result.
Contains is called for the collection containing about thousand entries.
The profiler shows monstrous human-unreadable SQL.
The query cannot be precompiled because of Contains().
Is there any ways for optimization of such queries?
Update
public List<Vulner> GetVulnersBySecurityObjectIds(int[] softwareIds, int[] productIds)
{
var sw = new Stopwatch();
var query = from vulner in _businessModel.DataModel.VulnerSet
join vt in _businessModel.DataModel.ObjectVulnerTieSet.Where(ovt => softwareIds.Contains(ovt.SecurityObjectId))
on vulner.Id equals vt.VulnerId
select vulner;
var result = ((ObjectQuery<Vulner>)query.OrderBy(v => v.Id).Distinct())
.Include("Descriptions")
.Include("Data")
.Include("VulnerStatuses")
.Include("GlobalIdentifiers")
.Include("ObjectVulnerTies")
.Include("Object.ProductObjectTies.Product")
.Include("VulnerComment");
//Если переданы конкретные продукты, добавляем фильтрацию
if (productIds.HasValues())
result = (ObjectQuery<Vulner>)result.Where(v => v.Object.ProductObjectTies.Any(p => productIds.Contains(p.ProductId)));
sw.Start();
var str = result.ToTraceString();
sw.Stop();
Debug.WriteLine("Сборка запроса заняла {0} секунд.", sw.Elapsed.TotalSeconds);
sw.Restart();
var list = result.ToList();
sw.Stop();
Debug.WriteLine("Получение уязвимостей заняло {0} секунд.", sw.Elapsed.TotalSeconds);
return list;
}

It's almost certain that splitting the query in pieces performs better, in spite of more db round trips. It is always advised to limit the number of includes, because they not only blow up the size and complexity of the query (as you noticed) but also blow up the result set both in length and in width. Moreover, they often get translated into outer joins.
Apart from that, using Contains the way you do is OK.
Sorry, it is hard to be more specific without knowing your data model and the size of the tables involved.

Related

Validate data exits using EF core best practices

List list = new List();
I have a list of Guid. What is the best to check all guid exits or not using ef core table?
I am currently using the below code but the performance is very bad. assume user table as 1 million records.
for Example
public async Task<bool> IsIdListValid(IEnumerable<int> idList)
{
var validIds = await _context.User.Select(x => x.Id).ToListAync();
return idList.All(x => validIds.Contains(x));
}

The performance is bad because you are reading each row of the table into memory, and then iterating through it (ToList materializes the query.) Try using the Any() method to take advantage of the strength of the database. Use something like the following: bool exists = _context.User.Any(u => idList.Contains(u));. This should translate to an SQL IN clause.

Provided you assert that the # of IDs being sent in is kept reasonable, you could do the following:
var idCount = _context.User.Where(x => idList.Contains(x.Id)).Count();
return idCount == idList.Count;
This assumes that you are comparing on a unique constraint like the PK. We get a count of how many rows have a matching ID from the list, then compare that to the count of IDs sent.
If you're passing a large # of IDs, you would need to break the list up into reasonable sets as there are limits to what you can do with an IN clause and potential performance costs as well.

How to combine LINQ contains query with count on all records and retrieve records with offset

I have page with pagination and search on movies, the movies table has 388262 records. Without using search the below code works fast. But when using search (vm.Search is filled) it becomes slow to retrieve the records.
So basically it's doing the contains/like query two times, one for returning the results with the offset and one for the count without the offset, this is correct, but is there a way to combine this for improved performance?
The weird thing is, when I use the queries from the log and execute them directly on the database using SSMS (SQL Server Management Studio). It executes the two queries rather fast in around 2 seconds. While it takes around 10 seconds when it's executed with the code/linq?
The code where I do the LINQ:
public IActionResult Index(MovieIndexViewModel vm) {
IQueryable<Movie> query = _movieRepository.GetQueryable().AsNoTracking()
.Select(m => new Movie {
movie_id = m.movie_id,
title = m.title,
description = m.description
});
if (!string.IsNullOrWhiteSpace(vm.Search)) {
query = query.Where(m => m.title.Contains(vm.Search));
}
vm.Movies = query.Skip(_pageSize * (vm.Page - 1)).Take(_pageSize).ToList();
vm.PageSize = _pageSize;
vm.TotalItemCount = query.Count();
return View(vm);
}
SQL log from the the line with Skip and Take:
SELECT [m].[movie_id], [m].[title], [m].[description]
FROM [Movie] AS [m]
WHERE [m].[title] LIKE ('%' + #__vm_Search_0) + '%'
ORDER BY ##ROWCOUNT
OFFSET 0 ROWS FETCH NEXT 30 ROWS ONLY
SQL log from the line with Count:
SELECT COUNT(*)
FROM [Movie] AS [m]
WHERE [m].[title] LIKE ('%' + #__vm_Search_0) + '%'

The below code will run both queries concurrently. However this code will only perform ~[latency to SQL server] faster, which should be in the order of 10ms.
public async Task<IActionResult> Index(MovieIndexViewModel vm) {
IQueryable<Movie> query = _movieRepository.GetQueryable().AsNoTracking()
.Select(m => new Movie {
movie_id = m.movie_id,
title = m.title,
description = m.description
});
if (!string.IsNullOrWhiteSpace(vm.Search)) {
query = query.Where(m => m.title.Contains(vm.Search));
}
var moviesTask = query.Skip(_pageSize * (vm.Page - 1)).Take(_pageSize).ToListAsync();
var countTask = query.CountAsync();
vm.Movies = await moviesTask;
vm.PageSize = _pageSize;
vm.TotalItemCount = await countTask;
return View(vm);
}
I suspect you aren't looking for a 10ms performance tweak. Instead, focus on the query plan.
Things you should try.
Add .OrderBy(m => m.title) into query
Add more indices
Clear the query cache
Things to check
Are you accurately benchmarking? Is that 8 seconds of JITer followed by 2 seconds of actual run?
Any weird HTTP systems like proxies in the way?
DNS?
Are you running in release or debug mode?

Count in jpa without getting result [duplicate]

I like the idea of Named Queries in JPA for static queries I'm going to do, but I often want to get the count result for the query as well as a result list from some subset of the query. I'd rather not write two nearly identical NamedQueries. Ideally, what I'd like to have is something like:
#NamedQuery(name = "getAccounts", query = "SELECT a FROM Account")
.
.
Query q = em.createNamedQuery("getAccounts");
List r = q.setFirstResult(s).setMaxResults(m).getResultList();
int count = q.getCount();
So let's say m is 10, s is 0 and there are 400 rows in Account. I would expect r to have a list of 10 items in it, but I'd want to know there are 400 rows total. I could write a second #NamedQuery:
#NamedQuery(name = "getAccountCount", query = "SELECT COUNT(a) FROM Account")
but it seems a DRY violation to do that if I'm always just going to want the count. In this simple case it is easy to keep the two in sync, but if the query changes, it seems less than ideal that I have to update both #NamedQueries to keep the values in line.
A common use case here would be fetching some subset of the items, but needing some way of indicating total count ("Displaying 1-10 of 400").

So the solution I ended up using was to create two #NamedQuerys, one for the result set and one for the count, but capturing the base query in a static string to maintain DRY and ensure that both queries remain consistent. So for the above, I'd have something like:
#NamedQuery(name = "getAccounts", query = "SELECT a" + accountQuery)
#NamedQuery(name = "getAccounts.count", query = "SELECT COUNT(a)" + accountQuery)
.
static final String accountQuery = " FROM Account";
.
Query q = em.createNamedQuery("getAccounts");
List r = q.setFirstResult(s).setMaxResults(m).getResultList();
int count = ((Long)em.createNamedQuery("getAccounts.count").getSingleResult()).intValue();
Obviously, with this example, the query body is trivial and this is overkill. But with much more complex queries, you end up with a single definition of the query body and can ensure you have the two queries in sync. You also get the advantage that the queries are precompiled and at least with Eclipselink, you get validation at startup time instead of when you call the query.
By doing consistent naming between the two queries, it is possible to wrap the body of the code to run both sets just by basing the base name of the query.

Using setFirstResult/setMaxResults do not return a subset of a result set, the query hasn't even been run when you call these methods, they affect the generated SELECT query that will be executed when calling getResultList. If you want to get the total records count, you'll have to SELECT COUNT your entities in a separate query (typically before to paginate).
For a complete example, check out Pagination of Data Sets in a Sample Application using JSF, Catalog Facade Stateless Session, and Java Persistence APIs.

oh well you can use introspection to get named queries annotations like:
String getNamedQueryCode(Class<? extends Object> clazz, String namedQueryKey) {
NamedQueries namedQueriesAnnotation = clazz.getAnnotation(NamedQueries.class);
NamedQuery[] namedQueryAnnotations = namedQueriesAnnotation.value();
String code = null;
for (NamedQuery namedQuery : namedQueryAnnotations) {
if (namedQuery.name().equals(namedQueryKey)) {
code = namedQuery.query();
break;
}
}
if (code == null) {
if (clazz.getSuperclass().getAnnotation(MappedSuperclass.class) != null) {
code = getNamedQueryCode(clazz.getSuperclass(), namedQueryKey);
}
}
//if not found
return code;
}

Entity Framework Timeout

I have been trying to figure out how to optimize the following query for the past few days and just not having much luck. Right now my test db is returning about 300 records with very little nested data, but it's taking 4-5 seconds to run and the SQL being generated by LINQ is awfully long (too long to include here). Any suggestions would be very much appreciated.
To sum up this query, I'm trying to return a somewhat flattened "snapshot" of a client list with current status. A Party contains one or more Clients who have Roles (ASPNET Role Provider), Journal is returning the last 1 journal entry of all the clients in a Party, same goes for Task, and LastLoginDate, hence the OrderBy and FirstOrDefault functions.
Guid userID = 'some user ID'
var parties = Parties.Where(p => p.BrokerID == userID).Select(p => new
{
ID = p.ID,
Title = p.Title,
Goal = p.Goal,
Groups = p.Groups,
IsBuyer = p.Clients.Any(c => c.RolesInUser.Any(r => r.Role.LoweredName == "buyer")),
IsSeller = p.Clients.Any(c => c.RolesInUser.Any(r => r.Role.LoweredName == "seller")),
Journal = p.Clients.SelectMany(c => c.Journals).OrderByDescending(j => j.OccuredOn).Select(j=> new
{
ID = j.ID,
Title = j.Title,
OccurredOn = j.OccuredOn,
SubCatTitle = j.JournalSubcategory.Title
}).FirstOrDefault(),
LastLoginDate = p.Clients.OrderByDescending(c=>c.LastLoginDate).Select(c=>c.LastLoginDate).FirstOrDefault(),
MarketingPlanCount = p.Clients.SelectMany(c => c.MarketingPlans).Count(),
Task = p.Tasks.Where(t=>t.DueDate != null && t.DueDate > DateTime.Now).OrderBy(t=>t.DueDate).Select(t=> new
{
ID = t.TaskID,
DueDate = t.DueDate,
Title = t.Title
}).FirstOrDefault(),
Clients = p.Clients.Select(c => new
{
ID = c.ID,
FirstName = c.FirstName,
MiddleName = c.MiddleName,
LastName = c.LastName,
Email = c.Email,
LastLogin = c.LastLoginDate
})
}).OrderBy(p => p.Title).ToList()

I think posting the SQL could give us some clues, as small things like the order of OrderBy coming before or after the projection could make a big difference.
But regardless, try extracting the Clients in a seperate query, this will simplify your query probably. And then include other tables like Journal and Tasks before projecting and see how this affects your query:
//am not sure what the exact query would be, and project it using ToList()
var clients = GetClientsForParty();
var parties = Parties.Include("Journal").Include("Tasks")
.Where(p=>p.BrokerID == userID).Select( p => {
....
//then use the in-memory clients
IsBuyer = clients.Any(c => c.RolesInUser.Any(r => r.Role.LoweredName == "buyer")),
...
}
)
In all cases, install EF profiler and have a look at how your query is affected. EF can be quiet surprising. Something like putting OrderBy before the projection, the same for all these FirstOrDefault or SingleOrDefault, they can all have a big effect.
And go back to the basics, if you are searching on LoweredRoleName, then make sure it is indexed so that the query is fast (even though that could be useless since EF could end up not making use of the covering index since it is querying so many other columns).
Also, since this is query is to view data (you will not alter data), don't forget to turn off Entity tracking, that will give you some performance boost as well.
And last, don't forget that you could always write your SQL query directly and project to your a ViewModel rather than anonymous type (which I see as a good practice anyhow) so create a class called PartyViewModel that includes the flatten view you are after, and use it with your hand-crafted SQL
//use your optimized SQL query that you write or even call a stored procedure
db.Database.SQLQuery("select * from .... join .... on");
I am writing a blog post about these issues around EF. The post is still not finished, but all in all, just be patient, use some of these tricks and observe their effect (and measure it) and you will reach what you want.

Merge two collections in mongodb using c#

I have two collections in mongodb.I am retreiving data from two collections independently working gud.But when I am implementing paging using skip and take methods I am getting data from both the collections like this
paging = new Pagination() { CurrentPage = pageNumber, ItemsPerPage = 16 };
var results = dataTable.FindAs<TradeInfo(queryAll).Skip(paging.Skip).Take(paging.Take).ToList<TradeInfo>();
paging.TotalCount = Convert.ToInt32(dataTable.Find(query).Count());
var results2 = new List<TradeInfo>();
if (dataTable2 != null)
{
results2 = dataTable2.FindAs<TradeInfo(queryAll).Skip(paging.Skip).Take(paging.Take).ToList<TradeInfo>();
int count = Convert.ToInt32(dataTable2.Find(query).Count());
paging.TotalCount = paging.TotalCount + count;
results.AddRange(results2);
}
I am giving results as Itemssource to Datagrid and I am getting total 32 items per page.
How can I do that is there any joins concept in Mongodb.Two collections columns are same.
How can I do it?
Please help me in doing that....
Thanks,
jan

I believe what you are looking for here is more of a Union than a Join.
Unfortunately there is no such concept in MongoDB. If your paging is dependent on a query, which in this case it seems it might be, your only real option is to create and maintain a single merged collection which gets updated every time a document is added or saved to either of these two collections. Then you can skip and take on the single collection after applying the query to it.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

What are the ways to optimize Entity Framework queries with Contains()? - entity-framework

Related

Validate data exits using EF core best practices

How to combine LINQ contains query with count on all records and retrieve records with offset

Count in jpa without getting result [duplicate]

Entity Framework Timeout

Merge two collections in mongodb using c#

Categories

Resources