I have a single DbContext.. First I do:
var all = context.MySet.Where(c=>c.X == 1).ToList();
later (with the same context instance)
var special = context.MySet.Where(c=>(c.X == 1) && (c.Y===1).ToList();
The database is hit AGAIN! Since the first query is guaranteed
to return all of the elements that will exist in the second, why is the DB being hit again?
If you wish to avoid hitting the database again then you could try this;
var special = all.Where(c=>(c.X == 1) && (c.Y===1).ToList();
Since the list of all objects already contains everything you want you can just query that list and the database won't get hit again.
Your link expression is just a query, it only retrieves data when you enumerate it (for example calling .ToList()). You can keep changing the query and hold off actually getting the data until you need it. The entity framework will convert your query into an SQL query in the background and then fetch data.
Avoid writing "ToList()" at the end of every query as this forces the EF to hit the database.
If you only ever what to hit the database once then get the data you need by calling "ToList(), To.Array etc and then work with that collection (in your case the "all" collection) since this is the object holding all the data.
Related
I'm running this linq query which is a little big.
var events = _context.Event.OrderByDescending(e => e.StartDate).Where(e => e.IsPresentation == true).Where(e => e.IsCanceled == false).Where(e => e.StartDate > new DateTime());
And the page outputing the data from this query is taking too much time to load. Maybe because I'm using too many wheres.
I had the same issue using includes, and then includes, in a different query, but I divided the query, to improve the performance. But I'm trying to figure it out how to do the same thing in that situation, because I'm not using any include.
Overall, the performance of the query will largely depend on the table size, and availability of suitable indices.
A couple things I can note from that query:
This statement doesn't make a lot of sense: .Where(e => e.StartDate > new DateTime()). new DateTime() will initialize a DateTime from 01/01/0001. Any dates stored in an SQL Server DateTime column for example will be from 01/01/1753, so this seems rather moot. If the database/entity DateTime value is null-able, then .Where(e => e.StartDate.HasValue) would be more applicable. If the DateTime value is not null-able then this condition can be left off entirely.
So if the field is null-able, the Linq expression would look more like:
var events = _context.Event
.Where(e => e.IsPresentation && !e.IsCanceled && e.StartDate.HasValue)
.OrderByDescending(e => e.StartDate)
.ToList();
If it's not null-able:
var events = _context.Event
.Where(e => e.IsPresentation && !e.IsCanceled)
.OrderByDescending(e => e.StartDate)
.ToList();
The next culprit to eliminate: Lazy Load proxy hits. Does the Event property have navigation properties to any other entities? If this is something like a web application and you are serializing entities to send back to the client, navigation property EF proxies can absolutely kill performance. Any code after this call that touches a navigation property will result in extra individual DB calls to lazy load these navigation properties. For methods that return lists of entities this can be critical. If an Event has a reference to something like a User and you load 1000 events referencing roughly 1000 users, when a serializer goes to serialize those 1000 events, it will "touch" each of the user references. This leads to ~1000 extra SQL statements effectively doing SELECT * FROM tblUser WHERE UserId = 1, SELECT * FROM tblUser WHERE UserId = 2... etc. etc. for each User ID in each Event. If you need these related entities you can Eager load them with Include(e => e.User) which will be faster than loading them individually, but this does mean loading a lot of extra data into memory that your client/code may not need. You can avoid the lazy load hits by turning off lazy loading & proxies, but this will leave these entities with #null references which means any code expecting an Event entity with any related details may get one of these partially loaded entities. (not good, the entity should always be in a complete or complete-able state) The final option is to use Select to populate a view model for your results. This can speed up queries considerably because you can have EF compose a query to pull back just the data you need from whatever entities rather than everything or triggering lazy loads.
For example if you just need an EventId, EventNumber, Name, StartDate, and a UserName to display for the events:
var events = _context.Event
.Where(e => e.IsPresentation && !e.IsCanceled && e.StartDate.HasValue)
.Select(e => new EventViewModel
{
EventId = e.EventId,
EventNumber = e.EventNumber,
Name = e.Name,
StartDate = e.StartDate,
UserName = e.User.Name
})
.OrderByDescending(e => e.StartDate)
.ToList();
This avoids any lazy load hits and reduces the query run to just the columns needed which can speed up queries significantly.
Next would be to look at the queries EF is running and their relevant execution plan. This can highlight missing/poor indexes, and any unexpected lazy load hits. The method for doing this would depend on your database, but involves running a Profiler against the DB to capture the SQL statements being run while you debug. From here you can capture the SQL statements that EF generates then run those manually against your database with any DB-side analysis tools. (such as SSMS with SQL Server to get an execution plan which can identify missing indexes) Serializer lazy load hits in a web application are detectable as a lot of extra SQL statements executing after your method appears to have completed, but before the data gets back to the client. This is the serializer "touching" proxies resulting in lots of additional queries that the server has to wait to complete before the data is returned to the client.
Lastly would be data volume. Any system that is expected to grow over time should consider the amount of data that can eventually be returned. Anything that returns lists of data over time should incorporate server-side pagination where the client sends a page size and page # where the server can translate this into a .Skip(pageNumber * pageSize).Take(pageSize) operation. (/w page # starting at 0) Most data grid and like components should support server-side pagination to send these arguments to their data load methods. These controls will need to know the total row count to set up pagination so you would need a method to return that count:
var rowCount = _context.Event
.Where(e => e.IsPresentation && !e.IsCanceled && e.StartDate.HasValue)
.Count();
Same conditions, no order by, and a .Count() with no ToList() etc. will compose an efficient Count query.
This should give you a few things to check and tweak to eliminate your performance pitfalls.
You should store the value to one variable like var now = new DateTime()
Combine all of the conditions into one Where clause
Just OrderByDescending after Where clause, It means We just order based on actual data instead of all.
var now = new DateTime();
var events = _context.Event
.Where(e => e.IsPresentation && !e.IsCanceled && e.StartDate > now)
.OrderByDescending(e => e.StartDate);
Tips
You should re-arrange your condition based on actual data. For example:
.Where(e => 1 == 2 && 2 == 2 && 3 == 3)
As you can see, We no need to manipulate the rest of conditions from && 2 == 2 && 3 == 3 because of and condition.
one thing can be sorting at the end because there would be less items and then less time to sort,
BUT it really depends on your data distribution. Look if MOST of your data has e.IsPresentation == true, then the first "Where" does not reduce the data size for you, SO then you are again checking e.IsCanceled == false on e.g. 95 % your data. But assume only 10 % of your whole data is e.IsCanceled == false. So now if you apply e.IsPresentation == true , on that 10% in the second order, it takes much less time than before. So in big databases DB managers usually use different query plans! however the final result is the same. the process time is NOT same. hope it helps you.
This is a very weird problem
In short
var q = (some query).Count();
Gives my a number and
var q = (some query).ToList().Count();
Gives me entirely different number...
with mentioning that (some query) has two includes (joins)
is there a sane explanation for that???
EDIT: here is my query
var q = db.membership_renewals.Include(i => i.member).Include(i => i.sport).Where(w => w.isDeleted == false).Count();
this gives me a wrong number
and this:
var q = db.membership_renewals.Include(i => i.member).Include(i => i.sport).Where(w => w.isDeleted == false).ToList().Count();
Gives me accurate number..
EDIT 2
Wher I wrote my query as linq query it worked perfectly...
var q1 = (from d in db.membership_renewals where d.isDeleted == false join m in db.members on d.mr_memberId equals m.m_id join s in db.sports on d.mr_sportId equals s.s_id select d.mr_id).Count();
I think the problem that entity framework doesn't execute the joins in the original query but forced to execute them in (ToList())...
I Finally figured out what's going on...
The database tables are not linked together in the database (there are no relationship or constraints defined in the database itself) so the code doesn't execute the (inner join) part.
However my classes on the other hand are well written so when I perform (ToList()) it automatically ignores the unbound rows...
And when I wrote the linq query defining the relation ship keys (primary and foreign) it worked alright because now the database understands my relation between tables...
Thanks everyone you've been great....
My guess is IQueryable gives a smaller number cause not all the objects are loaded, kind of like a stream in Java, but IQueryable.toList().count() forces the Iqueryable to load all the data and it is traversed by the list constructor and stored in the list so IQueryable.toList().Count() is the accurate answer. This is based on 5 minutes of search on MSDN.
The idea is the underlying datastore of the IQueryable is a database iterator so it executes differently every time because it executes the query again on the database, so if you call it twice against the same table, and the data has changed you get different results. This is called delayed execution. But when you say IQueryable.ToList() you force the iterator to do the whole iteration once and dump the results in a list which is constant
my cache is empty so sql queries return null.
The read-through means that if the cache is missed, then Ignite will automatically get down to the underlying db(or persistent store) to load the corresponding data.
If there are new data inserted into the underlying db table ,i have to down cache server to load the newly inserted data from the db table automatically or it will sync automatically ?
Is work same as Spring's #Cacheable or work differently.
It looks to me that the answer is no. Cache SQL query don't work as no data in cache but when i tried cache.get in i got following results :
case 1:
System.out.println("data == " + cache.get(new PersonKey("Manish", "Singh")).getPhones());
result ==> data == 1235
case 2 :
PersonKey per = new PersonKey();
per.setFirstname("Manish");
System.out.println("data == " + cache.get(per).getPhones());
throws error:- as following
error image, image2
Read-through semantics can be applied when there is a known set of keys to read. This is not the case with SQL, so in case your data is in an arbitrary 3rd party store (RDBMS, Cassandra, HBase, ...), you have to preload the data into memory prior to running queries.
However, Ignite provides native persistence storage [1] which eliminates this limitation. It allows to use any Ignite APIs without having anything in memory, and this includes SQL queries as well. Data will be fetched into memory on demand while you're using it.
[1] https://apacheignite.readme.io/docs/distributed-persistent-store
When you insert something into the database and it is not in the cache yet, then get operations will retrieve missing values from DB if readThrough is enabled and CacheStore is configured.
But currently it doesn't work this way for SQL queries executed on cache. You should call loadCache first, then values will appear in the cache and will be available for SQL.
When you perform your second get, the exact combination of name and lastname is sought in DB. It is converted into a CQL query containing lastname=null condition, and it fails, because lastname cannot be null.
UPD:
To get all records that have firstname column equal to 'Manish' you can first do loadCache with an appropriate predicate and then run an SQL query on cache.
cache.loadCache((k, v) -> v.lastname.equals("Manish"));
SqlFieldsQuery qry = new SqlFieldsQuery("select firstname, lastname from Person where firstname='Manish'");
try (FieldsQueryCursor<List<?>> cursor = cache.query(qry)) {
for (List<?> row : cursor)
System.out.println("firstname:" + row.get(0) + ", lastname:" + row.get(1));
}
Note that loadCache is a complex operation and requires to run over all records in the DB, so it shouldn't be called too often. You can provide null as a predicate, then all records will be loaded from the database.
Also to make SQL run fast on cache, you should mark firstname field as indexed in QueryEntity configuration.
In your case 2, have you tried specifying lastname as well? By your stack trace it's evident that Cassandra expects it to be not null.
I have 2 SQL Server databases, hosted on two different servers. I need to extract data from the first database. Which is going to be a list of integers. Then I need to compare this list against data in multiple tables in the second database. Depending on some conditions, I need to update or insert some records in the second database.
My solution:
(WCF Service/Entity Framework using LINQ to Entities)
Get the list of integers from 1st db, takes less than a second gets 20,942 records
I use the list of integers to compare against table in the second db using the following query:
List<int> pastDueAccts; //Assuming this is the list from Step#1
var matchedAccts = from acct in context.AmAccounts
where pastDueAccts.Contains(acct.ARNumber)
select acct;
This above query is taking so long that it gives a timeout error. Even though the AmAccount table only has ~400 records.
After I get these matchedAccts, I need to update or insert records in a separate table in the second db.
Can someone help me, how I can do step#2 more efficiently? I think the Contains function makes it slow. I tried brute force too, by putting a foreach loop in which I extract one record at a time and do the comparison. Still takes too long and gives timeout error. The database server shows only 30% of the memory has been used.
Profile the sql query being sent to the database by using SQL Profiler. Capture the SQL statement sent to the database and run it in SSMS. You should be able to capture the overhead imposed by Entity Framework at this point. Can you paste the SQL Statement emitted in step #2 in your question?
The query itself is going to have all 20,942 integers in it.
If your AmAccount table will always have a low number of records like that, you could just return the entire list of ARNumbers, compare them to the list, then be specific about which records to return:
List<int> pastDueAccts; //Assuming this is the list from Step#1
List<int> amAcctNumbers = from acct in context.AmAccounts
select acct.ARNumber
//Get a list of integers that are in both lists
var pastDueAmAcctNumbers = pastDueAccts.Intersect(amAcctNumbers);
var pastDueAmAccts = from acct in context.AmAccounts
where pastDueAmAcctNumbers.Contains(acct.ARNumber)
select acct;
You'll still have to worry about how many ids you are supplying to that query, and you might end up needing to retrieve them in batches.
UPDATE
Hopefully somebody has a better answer than this, but with so many records and doing this purely in EF, you could try batching it like I stated earlier:
//Suggest disabling auto detect changes
//Otherwise you will probably have some serious memory issues
//With 2MM+ records
context.Configuration.AutoDetectChangesEnabled = false;
List<int> pastDueAccts; //Assuming this is the list from Step#1
const int batchSize = 100;
for (int i = 0; i < pastDueAccts.Count; i += batchSize)
{
var batch = pastDueAccts.GetRange(i, batchSize);
var pastDueAmAccts = from acct in context.AmAccounts
where batch.Contains(acct.ARNumber)
select acct;
}
I'm using ORMLite & SQLite as my database, and I'm working on a android app.
1) I'm searching for a particular object in the Foreign-Collection object as follows.
Collection<PantryCheckLine> pantryCheckLinesCollection = pantryCheck.getPantryCheckLines();
Iterator iterator = pantryCheckLinesCollection.iterator();
while (iterator.hasNext()) {
pantryCheckLine = (PantryCheckLine) iterator.next();
//i'm searching for a purticular object match
}
2) Or else I can directly query from the relevant table and identified the item as well.
What I'm asking is out of these two methods which one will be much faster?
Depends a bit on the particulars of your situation.
If your ForeignCollection is eager fetched then your loop will not have to do any database transactions and will, for small collections, probably be faster then doing the query.
However if your collection lazy loaded then iterating through the collection will go back to the database anyway so you might as well do the precise query.