I would like to know what happens when I use IQueryable with and without AsQueryable(). Here is an example:
public partial class Book
{
.......
public Nullable<System.DateTime> CheckoutDate{get; set;}
}
I need to filter the data from SQL server before it is returned to an application server. I need to return books checked out more recently than entered date. Which one should I use?
A.
IQueryable<Book> books = db.Books;
books = books.Where(b => b.CheckoutDate >= date);
B.
IQueryable<Book> books = db.Books.ToList().AsQueryable();
books = books.Where(b => b.CheckoutDate >= date);
Basically I would like to know what is the difference between the above two options. Do they work on the similar grounds? Do they return same values?
With B option, you're basically retrieving every book from database and filtering data in memory.
A option is more performance, as it filters data at the database and return only the rows that match your query.
Related
I have an issue where we create a complex IQueryable that we need to make it more efficient.
There are 2 tables that should only be included if columns from them are being filtered.
My exact situation is complex to explain so I thought I could illustrate it with an example for cars.
If I have a CarFilter class like this:
public class CarFilter
{
public string BrandName { get;set; }
public decimal SalePrice {get; set; }
}
Let's say that we have a query for car sales:
var info = from car in cars
from carSale in carSales on carSale.BrandId == car.BrandId && car.ModelId == carSale.ModelId
from brand in carBrands on car.BrandId == brand.BrandId
select car
var cars = info.ToList();
Let's say that this is a huge query that returns 100'000 rows as we are looking at cars and sales and the associated brands.
The user only wants to see the details from car, the other 2 tables are for filtering purposes.
So if the user only wants to see Ford cars, our logic above is not efficient. We are joining in the huge car sale table for no reason as well as CarBrand as the user doesn't care about anything in there.
My question is how can I only include tables in my IQueryable if they are actually needed?
So if there is a BrandName in my filter I would include CarBrand table, if not, it's not included.
Using this example, the only time I would ever want both tables is if the user specified both a BrandName and SalePrice.
The semantics are not important here, i.e the number of records returned being impacted by the joins etc, I am looking for help on the approach
I am using EF Core
Paul
It is common for complex filtering. Just join when it is needed.
var query = cars;
if (filter.SalePrice > 0)
{
query =
from car in query
join carSale in carSales on new { car.BrandId, car.ModelId } equals new { carSale.BrandId, carSale.ModelId }
where carSale.Price >= filter.SalePrice
select car;
}
if (!filter.BrandName.IsNullOrEempty())
{
query =
from car in query
join brand in carBrands on car.BrandId equals brand.BrandId
where brand.Name == filter.BrandName
select car;
}
var result = query.ToList();
I'm trying to optimize my EF queries. I have an entity called Employee. Each employee has a list of tools. Ultimately, I'm trying to get a list of employees with their tools that are NOT broken. When running my query, I can see that TWO calls are made to the server: one for the employee entities and one for the tool list. Again, I'm trying to optimize the query, so the server is hit for a query only once. How can I do this?
I've been exploring with LINQ's join and how to create a LEFT JOIN, but the query is still not optimized.
In my first code block here, the result is what I want, but -- again -- there are two hits to the server.
public class Employee
{
public int EmployeeId { get; set; }
public List<Tool> Tools { get; set; } = new List<Tool>();
...
}
public class Tool
{
public int ToolId { get; set; }
public bool IsBroken { get; set; } = false;
public Employee Employee { get; set; }
public int EmployeeId { get; set; }
...
}
var x = (from e in db.Employees.Include(e => e.Tools)
select new Employee()
{
EmployeeId = e.EmployeeId,
Tools = e.Tools.Where(t => !t.IsBroken).ToList()
}).ToList();
This second code block pseudoly mimics what I'm trying to accomplish. However, the GroupBy(...) is being evaluated locally on the client machine.
(from e in db.Employees
join t in db.Tools.GroupBy(tool => tool.EmployeeId) on e.EmployeeId equals t.Key into empTool
from et in empTool.DefaultIfEmpty()
select new Employee()
{
EmployeeId = e.EmployeeId,
Tools = et != null ? et.Where(t => !t.IsBroken).ToList() : null
}).ToList();
Is there anyway that I can make ONE call to the server as well as not having my GroupBy() evaluate locally and have it return a list of employees with a filtered tool list with tools that are not broken? Thank you.
Shortly, it's not possible (and I don't think it ever will be).
If you really want to control the exact server calls, EF Core is simply not for you. While EF Core still has issues with some LINQ query translation which leads to N+1 query or client evaluation, one thing is by design: unlike EF6 which uses single huge union SQL query for producing the result, EF Core uses one SQL query for the main result set plus one SQL query per each correlated result set.
This is sort of explained in the How Queries Work EF Core documentation section:
The LINQ query is processed by Entity Framework Core to build a representation that is ready to be processed by the database provider
The result is cached so that this processing does not need to be done every time the query is executed
The result is passed to the database provider
The database provider identifies which parts of the query can be evaluated in the database
These parts of the query are translated to database specific query language (for example, SQL for a relational database)
One or more queries are sent to the database and the result set returned (results are values from the database, not entity instances)
Note the word more in the last bullet.
In your case, you have 1 main result set (Employee) + 1 correlated result set (Tool), hence the expected server queries are TWO (except if the first query returns empty set).
You can use this:
var x = from e in _context.Employees
select new
{
e,
Tools = from tool in e.Tools where !tool.IsBroken select tool
};
var result = x.AsEnumerable().Select(y => y.e);
Which will be finally translated to a SQL query like below depending on your provider:
SELECT
`Project1`.`EmployeeId`,
`Project1`.`Name`,
`Project1`.`C1`,
`Project1`.`ToolId`,
`Project1`.`IsBroken`,
`Project1`.`EmployeeId1`
FROM (SELECT
`Extent1`.`EmployeeId`,
`Extent1`.`Name`,
`Extent2`.`ToolId`,
`Extent2`.`IsBroken`,
`Extent2`.`EmployeeId` AS `EmployeeId1`,
CASE WHEN (`Extent2`.`ToolId` IS NOT NULL) THEN (1) ELSE (NULL) END AS `C1`
FROM `Employees` AS `Extent1` LEFT OUTER JOIN `Tools` AS `Extent2` ON (`Extent1`.`EmployeeId` = `Extent2`.`EmployeeId`) AND (`Extent2`.`IsBroken` != 1)) AS `Project1`
ORDER BY
`Project1`.`EmployeeId` ASC,
`Project1`.`C1` ASC
I change my previous answer which was wrong, thanks to comments.
Following is the code which is blowing up if the list which is being passed in to "IN" clause has several values. In my case the count is 1400 values. Also the customer table has several thousands (arround 100,000) of records in it. The query is executing against DERBY database.
public List<Customer> getCustomersNotIn(String custType, List<Int> customersIDs) {
TypedQuery<Customer> query = em.createQuery("from Customer where type=:custType and customerId not in (:customersIDs)", Customer.class);
query.setParameter("custType", custType);
query.setParameter("customersIDs", customersIDs);
List<Customer> customerList = query.getResultList();
return customerList;
}
The above mentioned method perfectly executes if the list has less values ( probably less than 1000 ), if the list customersIDs has more values since the in clause executes based on it, it throws an error saying "Statement too complex"
Since i am new to JPA can any one please tell me how to write the above mention function in the way described below.. * PLEASE READ COMMENTS IN CODE *
public List<Customer> getCustomersNotIn(String custType, List<Int> customersIDs) {
// CREATE A IN-MEMORY TEMP TABLE HERE...
// INSERT ALL VALUES FROM customerIDs collection into temp table
// Change following query to get all customers EXCEPT THOSE IN TEMP TABLE
TypedQuery<Customer> query = em.createQuery("from Customer where type=:custType and customerId not in (:customersIDs)", Customer.class);
query.setParameter("custType", custType);
query.setParameter("customersIDs", customersIDs);
List<Customer> customerList = query.getResultList();
// REMOVE THE TEMP TABLE FROM MEMORY
return customerList;
}
The Derby IN clause support does have a limit on the number of values that can be supplied in the IN clause.
The limit is related to an underlying limitation in the size of a single function in the Java bytecode format; Derby currently implements IN clause execution by generating Java bytecode to evaluate the IN clause, and if the generated bytecode would exceed the JVM's basic limitations, Derby throws the "statement too complex" error.
There have been discussions about ways to fix this, for example see:
DERBY-6784
DERBY-6301, or
DERBY-216
But for now, your best approach is probably to find a way to express your query without generating such a large and complex IN clause.
Ok here is my solution that worked for me. I could not change the part generating the customerList since it is not possible for me, so the solution has to be from within this method. Bryan your explination was the best one, i am still confuse how "in" clause worked perfectly with table. Please see below solution.
public List<Customer> getCustomersNotIn(String custType, List<Int> customersIDs) {
// INSERT customerIds INTO TEMP TABLE
storeCustomerIdsIntoTempTable(customersIDs)
// I AM NOT SURE HOW BUT, "not in" CLAUSE WORKED INCASE OF TABLE BUT DID'T WORK WHILE PASSING LIST VALUES.
TypedQuery<Customer> query = em.createQuery("select c from Customer c where c.customerType=:custType and c.customerId not in (select customerId from TempCustomer)");
query.setParameter("custType", custType);
List<Customer> customerList = query.getResultList();
// REMOVE THE DATA FROM TEMP TABLE
deleteCustomerIdsFromTempTable()
return customerList;
}
private void storeCustomerIdsIntoTempTable(List<Int> customersIDs){
// I ENDED UP CREATING TEMP PHYSICAL TABLE, INSTEAD OF JUST IN MEMORY TABLE
TempCustomer tempCustomer = null;
try{
tempCustomerDao.deleteAll();
for (Int customerId : customersIDs) {
tempCustomer = new TempCustomer();
tempCustomer.customerId=customerId;
tempCustomerDao.save(tempCustomer);
}
}catch(Exception e){
// Do logging here
}
}
private void deleteCustomerIdsFromTempTable(){
try{
// Delete all data from TempCustomer table to start fresh
int deletedCount= tempCustomerDao.deleteAll();
LOGGER.debug("{} customers deleted from temp table", deletedCount);
}catch(Exception e){
// Do logging here
}
}
JPA and the underlying Hibernate simply translate it into a normal JDBC-understood query. You wouldn't write a query with 1400 elements in the IN clause manually, would you? There is no magic. Whatever doesn't work in normal SQL, wouldn't in JPQL.
I am not sure how you get that list (most likely from another query) before you call that method. Your best option would be joining those tables on the criteria used to get those IDs. Generally you want to execute correlated filters like that in one query/transaction which means one method instead of passing long lists around.
I also noticed your customerId is double - a poor choice for a PK. Typically people use long (autoincremented/sequenced, etc.) And I don't get the "temp table" logic.
I'm developing a client app that uses breezejs and Entity Framework 6 on the back end. I've got a statement like this:
var country = 'Mexico';
var customers = EntityQuery.from('customers')
.where('country', '==', country)
.expand('order')
I want to use There may be hundreds of orders that each customer has made. For the purposes of performance, I only want to retrieve the latest order for each customer. This will be based on the created date for the order. In SQL, I could write something like this:
SELECT c.customerId, companyName, ContactName, City, Country, max(o.OrderDate) as LatestOrder FROM Customers c
inner join Orders o on c.CustomerID = o.CustomerID
group by c.customerId, companyName, ContactName, City, Country
If this was run against the northwind database, only the most recent order row is returned for each customer.
How can I write a similar query in breeze, so that it runs on the server side and therefore returns less data to the client. I know I could handle this all on the client but writing some javascript in a querysucceeded method that could be run by the client - but that's not the goal here.
thanks
For a case like this, you should create a special endpoint method that will perform your query.
Then you can use an Entity Framework query to do what you want, using the LINQ syntax.
Here are two Web API examples:
[HttpGet]
public IQueryable<Object> CustomersLatestOrderEntities()
{
// IQueryable<Object> containing Customer and Order entity
var entities = ContextProvider.Context.Customers.Select(c => new { Customer = c, LatestOrder = c.Orders.OrderByDescending(o => o.OrderDate).FirstOrDefault() });
return entities;
}
[HttpGet]
public IQueryable<Object> CustomersLatestOrderProjections()
{
// IQueryable<Object> containing Customer and Order entity
var entities = ContextProvider.Context.Customers.Select(c => new { Customer = c, LatestOrder = c.Orders.OrderByDescending(o => o.OrderDate).FirstOrDefault() });
// IQueryable<Object> containing just data fields, no entities
var projections = entities.Select(e => new { e.Customer.CustomerID, e.Customer.ContactName, e.LatestOrder.OrderDate });
return projections;
}
Note that you have a choice here. You can return actual entities, or you can return just some data fields. Which is right for you depends upon how you are going to use them on the client. If they are just for display in a
non-editable list, you can just return the plain data (CustomersLatestOrderProjections above). If they can potentially
be edited, then return the object containing the entities (CustomersLatestOrderEntities). Breeze will merge the entities
into its cache, even though they are contained inside this anonymous object.
Either way, because it returns IQueryable, you can use the Breeze filtering syntax from the client to further qualify the query.
var projectionQuery = breeze.EntityQuery.from("CustomersLatestOrderProjections")
.skip(20)
.take(10);
var entityQuery = breeze.EntityQuery.from("CustomersLatestOrderEntities")
.where('customer.countryName', 'startsWith', 'C');
.take(10);
How do I query using Ling to Entities.
I want to get back the total number of rows for a specific user.
I created a table with an ID(PK), Username, and PhoneNumber.
In my controller, I am able to do this and see my entities:
public ActionResult UserInfo()
{
using (UserInfoEntities db = new UserInfoEntities())
{
}
return View();
}
How do I query to return the total number of rows and declare it to a variable (totalrows) so that I can do thge following:
ViewData["TotalRows"] = totalrows
If there is a better way I am open to suggestions...
Thanks!!
Probably something like:
int count = db.SomeTable.Where(r => r.Username.Equals("username")).Count();
Yep Matthew is right. The count method does exactly what you need! Although, I would recommend looking at ViewModels to pass your data to your view (rather than ViewData) and some kind of DAL pattern, such as the repository pattern, for querying your entities.