Fairly complex LINQ to Entities query - entity-framework

I have two entities, assume they are called Container and Record. They have a master-child relationship: a 'container' can hold many records.
The Records table in the database has the following columns:
Id
Date
Container_Id
RecordType_Id
The Record entity does not have any navigation properties that back reference the Container.
I am writing a LINQ query for my repository that will retrieve ONLY the records for a container that have the most recent date for each RecordType_Id. All older records should be ignored.
So if a container has say 5 records, one for each RecordType_Id, with the date 24/May/2011. But also has another 5 records for each RecordType_Id but with the date 20/May/2011. Then only the first 5 with the 24/May date will be retrieved and added to the collection in the container.
I came up with an SQL query that does what I need (but maybe there is some more efficient way?):
select t.*
from Records t
inner join (
select Container_Id, RecordType_Id, max(Date) AS MaxDate
from Records
group by Container_Id, RecordType_Id ) g
on t.Date = g.MaxDate
and t.Container_Id = g.Container_Id
and t.RecordType_Id = g.RecordType_Id
order by t.Container_Id
, t.RecordType_Id
, t.Date
However I am struggling to translate this into a proper LINQ query. EF is already generating a fairly large query all by itself just to load the entities, which makes me unsure of how much of this SQL query is actually relevant to the LINQ query.

Off the top of my head:
var q = from c in Container
from r in c.Records
group r by r.RecordType.RecordType_Id into g
select new
{
Container = c,
RecordType_Id = g.Key,
Records = from gr in g
let maxDate = g.Max(d => d.Date)
where gr.Date == maxDate
select gr
};

Try using LinqPad, it helps you test linq queries easily. Even against an existing EF model (which is in your project). Visit http://www.linqpad.net/

Related

Select most reviewed courses starting from courses having at least 2 reviews

I'm using Flask-SQLAlchemy with PostgreSQL. I have the following two models:
class Course(db.Model):
id = db.Column(db.Integer, primary_key = True )
course_name =db.Column(db.String(120))
course_description = db.Column(db.Text)
course_reviews = db.relationship('Review', backref ='course', lazy ='dynamic')
class Review(db.Model):
__table_args__ = ( db.UniqueConstraint('course_id', 'user_id'), { } )
id = db.Column(db.Integer, primary_key = True )
review_date = db.Column(db.DateTime)#default=db.func.now()
review_comment = db.Column(db.Text)
rating = db.Column(db.SmallInteger)
course_id = db.Column(db.Integer, db.ForeignKey('course.id') )
user_id = db.Column(db.Integer, db.ForeignKey('user.id') )
I want to select the courses that are most reviewed starting with at least two reviews. The following SQLAlchemy query worked fine with SQlite:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \ .order_by(func.count(models.Review.course_id).desc()).all()
But when I switched to PostgreSQL in production it gives me the following error:
ProgrammingError: (ProgrammingError) column "review.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT review.id AS review_id, review.review_date AS review_...
^
'SELECT review.id AS review_id, review.review_date AS review_review_date, review.review_comment AS review_review_comment, review.rating AS review_rating, review.course_id AS review_course_id, review.user_id AS review_user_id, count(review.course_id) AS count_1 \nFROM review GROUP BY review.course_id \nHAVING count(review.course_id) > %(count_2)s ORDER BY count(review.course_id) DESC' {'count_2': 1}
I tried to fix the query by adding models.Review in the GROUP BY clause but it did not work:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \.order_by(func.count(models.Review.course_id).desc()).all()
Can anyone please help me with this issue. Thanks a lot
SQLite and MySQL both have the behavior that they allow a query that has aggregates (like count()) without applying GROUP BY to all other columns - which in terms of standard SQL is invalid, because if more than one row is present in that aggregated group, it has to pick the first one it sees for return, which is essentially random.
So your query for Review basically returns to you the first "Review" row for each distinct course id - like for course id 3, if you had seven "Review" rows, it's just choosing an essentially random "Review" row within the group of "course_id=3". I gather the answer you really want, "Course", is available here because you can take that semi-randomly selected Review object and just call ".course" on it, giving you the correct Course, but this is a backwards way to go.
But once you get on a proper database like Postgresql you need to use correct SQL. The data you need from the "review" table is just the course_id and the count, nothing else, so query just for that (first assume we don't actually need to display the counts, that's in a minute):
most_rated_course_ids = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
all()
but that's not your Course object - you want to take that list of ids and apply it to the course table. We first need to keep our list of course ids as a SQL construct, instead of loading the data - that is, turn it into a derived table by converting the query into a subquery (change the word .all() to .subquery()):
most_rated_course_id_subquery = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
subquery()
one simple way to link that to Course is to use an IN:
courses = session.query(Course).filter(
Course.id.in_(most_rated_course_id_subquery)).all()
but that's essentially going to throw away the "ORDER BY" you're looking for and also doesn't give us any nice way of actually reporting on those counts along with the course results. We need to have that count along with our Course so that we can report it and also order by it. For this we use a JOIN from the "course" table to our derived table. SQLAlchemy is smart enough to know to join on the "course_id" foreign key if we just call join():
courses = session.query(Course).join(most_rated_course_id_subquery).all()
then to get at the count, we need to add that to the columns returned by our subquery along with a label so we can refer to it:
most_rated_course_id_subquery = session.query(
Review.course_id,
func.count(Review.course_id).label("count")
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
subquery()
courses = session.query(
Course, most_rated_course_id_subquery.c.count
).join(
most_rated_course_id_subquery
).order_by(
most_rated_course_id_subquery.c.count.desc()
).all()
A great article I like to point out to people about GROUP BY and this kind of query is SQL GROUP BY techniques which points out the common need for the "select from A join to (subquery of B with aggregate/GROUP BY)" pattern.

Query produced for IN filter on 1-1 relation joins to parent table twice

I have this problem and reproduced it with AdventureWorks2008R2 to make it more easy. Basically, I want to filter a parent table for a list of IN values and I thought it would generate this type of query
but it doesn't.
SELECT * FROM SalesOrderDetail where EXISTS( select * from SalesOrderHeader where d.id=h.id and rowguid IN ('asdf', 'fff', 'weee' )
Any ideas how to change the LINQ statement to query Header only once?
(ignore the fact I'm matching on Guids - it will actually be integers; I was just quickly looking for a 1-1 table in EF because that's when the problem occurs and I happened to find these)
var guidsToFind = new Guid[] { Guid.NewGuid(), Guid.NewGuid(), Guid.NewGuid()};
AdventureWorks2008R2Entities context = new AdventureWorks2008R2Entities();
var g = context.People.Where(p => guidsToFind.Contains(p.BusinessEntity.rowguid)).ToList();
That produces the following more expensive query:
SELECT [Extent1].[BusinessEntityID] AS [BusinessEntityID],
[Extent1].[PersonType] AS [PersonType],
[Extent1].[NameStyle] AS [NameStyle],
[Extent1].[Title] AS [Title],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[MiddleName] AS [MiddleName],
[Extent1].[LastName] AS [LastName],
[Extent1].[Suffix] AS [Suffix],
[Extent1].[EmailPromotion] AS [EmailPromotion],
[Extent1].[AdditionalContactInfo] AS [AdditionalContactInfo],
[Extent1].[Demographics] AS [Demographics],
[Extent1].[rowguid] AS [rowguid],
[Extent1].[ModifiedDate] AS [ModifiedDate]
FROM [Person].[Person] AS [Extent1]
INNER JOIN [Person].[BusinessEntity] AS [Extent2] ON [Extent1].[BusinessEntityID] = [Extent2].[BusinessEntityID]
LEFT OUTER JOIN [Person].[BusinessEntity] AS [Extent3] ON [Extent1].[BusinessEntityID] = [Extent3].[BusinessEntityID]
WHERE [Extent2].[rowguid] = cast('b95b63f9-6304-4626-8e70-0bd2b73b6b0f' as uniqueidentifier) OR [Extent3].[rowguid] IN (cast('f917a037-b86b-4911-95f4-4afc17433086' as uniqueidentifier),cast('3188557d-5df9-40b3-90ae-f83deee2be05' as uniqueidentifier))
Really odd. Looks like a LINQ limitation.
I don't have a system to try this on right now but if you first get a list of BusinessEntityId values based on the provided guids and then get the persons like this
var g = context.People.Where(p => businessEntityIdList.Contains(p.BusinessEntityId)).ToList();
there should not be a reason for additional unnecessary joins anymore.
If that works, you can try to combine the to steps into one LINQ expression to see if the separation stays intact.

Group by query and filtered by Max date

I am trying to get the entities of my table grouped by an ID and filtered by MAX date, but I am not capable to do it with Entity Framework.
SELECT * FROM My_table UI
INNER JOIN
(
SELECT idgroup, MAX(filter_date) AS DateFilter FROM My_table
WHERE license = 1
GROUP BY idgroup
) T
ON T.idgroup= UI.idgroup AND T.DateFilter = UI.filter_date
WHERE license = 1
My solution for that right now is a StoredProcedure, but I think that this could be possible with EF, but I do not know how to do it.
Anyone know how to do that with EF statement?
My_table is already mapped as a EF entity in my project.

How to ORDER BY non-column field?

I am trying to create an Entity SQL that is a union of two sub-queries.
(SELECT VALUE DISTINCT ROW(e.ColumnA, e.ColumnB, 1 AS Rank) FROM Context.Entity AS E WHERE ...)
UNION ALL
(SELECT VALUE DISTINCT ROW(e.ColumnA, e.ColumnB, 2 AS Rank) FROM Context.Entity AS E WHERE ...)
ORDER BY *??* LIMIT 50
I have tried:
ORDER BY Rank
and
ORDER BY e.Rank
but I keep getting:
System.Data.EntitySqlException: The query syntax is not valid. Near keyword 'ORDER'
Edit:
This is Entity Framework. In C#, the query is executed using:
var esql = "...";
ObjectParameter parameter0 = new ObjectParameter("p0", value1);
ObjectParameter parameter1 = new ObjectParameter("p1", value2);
ObjectQuery<DbDataRecord> query = context.CreateQuery<DbDataRecord>(esql, parameter0, parameter1);
var queryResults = query.Execute(MergeOption.NoTracking);
There is only a small portion of my application where I have to resort to using Entity SQL. Generally, the main use case is when I need to do: "WHERE Column LIKE 'A % value % with % multiple % wildcards'".
I do not think it is a problem with the Rank column. I do think it is how I am trying to apply an order by to two different esql statements joined by union all. Could someone suggest:
How to apply a ORDER BY to this kind of UNION/UNION ALL statment
How to order by the non-entity column expression.
Thanks.

Convert SQL to LINQ, nested select, top, "distinct" using group by and multiple order bys

I have the following SQL query, which I'm struggling to convert to LINQ.
Purpose: Get the top 10 coupons from the table, ordered by the date they expire (i.e. list the ones that are about to expire first) and then randomly choosing one of those for publication.
Notes: Because of the way the database is structured, there maybe duplicate Codes in the Coupon table. Therefore, I am using a GROUP BY to enforce distinction, because I can't use DISTINCT in the sub select query (which I think is correct). The SQL query works.
SELECT TOP 1
c1.*
FROM
Coupon c1
WHERE
Code IN (
SELECT TOP 10
c2.Code
FROM
Coupon c2
WHERE
c2.Published = 0
GROUP BY
c2.Code,
c2.Expires
ORDER BY
c2.Expires
)
ORDER BY NEWID()
Update:
This is as close as I have got, but in two queries:
var result1 = (from c in Coupons
where c.Published == false
orderby c.Expires
group c by new { c.Code, c.Expires } into coupon
select coupon.FirstOrDefault()).Take(10);
var result2 = (from c in result1
orderby Guid.NewGuid()
select c).Take(1);
Here's one possible way:
from c in Coupons
from cs in
((from c in coupons
where c.published == false
select c).Distinct()
).Take(10)
where cs.ID == c.ID
select c
Keep in mind that LINQ creates a strongly-typed data set, so an IN statement has no general equivalent. I understand trying to keep the SQL tight, but LINQ may not be the best answer for this. If you are using MS SQL Server (not SQL Server Compact) you might want to consider doing this as a Stored Procedure.
Using MercurioJ's slightly buggy response, in combination with another SO suggested random row solution my solution was:
var result3 = (from c in _dataContext.Coupons
from cs in
((from c1 in _dataContext.Coupons
where
c1.IsPublished == false
select c1).Distinct()
).Take(10)
where cs.CouponId == c.CouponId
orderby _dataContext.NewId()
select c).Take(1);