T-SQL with GROUP BY, HAVING and COUNT in LINQ syntax - entity-framework

Im struggling converting a T-SQL query into LINQ syntax.
In T-SQL the query looks like this:
WITH FailedFiles AS
(
SELECT 1 AS FileExists
FROM [FileHistory] f
INNER JOIN [Users] u ON f.UerId = u.UserId
GROUP by f.FileName
HAVING SUM(CASE f.FileState WHEN 1 /*Success*/ THEN 1 ELSE 0 END) <= 0
AND SUM(CASE f.FileState WHEN 2 /*Failed*/ THEN 1 ELSE 0 END) >= 1
)
SELECT COUNT(1) from FailedFiles
Having serious troubles converting the T-SQL above into LINQ (doesnt matter if query syntax or method syntax). Can someone give me a hint how the correct order and nesting of the LINQ query should look like?
what the query actually does:
I have a file history table with multiple entries for a individual file. Need to know how many failed files there are in total. Meaning a files never processed sucessfully (state 1) and at least once processed with an error (state 2). single files are grouped by file name. The query result is a single number.

Below query should work in theory. But it may fail depending on the version of Entity Framework and the Provider.
var count = (from f in context.FileHistory
join u in context.User on f.UserId equals u.UserId
select f)
.GroupBy(f=> f.FileName)
.Where(g=>
g.Count(i=> i.FileState == 1) <=0 &&
g.Count(i=> i.FileState == 2) >=1 ).Count();

Related

Can I write sub query in KSQL?

I am new to ksql and have been using mysql for the longest time.
I would like to know is there anyway to have subqueries in KSQL?
This query works without any problem :
SELECT a.executedate, count(a.pno), sum(a.amount) FROM
tb3_withdraw_record_summary a WHERE a.status='3' GROUP BY
a.executedate;
Whereas this query returns an error message:
SELECT a.executedate, count(a.pno), sum(a.amount), (SELECT COUNT(b.pno)
FROM tb3_withdraw_record_summary b WHERE b.status='5' AND
b.executedate = a.executedate) FROM tb3_withdraw_record_summary a
WHERE a.status='3' GROUP BY a.executedate
'Failed to prepare statement: 'B' is not a valid stream/table name or alias.
Caused by: 'B' is not a valid stream/table name or alias.'
Anyway for me to make this work? Thanks!
nested query feature is not currently supported by Ksql but you can do it in following way -
1) CREATE STREAM B AS SELECT COUNT(b.pno)
FROM tb3_withdraw_record_summary b WHERE b.status='5';
2) SELECT a.executedate, count(a.pno), sum(a.amount) FROM tb3_withdraw_record_summary a JOIN B within 5 hours ON b.executedate = a.executedate WHERE a.status='3' GROUP BY a.executedate
Keep in mind that join is very different meaning then relational database world, here data is being partitioned through keys in multiple buckets and it conceptually a "colocated" join. more details about time-window here.
Hope it will help.
SubQuery functionality is not implemented for KSQL.
https://github.com/confluentinc/ksql/issues/745
I am not familiar with KSQL, but perhaps this does what you want:
SELECT wrs.executedate,
SUM(CASE WHEN wrs.status IN ('3') THEN 1 ELSE 0 END),
SUM(CASE WHEN wrs.status IN ('3') THEN amount ELSE 0 END),
SUM(CASE WHEN wrs.status IN ('5') THEN 1 ELSE 0 END)
FROM tb3_withdraw_record_summary wrs
WHERE wrs.status IN ('3', '5')
GROUP BY wrs.executedate;

EXISTS in filter returning too many values

I need to write a query that uses EXISTS, rather than IN, so that it will run fast. The filter is being fed so many parameter values that EXISTS seems like the only option. The difference is between a 20+ minute query and a 5 second query.
This is the query I have:
SELECT DISTINCT d.GROUP_NAME
FROM [EMPLOYEE] e JOIN [DATA_FACT] d ON (e.KEY = d.KEY)
WHERE d.DATE BETWEEN #Start and #End
AND EXISTS
(
select '1234567' -- #ID
)
AND e.Location IN (#Location)
ORDER BY d.GROUP_NAME ASC
The problem is that it is returning too many records. Based on the values I'm passing to filter on, I should get 1 row back but instead I am getting 28.
If I remove the EXISTS and add the following then I get the 1 record I need:
AND e.ID IN ('1234567')
Is there a way to fix the query to work with EXISTS so that I get the correct results?
This is essentially what you want if you are going to try to use exists to filter your data_fact table by parameters in your employee table. Not sure how much it's going to improve your performance though when you throw a massive number of employee IDs at it.
SELECT
d.GROUP_NAME
FROM [DATA_FACT] AS d
WHERE d.DATE BETWEEN #Start and #End
AND EXISTS
(
select 1
from EMPLOYEE AS e
WHERE d.[KEY] = e.[KEY]
AND e.[Location] IN (#Location)
AND e.ID IN ('1234567')
)
ORDER BY d.GROUP_NAME ASC

sp_executesql vs user defined scalar function

In the table below I am storing some conditions like this:
Then, generally, in second table, I am having the following records:
and what I need is to compare these values using the right condition and store the result ( let's say '0' for false, and '1' for true in additional column).
I am going to do this in a store procedure and basically I am going to compare from several to hundreds of records.
What of the possible solution is to use sp_executesql for each row building dynamic statements and the other is to create my own scalar function and to call it for eacy row using cross apply.
Could anyone tell which is the more efficient way?
Note: I know that the best way to answer this is to make the two solutions and test, but I am hoping that there might be answered of this, based on other stuff like caching and SQL internal optimizations and others, which will save me a lot of time because this is only part of a bigger problem.
I don't see the need in use of sp_executesql in this case. You can obtain result for all records at once in a single statement:
select Result = case
when ct.Abbreviation='=' and t.ValueOne=t.ValueTwo then 1
when ct.Abbreviation='>' and t.ValueOne>t.ValueTwo then 1
when ct.Abbreviation='>=' and t.ValueOne>=t.ValueTwo then 1
when ct.Abbreviation='<=' and t.ValueOne<=t.ValueTwo then 1
when ct.Abbreviation='<>' and t.ValueOne<>t.ValueTwo then 1
when ct.Abbreviation='<' and t.ValueOne<t.ValueTwo then 1
else 0 end
from YourTable t
join ConditionType ct on ct.ID = t.ConditionTypeID
and update additional column with something like:
;with cte as (
select t.AdditionalColumn, Result = case
when ct.Abbreviation='=' and t.ValueOne=t.ValueTwo then 1
when ct.Abbreviation='>' and t.ValueOne>t.ValueTwo then 1
when ct.Abbreviation='>=' and t.ValueOne>=t.ValueTwo then 1
when ct.Abbreviation='<=' and t.ValueOne<=t.ValueTwo then 1
when ct.Abbreviation='<>' and t.ValueOne<>t.ValueTwo then 1
when ct.Abbreviation='<' and t.ValueOne<t.ValueTwo then 1
else 0 end
from YourTable t
join ConditionType ct on ct.ID = t.ConditionTypeID
)
update cte
set AdditionalColumn = Result
If above logic is supposed to be applied in many places, not just over one table, then yes you may think about function. Though I would used rather inline table-valued function (not scalar), because of there is overhead imposed with use of user defined scalar functions (to call and return, and the more rows to be processed the more time wastes).
create function ftComparison
(
#v1 float,
#v2 float,
#cType int
)
returns table
as return
select
Result = case
when ct.Abbreviation='=' and #v1=#v2 then 1
when ct.Abbreviation='>' and #v1>#v2 then 1
when ct.Abbreviation='>=' and #v1>=#v2 then 1
when ct.Abbreviation='<=' and #v1<=#v2 then 1
when ct.Abbreviation='<>' and #v1<>#v2 then 1
when ct.Abbreviation='<' and #v1<#v2 then 1
else 0
end
from ConditionType ct
where ct.ID = #cType
which can be applied then as:
select f.Result
from YourTable t
cross apply ftComparison(ValueOne, ValueTwo, t.ConditionTypeID) f
or
select f.Result
from YourAnotherTable t
cross apply ftComparison(SomeValueColumn, SomeOtherValueColumn, #someConditionType) f

Convert SQL to LINQ, nested select, top, "distinct" using group by and multiple order bys

I have the following SQL query, which I'm struggling to convert to LINQ.
Purpose: Get the top 10 coupons from the table, ordered by the date they expire (i.e. list the ones that are about to expire first) and then randomly choosing one of those for publication.
Notes: Because of the way the database is structured, there maybe duplicate Codes in the Coupon table. Therefore, I am using a GROUP BY to enforce distinction, because I can't use DISTINCT in the sub select query (which I think is correct). The SQL query works.
SELECT TOP 1
c1.*
FROM
Coupon c1
WHERE
Code IN (
SELECT TOP 10
c2.Code
FROM
Coupon c2
WHERE
c2.Published = 0
GROUP BY
c2.Code,
c2.Expires
ORDER BY
c2.Expires
)
ORDER BY NEWID()
Update:
This is as close as I have got, but in two queries:
var result1 = (from c in Coupons
where c.Published == false
orderby c.Expires
group c by new { c.Code, c.Expires } into coupon
select coupon.FirstOrDefault()).Take(10);
var result2 = (from c in result1
orderby Guid.NewGuid()
select c).Take(1);
Here's one possible way:
from c in Coupons
from cs in
((from c in coupons
where c.published == false
select c).Distinct()
).Take(10)
where cs.ID == c.ID
select c
Keep in mind that LINQ creates a strongly-typed data set, so an IN statement has no general equivalent. I understand trying to keep the SQL tight, but LINQ may not be the best answer for this. If you are using MS SQL Server (not SQL Server Compact) you might want to consider doing this as a Stored Procedure.
Using MercurioJ's slightly buggy response, in combination with another SO suggested random row solution my solution was:
var result3 = (from c in _dataContext.Coupons
from cs in
((from c1 in _dataContext.Coupons
where
c1.IsPublished == false
select c1).Distinct()
).Take(10)
where cs.CouponId == c.CouponId
orderby _dataContext.NewId()
select c).Take(1);

Postgresql and comparing to an empty field

It seems that in PostgreSQL, empty_field != 1 (or some other value) is FALSE. If this is true, can somebody tell me how to compare with empty fields?
I have following query, which translates to "select all posts in users group for which one hasn't voted yet:
SELECT p.id, p.body, p.author_id, p.created_at
FROM posts p
LEFT OUTER JOIN votes v ON v.post_id = p.id
WHERE p.group_id = 1
AND v.user_id != 1
and it outputs nothing, even though votes table is empty. Maybe there is something wrong with my query and not with the logic above?
Edit: it seems that changing v.user_id != 1, to v.user_id IS DISTINCT FROM 1, did the job.
From PostgreSQL docs:
For non-null inputs, IS DISTINCT FROM
is the same as the <> operator.
However, when both inputs are null it
will return false, and when just one
input is null it will return true.
If you want to return rows where v.user_id is NULL then you need to handle that specially. One way you can fix it is to write:
AND COALESCE(v.user_id, 0) != 1
Another option is:
AND (v.user_id != 1 OR v.user_id IS NULL)
Edit: spacemonkey is correct that in PostgreSQL you should use IS DISTINCT FROM here.
NULL is a unknown value so it can never equal something. Look into using the COALESCE function.