How to convert case else to when otherwise in spark dataframes - scala

I would like to rewrite teradata code to spark dataframes using scala, facing an error " when() cannot be applied once otherwise() is applied ", Help is appreciated.
Teradata
CASE WHEN
CASE WHEN ID <> cid OR Group <> sg OR ead IS NULL THEN 1
ELSE
CASE WHEN HFlag <> hg AND ad + TBreak < ActivityDate THEN 1
ELSE
CASE WHEN HFlag = hg AND HFlag = 1 THEN
CASE WHEN ead + 1 < ActivityDate THEN
CASE WHEN ead + TBreak < ActivityDate THEN 1
ELSE 0 END
ELSE 0 END
WHEN HFlag = hg AND HFlag = 0 THEN
CASE WHEN ead + TBreak < ActivityDate THEN 1
ELSE 0 END
ELSE 0 END
END
END = 1 THEN Row_Number() Over(ORDER BY ID, Group, ActivityDate, HFlag)
ELSE 0 END AS arctic
I tried in the below way.
val windowRank = Window.orderBy('ID, 'Group, 'ActivityDate, 'HFlag)
df.withColumn("arctic",
when(when(col("ID") =!= col("cid") || col("Group") =!= col("sg") || col("ad").isNull, 1)
.when(col("HFlag") =!= col("hg") && (col("ad") + col("TBreak")) < col("ActivityDate"), 1)
.when(col("HFlag") === col("hg") && col("HFlag") === 1,
when((col("ad") + 1) < col("ActivityDate"),
when((col("ad") + col("TBreak")) < col("ActivityDate"), 1)
.otherwise(0))
.otherwise(0)
.when(col("HFlag") === col("hg") && col("HFlag") === 0, when((col("ad") + col("TBreak")) < col("ActivityDate"), 1))
.otherwise(0))
.otherwise(0) === 1, row_number() over windowRank)
.otherwise(0))

You have missed a closing bracket
val windowRank = Window.orderBy('ID, 'Group, 'ActivityDate, 'HFlag)
df.withColumn("arctic",
when(when(col("ID") =!= col("cid") || col("Group") =!= col("sg") || col("ad").isNull, 1)
.when(col("HFlag") =!= col("hg") && (col("ad") + col("TBreak")) < col("ActivityDate"), 1)
.when(col("HFlag") === col("hg") && col("HFlag") === 1,
when((col("ad") + 1) < col("ActivityDate"),
when((col("ad") + col("TBreak")) < col("ActivityDate"), 1)
.otherwise(0))
.otherwise(0)) // <-- missing close bracket
.when(col("HFlag") === col("hg") && col("HFlag") === 0, when((col("ad") + col("TBreak")) < col("ActivityDate"), 1))
.otherwise(0))
.otherwise(0) === 1, row_number() over windowRank)
.otherwise(0))

Related

Return in just one lines

I have a problem, I have a query in postgresql and I need to do a SUM, however, it returns me a total of 4 lines because the ids of my routes are different, and I need to add the "totalChecklists" and return the amount of all in one line.
select "users"."id",
count((cycles.adherence >= 100 and cycles.justified = false) or null) as "routesDone",
count((cycles.adherence < 100 and cycles.opened = false) or null) as "routesNotDone",
count((cycles.opened = true and cycles.adherence < 100) or null) as "routesOpened",
count(cycles.justified or null) as "routesJustified",
routes."totalSpots" AS "totalCheck",
count(cycles.id) as "routesOnPeriod",
(
count((cycles.adherence >= 100 and cycles.justified = false) or null) +
count((cycles.adherence < 100 and cycles.opened = false) or null) +
count((cycles.opened = true and cycles.adherence < 100) or null) +
count(cycles.justified or null)
) as "routesTotal",
0 as "routesDoneLate",
0 as "routesLate",
SUM(case when cycles.adherence >= 100 then routes."totalAssets" else 0 end) AS "coveredAssets",
SUM(case when cycles.adherence < 100 then routes."totalAssets" else 0 end) AS "uncoveredAssets"
from "routes"
inner join "cycles" on "routes"."identifier" = "cycles"."routeIdentifier"
inner join "workspaces" on "cycles"."workspaceId" = "workspaces"."id"
inner join "users" on "users"."id" = "cycles"."inspectorId"
where "routes"."deleted_at" is null
and "cycles"."deletedAt" is null
and "workspaces"."deleted" = false
and "cycleStartAt" <= '2022-07-11T02:59:59Z'
and "cycleEndAt" >= '2022-04-12T03:00:00Z'
and "users"."id" = 'b67830a7-39fc-4ad5-bf26-07a43dcd3676'
and "routes"."contextPath" like '/malicious-interaction-murder-32%'
and "routes"."deleted_at" is null
group by
users.id,
routes.id
My return:
Sorry for my bad english, I hope I managed to be very clear.
Thank you very much in advance

Convert an iteration to recursion

I have an iteration logic that returns a query like this :
val baseQuery = s"select agg_id from quality.QUALITY_AGGREGATOR where job_id = 123 and level ="
val reprocessDate ="2018-10-24"
for( i <- 0 to level){
var currLevelSubQuery=""
if (i==0 ){
currLevelSubQuery= baseQuery + s"$i and agg_value >= '${reprocessDate}'"
}
else{
currLevelSubQuery= baseQuery + s"$i and parent_agg_id in ( $prevLevelSubQuery )"
}
prevLevelSubQuery= currLevelSubQuery
finalQuery = finalQuery + currLevelSubQuery + (if (i<level) " union " else "")
}
It returns a query of this nature for level = 2.
SELECT agg_id
FROM quality.quality_aggregator
WHERE job_id = 123
AND level = 0
AND agg_value >= '2018-10-24'
UNION
SELECT agg_id
FROM quality.quality_aggregator
WHERE job_id = 123
AND level = 1
AND parent_agg_id IN (SELECT agg_id
FROM quality.quality_aggregator
WHERE job_id = 123
AND level = 0
AND agg_value >= '2018-10-24')
UNION
SELECT agg_id
FROM quality.quality_aggregator
WHERE job_id = 123
AND level = 2
AND parent_agg_id IN (SELECT agg_id
FROM quality.quality_aggregator
WHERE job_id = 123
AND level = 1
AND parent_agg_id IN
(SELECT agg_id
FROM quality.quality_aggregator
WHERE job_id = 123
AND level = 0
AND agg_value >=
'2018-10-24'))
I am trying to convert it into a recursive logic like this
val baseQuery = s"select agg_id from quality.QUALITY_AGGREGATOR where job_id = 123 and level ="
val finalQuery = getAggIdSQLGenerator(2,"2018-10-24")
def getAggIdSQLGenerator(level : Int, reprocessDate:String):String={
if (level == 0)
( baseQuery + s"$level and agg_value >= '${reprocessDate}'")
else
{
val subQuery=getAggIdSQLGenerator(level-1 ,reprocessDate)
baseQuery + s"$level and parent_agg_id in (" +subQuery +") union "+ subQuery
}
}
But this is not yielding correct results. What am I missing?
This isn't recursive but I think it's a smaller, cleaner, implementation of what you're after.
val baseQuery = ".... level="
val levelRE = "level=(\\d+)".r.unanchored
val reprocessDate ="2018-10-24"
val av = s" and agg_value >= '${reprocessDate}'"
val pai = " and parent_agg_id in "
val itrs = 3
val query = List.iterate(s"${baseQuery}0$av", itrs) { prevStr =>
val level = prevStr match {
case levelRE(n) => n.toInt + 1
case _ => 0
}
s"$baseQuery$level$pai($prevStr)"
}.mkString(" union ")
The issue is you have added union on every query and you have concatenated subquery 2 times.
This gives your expected output.
val baseQuery = s"select agg_id from quality.QUALITY_AGGREGATOR where job_id = 123 and level ="
val finalQuery = getAggIdSQLGenerator(2, "2018-10-24")
def getAggIdSQLGenerator(level: Int, reprocessDate: String): String = {
if (level == 0) {
baseQuery + s" $level and agg_value >= '${reprocessDate}'"
} else {
val subQuery = getAggIdSQLGenerator(level - 1, reprocessDate)
baseQuery + s" $level and parent_agg_id in (" + subQuery + ")"
}
}
println("UNION " + finalQuery)

Select SQL case when for Entity Framework

I have this select which works in T-SQL, but need to convert to Entity Framework. How can I use case in Entity Framework?
select
*
from
Contrato c
where
c.QtdParcelasFalta > 0
and (case
when c.DiaEspecificoMarcar = 1
then c.DiaEspecifico
when c.DiaEspecificoMarcar = 0
then (select convert(int, substring(DataCobranca, 2, 2))
from Contrato
where Contrato.Id = c.Id) -
(select e.DataProcessamentoNota
from Contrato t
inner join PedidoVenda p on p.id = t.PedidoVendaId
inner join Empresas e on p.EmpresaID = e.Id
where p.id = t.PedidoVendaId
and t.id = c.Id and p.EmpresaID = 1)
end) = (SELECT DAY(GETDATE()))
I tried it like this, but it didn't work out, did not pass the correct values, because I'm not aware of pass to Entity Framework:
var contrato = db.Contrato
.Include(a => a.PedidoVenda)
.Include(a => a.Cliente)
.Where(a => a.PedidoVenda.EmpresaID == model.EmpresaID
&& a.Cancelado == false
&& a.DiaEspecificoMarcar == true
&& a.DiaEspecifico == int.Parse(DateTime.Now.ToString("dd")) ||
(int.Parse(a.DataCobranca.Substring(1, 2)) - a.PedidoVenda.Empresa.DataProcessamentoNota)
== int.Parse(DateTime.Now.ToString("dd"))
).ToList();
You are actually on the right track. You just need to add when conditions to your lef and right conditions like this:
var contrato = db.Contrato.Include(a => a.PedidoVenda).Include(a => a.Cliente).Where
(a => a.PedidoVenda.EmpresaID == model.EmpresaID && a.Cancelado == false &&
a.DiaEspecificoMarcar == false && a.DiaEspecifico == int.Parse(DateTime.Now.ToString("dd")) ||
( a.DiaEspecificoMarcar == true && int.Parse(a.DataCobranca.Substring(1, 2)) - a.PedidoVenda.Empresa.DataProcessamentoNota)
== int.Parse(DateTime.Now.ToString("dd"))
).ToList();
I also suggest you use DateTime.Now from a variable to be sure both int.Parse gets the same input

user variables in a calculation

I am producing a league table for speedway teams, I have got a query to calculate the required data based on teams results. The part I am struggling with is the points difference (difference between points scored and scored against.
SELECT tbl_clubs.club, tbl_clubs.club_id,
SUM(if(tbl_clubs.club_id = tbl_fixtures.away AND tbl_fixtures.awayscore is not null ,1,0)) +
SUM(if(tbl_clubs.club_id = tbl_fixtures.home AND tbl_fixtures.homescore is not null,1,0 )) as `M`,
SUM( if( tbl_clubs.club_id = tbl_fixtures.home
AND tbl_fixtures.homescore > tbl_fixtures.awayscore, 1, 0 ) ) AS `W`,
SUM( IF( tbl_clubs.club_id = tbl_fixtures.home AND tbl_fixtures.awayscore = tbl_fixtures.homescore, 1, 0 ) ) AS `HD`,
SUM( if( tbl_clubs.club_id = tbl_fixtures.home
AND tbl_fixtures.homescore < tbl_fixtures.awayscore, 1, 0 ) ) AS `HL`,
SUM(if(tbl_clubs.club_id = tbl_fixtures.away
AND tbl_fixtures.awayscore > tbl_fixtures.homescore
AND tbl_fixtures.awayscore - tbl_fixtures.homescore >=7,1,0)) AS `4W`,
SUM(if(tbl_clubs.club_id = tbl_fixtures.away
AND tbl_fixtures.awayscore > tbl_fixtures.homescore
AND tbl_fixtures.awayscore - tbl_fixtures.homescore <=6,1,0)) AS `3W`,
SUM( IF( tbl_clubs.club_id = tbl_fixtures.away AND tbl_fixtures.awayscore = tbl_fixtures.homescore, 1, 0 ) ) AS `AD`,
SUM(if(tbl_clubs.club_id = tbl_fixtures.away
AND tbl_fixtures.awayscore < tbl_fixtures.homescore
AND tbl_fixtures.homescore - tbl_fixtures.awayscore <=6,1,0)) AS `1L`,
SUM(if(tbl_clubs.club_id = tbl_fixtures.away
AND tbl_fixtures.awayscore < tbl_fixtures.homescore
AND tbl_fixtures.homescore - tbl_fixtures.awayscore >=7,1,0)) AS `L`,
#FOR:=SUM(IF(tbl_clubs.club_id = tbl_fixtures.away,tbl_fixtures.awayscore,0)) +
SUM(IF(tbl_clubs.club_id = tbl_fixtures.home,tbl_fixtures.homescore,0)) as `F`,
#Against:=SUM(IF(tbl_clubs.club_id = tbl_fixtures.home,tbl_fixtures.awayscore,0)) +
SUM(IF(tbl_clubs.club_id = tbl_fixtures.away,tbl_fixtures.homescore,0)) as `A`,
SUM(#For - #Against) as `PtsDiff`,
SUM( if( tbl_clubs.club_id = tbl_fixtures.home
AND tbl_fixtures.homescore > tbl_fixtures.awayscore, 3, 0 ) ) +
SUM( IF( tbl_clubs.club_id = tbl_fixtures.home AND tbl_fixtures.awayscore = tbl_fixtures.homescore, 1, 0 ) ) +
SUM(if(tbl_clubs.club_id = tbl_fixtures.away
AND tbl_fixtures.awayscore > tbl_fixtures.homescore
AND tbl_fixtures.awayscore - tbl_fixtures.homescore >=7,4,0)) +
SUM(if(tbl_clubs.club_id = tbl_fixtures.away
AND tbl_fixtures.awayscore > tbl_fixtures.homescore
AND tbl_fixtures.awayscore - tbl_fixtures.homescore <=6,3,0)) +
SUM( IF( tbl_clubs.club_id = tbl_fixtures.away AND tbl_fixtures.awayscore = tbl_fixtures.homescore, 2, 0 ) ) +
SUM(if(tbl_clubs.club_id = tbl_fixtures.away
AND tbl_fixtures.awayscore < tbl_fixtures.homescore
AND tbl_fixtures.homescore - tbl_fixtures.awayscore <=6,1,0)) as `Pts`
FROM tbl_clubs
INNER JOIN tbl_fixtures ON tbl_clubs.club_id = tbl_fixtures.home
OR tbl_clubs.club_id = tbl_fixtures.away
where tbl_clubs.league_id = 3
GROUP BY tbl_clubs.club_id
order by Pts desc, PtsDiff desc, club asc
All of the query is working except for
#FOR:=SUM(IF(tbl_clubs.club_id = tbl_fixtures.away,tbl_fixtures.awayscore,0)) +
SUM(IF(tbl_clubs.club_id = tbl_fixtures.home,tbl_fixtures.homescore,0)) as `F`,
#Against:=SUM(IF(tbl_clubs.club_id = tbl_fixtures.home,tbl_fixtures.awayscore,0)) +
SUM(IF(tbl_clubs.club_id = tbl_fixtures.away,tbl_fixtures.homescore,0)) as `A`,
SUM(#For - #Against) as `PtsDiff`,
I am getting NULL as a result for this, I am guessing its something simple, any help would be great
Firstly, you'll want to ensure that null values always default to 0 and not null. This is something that could happen within your script, so you should ensure that 0 is the only viable default. This should occur before this query gets executed though ideally. It doesn't appear that this is the case based on your script, so you may want to ensure you have designed your database to cater for that.
Secondly, variable assignment inside queries is not quite as straight-forward as you'd like it to be unfortunately. You may be better using a sub-query for your initialisations:
#FOR:=(select IFNULL(SUM(tbl_fixtures.awayscore),0) from tbl_fixtures where tbl_clubs.club_id = tbl_fixtures.away) +
(select IFNULL(SUM(tbl_fixtures.homescore),0) from tbl_fixtures where tbl_clubs.club_id = tbl_fixtures.home)
as `F`,
#Against:=(select IFNULL(SUM(tbl_fixtures.awayscore),0) from tbl_fixtures where tbl_clubs.club_id = tbl_fixtures.home) +
(select IFNULL(SUM(tbl_fixtures.homescore),0) from tbl_fixtures where tbl_clubs.club_id = tbl_fixtures.away)
as `A`,
SUM(#FOR - #Against) as `PtsDiff`,
Not very elegant, but this should work. Notice the use of IFNULL to ensure that null is converted to 0.

Where condition in Slick

How to achieve the equivalent in Slick?
select * from table1 where col1 = 1 AND (col2 = 2 or col3 = 3)
This doesn't work:
val action = table.filter(_.col1 === 1 && (_.col2 === 2 || _.col3 === 3)).result
You cannot use the short hand in this case. Try this:
val action = table.filter( x => x.col1 == 1 && (x.col2 == 2 || x.col3 == 3)).result