I am aware of how to implement a simple CASE-WHEN-THEN clause in SPARK SQL using Scala. I am using Version 1.6.2. But, I need to specify AND condition on multiple columns inside the CASE-WHEN clause. How to achieve this in SPARK using Scala ?
Thanks in advance for your time and help!
Here's the SQL query that I have:
select sd.standardizationId,
case when sd.numberOfShares = 0 and
isnull(sd.derivatives,0) = 0 and
sd.holdingTypeId not in (3,10)
then
8
else
holdingTypeId
end
as holdingTypeId
from sd;
First read table as dataframe
val table = sqlContext.table("sd")
Then select with expression. There align syntaxt according to your database.
val result = table.selectExpr("standardizationId","case when numberOfShares = 0 and isnull(derivatives,0) = 0 and holdingTypeId not in (3,10) then 8 else holdingTypeId end as holdingTypeId")
And show result
result.show
An alternative option, if it's wanted to avoid using the full string expression, is the following:
import org.apache.spark.sql.Column
import org.apache.spark.sql.functions._
val sd = sqlContext.table("sd")
val conditionedColumn: Column = when(
(sd("numberOfShares") === 0) and
(coalesce(sd("derivatives"), lit(0)) === 0) and
(!sd("holdingTypeId").isin(Seq(3,10): _*)), 8
).otherwise(sd("holdingTypeId")).as("holdingTypeId")
val result = sd.select(sd("standardizationId"), conditionedColumn)
Related
Python doesn't like the ampersand below.
I get the error: & is not a supported operation for types str and str. Please review your code.
Any idea how to get this right? I've never tried to join more than 1 column for aliased tables. Thx!!
df_initial_sample = df_crm.alias('crm').join(df_cngpt.alias('cng'), on= (("crm.id=cng.id") & ("crm.cpid = cng.cpid")), how = "inner")
Try using as below -
df_initial_sample = df_crm.alias('crm').join(df_cngpt.alias('cng'), on= (["id"] and ["cpid"]), how = "inner")
Your join condition is overcomplicated. It can be as simple as this
df_initial_sample = df_crm.join(df_cngpt, on=['id', 'cpid'], how = 'inner')
My probleme is i have a code that gives filter column and values in a list as parameters
val vars = "age IN ('0')"
val ListPar = "entered_user,2014-05-05,2016-10-10;"
//val ListPar2 = "entered_user,2014-05-05,2016-10-10;revenue,0,5;"
val ListParser : List[String] = ListPar.split(";").map(_.trim).toList
val myInnerList : List[String] = ListParser(0).split(",").map(_.trim).toList
if (myInnerList(0) == "entered_user" || myInnerList(0) == "date" || myInnerList(0) == "dt_action"){
responses.filter(vars +" AND " + responses(myInnerList(0)).between(myInnerList(1), myInnerList(2)))
}else{
responses.filter(vars +" AND " + responses(myInnerList(0)).between(myInnerList(1).toInt, myInnerList(2).toInt))
}
well for all the fields except the one that contains date the functions works flawless but for fields that have date it throws an error
Note : I'm working with parquet files
here is the error
when i try to write it manually i get the same
here is how the query it sent to the sparkSQL
the first one where there is revenue it works but the second one doesn't work
and when i try to just filter with dates without the value of "vars" which contains other columns, it works
Well my issue is that i was mixing between sql and spark and when i tried to concatenate sql query which is my variable "vars" whith df.filter() and especially when i used between operator it was giving an output format unrocognised by sparksql which is
age IN ('0') AND ((entered_user >= 2015-01-01) AND (entered_user <= 2015-05-01))
it might seems correct but after looking in sql documentation it was missing parenthesese(in vars) it needed to be
(age IN ('0')) AND ((entered_user >= 2015-01-01) AND (entered_user <= 2015-05-01))
well the solution is i needed to concatenate those correctly so to do that i must to add " expr " to the variable vars which will result the desire syntaxe
responses.filter(expr(vars) && responses(myInnerList(0)).between(myInnerList(1), myInnerList(2)))
I want to replace an column in an dataframe. need to get the scala
syntax code for this
Controlling_Area = CC2
Hierarchy_Name = CC2HIDNE
Need to write as : HIDENE
ie: remove the Controlling_Area present in Hierarchy_Name .
val dfPC = ReadLatest("/Full", "parquet")
.select(
LRTIM( REPLACE(col("Hierarchy_Name"),col("Controlling_Area"),"") ),
Col(ColumnN),
Col(ColumnO)
)
notebook:3: error: not found: value REPLACE
REPLACE(col("Hierarchy_Name"),col("Controlling_Area"),"")
^
Expecting to get the LTRIM and replace code in scala
You can use withColumnRenamed to achieve that:
import org.apache.spark.sql.functions
val dfPC = ReadLatest("/Full", "parquet")
.withColumnRenamed("Hierarchy_Name","Controlling_Area")
.withColumn("Controlling_Area",ltrim(col("Controlling_Area")))
I have a code similar to this:
from pyspark.sql.functions import udf
from pyspark.sql.types import BooleanType
def regex_filter(x):
regexs = ['.*123.*']
if x and x.strip():
for r in regexs:
if re.match(r, x, re.IGNORECASE):
return True
return False
filter_udf = udf(regex_filter, BooleanType())
df_filtered = df.filter(filter_udf(df.fieldXX))
I want to use "regexs" var to verify if any digit "123" is in "fieldXX"
i don't know what i did wrong!
Could anyone help me with this?
Regexp is incorrect.
I think it should be something like:
regexs = '.*[123].*'
You can use SQL function to attain this
df.createOrReplaceTempView("df_temp")
df_1 = spark.sql("select *, case when col1 like '%123%' then 'TRUE' else 'FALSE' end col2 from df_temp")
Disadvantage in using UDF is you cannot save the data frame back or do any manipulations in that data frame further.
Can someone please tell me how do we compare a table column against NULL using slick's lifted embedding.
What i want to achieve in mysql:
select * from Users where email = '<some_email_addr>' and ( removed = NULL OR removed <= 1 )
It gives me an errors at x.removed === null when i try:
val q = for {
x <- Users
if x.email === email && ( x.removed === null || x.removed <= 1 )
} yield x.removed
Thanks
Try:
x.removed.isNull
I think that is what you are looking for
As daaatz said and per http://slick.typesafe.com/doc/2.1.0/upgrade.html#isnull-and-isnotnull you should use isEmpty, but it only works on Option columns. A workaround is x.removed.?.isEmpty.
Try
x.removed.isEmpty()
There are no nulls in scala ;-)