Error in databrick - Column is not iterable - pyspark

dim_processo.withColumn(
col("Cliente"),
when(
dim_processo.Cliente.endswith('"parque",lda.'),
regexp_replace(
dim_processo.Cliente, 'Restaurante "parque",lda.', "Restaurante parque,lda."
),
),
)
TypeError: Column is not iterable
Help???

First parameter of withColumn should be the column name (a string)
here is the fix :
withColumn(
"Cliente"

Related

ADF CopyData Is it possible to have a dynamic Additional Columns that can be nullable?

I have a configuration table with file names, destination table name (and other configs) to copy data into a SQL table. Sometimes I want the filename in a new column, but not for every files.
Is it possible to have a default value to not generate additional column for some files?
I tried
#json(
if(
equals(item().AdditionalColumns, null),
'{}',
item().AdditionalColumns
)
)
But I get this error: The value of property 'additionalColumns' is in unexpected type 'IList`1'.
And
#json(
if(
equals(item().AdditionalColumns, null),
'{[]}',
item().AdditionalColumns
)
)
But I get this error: The function 'json' parameter is not valid. The provided value '{[]}' cannot be parsed: 'Invalid property identifier character: [. Path '', line 1, position 1
Thank you
I figured out.
#json(
if(
equals(item()?.AdditionalColumns, null),
'[]',
item()?.AdditionalColumns
)
)

How to convert the Int column into a string in Pyspark?

Since I am a beginner of Pyspark can anyone help in doing conversion of an Integer Column into a String?
Here is my code in Aws Athena and I need to convert it into pyspark dataframe.
case when A.[HHs Reach] = 0 or A.[HHs Reach] is null then '0'
when A.[HHs Reach] = 1000000000 then '*'
else cast(A.[HHs Reach] as varchar) end as [HHs Reach]
assuming df is your dataframe, something like this :
from pyspark.sql import functions as F
df.withColumn(
"HHs Reach",
F.when(F.col("HHs Reach").isNull(), '0')
.when(F.col("HHs Reach") == 1000000000, '*')
.otherwise(F.col("HHs Reach").cast("string"))
)

agg condition : keyword can't be an expression with Pyspark

I am using pyspark to reate a dataframe which calculates the sum of "montant" when the value of the column "isfraud" ==1 .
But I get this error :
File "", line 5
when(col("isFraud") =1, sum("montant"))
^ SyntaxError: keyword can't be an expression
Here the code :
CNP_df_fraude= (tx_wd_df
#.filter("isFraude =='1'").filter("POS_Card_Presence =='CardNotPresent'")
.groupBy("POS_Cardholder_Presence")
.agg(
when(col("isFraud") =1, sum("montant"))
)
)
Any idea please?
Thanks
Just put when() inside sum():
CNP_df_fraude= (tx_wd_df
.groupBy("POS_Cardholder_Presence")
.agg(
sum(when(col("isFraud")==1, col("montant")).otherwise(0))
)
)
You cannot use when() inside the .agg() function.
You could however try:
CNP_df_fraude= tx_wd_df.filter(F.col("isFraud") == 1)
.groupBy("POS_Cardholder_Presence")
.sum("montant")

replace column and get ltrim of the column value

I want to replace an column in an dataframe. need to get the scala
syntax code for this
Controlling_Area = CC2
Hierarchy_Name = CC2HIDNE
Need to write as : HIDENE
ie: remove the Controlling_Area present in Hierarchy_Name .
val dfPC = ReadLatest("/Full", "parquet")
.select(
LRTIM( REPLACE(col("Hierarchy_Name"),col("Controlling_Area"),"") ),
Col(ColumnN),
Col(ColumnO)
)
notebook:3: error: not found: value REPLACE
REPLACE(col("Hierarchy_Name"),col("Controlling_Area"),"")
^
Expecting to get the LTRIM and replace code in scala
You can use withColumnRenamed to achieve that:
import org.apache.spark.sql.functions
val dfPC = ReadLatest("/Full", "parquet")
.withColumnRenamed("Hierarchy_Name","Controlling_Area")
.withColumn("Controlling_Area",ltrim(col("Controlling_Area")))

Identifying isNull and isNotNul in dataframe

I'm trying to identify the columns which are null and which are not null and depending on that trying to insert a string.
ab_final = join_df.withColumn("linked_A_B",
when(
col("a3_inbound").isNull() & ("a3_item").isNull()
), 'No Value'
).when(
(col("it_item").isNull() & ("it_export").isNull()),
'NoShipment'
).when(
(col("a3_inbound").isNotNull() & ("it_export").isNotNull()),
'Export')
)
I'm getting the below error
str' object has no attribute 'isnull'
Plz help