Write List of Map data into csv - scala

val rdd = df.rdd.map(line => Row.fromSeq((
scala.xml.XML.loadString("<?xml version='1.0' encoding='utf-8'?>" + line(1)).child
.filter(elem =>
elem.label == "name1"
|| elem.label == "name2"
|| elem.label == "name3"
|| elem.label == "name4"
).map(elem => (elem.label -> elem.text)).toList)
)
I do rdd.take(10).foreach(println), My is RDD[Row] then produced the output something like this:
[(name1, value1), (name2, value2),(name3, value3)]
[(name1, value11), (name2, value22),(name3, value33)]
[(name1, value111), (name2, value222),(name4, value44)]
I want save this into csv with (name1..name4 are header of csv), anyone please help how can I implement this with apache spark 2.4.0
name1 | name2 | name3 | name4
value1 | value2 |value3 | null
value11 | value22 |value33 | null
value111 | value222 |null | value444

I adjusted your example and added some intermediate Values to help get each step:
// define the labels you want:
val labels = Seq("name1", "name2", "name3", "name4")
val result: RDD[Row] = rdd.map { line =>
// your raw data
val tuples: immutable.Seq[(String, String)] =
scala.xml.XML.loadString("<?xml version='1.0' encoding='utf-8'?>" + line(1)).child
.filter(elem => labels.contains(elem.label)) // you can use the label list to filter
.map(elem => (elem.label -> elem.text)).toList // no change here
val values: Seq[String] =
labels.map(l =>
// take the values you have a label
tuples.find{case (k, v) => k == l}.map(_._2)
// or just add an empty String
.getOrElse(""))
// create a Row
Row.fromSeq(values)
}
Now I am not sure - but in essence you have to insert the title Row as the first row:
[name1, name2, name3]

Related

Spark - Drop null values from map column

I'm using Spark to read a CSV file and then gather all the fields to create a map. Some of the fields are empty and I'd like to remove them from the map.
So for a CSV that looks like this:
"animal", "colour", "age"
"cat" , "black" ,
"dog" , , "3"
I'd like to get a dataset with the following maps:
Map("animal" -> "cat", "colour" -> "black")
Map("animal" -> "dog", "age" -> "3")
This is what I have so far:
val csv_cols_n_vals: Array[Column] = csv.columns.flatMap { c => Array(lit(c), col(c)) }
sparkSession.read
.option("header", "true")
.csv(csvLocation)
.withColumn("allFieldsMap", map(csv_cols_n_vals: _*))
I've tried a few variations, but I can't seem to find the correct solution.
There is most certainly a better and more efficient way using the Dataframe API, but here is a map/flatmap solution:
val df = Seq(("cat", "black", null), ("dog", null, "3")).toDF("animal", "colour", "age")
val cols = df.columns
df.map(r => {
cols.flatMap( c => {
val v = r.getAs[String](c)
if (v != null) {
Some(Map(c -> v))
} else {
None
}
}).reduce(_ ++ _)
}).toDF("map").show(false)
Which produces:
+--------------------------------+
|map |
+--------------------------------+
|[animal -> cat, colour -> black]|
|[animal -> dog, age -> 3] |
+--------------------------------+
scala> df.show(false)
+------+------+----+
|animal|colour|age |
+------+------+----+
|cat |black |null|
|dog |null |3 |
+------+------+----+
Building Expressions
val colExpr = df
.columns // getting list of columns from dataframe.
.map{ columnName =>
when(
col(columnName).isNotNull, // checking if column is not null
map(
lit(columnName),
col(columnName)
) // Adding column name and its value inside map
)
.otherwise(map())
}
.reduce(map_concat(_,_))
// finally using map_concat function to concat map values.
Above code will create below expressions.
map_concat(
map_concat(
CASE WHEN (animal IS NOT NULL) THEN map(animal, animal) ELSE map() END,
CASE WHEN (colour IS NOT NULL) THEN map(colour, colour) ELSE map() END
),
CASE WHEN (age IS NOT NULL) THEN map(age, age) ELSE map() END
)
Applying colExpr on DataFrame.
scala>
df
.withColumn("allFieldsMap",colExpr)
.show(false)
+------+------+----+--------------------------------+
|animal|colour|age |allFieldsMap |
+------+------+----+--------------------------------+
|cat |black |null|[animal -> cat, colour -> black]|
|dog |null |3 |[animal -> dog, age -> 3] |
+------+------+----+--------------------------------+
Spark-sql solution:
val df = Seq(("cat", "black", null), ("dog", null, "3")).toDF("animal", "colour", "age")
df.show(false)
+------+------+----+
|animal|colour|age |
+------+------+----+
|cat |black |null|
|dog |null |3 |
+------+------+----+
df.createOrReplaceTempView("a_vw")
val cols_str = df.columns.flatMap( x => Array("\"".concat(x).concat("\""),x)).mkString(",")
spark.sql(s"""
select collect_list(m2) res from (
select id, key, value, map(key,value) m2 from (
select id, explode(m) as (key,value) from
( select monotonically_increasing_id() id, map(${cols_str}) m from a_vw )
)
where value is not null
) group by id
""")
.show(false)
+------------------------------------+
|res |
+------------------------------------+
|[[animal -> cat], [colour -> black]]|
|[[animal -> dog], [age -> 3]] |
+------------------------------------+
Or much shorter
spark.sql(s"""
select collect_list(case when value is not null then map(key,value) end ) res from (
select id, explode(m) as (key,value) from
( select monotonically_increasing_id() id, map(${cols_str}) m from a_vw )
) group by id
""")
.show(false)
+------------------------------------+
|res |
+------------------------------------+
|[[animal -> cat], [colour -> black]]|
|[[animal -> dog], [age -> 3]] |
+------------------------------------+

How can i check for empty values on spark Dataframe using User defined functions

guys, I have this user-defined function to check if the text rows are empty:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().master("local").getOrCreate()
import spark.implicits._
{{{
val df = Seq(
(0, "","Mongo"),
(1, "World","sql"),
(2, "","")
).toDF("id", "text", "Source")
// Define a "regular" Scala function
val checkEmpty: String => Boolean = x => {
var test = false
if(x.isEmpty){
test = true
}
test
}
val upper = udf(checkEmpty)
df.withColumn("isEmpty", upper('text)).show
}}}
I'm actually getting this dataframe:
+---+-----+------+-------+
| id| text|Source|isEmpty|
+---+-----+------+-------+
| 0| | Mongo| true|
| 1|World| sql| false|
| 2| | | true|
+---+-----+------+-------+
How could I check for all the rows for empty values and return a message like:
id 0 has the text column with empty values
id 2 has the text,source column with empty values
UDF which get nullable columns as Row can be used, for get empty column names. Then rows with non-empty columns can be filtered:
val emptyColumnList = (r: Row) => r
.toSeq
.zipWithIndex
.filter(_._1.toString().isEmpty)
.map(pair => r.schema.fields(pair._2).name)
val emptyColumnListUDF = udf(emptyColumnList)
val columnsToCheck = Seq($"text", $"Source")
val result = df
.withColumn("EmptyColumns", emptyColumnListUDF(struct(columnsToCheck: _*)))
.where(size($"EmptyColumns") > 0)
.select(format_string("id %s has the %s columns with empty values", $"id", $"EmptyColumns").alias("description"))
Result:
+----------------------------------------------------+
|description |
+----------------------------------------------------+
|id 0 has the [text] columns with empty values |
|id 2 has the [text,Source] columns with empty values|
+----------------------------------------------------+
You could do something like this:
case class IsEmptyRow(id: Int, description: String) //case class for column names
val isEmptyDf = df.map {
row => row.getInt(row.fieldIndex("id")) -> row //we take id of row as first column
.toSeq //then to get secod we change row values to seq
.zip(df.columns) //zip it with column names
.collect { //if value is string and empty we append column name
case (value: String, column) if value.isEmpty => column
}
}.map { //then we create description string and pack results to case class
case (id, Nil) => IsEmptyRow(id, s"id $id has no columns with empty values")
case (id, List(column)) => IsEmptyRow(id, s"id $id has the $column column with empty values")
case (id, columns) => IsEmptyRow(id, s"id $id has the ${columns.mkString(", ")} columns with empty values")
}
Then running isEmptyDf.show(truncate = false) will show:
+---+---------------------------------------------------+
|id |description |
+---+---------------------------------------------------+
|0 |id 0 has the text columns with empty values |
|1 |id 1 has no columns with empty values |
|2 |id 2 has the text, Source columns with empty values|
+---+---------------------------------------------------+
You can also join back with original dataset:
df.join(isEmptyDf, "id").show(truncate = false)

Split text and find the common words in a Spark Dataframe

I am working on Scala with Spark and I have a dataframe including two columns with text.
Those columns are with the format of "term1, term2, term3,..." and I want to create a third column with the common terms of the two of them.
For example
Col1
orange, apple, melon
party, clouds, beach
Col2
apple, apricot, watermelon
black, yellow, white
The result would be
Col3
1
0
What I have done until now is to create a udf that splits the text and get the intersection of the two columns.
val common_terms = udf((a: String, b: String) => if (a.isEmpty || b.isEmpty) {
0
} else {
split(a, ",").intersect(split(b, ",")).length
})
And then on my dataframe
val results = termsDF.withColumn("col3", common_terms(col("col1"), col("col2"))
But I have the following error
Error:(96, 13) type mismatch;
found : String
required: org.apache.spark.sql.Column
split(a, ",").intersect(split(b, ",")).length
I would appreciate any help since I am new in Scala and just trying to learn from online tutorials.
EDIT:
val common_authors = udf((a: String, b: String) => if (a != null || b != null) {
0
} else {
val tempA = a.split( ",")
val tempB = b.split(",")
if ( tempA.isEmpty || tempB.isEmpty ) {
0
} else {
tempA.intersect(tempB).length
}
})
After the edit, if I try termsDF.show() it runs. But if I do something like that termsDF.orderBy(desc("col3")) then I get a java.lang.NullPointerException
Try
val common_terms = udf((a: String, b: String) => if (a.isEmpty || b.isEmpty) {
0
} else {
var tmp1 = a.split(",")
var tmp2 = b.split(",")
tmp1.intersect(tmp2).length
})
val results = termsDF.withColumn("col3", common_terms($"a", $"b")).show
split(a, ",") its a spark column functions.
You are using an udf so you need to use string.split() wich is a scala function
After edit: change null verification to == not !=
In Spark 2.4 sql, you can get the same results without UDF. Check this out:
scala> val df = Seq(("orange,apple,melon","apple,apricot,watermelon"),("party,clouds,beach","black,yellow,white"), ("orange,apple,melon","apple,orange,watermelon")).toDF("col1","col2")
df: org.apache.spark.sql.DataFrame = [col1: string, col2: string]
scala>
scala> df.createOrReplaceTempView("tasos")
scala> spark.sql(""" select col1,col2, filter(split(col1,','), x -> array_contains(split(col2,','),x) ) a1 from tasos """).show(false)
+------------------+------------------------+---------------+
|col1 |col2 |a1 |
+------------------+------------------------+---------------+
|orange,apple,melon|apple,apricot,watermelon|[apple] |
|party,clouds,beach|black,yellow,white |[] |
|orange,apple,melon|apple,orange,watermelon |[orange, apple]|
+------------------+------------------------+---------------+
If you want the size, then
scala> spark.sql(""" select col1,col2, filter(split(col1,','), x -> array_contains(split(col2,','),x) ) a1 from tasos """).withColumn("a1_size",size('a1)).show(false)
+------------------+------------------------+---------------+-------+
|col1 |col2 |a1 |a1_size|
+------------------+------------------------+---------------+-------+
|orange,apple,melon|apple,apricot,watermelon|[apple] |1 |
|party,clouds,beach|black,yellow,white |[] |0 |
|orange,apple,melon|apple,orange,watermelon |[orange, apple]|2 |
+------------------+------------------------+---------------+-------+
scala>

array[array["string"]] with explode option dropping null rows in spark/scala [duplicate]

I have a Dataframe that I am trying to flatten. As part of the process, I want to explode it, so if I have a column of arrays, each value of the array will be used to create a separate row. For instance,
id | name | likes
_______________________________
1 | Luke | [baseball, soccer]
should become
id | name | likes
_______________________________
1 | Luke | baseball
1 | Luke | soccer
This is my code
private DataFrame explodeDataFrame(DataFrame df) {
DataFrame resultDf = df;
for (StructField field : df.schema().fields()) {
if (field.dataType() instanceof ArrayType) {
resultDf = resultDf.withColumn(field.name(), org.apache.spark.sql.functions.explode(resultDf.col(field.name())));
resultDf.show();
}
}
return resultDf;
}
The problem is that in my data, some of the array columns have nulls. In that case, the entire row is deleted. So this dataframe:
id | name | likes
_______________________________
1 | Luke | [baseball, soccer]
2 | Lucy | null
becomes
id | name | likes
_______________________________
1 | Luke | baseball
1 | Luke | soccer
instead of
id | name | likes
_______________________________
1 | Luke | baseball
1 | Luke | soccer
2 | Lucy | null
How can I explode my arrays so that I don't lose the null rows?
I am using Spark 1.5.2 and Java 8
Spark 2.2+
You can use explode_outer function:
import org.apache.spark.sql.functions.explode_outer
df.withColumn("likes", explode_outer($"likes")).show
// +---+----+--------+
// | id|name| likes|
// +---+----+--------+
// | 1|Luke|baseball|
// | 1|Luke| soccer|
// | 2|Lucy| null|
// +---+----+--------+
Spark <= 2.1
In Scala but Java equivalent should be almost identical (to import individual functions use import static).
import org.apache.spark.sql.functions.{array, col, explode, lit, when}
val df = Seq(
(1, "Luke", Some(Array("baseball", "soccer"))),
(2, "Lucy", None)
).toDF("id", "name", "likes")
df.withColumn("likes", explode(
when(col("likes").isNotNull, col("likes"))
// If null explode an array<string> with a single null
.otherwise(array(lit(null).cast("string")))))
The idea here is basically to replace NULL with an array(NULL) of a desired type. For complex type (a.k.a structs) you have to provide full schema:
val dfStruct = Seq((1L, Some(Array((1, "a")))), (2L, None)).toDF("x", "y")
val st = StructType(Seq(
StructField("_1", IntegerType, false), StructField("_2", StringType, true)
))
dfStruct.withColumn("y", explode(
when(col("y").isNotNull, col("y"))
.otherwise(array(lit(null).cast(st)))))
or
dfStruct.withColumn("y", explode(
when(col("y").isNotNull, col("y"))
.otherwise(array(lit(null).cast("struct<_1:int,_2:string>")))))
Note:
If array Column has been created with containsNull set to false you should change this first (tested with Spark 2.1):
df.withColumn("array_column", $"array_column".cast(ArrayType(SomeType, true)))
You can use explode_outer() function.
Following up on the accepted answer, when the array elements are a complex type it can be difficult to define it by hand (e.g with large structs).
To do it automatically I wrote the following helper method:
def explodeOuter(df: Dataset[Row], columnsToExplode: List[String]) = {
val arrayFields = df.schema.fields
.map(field => field.name -> field.dataType)
.collect { case (name: String, type: ArrayType) => (name, type.asInstanceOf[ArrayType])}
.toMap
columnsToExplode.foldLeft(df) { (dataFrame, arrayCol) =>
dataFrame.withColumn(arrayCol, explode(when(size(col(arrayCol)) =!= 0, col(arrayCol))
.otherwise(array(lit(null).cast(arrayFields(arrayCol).elementType)))))
}
Edit: it seems that spark 2.2 and newer have this built in.
To handle empty map type column: for Spark <= 2.1
List((1, Array(2, 3, 4), Map(1 -> "a")),
(2, Array(5, 6, 7), Map(2 -> "b")),
(3, Array[Int](), Map[Int, String]())).toDF("col1", "col2", "col3").show()
df.select('col1, explode(when(size(map_keys('col3)) === 0, map(lit("null"), lit("null"))).
otherwise('col3))).show()
from pyspark.sql.functions import *
def flatten_df(nested_df):
flat_cols = [c[0] for c in nested_df.dtypes if c[1][:6] != 'struct']
nested_cols = [c[0] for c in nested_df.dtypes if c[1][:6] == 'struct']
flat_df = nested_df.select(flat_cols +
[col(nc + '.' + c).alias(nc + '_' + c)
for nc in nested_cols
for c in nested_df.select(nc + '.*').columns])
print("flatten_df_count :", flat_df.count())
return flat_df
def explode_df(nested_df):
flat_cols = [c[0] for c in nested_df.dtypes if c[1][:6] != 'struct' and c[1][:5] != 'array']
array_cols = [c[0] for c in nested_df.dtypes if c[1][:5] == 'array']
for array_col in array_cols:
schema = new_df.select(array_col).dtypes[0][1]
nested_df = nested_df.withColumn(array_col, when(col(array_col).isNotNull(), col(array_col)).otherwise(array(lit(None)).cast(schema)))
nested_df = nested_df.withColumn("tmp", arrays_zip(*array_cols)).withColumn("tmp", explode("tmp")).select([col("tmp."+c).alias(c) for c in array_cols] + flat_cols)
print("explode_dfs_count :", nested_df.count())
return nested_df
new_df = flatten_df(myDf)
while True:
array_cols = [c[0] for c in new_df.dtypes if c[1][:5] == 'array']
if len(array_cols):
new_df = flatten_df(explode_df(new_df))
else:
break
new_df.printSchema()
Used arrays_zip and explode to do it faster and address the null issue.

How to filter a map<String, Int> in a data frame : Spark / Scala

I am trying to get the count individual column to publish metrics. I have a I have a df [customerId : string, totalRent : bigint, totalPurchase: bigint, itemTypeCounts: map<string, int> ]
Right now I am doing :
val totalCustomers = df.count
val totalPurchaseCount = df.filter("totalPurchase > 0").count
val totalRentCount = df.filter("totalRent > 0").count
publishMetrics("Total Customer", totalCustomers )
publishMetrics("Total Purchase", totalPurchaseCount )
publishMetrics("Total Rent", totalRentCount )
publishMetrics("Percentage of Rent", percentage(totalRentCount, totalCustomers) )
publishMetrics("Percentage of Purchase", percentage(totalPurchaseCount, totalCustomers) )
private def percentageCalc(num: Long, denom: Long): Double = {
val numD: Long = num
val denomD: Long = denom
return if (denomD == 0.0) 0.0
else (numD / denomD) * 100
}
But I am not sure how do I do this for itemTypeCounts which is a map. I want count and percentage based on each key entry. The issue is the key value is dynamic , I mean there is no way I know the key value before hand. Can some one tell me how do get count for each key values. I am new to scala/spark, any other efficient approaches to get the counts of each columns are much appreciated.
Sample data :
customerId : 1
totalPurchase : 17
totalRent : 0
itemTypeCounts : {"TV" : 4, "Blender" : 2}
customerId : 2
totalPurchase : 1
totalRent : 1
itemTypeCounts : {"Cloths" : 4}
customerId : 3
totalPurchase : 0
totalRent : 10
itemTypeCounts : {"TV" : 4}
So the output is :
totalCustomer : 3
totalPurchaseCount : 2 (2 customers with totalPurchase > 0)
totalRent : 2 (2 customers with totalRent > 0)
itemTypeCounts_TV : 2
itemTypeCounts_Cloths : 1
itemTypeCounts_Blender : 1
You can accomplish this in Spark SQL, I show two examples of this below (one where the keys are known and can be enumerated in code, one where the keys are unknown). Note that by using Spark SQL, you take advantage of the catalyst optimizer, and this will run very efficiently:
val data = List((1,17,0,Map("TV" -> 4, "Blender" -> 2)),(2,1,1,Map("Cloths" -> 4)),(3,0,10,Map("TV" -> 4)))
val df = data.toDF("customerId","totalPurchase","totalRent","itemTypeCounts")
//Only good if you can enumerate the keys
def countMapKey(name:String) = {
count(when($"itemTypeCounts".getItem(name).isNotNull,lit(1))).as(s"itemTypeCounts_$name")
}
val keysToCount = List("TV","Blender","Cloths").map(key => countMapKey(key))
df.select(keysToCount :_*).show
+-----------------+----------------------+---------------------+
|itemTypeCounts_TV|itemTypeCounts_Blender|itemTypeCounts_Cloths|
+-----------------+----------------------+---------------------+
| 2| 1| 1|
+-----------------+----------------------+---------------------+
//More generic
val pivotData = df.select(explode(col("itemTypeCounts"))).groupBy(lit(1).as("tmp")).pivot("key").count.drop("tmp")
val renameStatement = pivotData.columns.map(name => col(name).as(s"itemTypeCounts_$name"))
pivotData.select(renameStatement :_*).show
+----------------------+---------------------+-----------------+
|itemTypeCounts_Blender|itemTypeCounts_Cloths|itemTypeCounts_TV|
+----------------------+---------------------+-----------------+
| 1| 1| 2|
+----------------------+---------------------+-----------------+
I'm a spark newbie myself, so there is probably a better way to do this. But one thing you could try is transforming the itemTypeCounts into a data structure in scala that you could work with. I converted each row to a List of (Name, Count) pairs e.g. List((Blender,2), (TV,4)).
With this you can have a List of such list of pairs, one list of pairs for each row. In your example, this will be a List of 3 elements:
List(
List((Blender,2), (TV,4)),
List((Cloths,4)),
List((TV,4))
)
Once you have this structure, transforming it to a desired output is standard scala.
Worked example is below:
val itemTypeCounts = df.select("itemTypeCounts")
//Build List of List of Pairs as suggested above
val itemsList = itemTypeCounts.collect().map {
row =>
val values = row.getStruct(0).mkString("",",","").split(",")
val fields = row.schema.head.dataType.asInstanceOf[StructType].map(s => s.name).toList
fields.zip(values).filter(p => p._2 != "null")
}.toList
// Build a summary map for the list constructed above
def itemTypeCountsSummary(frames: List[List[(String, String)]], summary: Map[String, Int]) : Map[String, Int] = frames match {
case Nil => summary
case _ => itemTypeCountsSummary(frames.tail, merge(frames.head, summary))
}
//helper method for the summary map.
def merge(head: List[(String, String)], summary: Map[String, Int]): Map[String, Int] = {
val headMap = head.toMap.map(e => ("itemTypeCounts_" + e._1, 1))
val updatedSummary = summary.map{e => if(headMap.contains(e._1)) (e._1, e._2 + 1) else e}
updatedSummary ++ headMap.filter(e => !updatedSummary.contains(e._1))
}
val summaryMap = itemTypeCountsSummary(itemsList, Map())
summaryMap.foreach(e => println(e._1 + ": " + e._2 ))
Output:
itemTypeCounts_Blender: 1
itemTypeCounts_TV: 2
itemTypeCounts_Cloths: 1
Borrowing the input from Nick and using spark-sql pivot solution:
val data = List((1,17,0,Map("TV" -> 4, "Blender" -> 2)),(2,1,1,Map("Cloths" -> 4)),(3,0,10,Map("TV" -> 4)))
val df = data.toDF("customerId","totalPurchase","totalRent","itemTypeCounts")
df.show(false)
df.createOrReplaceTempView("df")
+----------+-------------+---------+-----------------------+
|customerId|totalPurchase|totalRent|itemTypeCounts |
+----------+-------------+---------+-----------------------+
|1 |17 |0 |[TV -> 4, Blender -> 2]|
|2 |1 |1 |[Cloths -> 4] |
|3 |0 |10 |[TV -> 4] |
+----------+-------------+---------+-----------------------+
Assuming that we know the distinct itemType beforehand, we can use
val dfr = spark.sql("""
select * from (
select explode(itemTypeCounts) itemTypeCounts from (
select flatten(collect_list(map_keys(itemTypeCounts))) itemTypeCounts from df
) ) t
pivot ( count(itemTypeCounts) as c3
for itemTypeCounts in ('TV' ,'Blender' ,'Cloths') )
""")
dfr.show(false)
+---+-------+------+
|TV |Blender|Cloths|
+---+-------+------+
|2 |1 |1 |
+---+-------+------+
For renaming columns,
dfr.select(dfr.columns.map( x => col(x).alias("itemTypeCounts_" + x )):_* ).show(false)
+-----------------+----------------------+---------------------+
|itemTypeCounts_TV|itemTypeCounts_Blender|itemTypeCounts_Cloths|
+-----------------+----------------------+---------------------+
|2 |1 |1 |
+-----------------+----------------------+---------------------+
To get the distinct itemType dynamically and pass it to pivot
val item_count_arr = spark.sql(""" select array_distinct(flatten(collect_list(map_keys(itemTypeCounts)))) itemTypeCounts from df """).as[Array[String]].first
item_count_arr: Array[String] = Array(TV, Blender, Cloths)
spark.sql(s"""
select * from (
select explode(itemTypeCounts) itemTypeCounts from (
select flatten(collect_list(map_keys(itemTypeCounts))) itemTypeCounts from df
) ) t
pivot ( count(itemTypeCounts) as c3
for itemTypeCounts in (${item_count_arr.map(c => "'"+c+"'").mkString(",")}) )
""").show(false)
+---+-------+------+
|TV |Blender|Cloths|
+---+-------+------+
|2 |1 |1 |
+---+-------+------+