splitting spark column into new columns based on fields in array<string>

splitting spark column into new columns based on fields in array<string> - scala

I have a spark 1.6 dataframe which is of type array<string>. The column has key, value pairs. I would like to flatten the column and use the keys to make new columns with their values.
Here is what some the rows in my dataframe look like:
[{"sequence":192,"id":8697413670252052,"type":["AimLowEvent","DiscreteEvent"],"time":527638582195}]
[{"sequence":194,"id":8702167944035041,"sessionId":8697340571921940,"type":["SessionCanceled","SessionEnded"],"time":527780267698,"duration":143863999}, {"sequence":1,"id":8697340571921940,"source":"iOS","schema":{"name":"netflixApp","version":"1.8.0"},"type":["Log","Session"],"time":527636403699}, 1]
I can use concat_ws to flatten the array, but how would I make use new columns based on the data ?
EDIT:
removed

Related

Extract info from some cloumns, store into dataframe after flattening up

I have one dataframe, my use case is to extract info from some columns do flattening up based on some column type and store into dataframe what is the efficient way to do and how ?
Ex: We have one dataframe which have lets say 5 columns for ex: a(string),b(string),c(string),d(json value),e(string). in transformation i wanted to extract or i can say flattened some records from d column (from json value). and in our resultant dataframe each value from json will become one row.

Creating Separate Spark dataframe from existing arraytype column

I have a spark dataframe as
with schema
StructType(structField("a",IntegerType,False),structField("b",IntegerType,False),structField("c",ArrayType(structType(structField("d",IntegerType,False),structField("e",IntegerType,False)))
I want to create a separate dataframe from column "c" which is of array type.
Desired output format is

Try this-
df.selectExpr("a", "b", "inline_outer(c)").show()

How to find if one column is contained in another array column in pyspark

I have 2 columns with the following schema in a pyspark dataframe
('pattern', 'array<struct<pattern:string,positions:array<int>>>')
('distinct_patterns', 'array<array<struct<pattern:string,positions:array<int>>>>')
I want to find those rows where pattern is there in distinct patterns

Convert two seperate columns of type org.apache.spark.sql.Column into a dataframe of two columns in scala

I have two columns of type org.apache.spark.sql.Column
I need to create a dataframe from these two columns such that the dataframe looks like [col1,col2] . Both columns contains data of double type.
Any suggestions on how to create the dataframe.

Add list as column to Dataframe in pyspark

I have a list of integers and a sqlcontext dataframe with the number of rows equal to the length of the list. I want to add the list as a column to this dataframe maintaining the order. I feel like this should be really simple but I can't find an elegant solution.

You cannot simply add a list as a dataframe column since list is local object and dataframe is distirbuted. You can try one of thw followin approaches:
convert dataframe to local by collect() or toLocalIterator() and for each row add corresponding value from the list OR
convert list to dataframe adding an extra column (with keys from dataframe) and then join them both

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

splitting spark column into new columns based on fields in array<string> - scala

Related

Extract info from some cloumns, store into dataframe after flattening up

Creating Separate Spark dataframe from existing arraytype column

How to find if one column is contained in another array column in pyspark

Convert two seperate columns of type org.apache.spark.sql.Column into a dataframe of two columns in scala

Add list as column to Dataframe in pyspark

Categories

Resources