How to convert string into a date format in spark

How to convert string into a date format in spark - scala

I have passed a string (datestr) to a function (that do ETL on a dataframe in spark using scala API) however at some point I need to filter the dataframe by a certain date
something like :
df.filter(col("dt_adpublished_simple") === date_add(datestr, -8))
where datestr is the parameter that I passed to the function.
Unfortunately, the function date_add requires a column type as a first param.
Can anyone help me with how to convert the param into a column or a similar solution that will solve the issue?

You probably only need to use lit to create a String Column from your input String. And then, use to_date to create a Date Column from the previous one.
df.filter(col("dt_adpublished_simple") === date_add(to_date(lit(datestr), format), -8))

Related

pySpark Timestamp as String to DateTime

I read from a CSV where column time contains a timestamp with miliseconds '1414250523582'
When I use TimestampType in schema it returnns NULL.
The only way it ready my data is to use StringType.
Now I need this value to be a Datetime for forther processing.
First I god rid of the to long timestamp with this:
df2 = df.withColumn("date", col("time")[0:10].cast(IntegerType()))
a schema checks says its a integer now.
now i try to make it a datetime with
df3 = df2.withColumn("date", datetime.fromtimestamp(col("time")))
it returns
TypeError: an integer is required (got type Column)
when I google people always just use col("x") to read and transform data, so what do I make wrong here?

The schema checks are a bit tricky; the data in that column may be pyspark.sql.types.IntegerType, but that is not equivalent to Python's int type. The col function returns a pyspark.sql.column.Column object, which often do not play nicely with vanilla Python functions like datetime.fromtimestamp. This explains the TypeError. Even though the "date" data in the actual rows is an integer, col doesn't allow you to access it as an integer to feed into a python function quite so simply. To apply arbitrary Python code to that integer value, you can compile a udf pretty easily, but in this case, pyspark.sql.functions already has a solution for your unix timestamp. Try this: df3 = df2.withColumn("date", from_unixtime(col("time"))), and you should see a nice date in 2014 for your example.
Small note: This "date" column will be of StringType.

How would I convert spark scala dataframe column to datetime?

Say I have a dataframe with two columns, both that need to be converted to datetime format. However, the current formatting of the columns varies from row to row, and when I apply to to_date method, I get all nulls returned.
Here's a screenshot of the format....
the code I tried is...
date_subset.select(col("InsertDate"),to_date(col("InsertDate")).as("to_date")).show()
which returned

Your datetime is not in the default format, so you should give the format.
to_date(col("InsertDate"), "MM/dd/yyyy HH:mm")
I don't know which one is month and date, but you can do that in this way.

parse date time from string in spotfire

one column in my csv file is a date that is read as a string and it follows this pattern : 2018-09-19 10:27:28.409Z. I am struggling to convert the column from string to date.

The conversion options in spotfire didn't allow me to change the column type. however, I found the solution, at the moment of importing the data set (file) you need to specify the type (date time) and magically spotfire manages the conversion.

Scala: How to split the column values?

I am processing the data in Spark shell and have a dataframe with a date column. The format of the column is like "2017-05-01 00:00:00.0", but I want to change all the values to "2017-05-01" without the "00:00:00.0".
Thanks!

Just use String.split():
"2017-05-01 00:00:00.0".split(" ")(0)

Neo4j - Date conversion

I'm using Neo4j database.Neo4j does not have date data type only have timestamp data type.
I need to compare current date with existing date using cql query.
My existing date format is "8/4/2011" that is string.
Then how can I compare it.Any way to use stored procedure [date] while csv bulk data import time.
I used APOC stored procedure but I don't know how compare it.
CALL apoc.date.format(timestamp(),"ms","dd.MM.yyyy")
07.07.2016
CALL apoc.date.parse("13.01.1975 19:00","s","dd.MM.yyyy HH:mm")
158871600
I expect like this
MATCH(dst:Distributor) WHERE dst.DIST_ID = "111137401" WITH dst CALL apoc.date.parse(dst.ENTRY_DATE,'s', 'dd/MM/yyyy') YIELD d SET dst.ENTRY_DATE = d RETURN dst;
Any possibilities please help me...

RETURN datetime("2018-06-04T10:58:30.007Z").epochMillis
1528109910007

Right query is :
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///DST.csv" AS row
CALL apoc.date.parse(toString(row.ENTRY_DATE),"ms","dd-MMM-yy") YIELD value as date CREATE (DST:Distributor {ENTRY_DATE: date })

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to convert string into a date format in spark - scala

You probably only need to use lit to create a String Column from your input String. And then, use to_date to create a Date Column from the previous one. df.filter(col("dt_adpublished_simple") === date_add(to_date(lit(datestr), format), -8))

Related

pySpark Timestamp as String to DateTime

How would I convert spark scala dataframe column to datetime?

parse date time from string in spotfire

Scala: How to split the column values?

Neo4j - Date conversion

Categories

Resources