Scala: How to split the column values? - scala

I am processing the data in Spark shell and have a dataframe with a date column. The format of the column is like "2017-05-01 00:00:00.0", but I want to change all the values to "2017-05-01" without the "00:00:00.0".
Thanks!

Just use String.split():
"2017-05-01 00:00:00.0".split(" ")(0)

Related

How would I convert spark scala dataframe column to datetime?

Say I have a dataframe with two columns, both that need to be converted to datetime format. However, the current formatting of the columns varies from row to row, and when I apply to to_date method, I get all nulls returned.
Here's a screenshot of the format....
the code I tried is...
date_subset.select(col("InsertDate"),to_date(col("InsertDate")).as("to_date")).show()
which returned
Your datetime is not in the default format, so you should give the format.
to_date(col("InsertDate"), "MM/dd/yyyy HH:mm")
I don't know which one is month and date, but you can do that in this way.

How to convert string into a date format in spark

I have passed a string (datestr) to a function (that do ETL on a dataframe in spark using scala API) however at some point I need to filter the dataframe by a certain date
something like :
df.filter(col("dt_adpublished_simple") === date_add(datestr, -8))
where datestr is the parameter that I passed to the function.
Unfortunately, the function date_add requires a column type as a first param.
Can anyone help me with how to convert the param into a column or a similar solution that will solve the issue?
You probably only need to use lit to create a String Column from your input String. And then, use to_date to create a Date Column from the previous one.
df.filter(col("dt_adpublished_simple") === date_add(to_date(lit(datestr), format), -8))

pyspark converting unix time to date

I am using the following code to convert a column of unix time values into dates in pyspark:
transactions3=transactions2.withColumn('date', transactions2['time'].cast('date'))
The column transactions2['time'] contains the unix time values. However the column date which I create here has no values in it (date = None for all rows). Any idea why this would be?
Use from_unixtime. expr("from_unixtime(timeval)")

Create a Spark Dataframe on Time

quick question that I did not find on google.
What is the best way to create a Spark Dataframe with time stamps
lets say I have startpoint endpoint and a 15 min interval. What would be the best way to resolve this on spark ?
try SparkSQL inside udf like this DF.withColumn("regTime",current_timestamp) ,it's can replace a column ,
you also can convert a String to date or timestamp use DF.withColumn("regTime", to_date(spark_temp_user_day("dateTime")))
regTime is a column of DF

Date Format Conversion in Hive

I'm very new to sql/hive. At first, I loaded a txt file into hive using:
drop table if exists Tran_data;
create table Tran_data(tran_time string,
resort string, settled double)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';
Load data local inpath 'C:\Users\me\Documents\transaction_data.txt' into table Tran_Data;
The variable tran_time in the txt file is like this:10-APR-2014 15:01. After loading this Tran_data table, I tried to convert tran_time to a "standard" format so that I can join this table to another table using tran_time as the join key. The date format desired is 'yyyymmdd'. I searched online resources, and found this: unix_timestamp(substr(tran_time,1,11),'dd-MMM-yyyy')
So essentially, I'm doing this: unix_timestamp('10-APR-2014','dd-MMM-yyyy'). However, the output is "NULL".
So my question is: how to convert the date format to a "standard" format, and then further convert it to 'yyyymmdd' format?
from_unixtime(unix_timestamp('20150101' ,'yyyyMMdd'), 'yyyy-MM-dd')
My current Hive Version: Hive 0.12.0-cdh5.1.5
I converted datetime in first column to date in second column using the below hive date functions. Hope this helps!
select inp_dt, from_unixtime(unix_timestamp(substr(inp_dt,0,11),'dd-MMM-yyyy')) as todateformat from table;
inp_dt todateformat
12-Mar-2015 07:24:55 2015-03-12 00:00:00
unix_timestamp function will convert given string date format to unix timestamp in seconds , but not like this format dd-mm-yyyy.
You need to write your own custom udf to convert a given string date to the format that you need as present Hive do not have any predefined functions. We have to_date function to convert a timestamp to date , remaining all unix_timestamp functions won't help your problem.
select from_unixtime(unix_timestamp('01032018' ,'MMddyyyy'), 'yyyyMMdd');
input format: mmddyyyy
01032018
output after query: yyyymmdd
20180103
To help someone in the future:
The following function should work as it worked in my case
to_date(from_unixtime(UNIX_TIMESTAMP('10-APR-2014','dd-MMM-yyyy'))
unix_timestamp('2014-05-01','dd-mmm-yyyy') will work, your input string should be in this format for hive yyyy-mm-dd or yyyy-mm-dd hh:mm:ss
Where as you are trying with '01-MAY-2014' hive won't understand it as a date string