Pyspark: Getting current_timestamp in dynamic hive query - pyspark

I am preparing Spark with python program which inserts data from 2 tables based on joins. The last column of the target table has a timestamp field which will have the value of create timestamp.
I tried current_timestamp and from_unixtime(unix_timestamp()). Both the functions does not seem to work. I tried now().
e.g., HiveContext(sc).sql("SELECT " + from_unixtime(unix_timestamp()) + " ")
This statement errors in pyspark with "NameError: name 'from_unixtime' is not defined" I have imported the pyspark.sql.function
Is there a way to insert timestamp value to the target table? My query contains insert with select from 2 tables which I am running in HiveContext.
Thank in advance!!!

Used within double quotes as below and it worked:
HiveContext(sc).sql("SELECT from_unixtime(unix_timestamp())")

Related

How to perform datediff using derived column expression in Azure DataFactory

I have a query in Sql which gives the result of activitystarttime and activityendtime in sec.
Below is the query in sql
DATEDIFF_BIG(second,ActivityStartTime, ActivityEndTime) as [DiffInTime]
I have to write the same using derived column expression.
Create a new column in Derived Column's Settings and write the below expression in Expression Field for this column. The new column will have the difference in seconds.
toInteger(ActiveEndTime-ActiveStartTime)/1000
Note: Make sure that ActiveEndTime and ActiveStartTime should be in timestamp format.

Snowflake : Unsupported subquery type cannot be evaluated

I am using snowflake as a data warehouse. I have a CSV file at AWS S3. I am writing a merge sql to merge data received in CSV to the table in snowflake. I have a column in time dimension table with data type as Number(38,0) data type in SF. This table holds all dates time, one e.g. is of column
time_id= 232 and time=12:00
In CSV I am getting a column with the label as time and value as 12:00.
In merge sql I am fetching this value and trying to get time_id for it.
update table_name set start_time_dim_id = (select time_id from time_dim t where t.time_name = csv_data.start_time_dim_id)
On this statement I am getting this error "SQL compilation error: Unsupported subquery type cannot be evaluated"
I am struggling to solve it, during this I google for it and got one reference for it
https://github.com/snowflakedb/snowflake-connector-python/issues/251
So want to make sure if anyone have encountered this issue? If yes, will appreciate pointers over it.
It seems like a conversion issue. I suggest you to check the data in CSV file. Maybe there is a wrong or missing value. Please check your data, and make sure it returns numeric values
create table simpleone ( id number );
insert into simpleone values ( True );
The last statement fails with:
SQL compilation error: Expression type does not match column data type, expecting NUMBER(38,0) but got BOOLEAN for column ID
If you provide sample data, and SQL to produce this error, maybe we can provide a solution.
unfortunately correlated and nested subqueries in Snowflake are a bit limited at this stage.
I would try running something like this:
update table_name
set start_time_dim_id= time_id
from time_dim
where t.time_name=csv_data.start_time_dim_id

Setting date format parameter on a sqoop-import job

I am having trouble casting a date column to a string using sqoop-import from an oracle database to an HDFS parquet file. I am using the following:
sqoop-import -Doraoop.oracle.session.initialization.statements="alter session set nls_date_format='YYYYMMDD'"
My understanding is that this should execute the above statement before it begins transferring data. I have also tried
-Duser.nls_date_format="YYYYMMDD"
But this doesn't work either, the resulting parquet file still contains the original date format as listed in the table. If it matters, I am running these in a bash script and also casting the same date columns to string using --map-column-java "MY_DATE_COL_NAME=String"What am I doing wrong?
Thanks very much.
Source: SqoopUserGuide
Oracle JDBC represents DATE and TIME SQL types as TIMESTAMP values. Any DATE columns in an Oracle database will be imported as a TIMESTAMP in Sqoop, and Sqoop-generated code will store these values in java.sql.Timestamp fields.
You can try casting date to String while importing within the query.
For Example
sqoop import -- query 'select col1, col2, ..., TO_CHAR(MY_DATE_COL_NAME, 'YYYY-MM-DD') FROM TableName WHERE $CONDITIONS'

hive insert current date into a table using date function errors

I have to insert current date (timestamp) in a table via hive query. The query is failing for some reason. Can someone please help me out.
CREATE EXTERNAL TABLE IF NOT EXISTS dataFlagTest(
date string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://bckt1/hive_test/dateFlag/';
Now To insert into it, I run following query :
INSERT OVERWRITE TABLE dataFlagTest
SELECT from_unixtime(unix_timestamp()) ;
It failed with the following error :
FAILED: NullPointerException null
Can someone please help me out
Solution is you have to do select from a table. You cannot run select without from clause.
So, create a sample table with 1 row or use an existing table like below :
Insert OVERWRITE TABLE dataflagtest SELECT from_unixtime(unix_timestamp()) as date FROM EXISTING_TABLE TABLESAMPLE(1 ROWS);

Insert date array in PostgreSQL table

I have a problem inserting a dynamic array with dates into a table. I'm working with Python 3.3 and using the package psycopg2 to communicate with a Postgres 9.3 database.
I create the table with following statement:
CREATE TABLE Test( id serial PRIMARY KEY, listdate DATE[] )
En easy example is a list of two dates. Let us assume the list would be dateList = ['2014-07-07','2014-07-08'].
Now I want to insert the complete list into the table. If I try the static version:
INSERT INTO Test(dateList[1],dateList[2]) VALUES(date '2014-07-07',date '2014-07-08')"
the inserting is no problem. But in the reality my list has a dynamic number of dates (at least 100) so the static version is not useful.
I tried different approaches like:
INSERT INTO Test VALUES(array" + str(dateList) +"::date[])
INSERT INTO Test VALUES(array date '" + str(dateList) +"')
INSERT INTO Test VALUES(date array" + str(dateList) +")
but nothing is successful. Maybe the problem is between the two prefixes date and array.
Any ideas for a simple SQL statement without an SQL function?
The other, somewhat simpler option is to use the string representation of a date array:
INSERT INTO test(listdate)
VALUES ('{2014-07-07,2014-07-08}')
You can add an explicit cast, but that's not required:
INSERT INTO test(listdate)
VALUES ('{2014-07-07,2014-07-08}'::date[])
Your variable would be: dateList = '{2014-07-07,2014-07-08}' ..