Setting date format parameter on a sqoop-import job - date

I am having trouble casting a date column to a string using sqoop-import from an oracle database to an HDFS parquet file. I am using the following:
sqoop-import -Doraoop.oracle.session.initialization.statements="alter session set nls_date_format='YYYYMMDD'"
My understanding is that this should execute the above statement before it begins transferring data. I have also tried
-Duser.nls_date_format="YYYYMMDD"
But this doesn't work either, the resulting parquet file still contains the original date format as listed in the table. If it matters, I am running these in a bash script and also casting the same date columns to string using --map-column-java "MY_DATE_COL_NAME=String"What am I doing wrong?
Thanks very much.

Source: SqoopUserGuide
Oracle JDBC represents DATE and TIME SQL types as TIMESTAMP values. Any DATE columns in an Oracle database will be imported as a TIMESTAMP in Sqoop, and Sqoop-generated code will store these values in java.sql.Timestamp fields.
You can try casting date to String while importing within the query.
For Example
sqoop import -- query 'select col1, col2, ..., TO_CHAR(MY_DATE_COL_NAME, 'YYYY-MM-DD') FROM TableName WHERE $CONDITIONS'

Related

Insert data to postgres table

I'm Trying to insert the data from csv file which was exported from Oracle DB. when I try to import on PGadmin. its failing with below error.
ERROR: invalid input syntax for type timestamp: "29-APR-18
12.04.07.000000000 AM" CONTEXT: COPY consolidated_dtls_job_log, line 1, column start_time: "29-APR-18 12.04.07.000000000 AM"
Note: Column Start_time is created with timestamp datatype.
Use a different NLS_TIMESTAMP_TZ_FORMAT when exporting the data from Oracle; something that is closer to the ISO format.
Here is an SQL statement provided by Belayer:
ALTER SESSION SET NLS_TIMESTAMP_TZ_FORMAT = 'YYYY-MM-DD HH:MI:SS.FF TZH:TZM';

Casting String type to Unix Date Amazon Athena

I'm looking to get a result in Amazon Athena were I can count the quantity of users created by day (or maybe by month)
But previous that I have to convert the unix timestamp to another date format. And this is where i fail.
My last goal is to convert this type of timestamp:
1531888605109
In something like:
2018-07-18
According to Epoch Converter
But when I try to apply the solution i saw in this quiestion: Casting unix time to date in Presto
I got the error:
[Simba]AthenaJDBC An error has been thrown from the AWS Athena client. SYNTAX_ERROR: line 1:13: Unexpected parameters (varchar) for function from_unixtime. Expected: from_unixtime(double) , from_unixtime(double, bigint, bigint) , from_unixtime(double, varchar(x)) [SQL State=HY000, DB Errorcode=100071]
This is my query:
select cast(from_unixtime(created)as date) as date_creation,
count(created)
from datalake.test
group by date_creation
Maybe I've to cast over the sctring because the data type of the field is not a date.
My table description: Link to the table description
line 1:13: Unexpected parameters (varchar) for function from_unixtime. Expected: from_unixtime(double)
This means that your timestamps -- even though they appear numeric -- are varchars.
You need to add a CAST to cast(from_unixtime(created)as date), like:
CAST(from_unixtime(CAST(created AS bigint)) AS date)
Note: When dealing with time-related data, please have in mind that https://github.com/prestosql/presto/issues/37 is not resolved yet in Presto.

PostgreSQL: Insert Date and Time of day in the same field/box?

After performing the following:
INSERT INTO times_table (start_time, end_time) VALUES (to_date('2/3/2016 12:05',
'MM/DD/YYYY HH24:MI'), to_date('2/3/2016 15:05', 'MM/DD/YYYY HH24:MI'));
PostgreSQL only displays the date.
If possible, would I have to run a separate select statement to extract the time (i.e. 12:05 and 15:05), stored in that field? Or are the times completey discard when the query gets executed.
I don't want to use timestamp, since I'd like to execute this in Oracle SQL as well.
to_date returns... a date. Surprise! So yeah, it's not going to give you the time.
You should be using the timestamp data type to store times and functions which return timestamps. So use to_timestamp.
Oracle also has a timestamp data type and to_timestamp function.
In general, trying to write one set of SQL that works with multiple databases results in either having to write very simple SQL that doesn't take advantage of any of the database's features, or madness.
Instead, use a SQL query builder to write your SQL for you, take care of compatibility issues, and allow you to add clauses to existing statements. For example, Javascript has Knex.js and Perl has SQL::Abstract.

Postgres: Error when using COPY from a CSV with timestamptz type

I am using Postgres 9.5.3(On Ubuntu 16.04) and I have a table with some timestamptz fields
...
datetime_received timestamptz NULL,
datetime_manufactured timestamptz NULL,
...
I used the following SQL command to generate CSV file:
COPY (select * from tmp_table limit 100000) TO '/tmp/aa.csv' DELIMITER ';' CSV HEADER;
and used:
COPY tmp_table FROM '/tmp/aa.csv' DELIMITER ';' CSV ENCODING 'UTF-8';
to import into the table.
The example of rows in the CSV file:
CM0030;;INV_AVAILABLE;2016-07-30 14:50:42.141+07;;2016-08-06 00:00:000+07;FAHCM00001;;123;;;;;1.000000;1.000000;;;;;;;;80000.000000;;;2016-07-30 14:59:08.959+07;2016-07-30 14:59:08.959+07;2016-07-30 14:59:08.959+07;2016-07-30 14:59:08.959+07;
But I encounter the following error when running the second command:
ERROR: invalid input syntax for type timestamp with time zone: "datetime_received"
CONTEXT: COPY inventory_item, line 1, column datetime_received: "datetime_received"
My database's timezone is:
show timezone;
TimeZone
-----------
localtime(GMT+7)
(1 row)
Is there any missing step or wrong configuration?
Any suggestions are appreciated!
The error you're seeing means that Postgres is trying (and failing) to convert the string 'datetime_received' to a timestamp value.
This is happening because COPY is trying to insert the header row into your table. You need to include a HEADER clause on the COPY FROM command, just like you did for the COPY TO.
More generally, when using COPY to move data around, you should make sure that the TO and FROM commands are using exactly the same options. Specifying ENCODING for one command and not the other can lead to errors, or silently corrupt data, if your client encoding is not UTF8.

Pyspark: Getting current_timestamp in dynamic hive query

I am preparing Spark with python program which inserts data from 2 tables based on joins. The last column of the target table has a timestamp field which will have the value of create timestamp.
I tried current_timestamp and from_unixtime(unix_timestamp()). Both the functions does not seem to work. I tried now().
e.g., HiveContext(sc).sql("SELECT " + from_unixtime(unix_timestamp()) + " ")
This statement errors in pyspark with "NameError: name 'from_unixtime' is not defined" I have imported the pyspark.sql.function
Is there a way to insert timestamp value to the target table? My query contains insert with select from 2 tables which I am running in HiveContext.
Thank in advance!!!
Used within double quotes as below and it worked:
HiveContext(sc).sql("SELECT from_unixtime(unix_timestamp())")