Hive date is showing null in elasticsearch - date

I have a hive table details with below schema
name STRING,
address STRING,
dob DATE
My dob is stored in yyyy-mm-dd format.like 1988-01-27.
I am trying to load this elastic search table . So i followed below instruction in HUE.
CREATE EXTERNAL TABLE sampletable (name STRING, address STRING, dob DATE)
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'test4/test4','es.nodes' = 'x.x.x.x:9200');
INSERT OVERWRITE TABLE sampletable SELECT * FROM details;
select * from sample table;
But DOB field shows NULL for all column. Whereas I can verify that my original hive table has data in date field.
After some research I was able to find that Elasticsearch expects data field to be in yyyy-mm-ddThh:mm:zz since my data doesn't match that it throws error. And also it mentioned, I can change the format to "strict_date" format, then it will work fine my hive date format. But I am not sure where in hive query i execute I need to metion this.
Can some one help me with this?

date type mapping to hive have some problem .
you can use hive string type mapping es date type , but you must set the config for hive table for parameter: es.mapping.date.rich , set it's value is false . like this 'es.mapping.date.rich' = 'false' , in create table statement ,it is:
CREATE EXTERNAL TABLE temp.data_index_es(
id bigint,
userId int,
createTime string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes' = 'xxxx:9200',
'es.index.auto.create' = 'false',
'es.resource' = 'abc/{_type}',
'es.mapping.date.rich' = 'false',
'es.read.metadata' = 'true',
'es.mapping.id' = 'id',
'es.mapping.names' = 'id:id, userId:userId, createTime:createTime');
refer link: Mapping and Types

Related

Select clause not showing '\u0001 char present inside string field in Hive CLI output

Facing problem while fetching data form hive table.
Input String : "\u0001d1\u0002d2\u0003"
Here \u0001 = ^A character. similarly \u0002 = ^B character ...
Inserted above string into hive table successfully. Hive DDL query is:
CREATE TABLE test_lt_snap (f1 string) PARTITIONED BY ( date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' WITH SERDEPROPERTIES ('serialization.encoding'='utf-8') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION '<file path>' TBLPROPERTIES ( 'store.charset'='utf-8', 'retrieve.charset'='utf-8');
After selecting field f1 through hive CLI i am not able to see '\u0001' char. such as:
hive (test_db) > select f1 from test_lt_snap;
output: d1d2
hive (test_db) > select f1 from test_lt_snap where f1 like '\u0001d1%';
output: d1d2
The problem with above select clause is the \u0001 char are not visible.
Is there any way we can display the chars as well ?
Thanks
Amiya

Not able to create Hive table with TIMESTAMP datatype in Azure Databricks

org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.UnsupportedOperationException: Parquet does not support
timestamp. See HIVE-6384;
Getting above error while executing following code in Azure Databricks.
spark_session.sql("""
CREATE EXTERNAL TABLE IF NOT EXISTS dev_db.processing_table
(
campaign STRING,
status STRING,
file_name STRING,
arrival_time TIMESTAMP
)
PARTITIONED BY (
Date DATE)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION "/mnt/data_analysis/pre-processed/"
""")
As per Hive-6384 Jira, Starting from Hive-1.2 you can use Timestamp,date types in parquet tables.
Workarounds for Hive < 1.2 version:
1. Using String type:
CREATE EXTERNAL TABLE IF NOT EXISTS dev_db.processing_table
(
campaign STRING,
status STRING,
file_name STRING,
arrival_time STRING
)
PARTITIONED BY (
Date STRING)
Stored as parquet
Location '/mnt/data_analysis/pre-processed/';
Then while processing you can cast arrival_time,Date cast to timestamp,date types.
Using a view and cast the columns but views are slow.
2. Using ORC format:
CREATE EXTERNAL TABLE IF NOT EXISTS dev_db.processing_table
(
campaign STRING,
status STRING,
file_name STRING,
arrival_time Timestamp
)
PARTITIONED BY (
Date date)
Stored as orc
Location '/mnt/data_analysis/pre-processed/';
ORC supports both timestamp,date type

Postgres 9.6 update jsonb column to add new attribute with a value from query

I have a table xyz, with a metadata jsonb column in postgres.
Table : xyz
column : metadata, type = jsonb
metadata = {"exceptions": {"first_exception": "first_value"} }
I want to add a new sub_attribute
desired metadata = {"exceptions": {"first_exception": "123"},{"second_exception": "234"} }
I can use the
update xyz
SET metadata = jsonb_set(metadata->'exceptions', '{second_exception}', '"234"', true).
But I want to get the value 234 from a select query. I am not able to figure how to combine the select query with the update to do this.
You can do
UPDATE xyz
SET metadata = jsonb_set(metadata, '{exceptions, second_exception}', other.value::jsonb)
FROM other
WHERE other.column = xyz.column
Pay attention that {"exceptions": {"first_exception": "123"},{"second_exception": "234"}} is not a valid json and update will give you following result {"exceptions": {"first_exception": "123", "second_exception": "234"}}

cakePHP find() returns field name instead of just the column values

I'm trying to retrieve data from my database using the find() method.
But when I use find('all') it returns the column value and column name together, like this: { "date_1": "2015-25-12"}, of course, I want only the values from the column.
If I use find('list'), it comes back empty.
This is how I'm trying to retrieve my data:
$date1 = $this->Stocks->find('all', ['fields' => ['Stocks.date_1'], 'conditions' => ['Stocks.families_id' => $id]]);
This is my table:
CREATE TABLE stocks
(
families_id integer,
date_1 date NOT NULL,
date_2 date,
created timestamp without time zone,
modified timestamp without time zone,
id serial NOT NULL,
CONSTRAINT stocks_pkey PRIMARY KEY (id)
)
Everything else is working fine, but this one part.
I'm using PostgreSQL.
Although you problem was related to displayField.
To select a single column or a set of columns you can do like this
$date1 = $this->Stocks->find()
->select(['date_1'])
->where(['families_id' => $id]);

Hive casting of MongoDb Date

I'm linking Hive to a MongoDb collection that has a date. The MongoDB collection's structure looks like this:
{
"name" : "Using Hive",
"validFrom" : ISODate("2014-11-04T00:00:00.000Z"),
"validTo" : ISODate("2016-01-30T00:00:00.000Z"),
"_id" : ObjectId("54da1c02ead8571c292901d3")
}
I'm adding it to Hive as follows:
CREATE TABLE certificate
(
name STRING,
validFrom TIMESTAMP,
validTo TIMESTAMP,
id STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/test.certificate');
When I get do a select the dates are null:
hive> select * from certificate;
OK
Using Hive NULL NULL 54da1c02ead8571c292901d3
MongoDb NULL NULL 54da1c02ead8571c292901d4
Hadoop NULL NULL 54da1c02ead8571c292901d5
I know Hive supports date casting, is that something I can do with the CREATE statement to ensure the dates are correctly cast? I'll be using queries with "where valid from date's less than today and valid to date's more than today" and such, so having those columns as dates and not strings is vital.
Thanks =D
Specify the mappings for columns validFrom and validTo. By default hive converts column names to lowercase. Please check if following works.
CREATE TABLE certificate
(
name STRING,
validfrom TIMESTAMP,
validto TIMESTAMP,
id STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","validfrom":"validFrom","validto":"validTo"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/test.certificate');