Hortonworks 2.5 ACID Insert in hive hangs

Hortonworks 2.5 ACID Insert in hive hangs - hiveql

I am using HDP 2.5. I want to perform a delete operation from a Hive table. I have created a table with the following command:
hive> create table test (x int, y string) clustered by (x) into 2 buckets stored as ORC tblproperties ("transactional" = "true");
OK
Time taken: 0.148 seconds
Further, I have set the following Hive properties:
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1;
set hive.optimize.sort.dynamic.partition=false;
Then on, I am doing a test insert into the hive table leveraging the following command:
INSERT INTO TEST VALUES (1,a);
Unfortunately my Hive CLI shell is getting hung and I have to issue Ctrl + C command to get out of the shell. May I know why is this happening please? Is it due to the fact that my Hive schema / database contains a mixture of ACID and non-ACID tables ? Any suggestion to resolve this problem will be very helpful.
Thanks in advance !

Related

Flink Table JDBC lookup.cache properties and related properties does not working on streaming environment

When a SQL query is triggered on Streaming environment with joining both Streaming data and jdbc table, jdbc table related task finished immediately after reading all table records. When I add properties for jdbc table as lookup.cache , lookup.partial-cache.max-rows, lookup.partial-cache.expire-after-write, It will not affect on task lifecycle. It means lookup.cache mechanizm not working as expected.
I have created table as
CREATE TABLE U_ZRB_C_RISKLI_MCC_0 (RISKLIMCC STRING ,PRIMARY KEY (RISKLIMCC) NOT ENFORCED)
WITH ('connector' = 'jdbc'
,'url' = 'jdbc:oracle:thin:#localhost:1521/orcl'
,'table-name' = 'U_ZRB_C_RISKLI_MCC_0'
,'driver' = 'oracle.jdbc.OracleDriver'
,'username' = 'username'
,'password' = 'password'
,'lookup.cache'='PARTIAL'
,'lookup.partial-cache.expire-after-write'='10s')
related job stops after reading table records

Thanks to #Xianxun Ye , When I added " FOR SYSTEM_TIME AS OF" clause after table name , lookup functionality runs as expected as defined at link https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#lookup-join
I think flink documentation page
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/jdbc/#lookup-cache
should refer the the lookup-join page also.

How to execute a update query in spark sql temp tables

I am trying the below code but it is throwing some random error that I am unable to understand:
df.registerTempTable("Temp_table")
spark.sql("Update Temp_table set column_a='1'")

Currently spark sql does not support UPDATE statments. The workaround is to use create a delta lake / iceberg table using your spark dataframe and execute you sql query directly on this table.
For iceberg implementation refer to :
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-iceberg.html

How to create a temporary table in snowflake based on pyspark dataframe

I can read the snowflake table in pyspark dataframe using sqlContext
sql = f"""select * from table1""";
df = sqlContext.read
.format(SNOWFLAKE_SOURCE_NAME)
.options(**snowflake_options)
.option("query", sql)
.load()
How do I create a temporary table in snowflake (using pyspark code) and insert values from this pyspark dataframe (df)?

just save as usual, with snowflake format
snowflake_options = {
...
'sfDatabase': 'dbabc',
'dbtable': 'tablexyz',
...
}
(df
.write
.format(SNOWFLAKE_SOURCE_NAME)
.options(**snowflake_options)
.save()
)

I don't believe this can be done. At least not the way you want.
You can, technically, create a temporary table; but persisting it is something that I have had a great deal of difficulty finding how to do (i.e. I haven't). If you run the following:
spark.sparkContext._jvm.net.snowflake.spark.snowflake.Utils.runQuery(snowflake_options, 'create temporary table tmp_table (id int, value text)')
you'll notice that it successfully returns a java object indicating the temp table was created successfully; but once you try and run any further statements on it, you'll get nasty errors that mean it no longer exists. Somehow we mere mortals would need to find a way to access and persist the Snowflake session through the jvm api. That being said, I also think that would run contrary to the Spark paradigm.
If you really need the special-case performance boost of running transformations on Snowflake instead of bringing it all into Spark, just keep everything in Snowflake to begin with by either
Using CTEs in the query, or
Using the runQuery api described above to create "temporary" permanent/transient tables and designing Snowflake queries that insert directly to those and then clean them up (DROP them) when you are done.

Simple Select query is running more than 5hours db2

We have a select query as below . Query to fetch the data is running more than 5 hours.
select ColumnA,
ColumnB,
ColumnC,
ColumnD,
ColumnE
from Table
where CodeN <> 'Z'
Is there any way we can collect stats or any other way to improve performance .. ?
And in DB2 do we have any table where we can check whether collect stats are collected on the below table..

The RUNSTATS command collects table & indexes statistics. Note, that this is a Db2 command and not an SQL statement, so you may either run it with Db2 Command Line Processor (CLP) or using relational interface with a special Stored Procedure, which is able to run such commands:
RUNSTATS command using the ADMIN_CMD procedure.
Statistics is stored in the SYSSTAT schema views. Refer to Road map to the catalog views - Table 2. Road map to the updatable catalog views.

How many rows exist in table?
and not equal operator '<>' not indexable predicates

Postgres table partition range by expression

I'm new to table partition in Postgres and I was trying something which didn't seem to work. Here's a simplified version of the problem:
cREATE table if not exists Location1
PARTITION OF "Location"
FOR VALUES FROM
(extract(epoch from now()::timestamp))
TO
(extract(epoch from now()::timestamp)) ;
insert into "Location" ("BuyerId", "Location")
values (416177285, point(-100, 100));
It fails with:
ERROR: syntax error at or near "extract"
LINE 2: FOR VALUES FROM (extract(epoch from now()::timestamp))
If I try to create the partition using literal values, it works fine.
I'm using Postgres 10 on linux.

After a lot of trial and error, I realized that Postgres 10 doesn't really have a full partitioning system out of the box. For a usable partitioned database, the following postgres modules are necessary to be installed:
pg_partman https://github.com/pgpartman/pg_partman (literally a gift from God)
pg_jobmon https://github.com/omniti-labs/pg_jobmon (to monitor what pg_partman is doing)
The above advice is for those new to the scene, to help you avoid a lot of headaches.
I assumed that automatic creation of partitions done by postgres-core was a no-brainer, but evidently the postgres devs know something I don't?
If any relational database from the 80's is asked to insert a row, it succeeds after passing the constraint checks.
The latest version of Postgres 2018 is asked to insert a row: "Sorry, I don't know where to put it."
It's a good first effort, but hopefully Postgres 11 will have a proper partitioning system OOTB...

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Hortonworks 2.5 ACID Insert in hive hangs - hiveql

Related

Flink Table JDBC lookup.cache properties and related properties does not working on streaming environment

How to execute a update query in spark sql temp tables

How to create a temporary table in snowflake based on pyspark dataframe

Simple Select query is running more than 5hours db2

Postgres table partition range by expression

Categories

Resources