How to merge existing hourly partitions to daily partition in hive - merge

My requirement is to merge existing hourly partitions to daily partition for all days.
My partition column is like:
2019_06_22_00, 2019_06_22_01, 2019_06_22_02, 2019_06_22_03..., 2019_06_22_23 => 2019_06_22
2019_06_23_00, 2019_06_23_01, 2019_06_23_02, 2019_06_23_03..., 2019_06_23_23 => 2019_06_23

The easy way is to extract date from current partition column and load into new table.
Create new table:
create table new (
...
)
partitioned by (partition_date date);
Then insert overwrite from old table:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table new partition(partition_date )
select
col1,
col2,
...
coln,
--extract hours if you need this column
substr('old_partition_col',12,2) hour,
--partition column is the last one
date(concat_ws('-',substr(old_partition_col,1,4),substr(old_partition_col,6,2),substr(old_partition_col,9,2))) as partition_date
from old_table;
Alternatively you can extract date using unix_timestamp and from_unixtime functions:
from_unixtime(unix_timestamp(old_partition_col,'yyyy_MM_dd_HH'),'yyyy-MM-dd') as partition_date
Then drop old table and rename new.

Related

Insert into partition table postgresql

everyone!
I'm trying to insert data from non-partition table t1 to a partition one t2 with
insert into t2 (select * from t1);
But I get an error: Partition key of the falling row contains (column_name) = (value)
What can be wrong?
t2 is partitioned by months by column date_name , not column_name
P.s. when I try to insert data from partition to partition table with the same way, I get the same error
Hoe should I insert data in partition table?
Version: Postgresql 11
There must be at least one row in t1 for which there is no matching partition in t2. You have to create all partitions for the table before you insert data.
To figure out which row gives you trouble, look at the value from the error message.

How to Truncate a postgreSQL table with conditions

I'm trying to truncate a PostgreSQL Table with some conditions.
Truncate all the data in the table and just let the data of the last 6 months
For that i have written this Query
select distinct datecalcul
from Table
where datecalcul > now() - INTERVAL '6 months'
order by datecalcul asc
How could I add the truncate clause?
TRUNCATE does not support a WHERE condition. You will have to use a DELETE statement.
delete from the_table
where ...
If you want to get rid of old ("expired") rows efficiently based on a timestamp, you can think about partitioning. Then you can just drop the old partitions.

Exchange & split partition have a issue

I am trying to exchange non partition data with partition data. I have done following steps.
Created a new table TEMP_TABLE with partition with the TEMP_TABLE_1 range as date('1-09-2019').
And I have used
ALTER TABLE TEMP_TABLE
EXCHANGE PARTITION TEMP_TABLE_1
WITH TABLE ORG_TABLE
WITHOUT VALIDATION
UPDATE GLOBAL INDEXES;
With this my table data is exchanged with the partition and new table I can see the partition with data.
But now the problem is that the data contains rows with date more than 1-09-2019, when I try
select count(*) from TEMP_TABLE where date > '1-09-2019';
its giving 0 though there is data with the date till today.
If I try to split this partition
ALTER TABLE TEMP_TABLE SPLIT PARTITION TEMP_TABLE_1 INTO (PARTITION
TEMP_TABLE_2 values LESS THAN (TO_DATE('01-OCT-2019 00:00:00', 'DD-MON-
YYYY HH24:MI:SS')), PARTITION TEMP_TABLE_1) UPDATE GLOBAL INDEXES
PARALLEL 4;
Its throwing partition cannot be split along the specified high bound.
How to get the data which is more than the range date i have provided.
As you are exchanging data without validation (probably to improve performance) Oracle won't validate whether the value for partition key column of the data that is inserted matches the partition range condition of the partition into which that data is inserted.
--partitioned table
create table mytabp(n date)
partition by range(n)
interval(numtodsinterval(1, 'DAY'))
(partition p0 values less than (to_date('20190901','yyyymmdd')));
--nonpartitioned table to hold the data outside partition range
create table temp_mytab(n date);
insert into temp_mytab values(to_date('20191001','yyyymmdd'));
--exchanging without validation
alter table mytabp exchange partition p0 with table temp_mytab without validation;
--Data exists
select count(1) from mytabp;--1
Due to partition pruning in the below query the record is searched in the partition which must hold this data by definition. As the record exists in an incorrect partition that data is not returned.
select count(1) from mytabp where n > to_date('20190901','yyyymmdd');--0
By applying TRUNC on partitioned column, Oracle is presented with an option to scan all partitions. So the below SQL produces the record. For me on Oracle 12cR1 on Exadata, the subsequent executions of this SQL with TRUNC scanned the exact partition where the record was sitting and did not scan all partitions. I checked this with my explain plan's PARTITON_START and PARTITION_STOP columns.
select count(1) from mytabp where trunc(n) > to_date('20190901','yyyymmdd');--1
By design it is bad to place data on incorrect partitions. Please validate or filter for the correct data before executing exchange without validation.

Delete from a table on basis of indexed columns is taking for ever

We have a table having three indexed columns say
column1 of type bigint
column2 of type timestamp without time zone
column3 of type timestamp without time zone
The table is having more than 12 crores of records and we are trying to delete all the records which are greater than current date - 45 days using below query
delete from tableA
where column2 <= '2019-04-15 00:00:00.00'
OR column3 <= '2019-04-15 00:00:00.00';
This is executing for ever and never completes.
Is there any way we can improve the performance of this query.
Drop indexes, delete data and recreate indexes. But this is not working as I am not able to delete data even after dropping the indexes.
delete
from tableA
where column2 <= '2019-04-15 00:00:00.00'
OR column3 <= '2019-04-15 00:00:00.00'
I do not want to change the query but want the Postgres configured through some property so that it is able to delete the records
See also for a good discussion of the issue Best way to delete millions of rows by ID
12 crores == 120 million rows?
Deleting from a large indexed table is slow because the index is rebuilt many times during the process. If you can select the rows you want to keep and use them to create a new table, then drop the old one, the process is much faster. If you do this regularly, use table partitioning and disconnect a partition when required, this can then be dropped.
1) Check the logs, you are probably suffering from deadlocks.
2) Try creating a new table selecting the data you need, then drop and rename. Use all the columns in your index in the query. DROP TABLE is much faster than DELETE .. FROM
CREATE TABLE new_table AS (
SELECT * FROM old_table WHERE
column1 >= 1 AND column2 >= current_date - 45 AND column3 >= current_date - 45);
DROP TABLE old_table;
ALTER TABLE new_table RENAME TO old_table;
CREATE INDEX ...
3) Create a new table using partitions based on date, with a table for say 15, 30 or 45 days (if you regularly remove data that is 45 days old). See https://www.postgresql.org/docs/10/ddl-partitioning.html for details.

How to update a KDB date partition in the same segment

I have the below tables in the standard splayed format where they are partitioned by date with each column as separate file under the table name.
../archive/2010.01.03/TradingHistory_EQU_ASI_DISCRETIONARY/col1, col2, col3,....
../archive/2010.01.03/TradingHistory_EQU_ASI_MULTIQUANT/col1, col2, col3,....
../archive/2010.01.03/TradingHistory_EXCEPTION_MULTIQUANT/col1, col2, col3,....
What is the correct method to rename/update the date partition to the next day (2010.01.04)
assuming the same tables defined here in 2010.01.03 exist and are populated for 2010.01.04.
Essentially, I wanted to merge the data for these tables for 2010.01.03 and 2010.01.04 whilst leaving the merged data in the 2010.01.04 date partition?
You can merge (insert or upsert) the 2010.01.03 data to the 2010.01.04 table using the following command:
.Q.par[`:archive;2010.01.04;`TradingHistory_EQU_ASI_DISCRETIONARY] upsert get
.Q.par[`:archive;2010.01.03;`TradingHistory_EQU_ASI_DISCRETIONARY]
where the first argument of .Q.par is the path of the database, the second is the date partition and the third is the table name.