Hive partitioning external table based on range

Hive partitioning external table based on range - range

I want to partition an external table in hive based on range of numbers. Say numbers with 1 to 100 go to one partition. Is it possible to do this in hive?

I am assuming here that you have a table with some records from which you want to load data to an external table which is partitioned by some field say RANGEOFNUMS.
Now, suppose we have a table called testtable with columns name and value. The contents are like
India,1
India,2
India,3
India,3
India,4
India,10
India,11
India,12
India,13
India,14
Now, suppose we have a external table called testext with some columns along with a partition column say, RANGEOFNUMS.
Now you can do one thing,
insert into table testext partition(rangeofnums="your value")
select * from testtable where value>=1 and value<=5;
This way all records from the testtable having value 1 to 5 will come into one partition of the external table.
The scenario is my assumption only. Please comment if this is not the scenario you have.
Achyut

Related

How to copy a Specific Partitioned Table

I would like to shift data from a specific paritioned table of parent to separate table.Can someone suggest what's the better way.
If I create a table
CREATE TABLE parent columns(a,b,c)partition by c
c Type is DATE.
CREATE TABLE dec_partition PARTITION OF parent FOR VALUES FROM '2021-02-12' to 2021-03-12;
Now I want to copy table dec_partition to separate table in single command with least time.
NOTE: The table has around 4million rows with 20 columns (1 of the column is in jsonb)

How to backup whole table into a single field item?

I have few very small tables (a total of ~1000 rows) that I want to backup regularly into the same DB, to a single table. I know it sounds weird but hear me out.
Let's say that the tables I want to backup are named linux_commands, and windows_commands. These two tables have roughly: id (pkey), name, definition, config (jsonb), commands.
I want to back these up everyday into a table called commands_backup and I want this new table to have a date field, a field for windows_commands, and another one for linux_commands, so three columns in total. Each day, a script would run and write current date to date field, and then fetch whole linux_commands table and write it to related field in a single row, then do the same for windows_commands.
How would you setup something like this? Also, what is the best data type for storing whole data set in a single item?

In the target table, windows_commands and linux_commands should be type jsonb.
Then you can use:
INSERT INTO commands_backup VALUES (
current_date,
(SELECT jsonb_agg(to_jsonb(linux_commands)) FROM linux_commands),
(SELECT jsonb_agg(to_jsonb(windows_commands)) FROM windows_commands)
);

POSTGRES: Aggregating full table data in case of table partitions

I'm new to table partitions in Postgres and have one doubt.
Let us assume I have a table:
product_visitors
I can create multiple partitions like:
product_visitors_year_2017
product_visitors_year_2018
etc.
I can create a trigger which can redirect insertion on product_visitors to appropriate table.
My question is, what if I want to aggregate on full data of product_visitors? For example, products and their visit count
As I understand, at the moment, data resides in year wise tables instead of main table

In Postgres 10 inserts will automatically be routed to the correct partition.
If you select from the "base table" product_visitors without any condition limiting the rows to one (or more) specific partitions, Postgres will automatically read the data from all partitions.
So
select count(*)
from product_visitors;
will count the rows in all partitions.

Postgres table partitioning with star schema

I have a schema with one table with the majority of data, customer, and three other tables with foreign key references to customer.entry_id which is a BIGSERIAL field. The three other tables are called location, devices and urls where we store various data related to a specific entry in the customer table.
I want to partition the customer table into monthly child tables, and have that part worked out; customer will stay as-is, each month will have a table customer_YYYY_MM that inherits from the master table with the right CHECK constraint and indexes will be created on each individual child table. Data will be moved to the correct child tables while the master table stays empty.
My question is about the other three tables, as I want to partition them as well. However, they have no date information (at all), only the reference to the primary key from the master table. How can I setup the constraints on these tables? Is it even meaningful or possible without date information?
My application logic knows where to insert all the data (it's fairly trivial), but I expect to be able to do simple SELECT queries without specifying which child tables to get it from. So this should work as you would expect from non-partitioned tables:
SELECT l.*
FROM customer c
JOIN location l USING entry_id
WHERE c.date_field > '2015-01-01'

I would partition them by the reference key. The foreign key is used in join conditions and is not usually subject to change so it fulfills the following important points:
Partition by the information that is used mostly in the WHERE clauses of the queries or other parts where partitioning can be used to filter out tables that don't need to be scanned. As one guide puts it:
The objective when defining partitions should be to allow as many queries as possible to fetch data from as few partitions as possible - ideally one.
Partition by information that is not going to be changed so that rows don't constantly need to be thrown from one subtable to another
This all depends of the size of the tables too of course. If the sizes stay small then there is no need to partition.

Read more about partitioning here.
Use views:
create view customer as
select * from customer_jan_15 union all
select * from customer_feb_15 union all
select * from customer_mar_15;
create view location as
select * from location_jan_15 union all
select * from location_feb_15 union all
select * from location_mar_15;

DB2 Partitioning

I know how partitioning in DB2 works but I am unaware about where this partition values exactly get stored. After writing a create partition query, for example:
CREATE TABLE orders(id INT, shipdate DATE, …)
PARTITION BY RANGE(shipdate)
(
STARTING '1/1/2006' ENDING '12/31/2006'
EVERY 3 MONTHS
)
after running the above query we know that partitions are created on order for every 3 month but when we run a select query the query engine refers this partitions. I am curious to know where this actually get stored, whether in the same table or DB2 has a different table where partition value for every table get stored.
Thanks,

table partitions in DB2 are stored in tablespaces.
For regular tables (if table partitioning is not used) table data is stored in a single tablespace (not considering LOBs).
For partitioned tables multiple tablespaces can used for its partitions.
This is achieved by the "" clause of the CREATE TABLE statement.
CREATE TABLE parttab
...
in TBSP1, TBSP2, TBSP3
In this example the first partition will be stored in TBSP1, the second in TBSP2, The third in TBSP3, the fourth in TBSP1 and so on.
Table partitions are named in DB2 - by default PART1 ..PARTn - and all these details can be looked up in the system catalog view SYSCAT.DATAPARTITIONS including the specified partition ranges.
See also http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0021353.html?cp=SSEPGG_10.5.0%2F2-12-8-27&lang=en
The column used as partitioning key can be looked up in syscat.datapartitionexpression.
There is also a long syntax for creating partitioned tables where partition names can be explizitly specified as well as the tablespace where the partitions will get stored.
For applications partitioned tables look like a single normal table.
Partitions can be detached from a partitioned table. In this case a partition is "disconnected" from the partitioned table and converted to a table without moving the data (or vice versa).
best regards
Michael

After a bit of research I finally figure it out and want to share this information with others, I hope it may come useful to others.
How to see this key values ? => For LUW (Linux/Unix/Windows) you can see the keys in the Table Object Editor or the Object Viewer Script tab. For z/OS there is an Object Viewer tab called "Limit Keys". I've opened issue TDB-885 to create an Object Viewer tab for LUW tables.
A simple query to check this values:
SELECT * FROM SYSCAT.DATAPARTITIONS
WHERE TABSCHEMA = ? AND TABNAME = ?
ORDER BY SEQNO
reference: http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0021353.html?lang=en

DB2 will create separate Physical Locations for each partition. So each partition will have its own Table-space. When you SELECT on this partitioned Table your SQL may directly go to a single partition or it may span across many depending on how your SQL is. Also, this may allow your SQL to run in parallel i.e. many TS can be accessed concurrently to speed up the SELECT.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Hive partitioning external table based on range - range

I want to partition an external table in hive based on range of numbers. Say numbers with 1 to 100 go to one partition. Is it possible to do this in hive?

Related

How to copy a Specific Partitioned Table

How to backup whole table into a single field item?

POSTGRES: Aggregating full table data in case of table partitions

Postgres table partitioning with star schema

DB2 Partitioning

Categories

Resources