Postgres 10 Partitioning - Parent Table Not Visible - postgresql

I've upgraded to Postgres 10 and I'm planning to deploy some partitioned tabled using the new declarative partitioning syntax:
-- Create a parent table
create table test(
name text,
created timestamp with time zone default now()
) partition by range(created);
-- Create a partition
create table test_2017 partition of test
for values from('1 JAN 2017') to ('1 JAN 2018');
I'd like to be able to reference the parent table in my functions and queries, but only the partitioned tables seem to get resolved by my tools. In both PhpStorm 2017.3 RC and BigSQL's PGAdmin III schema browsers, the table listings both show the partition, but not the parent table. In PhpStorm's editor, I get an "unresolved table" warning whenever I reference "test", however insert/select queries still work.
Have I missed something, or are the tools not yet fully compatible with partitioning?

Related

Attach Partition takes more time even after adding check constraint

So basically we have a very large table in Postgres 11 DB which has hundreds of millions of data since the table was added. Now we are trying to convert it into a range based partition table based on created_at column (timestamp - not nullable).
As suggested in the Postgres Partitioning Documentation, I tried adding a check constraint on the same table before actually running the attach partition. So after that if I run attach partition, ideally it should have taken very less time as it should skip the validation due to presence of respective constraint on the table, but I see it is still taking lot more time. My partition range and the constraint looks something like this:
alter table xyz_2020 add constraint temp_check check (created_at >= '2020-01-01 00:00:00' and created_at < '2021-01-01 00:00:00');
ALTER TABLE xyz ATTACH PARTITION xyz_2020 FOR VALUES FROM ('2020-01-01 00:00:00') TO ('2021-01-01 00:00:00');
Here xyz_2020 is my existing big table which got renamed from xyz. And xyz is the new master table created like the old table. So I want to understand what could be the possible reasons that attach partition might still be taking lot more time.
Edit: We are creating a new partitioned table and trying to attach the old table as one of its partition.

How to get the describe tables from the Redshift and ALTER it

I have create a redshift cluster and created a db inside.
My schema is new_schema
I have created 2 tables inside two tables inside table1, table2
My Question.
I want to list the datatypes of table1
I need to change the datatype of description which is inside the table1 which is of VARCHAR to TEXT
I have tried to list the datatypes of table1 with below query but nothing listing
SELECT * FROM PG_TABLE_DEF WHERE schemaname = 'new_schema';
A few possibilities as to why you are not seeing the expected results. Most likely is that new_schema isn't in your search_path. Pg_table_info only return info for tables in your search_path - see: https://docs.aws.amazon.com/redshift/latest/dg/r_PG_TABLE_DEF.html
Another possibility is that the tables have no data rows (no blocks assigned) and this can lead to incomplete info from some system tables.
Another possibility is that the tables were not committed by the creating session and being checked by a different session. Since you say that you are creating a new db this comes to mind.
Are the tables visible in svv_table_info?
Also the premise of changing varchar to text is a bit off. From https://docs.aws.amazon.com/redshift/latest/dg/r_Character_types.html#r_Character_types-text-and-bpchar-types
You can create an Amazon Redshift table with a TEXT column, but it is
converted to a VARCHAR(256) column that accepts variable-length values
with a maximum of 256 characters.
So it seems like the objective you are trying to achieve is a bit off.

PostgreSQL shows wrong sequence details in the table's information schema

Recently I saw a strange scenario with my PostgreSQL DB. The information schema of my database is showing a different sequence name than the one actually allocated for the column of my table.
The issue is:
I have a table tab_1
id name
1 emp1
2 emp2
3 emp3
Previously the id column (integer) of the table was an auto generated field where the sequence number was generated at run time via JPA. (Sequence name: tab_1_seq)
We made a change and updated the table's column id to bigserial and the sequence is maintained in the column level (allocated new sequence: tab_1_temp_seq) not handled by the JPA anymore.
After this change everything was working fine for few months and after that we faced an error - "the sequence "tab_1_temp_seq" is not yet defined in this session"
On analyzing the issue I found out that there is a mismatch between the sequences allocated for the table.
In the table structure, we where shown the sequence as tab_1_temp_seq and in the information_schema the table was allocated with the old sequence - tab_1_seq.
I am not sure what has really triggered this to happen, as we are not managing our database system. If you have faced any issues like this, kindly let me know its root cause.
Queries:
SELECT table_name, column_name, column_default from information_schema.columns where table_name = ‘tab_1’;
result :
table_name column_name column_default
tab_1 id nextval('tab_1_seq::regclass')
Below are the details found in the table structure/properties:
id nextval('tab_1_temp_seq::regclass')
name varChar
Perhaps you are suffering from data corruption, but it is most likely that you are suffering from bad tools to visualize your database objects. Whatever program shows you the “table structure/properties” might be confused.
To find out the truth (which DEFAULT value PostgreSQL uses), run:
SELECT pg_get_expr(adbin, adrelid)
FROM pg_attrdef
WHERE adrelid = 'tab1'::regclass;
This is also what information_schema.columns will show, but I added the naked query for clarity.
This DEFAULT value will be used whenever the INSERT statement either doesn't specify the id column or fills it with the special value DEFAULT.
Perhaps the confusion is also caused by different programs that may set default values in their way. The way I showed above is PostgreSQL's way, but nothing can keep a third-party tool from using its own sequence to filling the id.

DB2 Partitioning

I know how partitioning in DB2 works but I am unaware about where this partition values exactly get stored. After writing a create partition query, for example:
CREATE TABLE orders(id INT, shipdate DATE, …)
PARTITION BY RANGE(shipdate)
(
STARTING '1/1/2006' ENDING '12/31/2006'
EVERY 3 MONTHS
)
after running the above query we know that partitions are created on order for every 3 month but when we run a select query the query engine refers this partitions. I am curious to know where this actually get stored, whether in the same table or DB2 has a different table where partition value for every table get stored.
Thanks,
table partitions in DB2 are stored in tablespaces.
For regular tables (if table partitioning is not used) table data is stored in a single tablespace (not considering LOBs).
For partitioned tables multiple tablespaces can used for its partitions.
This is achieved by the "" clause of the CREATE TABLE statement.
CREATE TABLE parttab
...
in TBSP1, TBSP2, TBSP3
In this example the first partition will be stored in TBSP1, the second in TBSP2, The third in TBSP3, the fourth in TBSP1 and so on.
Table partitions are named in DB2 - by default PART1 ..PARTn - and all these details can be looked up in the system catalog view SYSCAT.DATAPARTITIONS including the specified partition ranges.
See also http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0021353.html?cp=SSEPGG_10.5.0%2F2-12-8-27&lang=en
The column used as partitioning key can be looked up in syscat.datapartitionexpression.
There is also a long syntax for creating partitioned tables where partition names can be explizitly specified as well as the tablespace where the partitions will get stored.
For applications partitioned tables look like a single normal table.
Partitions can be detached from a partitioned table. In this case a partition is "disconnected" from the partitioned table and converted to a table without moving the data (or vice versa).
best regards
Michael
After a bit of research I finally figure it out and want to share this information with others, I hope it may come useful to others.
How to see this key values ? => For LUW (Linux/Unix/Windows) you can see the keys in the Table Object Editor or the Object Viewer Script tab. For z/OS there is an Object Viewer tab called "Limit Keys". I've opened issue TDB-885 to create an Object Viewer tab for LUW tables.
A simple query to check this values:
SELECT * FROM SYSCAT.DATAPARTITIONS
WHERE TABSCHEMA = ? AND TABNAME = ?
ORDER BY SEQNO
reference: http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0021353.html?lang=en
DB2 will create separate Physical Locations for each partition. So each partition will have its own Table-space. When you SELECT on this partitioned Table your SQL may directly go to a single partition or it may span across many depending on how your SQL is. Also, this may allow your SQL to run in parallel i.e. many TS can be accessed concurrently to speed up the SELECT.

Hive: dynamic partition adding to external table

I am running hive 071, processing existing data which is has the following directory layout:
-TableName
- d= (e.g. 2011-08-01)
- d=2011-08-02
- d=2011-08-03
... etc
under each date I have the date files.
now to load the data I'm using
CREATE EXTERNAL TABLE table_name (i int)
PARTITIONED BY (date String)
LOCATION '${hiveconf:basepath}/TableName';**
I would like my hive script to be able to load the relevant partitions according to some input date, and number of days. so if I pass date='2011-08-03' and days='7'
The script should load the following partitions
- d=2011-08-03
- d=2011-08-04
- d=2011-08-05
- d=2011-08-06
- d=2011-08-07
- d=2011-08-08
- d=2011-08-09
I havn't found any discent way to do it except explicitlly running:
ALTER TABLE table_name ADD PARTITION (d='2011-08-03');
ALTER TABLE table_name ADD PARTITION (d='2011-08-04');
ALTER TABLE table_name ADD PARTITION (d='2011-08-05');
ALTER TABLE table_name ADD PARTITION (d='2011-08-06');
ALTER TABLE table_name ADD PARTITION (d='2011-08-07');
ALTER TABLE table_name ADD PARTITION (d='2011-08-08');
ALTER TABLE table_name ADD PARTITION (d='2011-08-09');
and then running my query
select count(1) from table_name;
however this is offcourse not automated according to the date and days input
Is there any way I can define to the external table to load partitions according to date range, or date arithmetics?
I have a very similar issue where, after a migration, I have to recreate a table for which I have the data, but not the metadata. The solution seems to be, after recreating the table:
MSCK REPAIR TABLE table_name;
Explained here
This also mentions the "alter table X recover partitions" that OP commented on his own post. MSCK REPAIR TABLE table_name; works on non-Amazon-EMR implementations (Cloudera in my case).
The partitions are a physical segmenting of the data - where the partition is maintained by the directory system, and the queries use the metadata to determine where the partition is located. so if you can make the directory structure match the query, it should find the data you want. for example:
select count(*) from table_name where (d >= '2011-08-03) and (d <= '2011-08-09');
but I do not know of any date-range operations otherwise, you'll have to do the math to create the query pattern first.
you can also create external tables, and add partitions to them that define the location.
This allows you to shred the data as you like, and still use the partition scheme to optimize the queries.
I do not believe there is any built-in functionality for this in Hive. You may be able to write a plugin. Creating custom UDFs
Probably do not need to mention this, but have you considered a simple bash script that would take your parameters and pipe the commands to hive?
Oozie workflows would be another option, however that might be overkill. Oozie Hive Extension - After some thinking I dont think Oozie would work for this.
I have explained the similar scenario in my blog post:
1) You need to set properties:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
2)Create a external staging table to load the input files data in to this table.
3) Create a main production external table "production_order" with date field as one of the partitioned columns.
4) Load the production table from the staging table so that data is distributed in partitions automatically.
Explained the similar concept in the below blog post. If you want to see the code.
http://exploredatascience.blogspot.in/2014/06/dynamic-partitioning-with-hive.html