How to convert a partitioned table to non partitioned table in Oracle 12c - oracle12c

I plan to range partition a table (Currently about 4 GB in size and growing at a rate of 2 GB per year) in oracle 12c in production expecting performance gains. However just in case it fails to meet performance expectations and later I need to convert it back to non-partitioned state what are the steps to be followed. Note that this table has foreign key references from and to - with few other tables. Has several indexes also. Assume the steps will be done by a DBA.

Related

Is there a recommended table size for partitioning in postgresql?

I used RDS Aurora PostreSQL in AWS.
The size of the data I manage and the number of rows are too large (7 billion rows and 4TB), so I am considering table partitioning.
(I also considered the citus of postgresql... but unfortunately it is not available in aws...)
When I request some query in that table, it is very slow...
So I applied table partitioning (10 partitions) and the query performance was there, but still slow.
The site below recommends ‘Tables bigger than 2GB should be considered.’, but in this case, there are too many partitioning tables and it seems difficult to manage.
https://hevodata.com/learn/postgresql-partitions/#t10
What would be the appropriate table size?
And is the pg_partman extension required in this case?
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL_Partitions.html
Is there any other way to improve query performance other than partitioning if there is too much data in the table?

Retention management for time-series table in Redshift

I have a table which I use DMS to migrate from Aurora to Redshift. This table is insert only with a lot of data by timestamp.
I would like to have a trimmed version of that table in redshift.
The idea was to use partitions on it and use retention script to keep it with just the last 2 months. However in Redshift there is no partitions and what I find out there time-series table which sounds the same. If I understand it correctly my table should look like:
create table public."bigtable"(
"id" integer NOT NULL DISTKEY,
"date" timestamp,
"name" varchar(256)
)
SORTKEY(date);
However I don't find good documentation how the retention is managed. Would like any corrections and advice :)
A couple of ways this is typically done in Redshift.
For small to medium tables the data can just be DELETEd and the table VACUUMed (usually a delete only vacuum). Redshift is very good at handling large amounts of data and for tables that are really large this works fine. There is some overhead for the delete and vacuum but if these are scheduled on off hours it works just fine and is simple.
When the table in question get really big or there are not low workload times to perform the delete and vacuum, people set up "month" tables for their data and use a view that UNION ALLs these tables together. Then "removing" a month is just redefining the view and dropping the unneeded table. This is very low effort for Redshift to perform but is a bit more complex to set up. Your incoming data needs to be put into the correct table based on month so it is no longer just a copy from Aurora. This process also simplifies UNLOADing the old tables to S3 for history capture purposes.

Pentaho table input giving very less performance on postgres tables even for two columns in a table

The simple source read from postgres table(get 3 columns out of 20 columns) is taking huge time to read which I want to read to stream lookup where I fetch one column information
Here is the log:
2020/05/15 07:56:03 - load_identifications - Step **Srclkp_Individuals.0** ended successfully, processed 4869591 lines. ( 7632 lines/s)
2020/05/15 07:56:03 - load_identifications - Step LookupIndiv.0 ended successfully, processed 9754378 lines. ( 15288 lines/s)
The table input query is:
SELECT
id as INDIVIDUAL_ID,
org_ext_loc
FROM
individuals
This table is in postgres with 20 columns hardly & about 4.8 million rows..
This is for pentaho 7.1 data integration & server details below
**Our server information**:
OS : Oracle Linux 7.3
RAM : 65707 MB
HDD Capacity : 2 Terabytes
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 16
CPU MHz: 2294.614
I am connecting to postgres using jdbc
Don't know what else I can do to get about 15K rows/sec throughput
Check transformation properties under Miscellaneous
Nr of rows in rowset
Feedback size
Also check your Table if it has proper index.
When you use a table input & stream lookup, the way how pentaho runs the stream lookup is slower than when you use a database lookup. As #nsousa suggested, I checked that with dummy step and got to know that pentaho's way of handling is different for every type of step
Even though database lookup & stream lookup come in same category, the performance for database lookup is better in this situation..
Pentaho help gives some idea / suggestion regarding the same

Should I migrate to Redshift?

I'm currently struggling querying be chunk of data that is stored in partitioned table (partition per date)
the data looks like that:
date, product_id, orders
2019-11-01, 1, 100
2019-11-01, 2, 200
2019-11-02, 1, 300
I have hundreds of date-partitions and millions of rows per date.
Now, if I want to query, for instance, total orders for product id 1 and 2 for period of 2 weeks, and group by date (to show in a graph per date), the db has to go to 2 weeks of partitions and fetch the data for them.
That process might be taking a long time when the number of products is big or the time frame required is long.
I have read that AWS Redshift is suitable for this kind of tasks. I'm considering shifting my partitioned tables (aggregated analytics per date) to that technology but I wonder if that's really what I should do to make those queries to run much faster.
Thanks!
As per your use case Redshift is really a good choice for you.
To gain the best performance out of Redshift, it is very important to set proper distribution and sort key. In your case "date" column should be distribution key and "productid" should be sort key. Another important note, Do not encode "date" and "productid" column.
You should get better performance.
If you are struggling with traditional SQL databases, then Amazon Redshift is certainly an option. It can handle tables with billions of rows.
This would involve loading the data from Amazon S3 into Redshift. This will allow Redshift to optimize the way that the data is stored, making it much faster for querying.
Alternatively, you could consider using Amazon Athena, which can query the data directly from Amazon S3. It understands data that is partitioned into separate directories (eg based on date).
Which version of PostgreSQL are you using?
Are you using native partioning or inheritance partitioning trigger-based?
Latest version of postgresql improved partitioning management.
Considering your case Amazon Redshift can be a good choice, so does Amazon Athena. But it is also important to consider your application framework. Are you opt moving to Amazon only for Database or you have other Amazon services in the list too?
Also before making the decision please check the cost of Redshift.

App performace with million tables in a database in PostgreSQL 9.4

We have a requirement for our application in which single database with 1 million tables(which spread across multiple schemas) with average 100 requests / second and we are planning to use version 9.4.
My Questions
Can PG deal with 1 million tables. Does huge number of tables slow down the performance any way
Query planner will query system tables like pg_catalog for every query. Will it cause any perf issues.
Will there be any significant performance degradation on lookup of data files(million) since located in single directory.
Senthil