I have migrated Oracle to Postgresql using Ora2pg tool.
The database size before migration in Oracle is around 2Tb,
The same database after migration into Postgresql,size seems only 600 gb.
NOTE: Records are migrated correctly with equal row counts.
Also i wanted to know how Postgresql handles Bytea data type after migration from Blob in oracle.
You might want to check if all migrated objects are present.
However, this is not surprising, and there are several things that can contribute to that:
You counted the size of the tablespaces in Oracle, but they were partly empty.
Your table and index blocks were fragmented, while they are not in the newly imported PostgreSQL database.
Depending on which options you installed in Oracle, the data dictionary can be quite large (though that alone cannot explain the observed difference).
Related
I a trying to do the migration for our Postgres database to Aurora postgres
first I create a normal task it migrates all tables only except its constraints.
My tries to clone our database
I downloaded AWS SCT (Schema Conversion Tool) then set my configuration to generate a migration report, here is the report
We completed the analysis of your PostgreSQL source database and
estimate that 100% of the database storage objects and 99.1% of
database code objects can be converted automatically or with minimal
changes if you select Amazon Aurora (PostgreSQL compatible) as your
migration target. Database storage objects include schemas, tables,
table constraints, indexes, types, sequences and foreign tables.
Database code objects include triggers, views, materialized views,
functions, domains, rules, operators, collations, fts configurations,
fts dictionaries and aggregates. Based on the source code syntax
analysis, we estimate 99.9% (based on # lines of code) of your code
can be converted to Amazon Aurora (PostgreSQL compatible)
automatically. To complete the migration, we recommend 133 conversion
action(s) ranging from simple tasks to medium-complexity actions to
complex conversion actions.
my question:
1- is there a way to automate including everything in my source database
2- the report mentions we recommend 133 conversion action(s) where I can find these conversion actions.
3- is it safe to ongoing migration as in my case we need to run migration every day.
Sequence, Index, and Constraint are not migrated and it is mentioned in the official docs on AWS.
You can use this source.
This will help you to migrate Sequence, Index, and Constraint at once.
p.s: this doesn't include View and Routine.
There's no way AFAIK in AWS to automate everything if that was there then it would have been already added in SCT. however, if there are similar errors that are occurring in code/DDL/function like some datatype conversions. you can create a script that will take schema dump and convert all these data types to the desired ones.
Choose the SQL Conversion Actions tab in SCT tool.
The SQL Conversion Actions tab contains a list of SQL code items that can't be converted automatically. There are also recommendations for how to manually convert the SQL code. You can look into the errors and make changes accordingly.
In case if you are migrating to the same version of PG in aurora you can take a schema only dump and restore it into target aurora and later setup a full load/ongoing replication with DMS and you don't have to take SCT into consideration(most of the time worked for me). Just make sure you adhere to aurora limitations specific to the PG version
We have been using ongoing migration in our project at it's working great. There are some best practices we have developed but that will differ from project to project
DDL changes must be made on the target first and stop replication while doing it and resume once done
Separate the tables with high transactions as different DMS task as it will help you in troubleshooting and your rest of the tables can still be working
Always keep in mind DMS replicates data, not the view/function/procedures
Active monitoring of tasks and replication instances
And I would like to suggest if you are performing homogenous migration(PG -> PG) you should consider pg_dump & pg_restore that easy and sophisticated for the same versions and AWS aurora supports it.
My understanding of an in-memory table is a table that will be created in memory and would resort to disk as little as possible, if at all. I am assuming that I have enough RAM to fit the table there, or at least most of it. I do not want to use an explicit function to load tables (like pg_prewarm) in memory, I just want the table to be there by default as soon as I issue a CREATE TABLE or CREATE TABLE AS select statement, unless memory is full or unless I indicate otherwise. I do not particularly care about logging to disk.
7 years ago, a similar question was asked here PostgreSQL equivalent of MySQL memory tables?. It has received 2 answers and one of them was a bit late (4 years later).
One answer says to create a RAM disk and to add a tablespace for it. Or to use an UNLOGGED table. Or to wait for global temporary tables. However, I do not have special hardware, I only have regular RAM - so I am not sure how to go about that. I can use UNLOGGED feature, but as I understand, there is still quite a bit of disk interaction involved (this is what I am trying to reduce) and I am not sure if tables will be loaded in memory by default. Furthermore, I do not see how global temporary spaces are related. My understanding of them is that they are just tables in spaces that can be shared.
Another answer recommends an in-memory column store engine. And to then use a function to load everything in memory. The issue I have with this approach is that the engine being referred to looks old and unmaintained and I cannot find any other. Also, I was hoping I wouldn't have to explicitly resort to using a 'load into memory' function, but instead that everything will happen by default.
I was just wondering how to get in-memory tables now in Postgres 12, 7 years later.
Postgres does not have in-memory tables, and I do not have any information about any serious work on this topic now. If you need this capability then you can use one of the special in-memory databases like REDIS, MEMCACHED or MonetDB. There are FDW drivers for these databases. So you can create in-memory tables in a specialized database and you can work with these tables from Postgres via foreign tables.
MySQL in-memory tables were necessary when there was only the MyISAM engine, because this engine had very primitive capabilities with regard to IO and MySQL did not have its own buffers. Now MySQL has the InnoDB engine (with modern form of joins like other databases) and a lot of the arguments for using MySQL in-memory tables are obsolete. In comparison to the old MySQL Postgres has its own buffers and does not bypass file system caches, so all of the RAM is available for your data and you have to do nothing. Ten years ago we had to use MySQL in-memory engine to have good enough performance. But after migrating to Postgres we have had better performance without in-memory tables.
If you have a lot of memory then Postgres can use it by default - via file system cache.
As This question is specific to Postgres
There is no in-memory table but in-memory view, Materialize view which can also be refreshed. See if your requirements fits in
I have a PostgreSQL database containing a table with several 'timestamp with timezone' fields.
I have a tool (DBSync) that I want to use to transfer the contents of this table to another server/database.
When I transfer the data to a MSSQL server all datetime values are replaced with '1753-01-01'. When I transfer the data to a PostgreSQL database all datetime values are replaced with '0001-01-01'.
The smallest possible date for those systems.
Now i recreate the source-table (including contents) in a different database on the same PostgreSQL server. The only difference: the sourcetable is in a different database. Same server, same routing. Only ports are different.
User is different but in each database I have the same rights.
How can it be that the database is responsible for an apparant different interpretation of the data? Do PostgreSQL databases have database-specific settings that can cause such behaviour? What database-settings can/should I check?
To be clear, I am not looking for another way to transfer data. I have several available. The thing that I am trying to understand is: how can it be that, if an application reads datetime info from table A in database Y on server X, it gives me the the wrong date while when reading the same table from database Z on server X will give me the data as it should be.
It turns out that the cause is probably the difference in server-version. One is a Postgres 9 (works ok), the other is a Postgres 10 (does not work okay).
They are different instances on the same machine. Somehow I missed that (blush).
With transferring I meant that I am reading records from a sourcedatabase (Postgresql) and inserting them in a targetdatabase (mssql 2017).
This is done through the application, I am not sure what drivers it is using.
I wil work with the people who made the application.
For those wondering: it is this application: https://dbconvert.com/mssql/postgresql/
When a solution is found I will update this answer with the found solution.
There are a few questions and answers already on PostgreSQL import (as well as the specific SQLite->PostgreSQL situation). This question is about a specific corner-case.
Background
I have an existing, in-production web-app written in python (pyramid) and using alembic for easy schema migration. Due to the database creaking with unexpectedly high write-load (probably due to the convoluted nature of my own code), I've decided to migrate to PostgreSQL.
Data migration
There are a few recommendations on data migration. The simplest one involved using
sqlite3 my.db .dump > sqlitedumpfile.sql
and then importing it with
psql -d newpostgresdb < sqlitedumpfile.sql
This required a bit of editing of sqlitedumpfile. In particular, removing some incompatible operations, changing values (sqlite represents booleans as 0/1) etc. It ended up being too complicated to do programmatically for my data, and too much work to handle manually (some tables had 20k rows or so).
A good tool for data migration which I eventually settled on was pgloader, which 'worked' immediately. However, as is typical for data migration of this sort, this exposed various data inconsistencies in my database which I had to solve at source before doing the migration (in particular, removing foreign keys to non-unique columns which seemed a good idea at the time for convenient joins and removing orphan rows which relied on rows in other tables which had been deleted). After these were solved, I could just do
pgloader my.db postgresql:///newpostgresdb
And get all my data appropriately.
The problem?
pgloader worked really well for data but not so well for the table structure itself. This resulted in three problems:-
I had to create a new alembic revision with a ton of changes (mostly datatype related, but also some related to problem 2).
Constraint/index names were unreliable (unique numeric names generated). There's actually an option to disable this, and this was a problem because I needed a reliable upgrade path which was replicable in production without me having to manually tweak the alembic code.
Sequences/autoincrement just failed for most primary keys. This broke my webapp as I was not able to add new rows for some (not all) databases.
In contrast, re-creating a blank database using alembic to maintain the schema works well without changing any of my webapps code. However pgloader defaults to over-riding existing tables, so this would leave me nowhere as the data is what really needs migrating.
How do I get proper data migration using a schema I've already defined (and which works)?
What eventually worked was, in summary:-
Create the appropriate database structure in postgresql://newpostgresdb (I just used alembic upgrade head for this)
Use pgloader to move data over from sqlite to a different database in postgresql. As mentioned in the question, some data inconsistencies need to be solved before this step, but that's not relevant to this question itself.
createdb tempdb
pgloader my.db postgresql:///tempdb
Dump the data in tempdb using pg_dump
pg_dump -a -d tempdb > dumped_postgres_database
Edit the resulting dump to accomplish the following:-
SET session_replication_role = replica because some of my rows are circular in reference to other rows in the same table
Delete the alembic_version table, as we're restarting a new branch for alembic.
Regenerate any sequences, with the equivalent of SELECT pg_catalog.setval('"table_colname_seq"', (select max(colname) from table));
Finally, psql can be used to load the data to your actual database
psql -d newpostgresdb < dumped_postgres_database
We have a daily process that pulls all data out of a number of tables in an Oracle database and imports them into a Postgress (EnterpriseDB) database - Version 8.4.
We are currently using a java application to select * from each table, change the keywords (date, timestamp, etc) and then import them into the Postgres Database.
Are there any tools available in Postgres that would provide a more efficient manner of doing this? I should note that there are CLOBs that are being transported over.
There is Ora2Pg, which is intended as a one-time-migration tool, but it might work in your case as well. I think of it as an Oracle-to-PostgreSQL pg_dump.