Caveats:
Let me first clarify that this is not a question about whether to use surrogates primary keys or not.
Also, this is NOT a related to identities (SQL Server) / Sequences (Oracle) and their pros / cons. I did get a fair bit of idea about that thanks to this, this and this
Question:
I come from a SQL Server background and have always used identity columns as surrogate primary keys for most tables.
Based on my knowledge of Oracle, I find that the nearest equivalent in Oracle are SEQUENCES which can be used to simulate something similar to Identity in SQL server.
As I am new to Oracle and my database has 100+ tables, the main thing that i am concerned about :-
Considering i have to create a sequence for each table in Oracle (almost), would this be the standard accepted implementation for simulating Identity or is there a better / easier way to achieve this kind of implementation in Oracle?
Are there any specific GOTCHA's related to having so many sequences in Oracle?
The system supports both Oracle 10G and 11G
Considering i have to create a
sequence for each table in Oracle
(almost), would this be the standard
accepted implementation for simulating
Identity or is there a better / easier
way to achieve this kind of
implementation in Oracle?
Yes, it is very typical in Oracle to create a sequence for each table. It is possible to use the same sequence for several tables, but you run the risk of making it a bottleneck by using a single sequence for many/all tables: see this AskTom q/a
Are there any specific GOTCHA's
related to having so many sequences in
Oracle?
None that I can think of.
100+ tables is not very many. I routinely work on databases with several hundred sequences, one for each table. The more the merrier.
It's even conceivable to have more sequences than tables - unlike identity columns in other DBMSs, sequences can be used for more than just creating surrogate key values.
An alternative is to use GUIDs - in Oracle you can call SYS_GUID to generate unique values.
A good article, followed by comments with pros and cons for both approaches: http://rwijk.blogspot.com/2009/12/sysguid.html
Related
For a project I need two types of tables.
hypertable (which is a special type of table in PostgreSQL (in PostgreSQL TimescaleDB)) for some timeseries records
my ordinary tables which are not timeseries
Can I create a PostgreSQL TimescaleDB and store my ordinary tables on it? Are all the tables a hypertable (time series) on a PostgreSQL TimescaleDB? If no, does it have some overhead if I store my ordinary tables in PostgreSQL TimescaleDB?
If I can, does it have any benefit if I store my ordinary table on a separate ordinary PostgreSQL database?
Can I create a PostgreSQL TimescaleDB and store my ordinary tables on it?
Absolutely... TimescaleDB is delivered as an extension to PostgreSQL and one of the biggest benefits is that you can use regular PostgreSQL tables alongside the specialist time-series tables. That includes using regular tables in SQL queries with hypertables. Standard SQL works, plus there are some additional functions that Timescale created using PostgreSQL's extensibility features.
Are all the tables a hypertable (time series) on a PostgreSQL TimescaleDB?
No, you have to explicitly create a table as a hypertable for it to implement TimescaleDB features. It would be worth checking out the how-to guides in the Timescale docs for full (and up to date) details.
If no, does it have some overhead if I store my ordinary tables in PostgreSQL TimescaleDB?
I don't think there's a storage overhead. You might see some performance gains e.g. for data ingest and query performance. This article may help clarify that https://docs.timescale.com/timescaledb/latest/overview/how-does-it-compare/timescaledb-vs-postgres/
Overall you can think of TimescaleDB as providing additional functionality to 'vanilla' PostgreSQL and so unless there's a reason around application design to separate non-time-series data to a separate database then you aren't obliged to do that.
One other point, shared by a very experienced member of our Slack community [thank you Chris]:
To have time-series data and “normal” data (normalized) in one or separate databases for us came down to something like “can we asynchronously replicate the time-series information”?
In our case we use two different pg systems, one replicating asynchronously (for TimescaleDB) and one with synchronous replication (for all other data).
Transparency: I work for Timescale
I'm working with openstreetmap data and import it with tools into a postgres database. One key term in openstreetmap is natural.
When importing this data, a column name in the postgres database table is natural.
The issue is, when reading the table in some clients, the attribute natural is represented as "natural" which leads to issues.
Is there a way to store "natural" as natural OR help the client to read it properly?
natural is a reserved keyword in postgres:
https://www.postgresql.org/docs/current/sql-keywords-appendix.html
keywords have to be quoted if they are used as identifiers. If possible, choose a different name.
I have a use case to distribute data across many databases on many servers, all in postgres tables.
From any given server/db, I may need to query another server/db.
The queries are quite basic, standard selects with where clauses on standard fields.
I have currently implemented postgres_FDW, (I'm, using postgres 9.5), but I think the queries are not using indexes on the remote db.
For this use case (a random node may query N other nodes), which is likely my best performance choice based on how each underlying engine actually executes?
The Postgres foreign data wrapper (postgres_FDW) is newer to
PostgreSQL so it tends to be the recommended method. While the
functionality in the dblink extension is similar to that in the
foreign data wrapper, the Postgres foreign data wrapper is more SQL
standard compliant and can provide improved performance over dblink
connections.
Read this article for more detailed info: Cross Database queryng
My solution was simple: I upgraded to Postgres 10, and it appears to push where clauses down to the remote server.
We have a large table in our Postgres production database which we want to start "sharding" using foreign tables and inheritance.
The desired architecture will be to have 1 (empty) table that defines the schema and several foreign tables inheriting from the empty "parent" table. (possible with Postgres 9.5)
I found this well written article https://www.depesz.com/2015/04/02/waiting-for-9-5-allow-foreign-tables-to-participate-in-inheritance/ that explains everything on how to do it from scratch.
My question is how to reduce the needed migration of data to a minimum.
We have this 100+ GB table now, that should become our first "shard". And in the future we will regulary add new "shards". At some point, the older shards will be moved to another tablespace (on cheaper hardware since they become less important).
My question now:
Is there a way to "ALTER" an existing table to be a foreign table instead?
No way to use alter table to do this.
You really have to basically do it manually. This is no different (really) than doing table partitioning. You create your partitions, you load the data. You direct reads and writes to the partitions.
Now in your case, in terms of doing sharding there are a number of tools I would look at to make this less painful. First, if you make sure your tables are split the way you like them first, you can use a logical replication solution like Bucardo to replicate the writes while you are moving everything over.
There are some other approaches (parallelized readers and writers) that may save you some time at the expense of db load, but those are niche tools.
There is no native solution for shard management of standard PostgreSQL (and I don't know enough about Postgres-XL in this regard to know how well it can manage changing shard criteria). However pretty much anything is possible with a little work and knowledge.
Due to added advantage of high performance and reduction in turnaround time, I am trying to migrate all the data from IBM DB2 to Netezza in my organization.
But what I realized is there is no concept of primary key in Netezza? If true, I can try and take care of these issue by using duplicate removal stage in Datastage.
Also, could you guys please assist me understanding if there are any more constraints that I should consider or challenges I could face for DB2 to Netezza migration?
Netezza does allow you to specify Primary Key and Foreign Key restraints, but they are not enforced. Which is to say that they are purely informational (for bot the user and the optimizer). A well-formed upsert process in ETL is a good way to manage for this.
On the topic of other issues you may face, here are a few thoughts:
Surrogate Keys
Be sure that you generate your surrogate keys either with Netezza's SEQUENCE object, or with a surrogate key generator in your ETL tool. Avoid using ROW_NUMBER for this process as it will most often prevent you from exploiting the parallel nature of the system when used in this way.
Stored Procedures
Stored procedures should avoid row-by-row/cusor-based processing when possible, as this is another case where you may prevent yourself from exploiting the parallel nature of the system.
SQL Extension Functions
If you find that you rely on functions that exists in DB2 that you don't find natively in Netezza, be sure to check what is available in the SQL Extensions Toolkit, which is included with Netezza, but not automatically installed/configured.
MERGE
If you rely on MERGE in your current environment, be aware that you must be on v7.2.1 to use MERGE in Netezza. Otherwise you will have to break it out into an INSERT/UPDATE operation.
Once you load the data in Netezza, one method we have utilized is to create a View to access the data and only expose the view. The view would have the logic inside to remove the duplicates.
Good luck!
Delan