I am reading the legacy datawarehouse in postgresql and found a list of tables named like
command
\list
result:
abc_1
abc_2
abc_3
...
abc_10000
what do these sequential named tables in postgresql in the context of datawarehouse mean ? Why don't we just merge them into one table abc
It is extremely likely that these will be partitions of a parent table abc. Check with \d+ abc_1. Does it mention any inheritance or parent table?
Even if they aren't part of an inheritance scheme it's likely to be partitioning, just handled at the application level.
Partitioning is a useful workaround for limitations in the database engine. In an ideal world it wouldn't be necessary, but in reality it can help some workloads and query patterns.
Related
I've been reading about logical replication in PostgreSQL, which seems to be a very good solution for sharing a small number of tables among several databases. My case is even simpler, as my subscribers will only use source tables in a read-only fashion.
I know that I can add extra columns to a subscribed table in the subscribing node, but what if I only want to import a subset of the whole set of columns of a source table? Is it possible or will it throw an error?
For example, my source table product, has a lot of columns, many of them irrelevant to my subscriber databases. Would it be feasible to create replicas of product with only the really needed columns at each subscriber?
The built in publication/subscription method does not support this. But the logical replication framework also supports any other decoding plugin you can write (or get someone else to write) and install, so you could make this happen that way. It looks like pglogical already supports this ("Selective replication of table columns at publisher side", but I have never tried to use this feature myself).
As of v15, PostgreSQL supports publishing a table partially, indicating which columns must be replicated out of the whole list of columns.
A case like this can be done now:
CREATE PUBLICATION users_filtered FOR TABLE users (user_id, firstname);
See https://www.postgresql.org/docs/15/sql-createpublication.html
I am trying to find the best solution to build a database relation. I need something to create a table that will contain data split across other tables from different databases. All the tables got exactly the same structure (same column number, names and types).
In the single database, I would create a parent table with partitions. However, the volume of the data is too big to do it in a single database that's why I am trying to do a split. From the Postgres documentation what I think I am trying to do is "Multiple-Server Parallel Query Execution".
At the moment the only solution I think to implement is to build API of databases address and use it to get data across the network into the main parent database when needed. I also found Postgres external extension called Citus that might do the job but I don't know how to implement the unique key across multiple databases (or Shards like Citus call it).
Is there any better way to do it?
Citus would most likely solve your problem. It lets you use unique keys across shards if it is the distribution column, or if it is a composite key and contains the distribution column.
You can also use distributed-partitioned table in citus. That is a partitioned table on some column (timestamp ?) and hash distributed table on some other column (like what you use in your existing approach). Query parallelization and data collection would be handled by Citus for you.
Is it possible to get the table structure like db2look from SQL?
Or the only way is from command line? Thus, by wrapping a external stored procedure in C I could call the db2look, but that is not what I am looking for.
Clarification added later:
I want to know which tables have the non logged option from SQL.
It is possible to create the table structure from regular SQL and the public DB2 catalog - however, it is complex and requires some deeper skills.
The metadata is available in the DB2 catalog views in the SYSCAT schema. For a regular table you would first start off by looking into the values in SYSCAT.TABLES and SYSCAT.COLUMNS. From there you would need to branch off to other views depending on what table and column options you are after, whether time-travel tables, special partitioning rules, or many other options are involved.
Serge Rielau published an article on developerWorks called Backup and restore SQL schemas for DB2 Universal Database that provides a set of stored procedures that will do exactly what you're looking for.
The article is quite old (2006) so you may need to put some time in to update the procedures to be able to handle features that were added to DB2 since the date of publication, but the procedures may work for you now and are a nice jumping off point.
We have a large table in our Postgres production database which we want to start "sharding" using foreign tables and inheritance.
The desired architecture will be to have 1 (empty) table that defines the schema and several foreign tables inheriting from the empty "parent" table. (possible with Postgres 9.5)
I found this well written article https://www.depesz.com/2015/04/02/waiting-for-9-5-allow-foreign-tables-to-participate-in-inheritance/ that explains everything on how to do it from scratch.
My question is how to reduce the needed migration of data to a minimum.
We have this 100+ GB table now, that should become our first "shard". And in the future we will regulary add new "shards". At some point, the older shards will be moved to another tablespace (on cheaper hardware since they become less important).
My question now:
Is there a way to "ALTER" an existing table to be a foreign table instead?
No way to use alter table to do this.
You really have to basically do it manually. This is no different (really) than doing table partitioning. You create your partitions, you load the data. You direct reads and writes to the partitions.
Now in your case, in terms of doing sharding there are a number of tools I would look at to make this less painful. First, if you make sure your tables are split the way you like them first, you can use a logical replication solution like Bucardo to replicate the writes while you are moving everything over.
There are some other approaches (parallelized readers and writers) that may save you some time at the expense of db load, but those are niche tools.
There is no native solution for shard management of standard PostgreSQL (and I don't know enough about Postgres-XL in this regard to know how well it can manage changing shard criteria). However pretty much anything is possible with a little work and knowledge.
Caveats:
Let me first clarify that this is not a question about whether to use surrogates primary keys or not.
Also, this is NOT a related to identities (SQL Server) / Sequences (Oracle) and their pros / cons. I did get a fair bit of idea about that thanks to this, this and this
Question:
I come from a SQL Server background and have always used identity columns as surrogate primary keys for most tables.
Based on my knowledge of Oracle, I find that the nearest equivalent in Oracle are SEQUENCES which can be used to simulate something similar to Identity in SQL server.
As I am new to Oracle and my database has 100+ tables, the main thing that i am concerned about :-
Considering i have to create a sequence for each table in Oracle (almost), would this be the standard accepted implementation for simulating Identity or is there a better / easier way to achieve this kind of implementation in Oracle?
Are there any specific GOTCHA's related to having so many sequences in Oracle?
The system supports both Oracle 10G and 11G
Considering i have to create a
sequence for each table in Oracle
(almost), would this be the standard
accepted implementation for simulating
Identity or is there a better / easier
way to achieve this kind of
implementation in Oracle?
Yes, it is very typical in Oracle to create a sequence for each table. It is possible to use the same sequence for several tables, but you run the risk of making it a bottleneck by using a single sequence for many/all tables: see this AskTom q/a
Are there any specific GOTCHA's
related to having so many sequences in
Oracle?
None that I can think of.
100+ tables is not very many. I routinely work on databases with several hundred sequences, one for each table. The more the merrier.
It's even conceivable to have more sequences than tables - unlike identity columns in other DBMSs, sequences can be used for more than just creating surrogate key values.
An alternative is to use GUIDs - in Oracle you can call SYS_GUID to generate unique values.
A good article, followed by comments with pros and cons for both approaches: http://rwijk.blogspot.com/2009/12/sysguid.html