How to create a PostgreSQL partitioned sequence? - postgresql

Is there a simple (ie. non-hacky) and race-condition free way to create a partitioned sequence in PostgreSQL. Example:
Using a normal sequence in Issue:
| Project_ID | Issue |
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 2 | 4 |
Using a partitioned sequence in Issue:
| Project_ID | Issue |
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |

I do not believe there is a simple way that is as easy as regular sequences, because:
A sequence stores only one number stream (next value, etc.). You want one for each partition.
Sequences have special handling that bypasses the current transaction (to avoid the race condition). It is hard to replicate this at the SQL or PL/pgSQL level without using tricks like dblink.
The DEFAULT column property can use a simple expression or a function call like nextval('myseq'); but it cannot refer to other columns to inform the function which stream the value should come from.
You can make something that works, but you probably won't think it simple. Addressing the above problems in turn:
Use a table to store the next value for all partitions, with a schema like multiseq (partition_id, next_val).
Write a multinextval(seq_table, partition_id) function that does something like the following:
Create a new transaction independent on the current transaction (one way of doing this is through dblink; I believe some other server languages can do it more easily).
Lock the table mentioned in seq_table.
Update the row where the partition id is partition_id, with an incremented value. (Or insert a new row with value 2 if there is no existing one.)
Commit that transaction and return the previous stored id (or 1).
Create an insert trigger on your projects table that uses a call to multinextval('projects_table', NEW.Project_ID) for insertions.
I have not used this entire plan myself, but I have tried something similar to each step individually. Examples of the multinextval function and the trigger can be provided if you want to attempt this...

Related

PostgreSQL: on database restart, why is starting sequence number unpredictable?

OS: macOS 11.4 (Big Sur)
PostgreSQL: 13.4
I would expect the default behavior of sequence numbers (that is, auto-generated sequences used typically for PK generation on record-inserts) to be straightforward on server re-starts: namely, that sequence numbers always "start where they left off". If the last record inserted had an auto-sequenced ID of 5, then the next record-insert should have ID of 6. And so on.
But recently, more than once, I have observed less than desirable default behavior for sequence numbers. Here are two different observations, both presumably resulting from the same suspect behavior after database server re-starts:
Let's suppose the record in your table with ID of 1 was deleted, but that records with ID 2-5 exist. Then on server re-start, the sequence number started at 1. The first record insert works (that is, a record with ID of 1 was successfully inserted). Then the next few inserts result in PK-duplicate exceptions! Once the sequence number reaches 6, inserts start working again.
Again, let's suppose records in your table exist for IDs 2-5. Then after server re-start, the sequence number starts at some larger number, like 35! In this case, a large swath of IDs between (exclusively) 5-35 are unused (making it seem as if there were records that were deleted with those IDs).
This certainly seems awkward behavior. Is there some way to set up sequence numbers to avoid this behavior?
Sample sequence number from my database:
mydb=# \dS+ birthday_id_seq
Sequence "public.birthday_id_seq"
Type | Start | Minimum | Maximum | Increment | Cycles? | Cache
--------+-------+---------+---------------------+-----------+---------+-------
bigint | 1 | 1 | 9223372036854775807 | 1 | no | 1
mydb=# \dS+ birthdays
Table "public.birthdays"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------+-----------------------------+-----------+----------+--------------------------------------+----------+--------------+-------------
id | bigint | | not null | nextval('birthday_id_seq'::regclass) | plain | |
birthdate | date | | | | plain | |
Indexes:
"birthdays_pkey" PRIMARY KEY, btree (id)
Access method: heap
mydb=# \d+
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
--------+---------------------+----------+-------------+-------------+------------+-------------
public | birthday_id_seq | sequence | kodecharlie | permanent | 8192 bytes |
(1 rows)
That is normal behavior:
Any sequence values that were already fetched by nextval, but never used in an INSERT that got committed, will be lost. That could happen if you perform a fast (or an immediate) shutdown while the INSERT was taking place.
Moreover, the first time you run nextval, PostgreSQL logs a WAL entry that consumes the next 32 values, so that it doesn't have to log each individual nextval. These values are lost after a restart.
As for the sequence going backwards after a restart:
Sequences, like all other objects, are WAL logged. WAL is guaranteed to be flushed during commit. Now if you start a transaction, fetch a sequence value and perform an insert, but don't commit the transaction yet, the changes to the sequence may still be in WAL buffers and not flushed to disk.
A crash that interrupts the transaction will cause the sequence to be reset to the last committed value, so you may get the same sequence number again. That is fine, because any sequence values fetched from the sequence since have not been committed either.
Which of the two behaviors you see depends on concurrent transactions: Typically, you will see missing values after a restart. But if you start a transaction, call nextval and crash the database without committing, you may see the same sequence value again after a restart.
You may want to read my article for more details.

Versioning in the database

I want to store full versioning of the row every time a update is made for amount sensitive table.
So far, I have decided to use the following approach.
Do not allow updates.
Every time a update is made create a new
entry in the table.
However, I am undecided on what is the best database structure design for this change.
Current Structure
Primary Key: id
id(int) | amount(decimal) | other_columns
First Approach
Composite Primary Key: id, version
id(int) | version(int) | amount(decimal) | change_reason
1 | 1 | 100 |
1 | 2 | 20 | correction
Second Approach
Primary Key: id
Uniqueness Index on [origin_id, version]
id(int) | origin_id(int) | version(int) | amount(decimal) | change_reason
1 | NULL | 1 | 100 | NULL
2 | 1 | 2 | 20 | correction
I would suggest a new table which store unique id for item. This serves as lookup table for all available items.
item Table:
id(int)
1000
For the table which stores all changes for item, let's call it item_changes table. item_id is a FOREIGN KEY to item table's id. The relationship between item table to item_changes table, is one-to-many relationship.
item_changes Table:
id(int) | item_id(int) | version(int) | amount(decimal) | change_reason
1 | 1000 | 1 | 100 | NULL
2 | 1000 | 2 | 20 | correction
With this, item_id will never be NULL as it is a valid FOREIGN KEY to item table.
The best method is to use Version Normal Form (vnf). Here is an answer I gave for a neat way to track all changes to specific fields of specific tables.
The static table contains the static data, such as PK and other attributes which do not change over the life of the entity or such changes need not be tracked.
The version table contains all dynamic attributes that need to be tracked. The best design uses a view which joins the static table with the current version from the version table, as the current version is probably what your apps need most often. Triggers on the view maintain the static/versioned design without the app needing to know anything about it.
The link above also contains a link to a document which goes into much more detail including queries to get the current version or to "look back" at any version you need.
Why you are not going for SCD-2 (Slowly Changing Dimension), which is a rule/methodology to describe the best solution for your problem. Here is the SCD-2 advantage and example for using, and it makes standard design pattern for the database.
Type 2 - Creating a new additional record. In this methodology, all history of dimension changes is kept in the database. You capture attribute change by adding a new row with a new surrogate key to the dimension table. Both the prior and new rows contain as attributes the natural key(or other durable identifiers). Also 'effective date' and 'current indicator' columns are used in this method. There could be only one record with the current indicator set to 'Y'. For 'effective date' columns, i.e. start_date, and end_date, the end_date for current record usually is set to value 9999-12-31. Introducing changes to the dimensional model in type 2 could be very expensive database operation so it is not recommended to use it in dimensions where a new attribute could be added in the future.
id | amount | start_date |end_date |current_flag
1 100 01-Apr-2018 02-Apr-2018 N
2 80 04-Apr-2018 NULL Y
Detail Explanation::::
Here, all you need to add the 3 extra column, START_DATE, END_DATE, CURRENT_FLAG to track your record properly. When the first time record inserted # source, this table will be store the value as:
id | amount | start_date |end_date |current_flag
1 100 01-Apr-2018 NULL Y
And, when the same record will be updated then you have to update the "END_DATE" of the previous record as current_system_date and "CURRENT_FLAG" as "N", and insert the second record as below. So you can track everything about your records. as below...
id | amount | start_date |end_date |current_flag
1 100 01-Apr-2018 02-Apr-2018 N
2 80 04-Apr-2018 NULL Y

What is the column limit for Spark Data Frames?

Our team is having a lot of issues with the Spark API particularly with large schema tables. We currently have a program written in Scala that utilizes the Apache spark API to create two Hive tables from raw files. We have one particularly very large raw data file that is giving us issues that contains around ~4700 columns and ~200,000 rows.
Every week we get a new file that shows the updates, inserts and deletes that happened in the last week. Our program will create two tables – a master table and a history table. The master table will be the most up to date version of this table while the history table shows all changes inserts and updates that happened to this table and showing what changed. For example, if we have the following schema where A and B are the primary keys:
Week 1 Week 2
|-----|-----|-----| |-----|-----|-----|
| A | B | C | | A | B | C |
|-----|-----|-----| |-----|-----|-----|
| 1 | 2 | 3 | | 1 | 2 | 4 |
|-----|-----|-----| |-----|-----|-----|
Then the master table will now be
|-----|-----|-----|
| A | B | C |
|-----|-----|-----|
| 1 | 2 | 4 |
|-----|-----|-----|
And The history table will be
|-----|-----|-------------------|----------------|-------------|-------------|
| A | B | changed_column | change_type | old_value | new_value |
|-----|-----|-------------------|----------------|-------------|-------------|
| 1 | 2 | C | Update | 3 | 4 |
|-----|-----|-------------------|----------------|-------------|-------------|
This process is working flawlessly for shorter schema tables. We have a table that has 300 columns but over 100,000,000 rows and this code still runs as expected. The process above for the larger schema table runs for around 15 hours, and then crashes with the following error:
Exception in thread "main" java.lang.StackOverflowError
at scala.collection.generic.Growable$class.loop$1(Growable.scala:52)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:57)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
Here is a code example that takes around 4 hours to run for this larger table, but runs in 20 seconds for other tables:
var dataframe_result = dataframe1.join(broadcast(dataframe2), Seq(listOfUniqueIds:_*)).repartition(100).cache()
We have tried all of the following with no success:
Using hash broad-cast joins (dataframe2 is smaller, dataframe1 is huge)
Repartioining on different numbers, as well as not repartitioning at all
Caching the result of the dataframe (we originally did not do this).
What is causing this error and how can we fix it? The only difference between this problem table is that it has so many columns. Is there an upper limit to how many columns Spark can handle?
Note: We are running this code on a very large MAPR cluster and we tried giving the code 500GB of RAM and its still failing.

How to properly index strings for lookup and excepts, the PostgreSQL way

Due to infrastructure costs, I've been studying the possibility to migrate a few databases to PostgreSQL. So far I am loving it. But there are a few topics I am quite lost. I need some guidance on one of them.
I have an ETL process that queries "deltas" in my database and imports the new data. To do so, I use lookup tables that store hashbytes of some strings to facilitate the lookup. This works in SQL Server, but apparently things work quite differently in PostgreSQL. In SQL Server, using hashbytes + except is suggested when working with millions of rows.
Let's suppose the following table
+----+-------+------------------------------------------+
| Id | Name | hash_Name |
+----+-------+------------------------------------------+
| 1 | Mark | 31e9697d43a1a66f2e45db652019fb9a6216df22 |
| 2 | Pablo | ce7169ba6c7dea1ca07fdbff5bd508d4bb3e5832 |
| 3 | Mark | 31e9697d43a1a66f2e45db652019fb9a6216df22 |
+----+-------+------------------------------------------+
And my lookup table
+------------------------------------------+
| hash_Name |
+------------------------------------------+
| 31e9697d43a1a66f2e45db652019fb9a6216df22 |
+------------------------------------------+
When querying new data (Pablo's hash), I can advance from the simplified query bellow:
SELECT hash_name
FROM mytable
EXCEPT
SELECT hash_name
FROM mylookup
Thinking the PostgreSQL way, how could I achieve this? Should I index and use EXCEPT? Or is there a better way of doing so?
From my research, I couldn't find much regarding storing hashbytes. Apparently, it is a matter of creating indexes and choosing the right index for the job. More precisely: BTREE for single field indexes and GIN for multiple field indexes.

DB2 table partitioning and delete old records based on condition

I have a table with few million records.
___________________________________________________________
| col1 | col2 | col3 | some_indicator | last_updated_date |
-----------------------------------------------------------
| | | | yes | 2009-06-09.12.2345|
-----------------------------------------------------------
| | | | yes | 2009-07-09.11.6145|
-----------------------------------------------------------
| | | | no | 2009-06-09.12.2345|
-----------------------------------------------------------
I have to delete records which are older than month with some_indicator=no.
Again I have to delete records older than year with some_indicator=yes.This job will run everyday.
Can I use db2 partitioning feature for above requirement?.
How can I partition table using last_updated_date column and above two some_indicator values?
one partition should contain records falling under monthly delete criterion whereas other should contain yearly delete criterion records.
Are there any performance issues associated with table partitioning if this table is being frequently read,upserted?
Any other best practices for above requirement will surely help.
I haven't done much with partitioning (I've mostly worked with DB2 on the iSeries), but from what I understand, you don't generally want to be shuffling things between partitions (ie - making the partition '1 month ago'). I'm not even sure if it's even possible. If it was, you'd have to scan some (potentially large) portion of your table every day, just to move it (select, insert, delete, in a transaction).
Besides which, partitioning is a DB Admin problem, and it sounds like you just have a DB User problem - namely, deleting 'old' records. I'd just do this in a couple of statements:
DELETE FROM myTable
WHERE some_indicator = 'no'
AND last_updated_date < TIMESTAMP(CURRENT_DATE - 1 MONTH, TIME('00:00:00'))
and
DELETE FROM myTable
WHERE some_indicator = 'yes'
AND last_updated_date < TIMESTAMP(CURRENT_DATE - 1 YEAR, TIME('00:00:00'))
.... and you can pretty much ignore using a transaction, as you want the rows gone.
(as a side note, using 'yes' and 'no' for indicators is terrible. If you're not on a version that has a logical (boolean) type, store character '0' (false) and '1' (true))