We are using postgresql 9.5 (AWS RDS) and have 100 of tables but only 1 table's sequence behaving weirdly, current value of table where seq being used is 670 but nextval('seq1') is 20000 and when I run nextval again it is like 20090 and then 20122 etc.
So there is no fix incrementing value i.e. 1,2 or 100 sequence incrementing automatically. Above is the case when I call nextval but without explicitly called nextval seq values gets updated. I have checked all logs of system and code as well but could not find any issue that without using or running insert of table how seq gets updated this seq not being used in any of the table there is only 1 table with this seq.
Type| Start |Minimum|Maximum | Increment | Cycles? | Cache
--------+-------+---------+---------------------+-----------+
bigint | 1 | 1 | 9223372036854775807 |1 | no | 1
Related
Index bloats are reaching 57%, while table bloat is 9% only and autovacuum_vacuum_Scale_factor is 10% only.
what is more surprising is even primary key is having bloat of 57%. My understanding is since my primary key is auto incrementing and single column key only so after 10% of table dead tuples, primary key index should also have 10% dead tuples.
Now when autovacuum will run at 10% of dead tuples , it will clean dead tuples. The dead tuple space now becomes bloat and this should be reused by new updates, insert. But this isn't happening in my database, here bloat size keeps on increasing.
FYI:
Index Bloat:
current_database | schemaname | tblname | idxname | real_size | extra_size | extra_ratio | fillfactor | bloat_size | bloat_ratio
| is_na
------------------+------------+----------------------+----------------------------------------------------------+------------+------------+------------------+------------+------------+-------------------
+-------
stackdb | public | data_entity | data_entity_pkey | 2766848000 | 1704222720 | 61.5943745373797 | 90 | 1585192960 | 57.2923760177646
Table Bloat:
current_database | schemaname | tblname | real_size | extra_size | extra_ratio | fillfactor | bloat_size | bloat_ratio | is_na
stackdb | public | data_entity | 10106732544 | 1007288320 | 9.96650812332014 | 100 | 1007288320 | 9.96650812332014 | f
Autovacuum Settings:
stackdb=> show autovacuum_vacuum_scale_factor;
autovacuum_vacuum_scale_factor
--------------------------------
0.1
(1 row)
stackdb=> show autovacuum_vacuum_threshold;
autovacuum_vacuum_threshold
-----------------------------
50
(1 row)
Note:
autovacuum is on
autovacuum is running successfully at defined intervals.
postgreSQL is running version 10.6. Same issue has been found with version 12.x
First: an index bloat of 57% is totally healthy. Don't worry.
Indexes become more bloated than tables, because the empty space cannot be reused as freely as it can be in a table. The table, also known al “heap”, has no predetermined ordering: if a new row is written as the result of an INSERT or UPDATE, it ends up in the first page that has enough free space, so it is easy to keep bloat low if VACUUM does its job.
B-tree indexes are different: their entries have a certain ordering, so the database is not free to choose where to put the new row. So you may have to put it into a page that is already full, causing a page split, while elsewhere in the index there are pages that are almost empty.
OS: macOS 11.4 (Big Sur)
PostgreSQL: 13.4
I would expect the default behavior of sequence numbers (that is, auto-generated sequences used typically for PK generation on record-inserts) to be straightforward on server re-starts: namely, that sequence numbers always "start where they left off". If the last record inserted had an auto-sequenced ID of 5, then the next record-insert should have ID of 6. And so on.
But recently, more than once, I have observed less than desirable default behavior for sequence numbers. Here are two different observations, both presumably resulting from the same suspect behavior after database server re-starts:
Let's suppose the record in your table with ID of 1 was deleted, but that records with ID 2-5 exist. Then on server re-start, the sequence number started at 1. The first record insert works (that is, a record with ID of 1 was successfully inserted). Then the next few inserts result in PK-duplicate exceptions! Once the sequence number reaches 6, inserts start working again.
Again, let's suppose records in your table exist for IDs 2-5. Then after server re-start, the sequence number starts at some larger number, like 35! In this case, a large swath of IDs between (exclusively) 5-35 are unused (making it seem as if there were records that were deleted with those IDs).
This certainly seems awkward behavior. Is there some way to set up sequence numbers to avoid this behavior?
Sample sequence number from my database:
mydb=# \dS+ birthday_id_seq
Sequence "public.birthday_id_seq"
Type | Start | Minimum | Maximum | Increment | Cycles? | Cache
--------+-------+---------+---------------------+-----------+---------+-------
bigint | 1 | 1 | 9223372036854775807 | 1 | no | 1
mydb=# \dS+ birthdays
Table "public.birthdays"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------+-----------------------------+-----------+----------+--------------------------------------+----------+--------------+-------------
id | bigint | | not null | nextval('birthday_id_seq'::regclass) | plain | |
birthdate | date | | | | plain | |
Indexes:
"birthdays_pkey" PRIMARY KEY, btree (id)
Access method: heap
mydb=# \d+
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
--------+---------------------+----------+-------------+-------------+------------+-------------
public | birthday_id_seq | sequence | kodecharlie | permanent | 8192 bytes |
(1 rows)
That is normal behavior:
Any sequence values that were already fetched by nextval, but never used in an INSERT that got committed, will be lost. That could happen if you perform a fast (or an immediate) shutdown while the INSERT was taking place.
Moreover, the first time you run nextval, PostgreSQL logs a WAL entry that consumes the next 32 values, so that it doesn't have to log each individual nextval. These values are lost after a restart.
As for the sequence going backwards after a restart:
Sequences, like all other objects, are WAL logged. WAL is guaranteed to be flushed during commit. Now if you start a transaction, fetch a sequence value and perform an insert, but don't commit the transaction yet, the changes to the sequence may still be in WAL buffers and not flushed to disk.
A crash that interrupts the transaction will cause the sequence to be reset to the last committed value, so you may get the same sequence number again. That is fine, because any sequence values fetched from the sequence since have not been committed either.
Which of the two behaviors you see depends on concurrent transactions: Typically, you will see missing values after a restart. But if you start a transaction, call nextval and crash the database without committing, you may see the same sequence value again after a restart.
You may want to read my article for more details.
Our team is having a lot of issues with the Spark API particularly with large schema tables. We currently have a program written in Scala that utilizes the Apache spark API to create two Hive tables from raw files. We have one particularly very large raw data file that is giving us issues that contains around ~4700 columns and ~200,000 rows.
Every week we get a new file that shows the updates, inserts and deletes that happened in the last week. Our program will create two tables – a master table and a history table. The master table will be the most up to date version of this table while the history table shows all changes inserts and updates that happened to this table and showing what changed. For example, if we have the following schema where A and B are the primary keys:
Week 1 Week 2
|-----|-----|-----| |-----|-----|-----|
| A | B | C | | A | B | C |
|-----|-----|-----| |-----|-----|-----|
| 1 | 2 | 3 | | 1 | 2 | 4 |
|-----|-----|-----| |-----|-----|-----|
Then the master table will now be
|-----|-----|-----|
| A | B | C |
|-----|-----|-----|
| 1 | 2 | 4 |
|-----|-----|-----|
And The history table will be
|-----|-----|-------------------|----------------|-------------|-------------|
| A | B | changed_column | change_type | old_value | new_value |
|-----|-----|-------------------|----------------|-------------|-------------|
| 1 | 2 | C | Update | 3 | 4 |
|-----|-----|-------------------|----------------|-------------|-------------|
This process is working flawlessly for shorter schema tables. We have a table that has 300 columns but over 100,000,000 rows and this code still runs as expected. The process above for the larger schema table runs for around 15 hours, and then crashes with the following error:
Exception in thread "main" java.lang.StackOverflowError
at scala.collection.generic.Growable$class.loop$1(Growable.scala:52)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:57)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
Here is a code example that takes around 4 hours to run for this larger table, but runs in 20 seconds for other tables:
var dataframe_result = dataframe1.join(broadcast(dataframe2), Seq(listOfUniqueIds:_*)).repartition(100).cache()
We have tried all of the following with no success:
Using hash broad-cast joins (dataframe2 is smaller, dataframe1 is huge)
Repartioining on different numbers, as well as not repartitioning at all
Caching the result of the dataframe (we originally did not do this).
What is causing this error and how can we fix it? The only difference between this problem table is that it has so many columns. Is there an upper limit to how many columns Spark can handle?
Note: We are running this code on a very large MAPR cluster and we tried giving the code 500GB of RAM and its still failing.
Is there a simple (ie. non-hacky) and race-condition free way to create a partitioned sequence in PostgreSQL. Example:
Using a normal sequence in Issue:
| Project_ID | Issue |
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 2 | 4 |
Using a partitioned sequence in Issue:
| Project_ID | Issue |
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
I do not believe there is a simple way that is as easy as regular sequences, because:
A sequence stores only one number stream (next value, etc.). You want one for each partition.
Sequences have special handling that bypasses the current transaction (to avoid the race condition). It is hard to replicate this at the SQL or PL/pgSQL level without using tricks like dblink.
The DEFAULT column property can use a simple expression or a function call like nextval('myseq'); but it cannot refer to other columns to inform the function which stream the value should come from.
You can make something that works, but you probably won't think it simple. Addressing the above problems in turn:
Use a table to store the next value for all partitions, with a schema like multiseq (partition_id, next_val).
Write a multinextval(seq_table, partition_id) function that does something like the following:
Create a new transaction independent on the current transaction (one way of doing this is through dblink; I believe some other server languages can do it more easily).
Lock the table mentioned in seq_table.
Update the row where the partition id is partition_id, with an incremented value. (Or insert a new row with value 2 if there is no existing one.)
Commit that transaction and return the previous stored id (or 1).
Create an insert trigger on your projects table that uses a call to multinextval('projects_table', NEW.Project_ID) for insertions.
I have not used this entire plan myself, but I have tried something similar to each step individually. Examples of the multinextval function and the trigger can be provided if you want to attempt this...
There is a table:
CREATE TABLE temp
(
IDR decimal(9) NOT NULL,
IDS decimal(9) NOT NULL,
DT date NOT NULL,
VAL decimal(10) NOT NULL,
AFFID decimal(9),
CONSTRAINT PKtemp PRIMARY KEY (IDR,IDS,DT)
)
;
Let's see the plan for select star query:
SQL>explain plan for select * from temp;
Explained.
SQL> select plan_table_output from table(dbms_xplan.display('plan_table',null,'serial'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 61 | 2 (0)|
| 1 | TABLE ACCESS FULL| TEMP | 1 | 61 | 2 (0)|
---------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
11 rows selected.
SQL server 2008 shows in the same situation Clustered index scan. What is the reason?
select * with no where clause -- means read every row in the table, fetch every column.
What do you gain by using an index? You have to go to the index, get a rowid, translate the rowid into a table offset, read the file.
What happens when you do a full table scan? You go the th first rowid in the table, then read on through the table to the end.
Which one of these is faster given the table you have above? Full table scan. Why? because it skips having to to go the index, retreive values, then going back to the other to where the table lives and fetching.
To answer this more simply without mumbo-jumbo, the reason is:
Clustered Index = Table
That's by definition in SQL Server. If this is not clear, look up the definition.
To be absolutely clear once again, since most people seem to miss this, the Clustered Index IS the table itself. It therefore follows that "Clustered Index Scan" is another way of saying "Table Scan". Or what Oracle calls "TABLE ACCESS FULL"