sequence in DB2 generates duplicate values

sequence in DB2 generates duplicate values - db2

My application uses DB2 data base. I had created a sequence for my table to generate the primary key,it was working fine uptill today, but now it seems to be generating existing values and I am getting DuplicateKeyException while inserting values. After a bit of googling I found that we can reset the sequence again.
Could some one please help me with the best possible option as I have not worked with sequences much and the things to consider while going with that approach.
If I have to reset the sequence then what should be the best way to do it and again points to consider before doing so.Also it would be great to know what could be the reason behind the issue that I am facing so that I can take care of it in future.
Just for information the max value assigned while creating the sequence has not yet reached.
Thanks a lot in advance.

ALTER SEQUENCE SCHEMA.SEQ_NAME RESTART WITH NUMERIC_VALUE;
this was required in my case i.e restarting the sequence with a value higher then the current max value of the id field for which the sequence was being used.
The NUMERIC_VALUE denotes the value which is higher then current max value for my sequence generated field.
Hope it will be helpful for others.
The cause probably for this issue was manual insertion of records in the db.

Related

Updating null value will split row in file Postgresql

I have a colleague that tells me that the reason why we add default values instead of null values to our table is, that Postgresql allocates a number of bytes to a file when a new row is stored. And if this column gets updated later on, it might end up splitting this row into two rows in the file, and multiple IO operations will have to occure when reading and writing.
I'm not a Postgresql expert at all, and I have hard time finding any documentation suggesting this.
Can someone clearify this for me?
Is this a good reason for not having null values in a column, and using some default instead? Will there be any hughe performance issues in such cases?

I'm not sure I'd say the documentation is hard to find:
https://www.postgresql.org/docs/10/storage-file-layout.html
https://www.postgresql.org/docs/current/storage-page-layout.html
It's fair to say there is a lot to absorb though.
So, the reason you SHOULD have defaults rather than NULLs is because you don't want to have an "unknown" in your column. Start with the requirements before worrying about efficiency tweaks.
Whether a particular value is null is stored in a bitmap. This bitmap is optional - so if there are no nulls in a row then the bitmap is not created. So - that suggests nulls would make a row bigger. But wait, if a bit is set to show null then you don't need the overhead of of the value structure, and (IIRC - you'll need to check the docs) that can end up saving you space. There is a good chance that general per-row overheads and type alignment issues are far more important to you though.
However - all of this is ignoring the elephant* in the room which is that if you update a row then PostgreSQL marks the current version of the row as expired and creates a whole new row. So the whole description of how updates work is just confused in that first paragraph you wrote.
So - don't worry about the efficiency of nulls in 99.9% of cases. Worry about using them properly and about the general structure of your database, its indexes and queries.
* no I'm not apologising for that pun.

Partitioning of related tables in PostgreSQL

I've checked documentation and saw some presentations, read blogs, but can't find examples of partitioning of more than a single table in PostgreSQL - and that's what we need. Our tables are insert only audit trail with master-detail structure and we aim to solve our problem with slow data removal problem, currently done using delete.
The simplified structure and some queries are shown in the following fiddle: https://www.db-fiddle.com/f/2mRXT4wGjM2ZSftjgKyZce/46
The issue I'm investigating right now is how to effectively query the detail table, be it in JOIN or directly. Because the timestamp field is part of the partition key I understand that using it in query is essential. I don't understand why JOIN is not able to figure this out when timestamp equality is used in ON clause (couple of explain examples are in the fiddle).
Then there are broader questions:
What is general recommended strategy for similar case? We expect that timestamp is essential for our query, so it feels natural to use it as partitioning key.
I've made a short experiment (so no real experiences from it yet) and based the partitioning solely on id range. This seems to have one advantage - predictable partition table sizes (more or less, depending on the size of variable columns, of course). It is possible to add check timestamp ... conditions on any full partition (and open interval check on active one too!) which helps with partition pruning. This has nice benefit that detail table needs single column FK referencing only master.id (perhaps even pruning better during JOINs). Any ideas or experiences with something similar?
We would rather have time-based partitioning, seems more natural, but it's not a hard condition. The need of dragging timestamp to another table and to its FK, etc. makes it less compelling.
Obviously, we want both tables (all, to be precise, we will have more detail table types) partitioned along the same range, be it id or timestamp. I guess not doing so beats the whole purpose of partitioning as we would not be able to remove data related to the master partitions.
I welcome any pointers or ideas on how to do it properly. In the end we will decide for ourselves, but there is not much material to help with the decision right now. Thanks.

Your strategy is good. Partition related tables by the common timestamp and make sure that the partition boundaries are the same.
You probably didn't get the efficient partitionwise join because you didn't set enable_partitionwise_join to on. That parameter is turned off by default because it can consume substantial query planning time that you don't want to expend unless you know you can benefit.

Skip some ranges in postgresql sequence?

I want to skip some ranges in sequence:
Create sequence id_seq;
Consider I have a sequence as Id_seq.. And it starts from 100..
When it reaches to 199.. Then it should start with 1000 and when it reaches 1999 .. It should start with 10000..
setval(100,'Id_seq');
Whether postgres has any default configuration to do this?
Multiple process will use this sequence.. So assigning manually in process using setval() lead some difficulties..

No there is nothing built in to do this. I've never heard of anyone wanting to do this before.
If you really care about the numbers you get then a sequence isn't the right thing for you anyway. You can get gaps in it quite easily. It's designed to generate differing numbers without impacting concurrency.

PostgreSQL - Clustering never completes - long key?

I am having problems with clustering a table where the key consists of one char(23) field and two TimeStamp fields. The char(23) field contains Alpha-Numeric values. The clustering operation never finishes. I have let it run for 24 hours and it still did not finish.
Has anyone run into this kind of problem before? Is my theory that the reason is the long key field makes sense? We have dealt with much larger tables that do not have long keys and we have always been able to perform DB operations on them without any problem. That makes me think that it might have to do with the size of the key in this case.

Cluster rewrites the table so it must wait on locks. It is possible that it is never getting the lock it needs. Why are you setting varchar(64000)? Why not just unrestricted varchar? And how big is this index?
If size is a problem it has to be based on the index size not the key size. I don't know what the effect of toasted key attributes is on cluster because these are moved into extended storage. TOAST might complicate CLUSTER and I have never heard of anyone clustering on a TOASTed attribute. It wouldn't make much sense to do so. TOASTing is necessary for any attribute more than 4k in size.
A better option is to create an index for the values without the possibly toasted value, and then cluster on that. That should give you something very close to what you'd get otherwise.

How do I reset the primary key count/max in Core Data?

I've managed to delete all entities stored using Core Data (following this answer).
The problem is, I've noticed the primary key is still counting upwards. Is there a way (without manually writing a SQL query) to reset the Z_MAX value for the entity? Screenshot below to clarify what I mean.
The value itself isn't an issue, but I'm just concerned that at some point in the future the maximum integer may be reached and I don't want this to happen. My application syncs data with a web service and caches it using core data, so potentially the primary key may increase by hundreds/thousands at a time. Deleting the entire Sqlite DB isn't an option as I need to retain some of the information for other entities.
I've seen the 'reset' method, but surely that will reset the entire Sqlite DB? How can I reset the primary key for just this one set of entities? There are no relationships to other entities with the primary key I want to reset.

I'm just concerned that at some point
in the future the maximum integer may
be reached and I don't want this to
happen.
Really? What type is your primary key? Because if it's anything other than an Int16 you really don't need to care about that. A signed 32-bit integer gives you 2,147,483,647 values. A 64-bit signed integer gives you 9,223,372,036,854,775,807 values.
If you think you're going to use all those up, you probably have more important things to worry about than having an integer overflow.
More importantly, if you're using Core Data you shouldn't need to care about or really use primary keys. Core Data isn't a database - when using Core Data you are meant to use relationships and not really care about primary keys. Core Data has no real need for them.

Core Data uses 64 bit integer primary keys. Unless I/O systems get many orders of magnitude faster, which unlike CPUs, they have not in recent years, you could save as fast as possible for millions of years.
Please file a bug with bugreport.apple.com when you run out.
Ben

From the sqlite faq:
If the largest possible integer key,
9223372036854775807, then an unused
key value is chosen at random.
9,223,372,036,854,775,807 / (1024^4) = 8,388,608 tera-rows. I suspect you will run into other limits first. :) http://www.sqlite.org/limits.html reviews the more practical limits you'll run into.
asking sqlite3 about a handy core data store yields:
sqlite> .schema zbookmark
CREATE TABLE ZBOOKMARK ( Z_PK INTEGER PRIMARY KEY, ...
note lack of "autoincrement", which in sqlite means to never reuse a key. So core data does allow old keys to be reused, so you're pretty safe even if you manage to add (and remove most of) that many rows over time.
If you really do want to reset it, poking around in apple's z_ tables is really the only way. [This is not to say that this is a thing you should in fact do. It is not (at least in any code you want to ship), even if it seems to work.]

Besides the fact that directly/manually editing a Core Data store is a horrendously stupid idea, the correct answer is:
Delete the database and re-create it.
Of course, you're going to lose all your data doing that, but if you're that concerned about this little number, then that's ok, right?
Oh, and Core Data will make sure you don't have primary key collisions.