Spring Data JPA bulk identifier generation - postgresql

I have an entity, for which apart from the primary key, an extra unique identifier should be generated:
#Entity
class MyEntity(
val otherId: String // <- this id is unique as well
) {
#Id
#Generated
var id: UUID // PK
}
otherId property value is derived from a postgres sequence value, by calling SELECT nextval(...) and adding a prefix string. When I do bulk inserts, I have to resort to a custom query defined in my JPA repository for the entity, which retrieves multiple sequence values at once, but I'd like to make this process automatic.
I tried to implement IdentifierGenerator interface, but the best I could achieve is a single SELECT nextval query was made for each new entity inserted, which is totally unacceptable in my case since batches can consist of hundreds of entities. Digging into the hibernate details didn't give me an answer how to do that either.
Is there a way to generate a number of ids via some callback/hook for multiple entities at once? Or I still have to do everything by hand?

There are hooks to implement this, see this article as an example: https://thorben-janssen.com/custom-sequence-based-idgenerator/
To improve performance, you will have to configure the increment size, which by default is 50. This means that it will increment the sequence by 50 and put these values in a pool from which the values are served for identity generation.

Related

JPA: generate non pk unique and random alphanumeric value

I want to uniquely identity an entity without using the primary key. So I thought about generating an unique and random value. Moreover, value must be easy to read / manually copy and is expected to be 6 or 7 characters long.
Design
My entity A:
public class A{
// ...
#Column(name="value", unique=true, nullable=false, insertable=false, updatable=false)
private String value;
// ...
public String getValue(){
return value;
}
protected void setValue(String value){
this.value = value;
}
}
represented in the database by the table
CREATE TABLE IF NOT EXISTS schema.mytable{
-- ...
value TEXT NOT NULL DEFAULT generate_unique_value_for_mytable(),
-- ...
CONSTRAINT "un_value" UNIQUE (value),
-- ...
}
I thought letting the database handling this and then fetch the value...
Problem
With the current design, value is correctly generated in the database but when JPA fetches A entities, value field is empty.
I cannot remove insertable=false otherwise, it will hit against the NOT NULL constraint
If I remove insertable=false and I put some dummy data, the data overrides the value generated by generate_unique_value_for_mytable()
If I remove everything in the Column annotation, I can save the A entity but value is still empty
Ugly solution
I couldn't find a proof but it looks like having the database generating a value is a bad idea. I do have the same problem for a non-primary key field which is generated by a sequence: I cannot fetch the value from the database.
So my ugly solution is to decorate the create() method of the EJB responsible for A entities:
public class Aejb{
public void create(A entity){
// method kind of ensures randomness
String value = MyUtil.generateRandomValue();
A isThereAnyoneHere = findByValue(value);
while(isThereAnyoneHere != null){
String value = MyUtil.generateRandomValue();
isThereAnyoneHere = findByValue(value);
}
// unicity is ensured
entity.setValue(value);
em.persist(entity);
}
}
Questions
Can I fetch a non-primary key value generated by the database from a JPA entity? Value can be generated by a function or a sequence.
Is there a more elegant solution than my ugly workaround to provide an unique and random value?
Yes.You haven't mentioned your database, but it is possible for
Oracle to return the value inserted via triggers, and have
Eclipselink obtain this value in your model - see
https://www.eclipse.org/eclipselink/documentation/2.5/jpa/extensions/a_returninsert.htm
Set the value using a #PrePersist method that will get executed
before the entity is inserted, but if you are relying on one or more database queries, you will run into performance issues, as inserting a new A will be expensive. You might instead just insert the random value and deal with the occasional conflict, and pick some random that has less chance of overlaps, like a UUID.
If I understand correctly, #Generated annotation should do the trick. This annotation sets the value from database DEFAULT field value.
Example:
#Generated(GenerationTime.INSERT)
#Column(name="value", unique=true, nullable=false, insertable=false, updatable=false)
private String value;
However there is a drawback: if you decide to set value of your field in Java, it would be overwritten by Hibernate using the result from DEFAULT in your database.
Self-answer to mark question as closed
Final solution
We finally went for a combination of
Stored procedures: the database will generate the value. The procedure also ensures that the value is unique across the table
Named queries: to fetch the generated value by the procedure. I did not use NamedStoredProcedures because we are using PostgreSQL and PostgreSQL JDBC driver did not support name parameters which raised some problems.
With this configuration, the EJB is sure to have at most one database call to fetch the requested value.
Response to other answers
Here is a summary of the other answers feedback for self-reference and next readers:
Oracle trigger: we're using PostgreSQL :(
UUID: We had the constraint of having our unique and random code human-readable. An end-user is assumed to be able to manually rewrite it. Consequently, we could not have a long String such as an UUID.
PrePersist: Other business actions take place after the code generation in the same transaction which means that those actions need to be redone in case of collision. I'm not very confident about managing JPA exception (transaction scope and so on) so I preferred not to play with it.
#Generated: This is a Hibernate specific feature. We're using EclipseLink
Database Trigger: If code were purely generated at database level, I encountered the same problems of not fetching the value: the value is properly generated as database level but the entity will have the value as null

JPA and PostgreSQL with GenerationType.IDENTITY

I have a question about Postgres and GenerationType.Identity vs Sequence
In this example...
#Id
#SequenceGenerator(name="mytable_id_seq",
sequenceName="mytable_id_seq",
allocationSize=1)
#GeneratedValue(strategy = GenerationType.SEQUENCE,
generator="mytable_id_seq")
I understand that I am specifying a Postgres sequence to use via annotations.
However, I have an id column defined with the 'serial' type, I have read that I can simply use GenerationType.IDENTITY and it will automatically generate a db sequence and use it to auto increment.
If that's the case, I don't see an advantage to using the SEQUENCE annotations unless you are using an integer for an id or have some specific reason to use another sequence you have created. IDENTITY is alot less code and potentially makes it portable across databases.
Is there something I'm missing?
Thanks in advance for the feedback.
If you have a column of type SERIAL, it will be sufficient to annotate your id field with:
#Id #GeneratedValue(strategy=GenerationType.IDENTITY)
This is telling Hibernate that the database will be looking after the generation of the id column. How the database implements the auto-generation is vendor specific and can be considered "transparent" to Hibernate. Hibernate just needs to know that after the row is inserted, there will be an id value for that row that it can retrieve somehow.
If using GenerationType.SEQUENCE, you are telling Hibernate that the database is not automatically populating the id column. Instead, it is Hibernate's responsibility to get the next sequence value from the specified sequence and use that as the id value when inserting the row. So Hibernate is generating and inserting the id.
In the case of Postgres, it happens that defining a SERIAL column is implemented by creating a sequence and using it as a default column value. But it is the database that is populating the id field so using GenerationType.IDENTITY tells Hibernate that the database is handling id generation.
These references may help:
http://docs.jboss.org/hibernate/orm/5.2/userguide/html_single/Hibernate_User_Guide.html#identifiers-generators
https://www.postgresql.org/docs/8.1/static/datatype.html#DATATYPE-SERIAL
From "Pro JPA2" book:
"Another difference, hinted at earlier, between using IDENTITY and other id generation strategies is that the identifier will not be accessible until after the insert has occurred. Although no guarantee is made about the accessibility of the identifier before the transaction has completed, it is at least possible for other types of generation to eagerly allocate the identifier. But when using identity, it is the action of inserting that causes the identifier to be generated. It would be impossible for the identifier to be available before the entity is inserted into the database, and because insertion of entities is most often deferred until commit time, the identifier would not be available until after the transaction has been committed."
I think it can be helpful if you are using the same sequence for more than one table (for example you want a unique identifier for many types of bills) ... also If you want to keep track of the sequence away from the auto generated key
You can find here the solution of updating the PostgreSQL table creation accordingly, in order to work with the GenerationType.IDENTITY option.

Entity Framework : map duplicate tables to single entity at runtime?

I have a legacy database with a particular table -- I will call it ItemTable -- that can have billions of rows of data. To overcome database restrictions, we have decided to split the table into "silos" whenever the number of rows reaches 100,000,000. So, ItemTable will exist, then a procedure will run in the middle of the night to check the number of rows. If numberOfRows is > 100,000,000 then silo1_ItemTable will be created. Any Items added to the database from now on will be added to silo1_ItemTable (until it grows to big, then silo2_ItemTable will exist...)
ItemTable and silo1_ItemTable can be mapped to the same Item entity because the table structures are identical, but I am not sure how to set this mapping up at runtime, or how to specify the table name for my queries. All inserts should be added to the latest siloX_ItemTable, and all Reads should be from a specified siloX_ItemTable.
I have a separate siloTracker table that will give me the table name to insert/read the data from, but I am not sure how I can use this with entity framework...
Thoughts?
You could try to use the Entity Inheritance to get this. So you have a base class which has all the fields mapped to ItemTable and then you have descendant classes that inherit from ItemTable entity and is mapped to the silo tables in the db. Every time you create a new silo you create a new entity mapped to that silo table.
[Table("ItemTable")]
public class Item
{
//All the fields in the table goes here
}
[Table("silo1_ItemTable")]
public class Silo1Item : Item
{
}
[Table("silo2_ItemTable")]
public class Silo2Item : Item
{
}
You can find more information on this here
Other option is to create a view that creates a union of all those table and map your entity to that view.
As mentioned in my comment, to solve this problem I am using the SQLQuery method that is exposed by DBSet. Since all my item tables have the exact same schema, I can use the SQLQuery to define my own query and I can pass in the name of the table to the query. Tested on my system and it is working well.
See this link for an explanation of running raw queries with entity framework:
EF raw query documentation
If anyone has a better way to solve my question, please leave a comment.
[UPDATE]
I agree that stored procedures are also a great option, but for some reason my management is very resistant to make any changes to our database. It is easier for me (and our customers) to put the sql in code and acknowledge the fact that there is raw sql. At least I can hide it from the other layers rather easily.
[/UPDATE]
Possible solution for this problem may be using context initialization with DbCompiledModel param:
var builder = new DbModelBuilder(DbModelBuilderVersion.V6_0);
builder.Configurations.Add(new EntityTypeConfiguration<EntityName>());
builder.Entity<EntityName>().ToTable("TableNameDefinedInRuntime");
var dynamicContext = new MyDbContext(builder.Build(context.Database.Connection).Compile());
For some reason in EF6 it fails on second table request, but mapping inside context looks correct on the moment of execution.

EF Table-per-hierarchy mapping

In trying to normalize a database schema and mapping it in Entity Framework, I've found that there might end up being a bunch of lookup tables. They would end up only containing key and value pairs. I'd like to consolidate them into one table that basically has two columns "Key" and "Value". For example, I'd like to be able to get Addresses.AddressType and Person.Gender to both point to the same table, but ensure that the navigation properties only return the rows applicable to the appropriate entity.
EDIT: Oops. I just realized that I left this paragraph out:
It seems like a TPH type of problem, but all of the reading I've done indicates that you start with fields in the parent entity and migrate fields over to the inherited children. I don't have any fields to move here because there would generally only be two.
There are a lot of domain-specific key-value pairs need to be represented. Some of them will change from time to time, others will not. Rather than pick and choose I want to just make everything editable. Due to the number of these kinds of properties that are going to be used, I'd rather not have to maintain a list enums that require a recompile, or end up with lots of lookup tables. So, I thought that this might be a solution.
Is there a way to represent this kind of structure in EF4? Or, am I barking up the wrong tree?
EDIT: I guess another option would be to build the table structure I want at the database level and then write views on top of that and surface those as EF entities. It just means any maintenance needs to be done at multiple levels. Does that sound more, or less desireable than a pure EF solution?
Table per hiearchy demands that you have one parent entity which is used as base class for child entities. All entities are mapped to the same table and there is special discriminator column to differ type of entity stored in database record. You can generally use it even if your child entities do not define any new properties. You will also have to define primary key for your table otherwise it will be handled as readonly entity in EF. So your table can look like:
CREATE TABLE KeyValuePairs
(
Id INT NOT NULL IDENTITY(1,1),
Key VARCHAR(50) NOT NULL,
Value NVARCHAR(255) NOT NULL,
Discriminator VARCHAR(10) NOT NULL,
Timestamp Timestamp NOT NULL
)
You will define your top level KeyValuePair entity with properties Id, Key, Value and Timestamp (set as concurrency mode fixed). Discriminator column will be used for inheritance mapping.
Be aware that EF mapping is static. If you define AddressType and Gender entities you will be able to use them but you will not be able to dynamically define new type like PhoneType. This will always require modifying your EF model, recompiling and redeploying your application.
From OOP perspective it would be nicer to not model this as object hiearchy and instead use conditional mapping of multiple unrelated entities to the same table. Unfortunatelly even EF supports conditional mapping I have never been able to map two entities to the same table yet.

GUID or int entity key with SQL Compact/EF4?

This is a follow-up to an earlier question I posted on EF4 entity keys with SQL Compact. SQL Compact doesn't allow server-generated identity keys, so I am left with creating my own keys as objects are added to the ObjectContext. My first choice would be an integer key, and the previous answer linked to a blog post that shows an extension method that uses the Max operator with a selector expression to find the next available key:
public static TResult NextId<TSource, TResult>(this ObjectSet<TSource> table, Expression<Func<TSource, TResult>> selector)
where TSource : class
{
TResult lastId = table.Any() ? table.Max(selector) : default(TResult);
if (lastId is int)
{
lastId = (TResult)(object)(((int)(object)lastId) + 1);
}
return lastId;
}
Here's my take on the extension method: It will work fine if the ObjectContext that I am working with has an unfiltered entity set. In that case, the ObjectContext will contain all rows from the data table, and I will get an accurate result. But if the entity set is the result of a query filter, the method will return the last entity key in the filtered entity set, which will not necessarily be the last key in the data table. So I think the extension method won't really work.
At this point, the obvious solution seems to be to simply use a GUID as the entity key. That way, I only need to call Guid.NewGuid() method to set the ID property before I add a new entity to my ObjectContext.
Here is my question: Is there a simple way of getting the last primary key in the data store from EF4 (without having to create a second ObjectContext for that purpose)? Any other reason not to take the easy way out and simply use a GUID? Thanks for your help.
I ended up going with a GUID.
The size/performance issues aren't
critical (or even noticeable) with SQL Compact, since
it is a local, single-user system.
It's not like the app will be
managing an airline reservation
system.
And at least at this point, there
seems to be no way around the "no
server-generated keys" limitation of
the SQL Compact/EF4 stack. If someone has a clever hack, I'm still open to it.
That doesn't mean I would take the same approach in SQL Server or SQL Express. I still have a definite preference for integer keys, and SQL Compact's bigger siblings allow them in conjunction with EF4.
Use a Guid. AutoIncrement is not supported on Compact Framework with Entity Framework.
Also, if you ever want to create a application which uses multiple data sources, int PK's are going to fall apart on you very, very quickly.
With Guid's, you can juse call Guid.NewGuid() to get a new key.
With int's, you have to hit the database to get a valid key.
If you store data in multiple databases, int PK's will cause conflicts.
What I've done for SQL CE before, and I assume we have a single application accessing the database, is to calculate the MAX value on startup and put it in a static variable. You can now hand out sequential values easily and you can make the code to generate them thread safe very easily.
One reason to avoid Guids would be size = memory and storage space consumption.
You could also query SQL Compact metadata like so:
SELECT AUTOINC_NEXT FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'Categories' AND AUTOINC_NEXT IS NOT NULL