JPA bulk insert if not exists

JPA bulk insert if not exists - jpa

I have 3 entities - Player, Hand and PlayerHandStats. First two are regular tables, with ID as an PK. PlayerHandStats on the other hand has a composite PK (player_id, hand_id). Together they form some sort of a "package", that's why I am trying to persist them in one block. ParsedHand is just a class which bundles the entities - it contains 1 Hand, 2..10 Player and 2..10 PlayerHandStats. Below is my current naive approach, which doesn't really work.
public static void persist(ParsedHand parsedHand) {
EntityManager em = emf.createEntityManager();
em.getTransaction().begin();
em.persist(parsedHand.getHand());
Collection<Player> players = parsedHand.getPlayers().values();
for (Player player : players) {
em.persist(player);
}
Collection<PlayerHandStats> stats = parsedHand.getStats().values();
for (PlayerHandStats phs : stats) {
em.persist(stats);
}
em.getTransaction().commit();
em.close();
}
Problem is, that a specific Player entity may already exists in the DB - in that case the whole process terminates. I would like to keep it going, not perform any merge or update upon the entity, but retrieve its' ID (since at application level it has no ID assigned).
Quick example (NOTE: columns NAME and POKER_SITE form an UNIQUE CONSTRAINT):
ID |NAME |POKER_SITE
----+------------+------------
0 |neverlimp92 |PokerStars
1 |player01 |PokerStars
Now, let's say I have a Player entity at application level with fields (null, 'neverlimp92', 'PokerStars'). Obviously, it will return java.sql.SQLIntegrityConstraintViolationException, and the whole process terminates. How can I avoid this? Should I override the hashCode() and equals() methods, and perform
for (Player player : playes) {
if (!em.contains(player)) {
em.persist(player);
}
}
I am not sure if this is a smart thing to do, considering there may be potentially 10k's, even 100k's rows.
And also, if the entity does exists, what is the proper way of retrieving its ID and assigning it to the existing instance at the application level?
I have very little experience with JPA, and any kind of help, or pointing me in the right direction is much appreciated. Thank you.

If the players may be existing players, then for each player you need to query the database for the player with that name and site, if one exists use it, if it does not then persist it.
This depends on how likely it is that their is such a player. If it is very unlikely, then you could just persist it, and if the commit fails, then retry with the second algorithm.
In general it would be better if your player had its id, then you would know if it is new or existing.

Related

Is it possible to improve EF6 WarmUp time?

I have an application in which I verify the following behavior: the first requests after a long period of inactivity take a long time, and timeout sometimes.
Is it possible to control how the entity framework manages dispose of the objects? Is it possible mark some Entities to never be disposed?
...in order to avoid/improve the warmup time?
Regards,

The reasons that similar queries will have an improved response time are manifold.
Most Database Management Systems cache parts of the fetched data, so that similar queries in the near future will be faster. If you do query Teachers with their Students, then the Teachers table will be joined with the Students table. This join result is quite often cached for a while. The next query for Teachers with their Students will reuse this join result and thus become faster
DbContext caches queried object. If you select a Single teacher, or Find one, it is kept in local memory. This is to be able to detect which items are changed when you call SaveChanges. If you Find the same Teacher again, this query will be faster. I'm not sure if the same happens if you query 1000 Teachers.
When you create a DbContext object, the initializer is checked to see if the model has been changed or not.
So it might seem wise not to Dispose() a created DbContext, yet you see that most people keep the DbContext alive for a fairly short time:
using (var dbContext = new MyDbContext(...))
{
var fetchedTeacher = dbContext.Teachers
.Where(teacher => teacher.Id = ...)
.Select(teacher => new
{
Id = teacher.Id,
Name = teacher.Name,
Students = teacher.Students.ToList(),
})
.FirstOrDefault();
return fetchedTeacher;
}
// DbContext is Disposed()
At first glance it would seem that it would be better to keep the DbContext alive. If someone asks for the same Teacher, the DbContext wouldn't have to ask the database for it, it could return the local Teacher..
However, keeping a DbContext alive might cause that you get the wrong data. If someone else changes the Teacher between your first and second query for this Teacher, you would get the old Teacher data.
Hence it is wise to keep the life time of a DbContext as short as possible.
Is there nothing I can do to improve the speed of the first query?
Yes you can!
One of the first things you could do is to set the initialize of your database such that it doesn't check the existence and model of the database. Of course you can only do this when you are fairly sure that your database exists and hasn't changed.
// constructor; disables initializer
public SchoolDBContext() : base(...)
{
//Disable initializer
Database.SetInitializer<SchoolDBContext>(null);
}
Another thing could be, if you already have fetched your object to update the database, and you are sure that no one else changed the object, you can Attach it, instead of fetching it again, as is shown in this question
Normal usage:
// update the name of the teacher with teacherId
void ChangeTeacherName(int teacherId, string name)
{
using (var dbContext = new SchoolContext(...))
{
// fetch the teacher, change the name and save
Teacher fetchedTeacher = dbContext.Teachers.Find(teacherId);
fetchedTeader.Name = name;
dbContext.SaveChanges();
}
}
Using Attach to update an earlier fetched Teacher:
void ChangeTeacherName (Teacher teacher, string name)
{
using (var dbContext = new SchoolContext(...))
{
dbContext.Teachers.Attach(teacher);
dbContext.Entry(teacher).Property(t => t.Name).IsModified = true;
dbContext.SaveChanges();
}
}
Using this method doesn't require to fetch the Teacher again. During SaveChanges the value of IsModified of all properties of all Attached items is checked. If needed they will be updated.

DDD Repo - repo.findByChildId(id) AND repo.transferChild(to, from, child)

I've recently started looking into DDD and have been refactoring an old personal project to this pattern. I'm about halfway through Evans blue book, and can't seem to find the answer for this there or online anywhere.
Basically my application is an inventory tracker. Inventory would contain a collection of items, items are transferrable entities between inventories. Inventory would have methods like transferIn() transferOut() which would contain some validation logic, ie checking that the inventory is not already full or that the item is in a transferrable state. These constraints lead me to believe that inventory is the aggregate root and that item is an entity.
1) at some point if a user requests a specific item entity for their inventory I would like to have a inventoryRepo.findByItemId(id) which would return the inventory that currently has that item. So that I can:
2) through a service do something like:
boolean requestItemTransfer(destInvId, itemId){
Inv from = invRepo.findByItemId(itemId);
Inv to = invRepo.findById(destInvId);
from.transferOut(itemId);
to.transferIn(from.getItem(itemId));
return invRepo.transferChild(to, item); //Edited
}
Basically writing my validation logic in the inventory class (rich domain model) and if there are no exceptions then I use the repo.transfer() method to persist the changes.
Would I be violating DDD? Are there better alternatives?
From what I've read and understood this seems valid if only unconventional. Every example that I've found shows entities that can only exist within 1 root instance. There's also the bank account transfer examples but those deal with amounts that are value objects, and have a transfer repository because transfers are to be recorded in that particular scenario, just not in mine.
EDIT:
The use cases are as follow:
1) User requests a list of their inventories and their items.
2) User selects 1 or more items from 1 inventory and requests for them to be sent to another inventory. This is where my TransferService would come in and coordinate the txIn and txOut from the specified inventories, and persist those changes through the repo. Maybe that should be an infrastructure services? That's one thing I'm not clear on.
3) User predefines a set of items he would like to be able to transfer to an inventory regardles of what inventory those items are currently in. TransferService would find where those items currently are and coordinate the rest as use case 2 does.
EDIT2: About the repo.transfer
This is actually a constraint/optimization? from the data side, from what I've been told all it does is lookup the item and change the inventory id that it points to. This is because items cannot be in 2 inventories at once. So instead of repo.update(fromInvInNewState) and repo.update(toInvInNewState) there is repo.moveChild(toInv, child) because we don't want to rewrite the entire state of the inventory (all its items that haven't moved, and because the rest of its state is derived from the items that it has at any point), just move some items around.

You are missing at least one aggregate and trying to replace it with your persistence. Talk to your domain expert, find out who or what is doing this transfer. I bet you will not hear that this is done by "a repository" or "a database". This something will be your aggregate and it will probably have this Transfer method. This call would also encapsulate the login from transferIn and transferOut since this seems to be a transactional process and you are doing it in three different places. Remember that your transaction boundaries is your aggregate. Not your repository.

First of all I would like to recap the domain model defined by your scenario.
You said that you are building an Inventory Tracker with the next spec:
An User has Inventories.
An Iventory consists of Items.
An User can transfer Items from one Inventory to another. I guess that both inventories belong to the User as you said that:
"User requests a list of their inventories and their items. User selects 1 or more items from 1 inventory and requests for them to be sent to another inventory..."
On the other hand, an invariant you pointed out are:
An Item can be transferred from the Inventory where it already is (InventoryA) to another Inventory (InventoryB) only if InventoryB is not already full. I guess that in case that the Item cannot be transferred it should be kept in the InventoryA.
If I understood well, an User transfers his Items between his Repositories.
Something like:
class TransferItemService {
public function execute(TransferItemRequest request)
{
user = userRepository.findOfId(request.userId());
user.transferItem(request.itemId(), request.fromInventoryId(), request.toInventoryId()); //Checks invariant -> the given Item is in one of his Inventories, the destination Inventory is owned by him, the destination Inventory is not full and finally transfers the Item
userRepository.save(user);
}
}
Now, in order to define the Aggregate Root/s I would need to know if my business can deal with eventual consistency. That is, if moving an Item must be done atomically (just one request) or it can take some time (more than one request).
No Eventual Consistency
In case business says that Eventual Consistency is not allowed here, if you want to ensure that your domain remains consistent and aligned to the invariant, the User would be the unique AggregateRoot as he is the nexus between his Inventories. In this case, you can face performance problems due to loading all the Inventories along with their Items.
Eventual Consistency
In case that you can go with eventual consitency, you can have the next Aggregate Roots: User, Inventory, Item. So, using the previous code to model the use case of transferring an item:
class TransferItemService {
public function execute(TransferItemRequest request)
{
user = userRepository.findOfId(request.userId());
user.transferItem(request.itemId(), request.fromInventoryId(), request.toInventoryId()); //Checks invariant -> the given Item is in one of his Inventories, the destination Inventory is owned by him, the destination Inventory is not full and finally transfers the Item
userRepository.save(user);
}
}
In this case, the transferItem method would look like:
class User {
private string id;
private List<UserInventory> inventories;
public function transferItem(itemId, fromInventoryId, toInventoryId)
{
fromUserInventory = this.inventories.get(fromInventoryId);
if(!fromUserInventory) throw new InventoryNotBelongToUser(fromInventoryId, this.id);
toUserInventory = this.inventories.get(toInventoryId);
if(!toUserInventory) throw new InventoryNotBelongToUser(toInventoryId, this.id);
toUserInventory.addItem(itemId);
fromUserInventory.deletetItem(itemId);
}
}
class UserInventory {
private String identifier;
private int capacity;
public function deleteItem(userId, itemId)
{
this.capacity--;
DomainEventPublisher.publish(new ItemWasDeleted(this.identifier, itemId));
}
public function addItem(userId, itemId)
{
if(this.capacity >= MAX_CAPACITY) {
throw new InventoryCapacityAlreadyFull(this.identifier);
}
this.capacity++;
DomainEventPublisher.publish(new ItemWasAdded(this.identifier, itemId));
}
}
Notice that UserInventory is not the Inventory Aggregate Root, it is just a VO with an identifier reference and the current capacity of the actual Inventory.
Now, you can have a Listener that asynchonously updates each Inventory:
class ItemWasRemovedListener()
{
public function handleEvent(event)
{
removeItemFromInventoryService.execute(event.inventoryId(), event.itemId());
}
}
class ItemWasAddedListener()
{
public function handleEvent(event)
{
addItemToInventoryService.execute(event.inventoryId(), event.itemId());
}
}
Unless I have made a mistake I think we have satisfied all our invariant, we have just modified one Aggregate Root per Request and we don't need to load all our Items to perform an operation on an Inventory.
If you see something wrong please let me know :D.

How to set up relationships between new and existing entities in EF

My application allows the user to create a hierarchy of new entities via a UI - let's say it's a "Customer" plus one or more child "Order" entities. The user also assigns each Order entity to an existing "OrderDiscount" entity (think of these as "reference"/"lookup" items retrieved from the database). Some time later, the user will choose to save the whole hierarchy to the database, accomplished like this:-
using (var context = new MyContext())
{
context.Customers.Add(customer);
foreach (var entity in context.OrderDiscounts.Local)
{
objectStateManager.ChangeObjectState(entity, EntityState.Unchanged);
}
context.SaveChanges();
}
The foreach loop changes the state of the OrderDiscount entities to Unchanged, and prevents EF from attempting to insert them into the database, resulting in duplicates.
Great so far, but I've now hit another issue. For reasons I won't go into, the OrderDiscount entities can come from different BLL calls, resulting in a situation where two Orders in the graph may appear to reference the same OrderDiscount (i.e. both have the same PK ID, and other properties), but the entities are different object references.
When I save, the above foreach loop fails with the message "AcceptChanges cannot continue because the object's key values conflict with another object in the ObjectStateManager. Make sure that the key values are unique before calling AcceptChanges". I can see the two OrderDiscount objects in the context.OrderDiscounts.Local collection, both with the same PK ID.
I'm not sure how I can avoid this situation. Any suggestions?
This article (http://msdn.microsoft.com/en-us/magazine/dn166926.aspx) describes the scenario and provides one possible solution, which is to set just the FK ID (order.OrderDiscountId), and leave the order.OrderDiscount relationship null. Unfortunately it's not feasible in my case, as further down the line I rely on being able to traverse such relationships, e.g. ApplyDiscount(order.OrderDiscount);.

Delete a child from an aggregate root

I have a common Repository with Add, Update, Delete.
We'll name it CustomerRepository.
I have a entity (POCO) named Customer, which is an aggregate root, with Addresses.
public class Customer
{
public Address Addresses { get; set; }
}
I am in a detached entity framework 5 scenario.
Now, let's say that after getting the customer, I choose to delete a client address.
I submit the Customer aggregate root to the repository, by the Update method.
How can I save the modifications made on the addresses ?
If the address id is 0, I can suppose that the address is new.
For the rest of the address, I can chose to attach all the addresses, and mark it as updated no matter what.
For deleted addresses I can see no workaround...
We could say this solution is incomplete and inefficient.
So how the updates of aggregate root childs should be done ?
Do I have to complete the CustomerRepository with methods like AddAddress, UpdateAddress, DeleteAddress ?
It seems like it would kind of break the pattern though...
Do I put a Persistence state on each POCO:
public enum PersistanceState
{
Unchanged,
New,
Updated,
Deleted
}
And then have only one method in my CustomerRepository, Save ?
In this case it seems that I am reinventing the Entity "Non-POCO" objects, and adding data access related attribute to a business object...

First, you should keep your repository with Add, Update, and Delete methods, although I personally prefer Add, indexer set, and Remove so that the repository looks like an in memory collection to the application code.
Secondly, the repository should be responsible for tracking persistence states. I don't even clutter up my domain objects with
object ID { get; }
like some people do. Instead, my repositories look like this:
public class ConcreteRepository : List<AggregateRootDataModel>, IAggregateRootRepository
The AggregateRootDataModel class is what I use to track the IDs of my in-memory objects as well as track any persistence information. In your case, I would put a property of
List<AddressDataModel> Addresses { get; }
on my CustomerDataModel class which would also hold the Customer domain object as well as the database ID for the customer. Then, when a customer is updated, I would have code like:
public class ConcreteRepository : List<AggregateRootDataModel>, IAggregateRootRepository
{
public Customer this[int index]
{
set
{
//Lookup the data model
AggregateRootDataModel model = (from AggregateRootDataModel dm in this
where dm.Customer == value
select dm).SingleOrDefault();
//Inside the setter for this property, run your comparison
//and mark addresses as needing to be added, updated, or deleted.
model.Customer = value;
SaveModel(model); //Run your EF code to save the model back to the database.
}
}
}
The main caveat with this approach is that your Domain Model must be a reference type and you shouldn't be overriding GetHashCode(). The main reason for this is that when you perform the lookup for the matching data model, the hash code can't be dependent upon the values of any changeable properties because it needs to remain the same even if the application code has modified the values of properties on the instance of the domain model. Using this approach, the application code becomes:
IAggregateRootRepository rep = new ConcreteRepository([arguments that load the repository from the db]);
Customer customer = rep[0]; //or however you choose to select your Customer.
customer.Addresses = newAddresses; //change the addresses
rep[0] = customer;

The easy way is using Self Tracking entities What is the purpose of self tracking entities? (I don't like it, because tracking is different responsability).
The hard way, you take the original collection and you compare :-/
Update relationships when saving changes of EF4 POCO objects
Other way may be, event tracking ?

possible to return only one column using JPA

I have an Open JPA entity and it successfully connects a many-to-many relationship. Right now I successfully get the entire table, but I really only want the ID's from that tables. I plan on calling the database later to reconstruct the entities that I need (according to the flow of my program).
I need only the ID's (or one column from that table).
1) Should I try and restrict this in my entity beans, or in the stateless session beans that I will be using to call the entity beans
2) If I try and do this using JPA, how can I specify that I only get back the ID's from the table, instead of the whole table? So far looking online, I don't see a way that you can do this. So I am guessing there is no way to do this.
3) If I simply just manipulate the return values, should I create a separate class that I will be returning to the user that will return only the required id list to the user?
I could be completely wrong here, but from the looks of it, I don't think there is a simple way to do this using JPA and I will have to return a custom object instead of the entity bean to the user (this custom object would only hold the id's as opposed to the whole table as it currently does)
Any thoughts... I don't think this is really relevant, but people are always asking for code, so here you go...
#ManyToMany(fetch=FetchType.EAGER)
#JoinTable(name="QUICK_LAUNCH_DISTLIST",
joinColumns=#JoinColumn(name="QUICK_LAUNCH_ID"),
inverseJoinColumns=#JoinColumn(name="LIST_ID"))
private List<DistributionList> distributionlistList;
Currently how I get the entire collection of records. Remember I only want the id...
try
{
//int daSize = 0;
//System.out.println("Testing 1.2..3...! ");
qlList = emf.createNamedQuery("getQuickLaunch").getResultList();
}
This is how I call the Entity beans. I am thinking this is where I will have to programatically go through and create a custom object similar to the entity bean (but it just has the ID's and not the whole table, and attempt to put the id's in there somewhere.
What are your thoughts?
Thanks

I believe I just figured out the best solution to this problem.
This link would be the answer:
my other stack overflow answer post
But for the sake of those too lazy to click on the link I essentially used the #ElementCollection attribute...
#ElementCollection(fetch=FetchType.EAGER)
#CollectionTable(name="QUICK_LAUNCH_DISTLIST",joinColumns=#JoinColumn(name="QUICK_LAUNCH_ID"))
#Column(name="LIST_ID")
private List<Long> distListIDs;
That did it.

Sounds like you want something like this in your quickLaunch class:
#Transient
public List<Integer> getDistributionListIds () {
List<Integer> distributionListIds = new LinkedList<Integer>();
List<DistributionList> distributionlistList = getDistributionlistList();
if (distributionlistList != null) {
for (DistributionList distributionList : distributionlistList)
distributionListIds.add(distributionList.getId());
}
return distributionListIds;
}
I had to guess a little at the names of your getters/setters and the type of DistributionList's ID. But basically, JPA is already nicely handling all of the relationships for you, so just take the values you want from the related objects.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

JPA bulk insert if not exists - jpa

Related

Is it possible to improve EF6 WarmUp time?

DDD Repo - repo.findByChildId(id) AND repo.transferChild(to, from, child)

How to set up relationships between new and existing entities in EF

Delete a child from an aggregate root

possible to return only one column using JPA

Categories

Resources