JPA: Issue on batch insert using persist() - jpa

I am using below code to insert the record into database,
EntityManagerFactory emf = getEmf();
em = emf.createEntityManager();
//Get the Transaction
EntityTransaction trx = em.getTransaction();
trx.begin();
for ( int i=0;i<10000;i++) {
//Create new Object and persist
Customer customer = new Customer(.....);
em.persist(customer);
//Once its reach the 1000 size do commit
if ( i % 1000 == 0 ) {
em.flush();
em.clear();
trx.commit();
}
}
em.close();
The above code is working fine when I don’t have any duplicate value in database as like new Customer object. If I have any duplicate value for Customer in database I am getting “SQLIntegrityConstraintViolationException”.
Here I am using persist () to insert the record into database for performance issue. If I use merge (), its working fine. but its need more time to complete the task.
In persist () case is there any option to identify which is duplicate record (i.e get duplicate record information)?
If more than one duplicate record present in DB, is there any option to identify which are duplicates records?

I believe avoiding adding duplicates is meant to be the application code's responsibility.
You could save a list of the customers you have sent to persist(), then prior to committing, query the db using a JPQL query like:
SELECT COUNT(c)
FROM Customer c
WHERE c in (:customers)
and send the list to the customers bind variable. Check ther result, and if the count is > 0, you know a customer has been duplicated and will cause a SQLIntegrityConstraintViolationException on commit.
Alternately, use a call to em.find(customer, Customer.class) prior to adding the em to the Persistence Context with the persist call. If it exists, don't add it.

Related

ExecuteStoreCommand "Delete" returning different record count to "DeleteObject" post delete, why?

Got a strange one here. I am using EF 6 over SQL Server 2012 and C#.
If I delete a record, using DeleteObject, I get:
//order.orderitem count = 11
db.OrderItem.DeleteObject(orderitem);
db.SaveChanges();
var order = db.order.First(r => r.Id == order.id);
//Order.OrderItem count = 10, CORRECT
If I delete an Order Item, using ExecuteStoreCmd inline DML, I get:
//order.orderitem count = 11
db.ExecuteStoreCommand("DELETE FROM ORDERITEM WHERE ID ={0}", orderitem.Id);
var order = db.Order.First(r => r.Id == order.id);
//order.orderitem count = 11, INCORRECT, should be 10
So the ExecuteStoreCommand version reports 11, however the OrderItem is definitely deleted from the DB, so it should report 10. Also I would have thought First() does an Eager search thus repopulating the "order.orderitem" collection.
Any ideas why this is happening? Thanks.
EDIT: I am using ObjectContext
EDIT2: This is the closest working solution I have using "detach". Interestingly the "detach" actually takes about 2 secs ! Not sure what it is doing, but it works.
db.ExecuteStoreCommand("DELETE FROM ORDERITEM WHERE ID ={0}", orderitem.Id);
db.detach(orderitem);
It would be quicker to requery and repopulate the dataset. How can I force a requery? I thought the following would do it:
var order = db.order.First(r => r.Id == order.id);
EDIT3: This seems to work to force a refresh post delete, but still take about 2 secs:
db.Refresh(RefreshMode.StoreWins,Order.OrderItem);
I am still not really understanding why one cannot just requery as a Order.First(r=>r.id==id) type query oftens take much less than 2 secs.
This would likely be because the Order and it's order items are already known to the context when you perform the ExecuteStoredCommand. EF doesn't know that the command relates to any cached copy of Order, so the command will be sent to the database, but not update any loaded entity state. WHere-as the first one would look for any loaded OrderItem, and when told to remove it from the DbSet, it would look for any loaded entities that reference that order item.
If you don't want to ensure the entity(ies) are loaded prior to deleting, then you will need to check if any are loaded and refresh or detach their associated references.
If orderitem represents an entity should just be able to use:
db.OrderItems.Remove(orderitem);
If the order is loaded, the order item should be removed automatically. If the order isn't loaded, no loss, it will be loaded from the database when requested later on and load the set of order items from the DB.
However, if you want to use the SQL execute approach, detaching any local instance should remove it from the local cache.
db.ExecuteStoreCommand("DELETE FROM ORDERITEM WHERE ID ={0}", orderitem.Id);
var existingOrderItem = db.OrderItems.Local.SingleOrDefault(x => x.Id == orderItem.Id);
if(existingOrderItem != null)
db.Entity(existingOrderItem).State = EntityState.Detached;
I don't believe you will need to check for the orderItem's Order to refresh anything beyond this, but I'm not 100% sure on that. Generally though when it comes to modifying data state I opt to load the applicable top-level entity and remove it's child.
So if I had a command to remove an order item from an order:
public void RemoveOrderItem(int orderId, int orderItemId)
{
using (var context = new MyDbContext())
{
// TODO: Validate that the current user session has access to this order ID
var order = context.Orders.Include(x => x.OrderItems).Single(x => x.OrderId == orderId);
var orderItem = order.OrderItems.SingleOrDefault(x => x.OrderItemId == orderItemId);
if (orderItem != null)
order.OrderItems.Remove(orderItem);
context.SaveChanges();
}
}
The key points to this approach.
While it does mean loading the data state again for the operation, this load is by ID so it's fast.
We can/should validate that the data requested is applicable for the user. Any command for an order they should not access should be logged and the session ended.
We know we will be dealing with the current data state, not basing decisions on values/data from the point in time that data was first read.

DbUpdateConcurrencyException trying to insert detail row after master row

Couldn't find and answer for whatever is happening to my code. I'm reading an XML from a web service and adding rows to my database, I'm using EF and after adding the header or master row, I get that exception when calling SaveChanges() on the detail row.
// master row insert
var result = 0;
using (var context = new XEntities())
{
var masterRow = new TableName{
x = someValue,
...
};
context.TableName.Add(masterRow);
context.SaveChanges();
result = masterRow.id;
}
return result;
So this insert has no issues, after returning that inserted row's id. I call a function to insert the detail with some parameters, including the parent id.
That's when I get that exception:
DbUpdateConcurrencyException: Store update, insert, or delete
statement affected an unexpected number of rows (0). Entities may have
been modified or deleted since entities were loaded. Refresh
ObjectStateManager entries.
The code for the child row is exactly the same, of course with its own data. Any ideas how can I debug this exception? My best guess is the auto generated id on the master row, however, I'm logging the correct id (not 0) on the return statement after the first SaveChanges().
Edit:
I tried just now to insert my child row using context.Database.ExecuteSqlCommand(query, parameters) with the same data I used for entity framework and it works. But then why using context.Add(row) then SaveChanges() is not working? Typing queries is so from the 90's hahahaha

CreateOrUpdate Operation Over Many Records Using Entity Framework 6

I am writing a web crawler for fun.
I have a remote SQL database that I want to save information about each page I visit and I am using Entity Framework 6 to persist data. For the sake of illustration, let's assume that the only data I want to save about each page is the last time I visited it.
Updating this database is very slow. Here is the operation that I want make fast:
For the current page, check if it exists in the database already
If it exists already, update the "lastVisited" timestamp field on the record and save.
If it doesn't exist, create it.
Currently I can only do 300 updates per minute. My SQL server instance shows almost no activity, so I assume I am client-bound.
My code is naive:
public static void AddOrUpdatePage(long id, DataContext db)
{
Page p = db.Pages.SingleOrDefault(f => f.id == id);
if (p == null)
{
// create
p = new Page();
p.id = id;
db.Pages.Add(p);
}
p.lastSeen = DateTime.Now;
db.SaveChanges();
}
I crawl a bunch of pages (1000s), and then call AddOrUpdatePage in a loop for each page.
It seems like the way to get more speed is batching? What is the best way to get 1000 records from my database at a time, given a set of page ids? In SQL I would use a table variable for this and join, or use a lengthy IN clause.

EJB 3.0 Update then Select is not working properly jboss5.1.0.G

I am getting very strange problem. My problem is that this
FIRST am selecting entity from the database using EJB 3.0 and jboss 5.1.0.GA
Subscriber s = (Subscriber)manager.createQuery("SELECT s FROM Subscriber s " +
"WHERE s.subscriber_id = ?1").setParameter(1,123).getSingleResult();
Then I am doing update to the entity with query like this
int a = manager.createQuery(" UPDATE Subscriber s SET s.balance = s.balance + "+10+"WHERE s.subscriber_id = ?1").setParameter(1,123).executeUpdate();
THEN againg I am selecting the entity like this
s = (Subscriber)manager.createQuery("SELECT s FROM Subscriber s " +
"WHERE s.subscriber_id = ?1").setParameter(1,123).getSingleResult();
BUT I am not getting the updated value of "balance" field BUT as soon as I comment the first SELECT statement I am getting the updated value. But I need the first SELECT statement in any case because I want to use it.
Can any one tell me why it is happening and what is its solution?
I would bet this is because of the cache problem.
I will assume that your manager is an EntityManager instance.
You're executing the first query, so the PersistenceContext is fetching the Subscriber entity from the database and puts it into the cache. Then, you're executing batch UPDATE query which directly hits the database omitting the cache structures, so it doesn't affect the PersistenceContext.
At the end, you execute once again the SELECT query. Doing so, the PersistenceContext checks if it have Subscribe entity cached somewhere. It does, so it doesn't hit the database but returns the value stored in its cache.
I don't quite get why you're executing batch UPDATE query instead of just updating your object state and letting the JPA to commit the changes when appropriate.
So instead of:
int a = manager.createQuery("UPDATE Subscriber s SET s.balance = s.balance +
"+10+"WHERE s.subscriber_id = ?1")
.setParameter(1,123).executeUpdate();
you could just do:
// 's' is the Subscribe entity previously fetched from the database
s.setBalance(s.getBalance() + 10);
Although if you still really need to use bach UPDATE then you could try doing
manager.refresh(s);
after the batch UPDATE query. This will let the JPA access the database instead of its cached version.
If you comment the first SELECT statement, the example works because the PersistenceContext didn't cache your entity. It fetches it from the database for the first time just after the batch UPDATE query.

Clear an EntityCollection and then Add to it within the same transaction doesn't seem possible

I have 2 entities a Department and an Employee. 1 Department can have many Employees. I would like to clear all the Employees from an existing Department, and also add a new Employee to that same department and then save the changes. It must be within a single transaction.
However when I try execute the code below I get a key violation error on the database. It seems that the clear is not deleting the items in the DepartmentEmployee table, and then inserting the new Employee.
Employee newEmployee = GetNewEmployee();
department.Employees.Clear();
department.Employees.Add(newEmployee);
EntityContext.ApplyPropertyChanges("SetName", department);
EntityContext.SaveChanges();
Any help with this would be greatly appreciated. Thanks.
I don't think you can do this in one call to SaveChanges. The Entity Framework does not guarantee any specific order of operations. So I don't think there is any way to force the DELETE to come before the INSERT without an additional call to SaveChanges.
On the other hand, you probably can do it in one database transaction. Just do it inside a new TransactionScope.
Try:
using (var t = new TransactionScope())
{
Employee newEmployee = GetNewEmployee();
department.Employees.Clear();
EntityContext.SaveChanges();
department.Employees.Add(newEmployee);
EntityContext.ApplyPropertyChanges("SetName", department);
EntityContext.SaveChanges();
t.Complete();
}