PostgreSQL - multiple insert within stored procedure - postgresql

I have a client-side entity which contains a list of secondary entities, and this is represented on the DB as a foreign key constraint.
The client-side creation form allows to create both the main entity and its sub-entities at once, so on the DB side may have to save both a new main entity and some new sub-entities in one transaction. For this reason, I would like to use a stored procedure DB-side, and the most immediate approach would seem to pass the main entity's fields as arguments, with the sub-entities as a collection argument of some sort (likely a json string, to account for the heterogeneity of the sub-entities' data).
Within my stored procedure, first I would save the main entity, then I would save the sub-entities, possibly making use of a newly-generated main entity ID (as I understand, the procedure itself as a transaction, and a failure after writing the main entity would roll back said insertion/update).
My problem lies within saving the sub-entities (from here on just "entities"). While I understand that, for existing entities, I need to run each update on its own, I would like to use the multi-row insert syntax for new entity. For this reason, I want to build a prepared statement where I can grow the list of inserted tuples as the length of the entity list may vary, as in INSERT INTO ... VALUES <tuple1>, <tuple2>, ..., and here comes the pain...
It would of course be simplest to just concatenate raw values from each entity in order to build the query string, this is inviting SQL injection.
On the other hand, while I may build the insert statement using the $ notation, I would have a problem when executing it, wiring the parameters, that is:
-- prepare insert statement stmt
EXECUTE stmt USING <???> -- variable number of parameters to wire,
-- I don't know how to work around it.
Is there a way to use a single multi-row insert, or should I just fall back to inserting values one-by-one? I assume that a multi-row insert doesn't actually just repeatedly perform single-row inserts, and that there would be some benefit in using it over the one-by-one approach. Of course, I would rather avoid twisted workarounds that would slow the operation down beyond the benefit given by the multi-row insert.

You could package the tuples in a json[] and deparse them in the INSERT statement, but I don't really see the point.
I would just run single-row INSERT statements for each entity.
I guess you are worried about performance, but maybe all that parsing and deparsing would outweigh the advantage of inserting everything in a single statement.
Apart from that, I'd expect the code to cleaner and easier to maintain with simple INSERTs in a loop.

Related

How to rehash and clean existing entities in the database?

We have an entity and a corresponding table in the database with one additional column which contains digested hash of the entity fields, calculated each time programmatically in application. Entity has associations with two additional tables/entities which fields also take part in hashing.
Now a decision was made to get rid of one of the fields from the main entity (boolean flag) and exclude it from hashing, since it makes two otherwise identical entities get different hashes when one entity has its flag set to true, while other is false. Since hashes are different both entities get stored in the database, which is not what we want.
Removing the field is simple, but we also need to re-calculate hashes for entities which have been already stored in the database. Since there might be duplicates, we also need to get rid of one of two duplicated entries. This whole operation must be done once after migration.
The stack we use is Quarkus, Flyway, Hibernate with Panache, and PostgreSQL. I have tried to use Flyway callbacks with Event.AFTER_MIGRATE to get all existing entities from the db, but I can't use Panache since its not initialised yet by the time callback hits. Using plain java.sql.* Connection and Statement is pretty cumbersome, cause I need to fetch data from 3 tables, create entity from all of the fields, re-calculate hash and put it back, while taking care of possible conflicts. Another option would be to create a new REST API endpoint specifically for the job which the client will have to call after app has booted, but somehow I don't feel that that is the best solution.
How do you tackle this kind of a situation?

Does Entity Framework make one database call per operation?

I am trying to profile EF to understand more of its inner workings, I have tried to add two entities using the Add method and the AddRange method, then of course committing with the SaveChanges method. And here is what I got on the profiler in both cases.
Does this mean that EF actually makes two trips to the database one per insert? which means that if I am trying to insert 100 entities for example this will mean 100 trips to the database? which will greatly impact performance. or am I missing something here?
Yes, that is correct, it will issue one database call per item attempting to be added, as it is using the standard SQL INSERT command in this case.
The alternatives would be to use BULKINSERT, such as using a stored procedure that takes in an object such as a DataTable.

What Methodology To Implement For Insert Statements In Entity Framework

When inserting a lot of interrelated nested objects, is it best practice to first create and persist the inner entites as to respect the foreign key relationsihps and move up in the hierarchy or is it better to create all the objects with their inner objects and persist only the outer object? So far, I have experienced that when doing the latter, Entity Framework seems to be intelligently figuring out what to insert first with regard to the relationships. But is there a caveat that one should be aware of? The first method seems to me as the classical SQL logic whereas the latter seems to suit the idea of Entity Framework more.
It's all about managing the risk of data corruption and ensuring that database is valuable.
When you persist the inner records first, you open yourself up to the possibility of bad data if the outer save fails. Depending on your application needs, orphaned inner records could be anywhere from a major problem to a minor annoyance.
When you save everything together, if a single record fails the entire save fails.
Of course, if the inner records are meaningful without the outer records (again, determined by your business needs) then you might be preventing the application from progressing.
In short: if the inner record is dependent on the outer, save together. If the inner is meaningful on its own, make the decision based on performance / readability.
Examples:
A House has meaning without a Homeowner. It would be acceptable to create a House on its own.
A HSA (Homeowner's Association) does not make sense without Homeowners. These should be created together.
Obviously, these examples are contrived and become trivial when using existing data. For the purpose of this question, we're assuming that both are being created at the same time.

Entity framework 4.5 and 5 and serial list processing possible without stored procedure?

My C# application uses EF and calls min() on an int column to get the 'next' number in a sequence of numbers from a database table. The database table already has the next X numbers ready to go and my EF code just needs to get the 'next' one and after getting this number, the code then deletes that entry so the next request gets the following one etc. With one instance of the application all is fine, but with multiple users this leads to concurrency issues. Is there a design pattern for getting this next min() value in a serial fashion for all users, without resorting to a stored procedure? I'm using a mix of EF4.5 and EF5.
Thanks, Pete
Firstly, you can add an timestamp type column into your table and on Entity Framework property window set the concurrency mode to Fixed.
Doing that you enable optimistic concurrency check on the table. If there is another data context tries to interrupt your update, it will generate an excepton.
Check this link: http://blogs.msdn.com/b/alexj/archive/2009/05/20/tip-19-how-to-use-optimistic-concurrency-in-the-entity-framework.aspx?Redirected=true
Alternatively, you can use a TransactionScope object on your select/update logic. You can simply wrap around your code logic with a TransactionScope logic and everything within the scope will be enforced by the transaction.
Check this link for more information:
TransactionScope vs Transaction in LINQ to SQL

Entity Framework only with stored procedures

i have a question about the reasonableness of using entity framework only with stored procedures in our scenario.
We plan to have an N-tier architecutre, with UI, BusinessLayer (BLL), DataAccessLayer(DAL) and a BusinessObjectDefinitions(BOD) layer. The BOD layer is known by all other layers and the results from executes queries in the DAL should be transformed into Objects (definied in the BOD) before passing into the BLL.
We will only use stored procedures for all CRUD methods.
So in case of a select stored procedure, we would add a function import, create a complex type and when we execute the function, we tranform the values of the complex type into a class of BOD and pass that to the BLL.
So basicly, we have no Entities in the Model, just Complex types, that are transformed into Business Objects.
I'm not sure if that all makes sense, since in my opinion, we lose a lot of the benefit, EF offers.
Or am i totally wrong?
I would not use EF if all I was just using was stored procs.
Personally, I'd look at something like PetaPoco, Massive or even just straight Ado.Net
EDIT
Here's an example of PetaPoco consuming SPs and outputting custom types
http://weblogs.asp.net/jalpeshpvadgama/archive/2011/06/20/petapoco-with-stored-procedures.aspx
I disagree with both of the existing answers here. Petapoco is great, but I think the EF still offers a number of advantages.
Petapoco works great (maybe even better than the EF) for executing simple stored procedures that read a single entity or a list of entities. However, once you've read the data and need to begin modifying it, I feel this is where the EF is the clear winner.
To insert/update data with petapoco you'll need to manually call the insert/update stored procedure using:
db.Execute("EXEC spName #param1 = 1, #param2 = 2")
Manually constructing the stored procedure call and declaring all the parameters gets old very fast when the insert/update stored procedures insert rows with more than just a couple of columns. This gets even worse when calling update stored procedures that implement optimistic concurrency (i.e. passing in the original values as parameters).
You also run the risk of making a typo in your in-lined stored procedure call, which very likely will not be caught until runtime.
Now compare this to the entity framework: In the EF I would simply map my stored procedure to my entity in the edmx. There's less risk of a typo, since the entity framework tools will automatically generate the mapping by analyzing my stored procedure.
The entity framework also will handle optimistic concurrency without any problems. Finally, when it comes time to save my changes the only step is to call:
entities.SaveChanges()
I agree, if you rely on stored procedures for all CRUD methods, then there is no need to use EF.
I use EF to map stored procedure calls as our DAL. It saves time in writing your DAL by mapping the functions. We are not using LINQ to SQL as much, as our DBA does not want direct data table access.