I'm working on a .NET program that executes arbitrary scripts against a database.
When a colleage started writing the database access code, he simply exposed one command object to the rest of the application which is re-used (setting CommandText/Type, calling ExecuteNonQuery() etc.) for each statement.
I imagine this is a big performance hit for repeated, identical statements, because they are parsed anew each time.
What I'm wondering about, though, is: will this also degrade execution speed if each statement is different from the previous one (not only different parameters, but an entirely different statement)? I couldn't easily find an answer on that in the documentation.
Btw, the RDBMS used is Oracle, but I guess this question is not really database specific.
P.S. I know exposing the same Command object is not thread safe, but that's not an issue here.
There is some overhead involved in creating new command objects, and so in certain circumstances it can make sense to re-use the same command. But as the general case enforced for an entire application it seems more than a little odd.
The performance hit usually comes from establishing a connection to the database, but ADO.NET creates a connection pool to help here.
If you wish to avoid parsing statements each time anew, you can put them into stored procedures.
I imagine your colleague just uses some old style approach that he's inherited from working on other platforms where reusing a command object did make a difference.
Related
I have just moved to PostgreSQL after having worked with Oracle for a few years.
I have been looking into some performance issues with prepared statements in the application (Java, JDBC) with the PostgreSQL database.
Oracle caches prepared statements in its SGA - the pool of prepared statements is shared across database connections.
PostgreSQL documentation does not seem to indicate this. Here's the snippet from the documentation (https://www.postgresql.org/docs/current/static/sql-prepare.html) -
Prepared statements only last for the duration of the current database
session. When the session ends, the prepared statement is forgotten,
so it must be recreated before being used again. This also means that
a single prepared statement cannot be used by multiple simultaneous
database clients; however, each client can create their own prepared
statement to use.
I just want to make sure that I am understanding this right, because it seems so basic for a database to implement some sort of common pool of commonly executed prepared statements.
If PostgreSQL does not cache these that would mean every application that expects a lot of database transactions needs to develop some sort of prepared statement pool that can be re-used across connections.
If you have worked with PostgreSQL before, I would appreciate any insight into this.
Yes, your understanding is correct. Typically if you had a set of prepared queries that are that critical then you'd have the application call a custom function to set them up on connection.
There are three key reasons for this afaik:
There's a long todo list and they get done when a developer is interested/paid to tackle them. Presumably no-one has thought it worth funding yet or come up with an efficient way of doing it.
PostgreSQL runs in a much wider range of environments than Oracle. I would guess that 99% of installed systems wouldn't see much benefit from this. There are an awful lot of setups without high-transaction performance requirement, or for that matter a DBA to notice whether it's needed or not.
Planned queries don't always provide a win. There's been considerable work done on delaying planning/invalidating caches to provide as good a fit as possible to the actual data and query parameters.
I'd suspect the best place to add something like this would be in one of the connection pools (pgbouncer/pgpool) but last time I checked such a feature wasn't there.
HTH
I'm setting up a new application using Entity Framework Code Fist and I'm looking at ways to try to reduce the number of round trips to the SQL Server as much as possible.
When I first read about the .Local property here I got excited about the possibility of bringing down entire object graphs early in my processing pipeline and then using .Local later without ever having to worry about incurring the cost of extra round trips.
Now that I'm playing around with it I'm wondering if there is any way to take down all the data I need for a single request in one round trip. If for example I have a web page that has a few lists on it, news and events and discussions. Is there a way that I can take down the records of their 3 unrelated source tables into the DbContext in one single round trip? Do you all out there on the interweb think it's perfectly fine when a single page makes 20 round trips to the db server? I suppose with a proper caching mechanism in place this issue could be mitigated against.
I did run across a couple of cracks at returning multiple results from EF queries in one round trip but I'm not sure the complexity and maturity of these kinds of solutions is worth the payoff.
In general in terms of composing datasets to be passed to MVC controllers do you think that it's best to simply make a separate query for each set of records you need and then worry about much of the performance later in the caching layer using either the EF Caching Provider or asp.net caching?
It is completely ok to make several DB calls if you need them. If you are affraid of multiple roundtrips you can either write stored procedure and return multiple result sets (doesn't work with default EF features) or execute your queries asynchronously (run multiple disjunct queries in the same time). Loading unrealted data with single linq query is not possible.
Just one more notice. If you decide to use asynchronous approach make sure that you use separate context instance in each asynchronous execution. Asynchronous execution uses separate thread and context is not thread safe.
I think you are doing a lot of work for little gain if you don't already have a performance problem. Yes, pay attention to what you are doing and don't make unnecessary calls. The actual connection and across the wire overhead for each query is usually really low so don't worry about it.
Remember "Premature optimization is the root of all evil".
My rule of thumb is that executing a call for each collection of objects you want to retrieve is ok. Executing a call for each row you want to retrieve is bad. If your web page requires 20 collections then 20 calls is ok.
That being said, reducing this to one call would not be difficult if you use the Translate method. Code something like this would work
var reader = GetADataReader(sql);
var firstCollection = context.Translate<whatever1>(reader);
reader.NextResult();
var secondCollection = context.Translate<whateve2r>(reader);
etc
The big down side to doing this is that if you place your sql into a stored proc then your stored procs become very specific to your web pages instead of being more general purpose. This isn't the end of the world as long as you have good access to your database. Otherwise you could just define your sql in code.
Whenever I watch a demo regarding the Entity Framework the demonstrator simply sets up some tables and performs Inserts, Updates and Deletes using automatically created code stubs but never shows any use of stored procedures. It seems to me that this is executing SQL from the client.
In my experience this is not particular good practice so I am presuming that my understanding of the Entity Framework is wrong.
Similarly WCF RIA Services demos use the EF and the demos are always the same. Can anyone shed any light on how you would use EF in a typical Business Layer/Data Access Layer/Stored Procedures set up.
I think I am confused and shouldn't be!!?
There's nothing wrong with executing SQL from the client. Most (if not all) of the problems that it might cause are in fact not there when using something like EF. For instance:
Client generated SQL might cause runtime syntax errors. This is not unlikely since the description of your query is mostly checked on compile time (assuming that the generator itself doesn't generate invalid SQL, which is also unlikely)
Client generated SQL might be inefficient. This is not true with modern database software which have query caches. EF works in a way that's compatible with query caches, i.e. it generates the same SQL consistently (as long as you use the same code consistently) and uses parameters for varying data.
Client generated SQL might be insecure (SQL injections and whatnot). This is all handled by the generator, which uses parameters for your values and does not interpolate user input into the query itself.
Back in the old Client / Server days, it used to be considered good practice to do all db updates using stored procedures.
But now, it's perfectly acceptable to have an O/RM generate SQL and run directly against DB.
Well, part of the reason why executing sql in stored procedures is a good idea is that it gives you a level of abstraction - when db changes inevitably occur, you make a change in a single place (the proc) rather than a dozen places (all the places where you were calling the client sql). Entity Framework provides this layer of abstraction through the data model, and you have the same advantage.
There are some other reasons why you might want to look at procs, like security granularity (only allowing certain users the right to execute), and some minor performance differences. Ultimately, you have to decide for yourself what the right trade-off is. EF is an attempt to dramatically reduce the developer time spent creating a data layer, with the trade-offs listed above.
never shows any use of stored procedures
Take a look at this video: Using Your Own Stored Procedures to Insert, Update and Delete Entities in Entity Framework.
Note that there are a lot of other videos on that topic there that are certainly worth watching!
The legend is that Scott Hanselman once said "It's not a real demo unless someone drags a datagrid" (pg 478 Silverlight 4 In Action, Pete Brown)
You have to remember that demos, are all about selling software, and not at all about communicating best practice. So your observations about the demos are absolutely correct, they cover the basics, and leave it to the observer to fill in the blanks.
As to your comment about Stored Procedures, and various answers to your question about the generator. The generator is good, and getting better. Howerver there are certain circumstances when it will generate completely unusable queries. (see my SO question here and discussed on the ADO.NET team blog)
Therefore there are occasions when hand crafted queries are your only recourse (either by way of stored proc, table value functions, views etc)
we are running java6/hibernate/c3p0/postgresql stack.
Our JDBC Driver is 8.4-701.jdbc3
I have a few questions about Prepared Statements. I have read
excellent document about Prepared Statements
But i still have a question how to configure c3p0 with postgresql.
At the moment we have
c3p0.maxStatements = 0
c3p0.maxStatementsPerConnection = 0
In my understanding the prepared statements and statement pooling are two different things:
Our hibernate stack uses prepared statements. Postgresql is caching the
execution plan. Next time the same statement is used, postgresql reuses the
execution plan. This saves time planning statements inside DB.
Additionally c3p0 can cache java instances of "java.sql.PreparedStatement"
which means it is caching the java object. So when using
c3p0.maxStatementsPerConnection = 100 it caches at most 100 different
objects. It saves time on creating objects, but this has nothing to do with
the postgresql database and its prepared statements.
Right?
As we use about 100 different statements I would set
c3p0.maxStatementsPerConnection = 100
But the c3p0 docs say in c3p0 known shortcomings
The overhead of Statement pooling is
too high. For drivers that do not
perform significant preprocessing of
PreparedStatements, the pooling
overhead outweighs any savings.
Statement pooling is thus turned off
by default. If your driver does
preprocess PreparedStatements,
especially if it does so via IPC with
the RDBMS, you will probably see a
significant performance gain by
turning Statement pooling on. (Do this
by setting the configuration property
maxStatements or
maxStatementsPerConnection to a value
greater than zero.).
So: Is it reasonable to activate maxStatementsPerConnection with c3p0 and Postgresql?
Is there a real benefit activating it?
kind regards
Janning
I don't remember offhand if Hibernate actually stores PreparedStatement instances itself, or relies on the connection provider to reuse them. (A quick scan of BatcherImpl suggests it reuses the last PreparedStatement if executing the same SQL multiple times in a row)
I think the point that the c3p0 documentation is trying to make is that for many JDBC drivers, a PreparedStatement isn't useful: some drivers will end up simply splicing the parameters in client-side and then passing the built SQL statement to the database anyway. For these drivers, PreparedStatements are no advantage at all, and any effort to reuse them is wasted. (The Postgresql JDBC FAQ says this was the case for Postgresql before sever protocol version 3 and there is more detailed information in the documentation).
For drivers that do handle PreparedStatements usefully, it's still likely necessary to actually reuse PreparedStatement instances to get any benefit. For example if the driver implements:
Connection.prepareStatement(sql) - create a server-side statement
PreparedStatement.execute(..) etc - execute that server-side statement
PreparedStatement.close() - deallocate the server-side statement
Given this, if the application always opens a prepared statement, executes it once and then closes it again, there's still no benefit; in fact, it might be worse since there are now potentially more round-trips. So the application needs to hang on to PreparedStatement instances. Of course, this leads to another problem: if the application hangs on to too many, and each server-side statement consumes some resources, then this can lead to server-side issues. In the case where someone is using JDBC directly, this might be managed by hand- some statements are known to be reusable and hence are prepared; some aren't and just use transient Statement instances instead. (This is skipping over the other benefit of prepared statements: handling argument escaping)
So this is why c3p0 and other connection pools also have prepared statement caches- it allows application code to avoid dealing with all this. The statements are usually kept in some limited LRU pool, so common statements reuse a PreparedStatement instance.
The final pieces of the puzzle are that JDBC drivers may themselves decide to be clever and do this; and servers may themselves also decide to be clever and detect a client submitting a statement that is structurally similar to a previous one.
Given that Hibernate doesn't itself keep a cache of PreparedStatement instances, you need to have c3p0 do that in order to get the benefit of them. (Which should be reduced overhead for common statements due to reusing cached plans). If c3p0 doesn't cache prepared statements, then the driver will just see the application preparing a statement, executing it, and then closing it again. Looks like the JDBC driver has a "threshold" setting for avoiding the prepare/execute server overhead in the case where the application always does this. So, yes, you need to have c3p0 do statement caching.
Hope that helps, sorry it's a bit long winded. The answer is yes.
Remember that statements have to be cached per connection which will mean you're going to have to consume quite a chunk of memory and it will take a long time before you'll see any benefit. So if you set it to use 100 statements to be cached, that's actually 100*number of connections or else 100/no of connections but you will still need to take quite some time until your cache will have any meaningful effect.
I have a ADO.NET/TSQL performance question. We have two options in our application:
1) One big database call with multiple result sets, then in code step through each result set and populate my objects. This results in one round trip to the database.
2) Multiple small database calls.
There is much more code reuse with Option 2 which is an advantage of that option. But I would like to get some input on what the performance cost is. Are two small round trips twice as slow as one big round trip to the database, or is it just a small, say 10% performance loss? We are using C# 3.5 and Sql Server 2008 with stored procedures and ADO.NET.
I would think it in part would depend on when you need the data. For instance if you return ten datasets in one large process, and see all ten on the screen at once, then go for it. But if you return ten datasets and the user may only click through the pages to see three of them then sending the others was a waste of server and network resources. If you return ten datasets but the user really needs to see sets seven and eight only after making changes to sets 5 and 6, then the user would see the wrong info if you returned it too soon.
If you use separate stored procs for each data set called in one master stored proc, there is no reason at all why you can't reuse the code elsewhere, so code reuse is not really an issue in my mind.
It sounds a wee bit obvious, but only send what you need in one call.
For example, we have a "getStuff" stored proc for presentation. The "updateStuff" proc calls "getStuff" proc and the client wrapper method for "updateStuff" expects type "Thing". So one round trip.
Chatty servers are one thing you prevent up front with minimal effort. Then, you can tune the DB or client code as needed... but it's hard to factor out the roundtrips later no matter how fast your code runs. In the extreme, what if your web server is in a different country to your DB server...?
Edit: it's interesting to note the SQL guys (HLGEM, astander, me) saying "one trip" and the client guys saying "multiple, code reuse"...
I am struggling with this problem myself. And I don't have an answer yet, but I do have some thoughts.
Having reviewed the answers given by others to this point, there is still a third option.
In my appllication, around ten or twelve calls are made to the server to get the data I need. Some of the datafields are varchar max and varbinary max fields (pictures, large documents, videos and sound files). All of my calls are synchronous - i.e., while the data is being requested, the user (and the client side program) has no choice but to wait. He may only want to read or view the data which only makes total sense when it is ALL there, not just partially there. The process, I believe, is slower this way and I am in the process of developing an alternative approach which is based on asynchronous calls to the server from a DLL libaray which raises events to the client to announce the progress to the client. The client is programmed to handle the DLL events and set a variable on the client side indicating chich calls have been completed. The client program can then do what it must do to prepare the data received in call #1 while the DLL is proceeding asynchronously to get the data of call #2. When the client is ready to process the data of call #2, it must check the status and wait to proceed if necessary (I am hoping this will be a short or no wait at all). In this manner, both server and client side software are getting the job done in a more efficient manner.
If you're that concerned with performance, try a test of both and see which performs better.
Personally, I prefer the second method. It makes life easier for the developers, makes code more re-usable, and modularizes things so changes down the road are easier.
I personally like option two for the reason you stated: code reuse
But consider this: for small requests the latency might be longer than what you do with the request. You have to find that right balance.
As the ADO.Net developer, your job is to make the code as correct, clear, and maintainable as possible. This means that you must separate your concerns.
It's the job of the SQL Server connection technology to make it fast.
If you implement a correct, clear, maintainable application that solves the business problems, and it turns out that the database access is the major bottleneck that prevents the system from operating within acceptable limits, then, and only then, should you start persuing ways to fix the problem. This may or may not include consolidating database queries.
Don't optimize for performance until a need arisess to do so. This means that you should analyze your anticipated use patterns and determine what the typical frequency of use for this process will be, and what user interface latency will result from the present design. If the user will receive feedback from the app is less than a few (2-3) seconds, and the application load from this process is not an inordinate load on server capacity, then don't worry about it. If otoh the user is waiting an unacceptable amount of time for a response (subjectve but definitiely measurable) or if the server is being overloaded, then it's time to begin optimization. And then, which optimization techniques will make the most sense, or be the most cost effective, depend on what your analysis of the issue tells you.
So, in the meantime, focus on maintainability. That means, in your case, code reuse
Personally I would go with 1 larger round trip.
This will definately be influenced by the exact reusability of the calling code, and how it might be refactored.
But as mentioned, this will depend on your exact situation, where maintainability vs performance could be a factor.