Using SQL Server 2005 and have a stored procedure (not written by me) that I now have to maintain. Its an important query and I want to make some changes to make it easier to maintain without totally rewritting the whole thing. Just exploring some options, so...
How many variables can I have in a stored procedure (is there a limit)?
DECLARE #MyVariable int
According to this article, there is a limit of 2100 parameters that can be passed to a stored proc, so I would assume you could do at least that many variables.
Edit:
Per article:
The maximum number of local variables in a stored procedure is limited only by available memory.
Related
We are using OpenEdge 10.2A, and generating summary reports using progress procedures. We want to decrease the production time of the reports.
Since using Accumulate and Accum functions are not really faster than defining variables to get summarized values, and readibility of them is much worse, we don't really use them.
We have tested our data using SQL commands using ODBC connection and results are much faster than using procedures.
Let me give you an example. We run the below procedure:
DEFINE VARIABLE i AS INTEGER NO-UNDO.
ETIME(TRUE).
FOR EACH orderline FIELDS(ordernum) NO-LOCK:
ASSIGN i = i + 1.
END.
MESSAGE "Count = " (i - 1) SKIP "Time = " ETIME VIEW-AS ALERT-BOX.
The result is:
Count= 330805
Time= 1891
When we run equivalent SQL query:
SELECT count(ordernum) from pub.orderline
The execution time is 141.
In short, when we compare two results; sql time is more than 13 times faster then procedure time.
This is just an example. We can do the same test with other aggregate functions and time ratio does not change much.
And my question has two parts;
1-) Is it possible to get aggregate values using procedures as fast as using sql queries?
2-) Is there any other method to get summarized values faster other than using real time SQL queries?
The 4gl and SQL engines use very different approaches to sending the data to the client. By default SQL is much faster. To get similar performance from the 4gl you need to adjust several parameters. I suggest:
-Mm 32600 # messages size, default 1024, max 32600
-prefetchDelay # don't send the first record immediately, instead bundle it
-prefetchFactor 100 # try to fill message 100%
-prefetchNumRecs 10000 # if possible pack up to 10,000 records per message, default 16
Prior to 11.6 changing -Mm requires BOTH the client and the server to be changed. Starting with 11.6 only the server needs to be changed.
You need at least OpenEdge 10.2b06 for the -prefetch* parameters.
Although there are caveats (among other things joins will not benefit) these parameters can potentially greatly improve the performance of "NO-LOCK queries". A simple:
FOR EACH table NO-LOCK:
/* ... */
END.
can be greatly improved by the use of the parameters above.
Use of a FIELDS list can also help a lot because it reduces the amount of data and thus the number of messages that need to be sent. So if you only need some of the fields rather than the whole record you can code something like:
FOR EACH customer FIELDS ( name balance ) NO-LOCK:
or:
FOR EACH customer EXCEPT ( photo ) NO-LOCK:
You are already using FIELDS and your sample query is a simple NO-LOCK so it should benefit substantially from the suggested parameter settings.
The issue at hand seems to be to "decrease the production time of the reports.".
This raises some questions:
How slow are the reports now and how fast do you want them?
Have running time increased compared to for instance last year?
Has the data amount also increased?
Has something changed? Servers, storage, clients, etc?
It will be impossible to answer your question without more information. Data access from ABL will most likely be fast enough if:
You have correct indexes (indices) set up in your database.
You have "good" queries.
You have enough system resources (memory, cpu, disk space, disk speed)
You have a database running with a decent setup (-spin, -B parameters etc).
The time it takes for a simple command like FOR EACH <table> NO-LOCK: or SELECT COUNT(something) FROM <somewhere> might not indicate how fast or slow your real super complicated query might run.
Some additional suggestions:
It is possible to write your example as
DEFINE VARIABLE i AS INTEGER NO-UNDO.
ETIME(TRUE).
select count(*) into i from orderline.
MESSAGE "Count = " (i - 1) SKIP "Time = " ETIME VIEW-AS ALERT-BOX.
which should yield a moderate performance increase. (This is not using an ODBC connection. You can use a subset of SQL in plain 4GL procedures. It is debatable if this can be considered good style.)
There should be a significant performance increase by accessing the database through shared memory instead of TCP/IP, if you are running the code on the server (which you do) and you are not already doing so (which you didn't specify).
open query q preselect each EACH orderline no-lock.
message num-results("q") view-as alert-box.
I'm rewriting some SQL code that generates a "bucket" column based on the current month (that happens to be stored as VARCHAR, ugh).
I noticed that the previous author decided to store a certain number of dates for comparison as variables, and then use a case statement to calculate the bucket.
See SQL Fiddle Here that demonstrates a slice of this (it happens about five times in three different ways similar to this).
Is there any reason I cannot simply get rid of most of the redundant variables and simplify it to a few lines of code (See This Fiddle)? Is the overhead of in-line function calls great enough to justify doing this, or does the query compiler cache the results of that portion anyways?
I'm having a scaling issue with an application that uses a PostgreSQL 9 backend. I have one table who's size is about 40 million records and growing and the conditional queries against it have slowed down dramatically.
To help figure out what's going wrong, I've taken a development snapshot of the database and dump the queries with the execution time into the log.
Now for the confusing part, and the gist of the question ....
The run times for my queries in the log are vastly different (an order of magnitude+) that what I get when I run the 'exact' same query in DbVisualizer to get the explain plan.
I say 'exact' but really the difference is, the application is using a prepared statement to which I bind values at runtime while the queries I run in DbVisualizer has those values in place already. The values themselves are exactly as I pulled them from the log.
Could the use of prepared statements make that big of a difference?
The answer is YES. Prepared statements cut both ways.
On the one hand, the query does not have to be re-planned for every execution, saving some overhead. This can make a difference or be hardly noticeable, depending on the complexity of the query.
On the other hand, with uneven data distribution, a one-size-fits-all query plan may be a bad choice. Called with particular values another query plan could be (much) better suited.
Running the query with parameter values in place can lead to a different query plan. More planning overhead, possibly a (much) better query plan.
Also consider unnamed prepared statements like #peufeu provided. Those re-plan the query considering parameters every time - and you still have safe parameter handling.
Similar considerations apply to queries inside PL/pgSQL functions, where queries can be treated as prepared statements internally - unless executed dynamically with EXECUTE. I quote the manual on Executing Dynamic Commands:
The important difference is that EXECUTE will re-plan the command on
each execution, generating a plan that is specific to the current
parameter values; whereas PL/pgSQL may otherwise create a generic plan
and cache it for re-use. In situations where the best plan depends
strongly on the parameter values, it can be helpful to use EXECUTE to
positively ensure that a generic plan is not selected.
Apart from that, general guidelines for performance optimization apply.
Erwin nails it, but let me add that the extended query protocol allows you to use more flavors of prepared statements. Besides avoiding re-parsing and re-planning, one big advantage of prepared statements is to send parameter values separately, which avoids escaping and parsing overhead, not to mention the opportunity for SQL injections and bugs if you don't use an API that handles parameters in a manner you can't forget to escape them.
http://www.postgresql.org/docs/9.1/static/protocol-flow.html
Query planning for named prepared-statement objects occurs when the
Parse message is processed. If a query will be repeatedly executed
with different parameters, it might be beneficial to send a single
Parse message containing a parameterized query, followed by multiple
Bind and Execute messages. This will avoid replanning the query on
each execution.
The unnamed prepared statement is likewise planned during Parse
processing if the Parse message defines no parameters. But if there
are parameters, query planning occurs every time Bind parameters are
supplied. This allows the planner to make use of the actual values of
the parameters provided by each Bind message, rather than use generic
estimates.
So, if your DB interface supports it, you can use unnamed prepared statements. It's a bit of a middle ground between a query and a usual prepared statement.
If you use PHP with PDO, please note that PDO's prepared statement implementation is rather useless for postgres, since it uses named prepared statements, but re-prepares every time you call prepare(), no plan caching takes place. So you get the worst of both : many roundtrips and plan without parameters. I've seen it be 1000x slower than pg_query() and pg_query_params() on specific queries where the postgres optimizer really needs to know the parameters to produce the optimal plan. pg_query uses raw queries, pg_query_params uses unnamed prepared statements. Usually one is faster than the other, that depends on the size of parameter data.
What is the maximum number of placeholders is allowed in a single statement? I.e. the upper limit of attribute NUM_OF_PARAMS.
I'm experiencing odd issue where I try to tune the maximum number of multiple rows insert, ie set the number to 20,000 gives me an error because $sth->{NUM_OF_PARAMS} becomes negative.
Reducing the max inserts to 5000 works fine.
Thanks.
As far as I am aware the only limitation in DBI is that the value is placed into a Perl scalar so it is what can be held in that. However, for DBDs it is totally different. I doubt many, if any databases support 20000 parameters. BTW, NUM_OF_PARAMS is readonly so I've no idea what you mean by "set the number to 20,000". I presume you just mean you create a SQL statement with 20000 parameters and then read NUM_OF_PARAMS and it gives you a negative value. If the latter I suggest you report (with an example) that on rt.cpan.org as it does not sound right at all.
I cannot imagine creating a SQL statement with 20000 parameters is going to be very efficient in any database. Far better to try and reduce that to a range or something like it if you can. In ODBC, 20000 parameters would mean 20000 IPDs and APDs and they are quite big structures. Since DB2 cli library is very like ODBC I would imagine you are going to eat up loads of memory.
Given that 20,000 causes negative problems and 5,000 doesn't, there's a signed 16-bit integer somewhere in the system, and the upper bound is therefore approximately 16383.
However, the limit depends on the underlying DBMS and the API used by the DBD module for the DBMS (and possibly the DBD code itself); it is not affected by DBI.
Are you sure that's the best way to deal with your problem?
I'm trying to figure out how I can parallelize some procedural code to create records in a table.
Here's the situation (sorry I can't provide much in the way of actual code):
I have to predict when a vehicle service will be needed, based upon the previous service date, the current mileage, the planned daily mileage and the difference in mileage between each service.
All in all - it's very procedural, for each vehicle I need to take into account it's history, it's current servicing state, the daily mileage (which can change based on ranges defined in the mileage plan), and the sequence of servicing.
Currently I'm calculating all of this in PHP, and it takes about 20 seconds for 100 vehicles. Since this may in future be expanded to several thousand, 20 seconds is far too long.
So I decided to try and do it in a CLR stored procedure. At first I thought I'd try multithreading it, however I quickly found out it's not easy to do in the TSQL host. I was recommended to allow TSQL to work out the parallelization itself. Yet I have no idea how. If it wasn't for the fact the code needs to create records I could define it as a function and do:
SELECT dbo.PredictServices([FleetID]) FROM Vehicles
And TSQL should figure out it can parallelize that, but I know of no alternative for procedures.
Is there anything I can do to parallelize this?
The recommendation you received is a correct one. You simply don't have .NET framework facilities for parallelism available in your CLR stored procedure. Also please keep in mind that the niche for CLR Stored Procedures is rather narrow and they adversely impact SQL Server's performance and scalability.
If I understand the task correctly you need to compute a function PredictServices for some records and store the results back to database. In this case CLR Stored procedures could be your option provided PredictServices is just data access/straightforward transformation of data. Best practice is to create WWF (Windows Workflow Foundation) service to perform computations and call it from PHP. In Workflow Service you can implement any solution including one involving parallelism.