SQL 2008. I am running sproc from SQL Studio and when I try to get Actual Execution Plan it blows up the tempdb.
I narrowed the problem down to call to a row scalar function which is used on 700K rows table.
I deduced that SQL is trying to create 700K exec plans for that function and writes all the data to tempdb which has 3Gb free space..
I dont really need to see the plan for that function.
Can I explicitely exclude a statement from generation of exec plan?
You can't exclude it from a execution plan, other than removing the call from the query.
It does however, sound like a prime candidate to switch from a scalar UDF to an inline table UDF. Scalar UDFs can be a big cause of poor performance due to be run once per row in a query.
Have a read through this article which contains an example to demonstrate.
Related
I've a procedure in Oracle PL/SQL which fetches transactional data based on certain condition, then performs some logical calculations. I used cursor to store the SQL and then I used FETCH (cursor) BULK COLLECT INTO (table type variable) LIMIT 10000, iterated over this table variable to perform calculation and ultimately storing the value in a DB table. Once 10000 rows have been processed, query will be executed to fetch next set of records,
This helped me limiting number of times SQL is executed via cursor and limiting the number of records loaded into memory.
I am trying to migrate this code to plpgsql. How can I achieve this functionality in plpgsql?
You cannot achieve this functionality in PostgreSQL.
I wrote an extension https://github.com/okbob/dbms_sql . It can be used for reduce of necessary work related to migration from Oracle to Postgres.
But you don't need this feature in Postgres. Although PL/pgSQL is similar to PL/SQL, the architecture is very different - and bulk collect operations are not necessary.
Looking here, it is clear that Oracle supports execution of DDL commands in parallel with scenarios clearly listed. I was wondering whether Postgres does indeed offer such functionality? I can find a lot of material on "parallel queries" for PostgreSQL but not so much when DDL is involved.
For example, can I execute multiple 'CREATE TABLE...AS SELECT' in parallel? And if not, how can I achieve such functionality? What happens if I have a temporary table (CREATE TEMP TABLE)? Do I need to configure something for locks?
From here:
Even when it is in general possible for parallel query plans to be generated, the planner will not generate them for a given query if any of the following are true:
The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within
a CTE, no parallel plans for that query will be generated.
(emphasis mine).
Which seems to suggest that Postgres will not "parallelize" any query that modifies the database structure, under any circumstances.
Running multiple queries simultaneously in Postgres requires one connection per running query.
Those are generic DDL statements, they are index operations and partition operations that can be parallelized.
If you check the Notes section of the CREATE INDEX statement, you'll see that parallel index building is supported :
PostgreSQL can build indexes while leveraging multiple CPUs in order to process the table rows faster. This feature is known as parallel index build. For index methods that support building indexes in parallel (currently, only B-tree), maintenance_work_mem specifies the maximum amount of memory that can be used by each index build operation as a whole, regardless of how many worker processes were started. Generally, a cost model automatically determines how many worker processes should be requested, if any.
Update
I suspect the real question is about CREATE TABLE ... AS though.
This is essentially a CREATE TABLE followed by an INSERT .. SELECT. The CREATE TABLE part can't be parallelized and doesn't have to - it's essentially a metadata operation. The SELECT on the other hand, could be parallelized easily. INSERT is a bit harder, but it's a matter of implementation.
As a_horse_with_no_name explains in a comment to this question, parallelization for CREATE TABLE AS was added in PostgreSQL 11 :
Improvements to parallelism, including:
CREATE INDEX can now use parallel processing while building a B-tree index
Parallelization is now possible in CREATE TABLE ... AS, CREATE MATERIALIZED VIEW, and certain queries using UNION
Parallelized hash joins and parallelized sequential scans now perform better
If I change user-defined function (UDF) that is in use in another UDF will the PostgreSQL rebuild execution plan for both of them or only for changed one?
If a function is unchanged, the execution plans for all its queries should remain cached.
So only the execution plans for the function you changed will be recalculated.
You can use the SQL statement DISCARD PLANS to discard all cached plans.
I am just starting to use tsqlt inside redgate's sql test. I have to deal with quite large tables (large number of columns) in a legacy databases. What is the best practise to insert some fake data into such tables (the 'script as' insert statements are quite large) - hence they would make my 'arrange part' of the unit test literally unreadable. Can I factor out such code? Also is there a way to not only script the insert statement but also fill in some values automagically? Thanks.
I would agree with your comment that you don't need to fill out all of the columns in your insert statement.
tSQLt.FakeTable removes all non-null constraints from the columns, as well as computed columns and identity columns (although these last two can be reinstated using particular parameters to FakeTable).
Therefore, you only need to populate the columns which are relevant to your code under test, which is usually only a smaller subset of columns from the table(s).
I wrote about this in a bit more detail in this article which also contains a few other 'gotchas' you may want to know.
Additionally, I'd suggest that if you have a number of tests which all need the same table faked and data inserted, that you consider using a SetUp routine - this is a Stored procedure in the test class (schema) which is called SetUp, and is called by tSQLt before each test in that schema. They won't show in RedGate's SQL Test window as yet (I've suggested it as an improvement), but will still work. This can make it harder to see - but does modularise that code thus reducing identical, repeated code.
I need to make sure my ado.net commands don't return more than 1000-5000 rows. Is there an ADO.NET way to do this, or is it just a TSQL?
I'm calling a stored procedure, I don't control the source code in that stored procedure. Hence I was hoping there was a ADO.NET way to do it.
Before LINQ this typically was always done using a top N clause in your inline query or stored proc. With LINQ there is some cool functions called "Take" and "Skip" which provides a construct for downloading and or skipping N number of rows. Under the hood LINQ figures out the details of how to construct the inline query that yields the exact number of rows you want off the top.
[Edit]
Since you're calling a stored procedure, I'd advise just using a TOP N clause in the select statement. This is path of least resistance and IMHO is the simplest to maintain going forward since you already have the stored procedure.