Table Valued Function [XML Reader] Very Slow - Alternatives? - tsql

I have the following query that really kills performance and want to know what alternatives their are to an xml reader subquery. The purpose of this query is to export data with some html code.
An example of the table data is as follows.
p_s_id | p_c_id | notes
-----------------------
1 | 1 | this note is really long.
2 | 1 | This is fun.
3 | null | long note here
4 | 2 | this is not fun
5 | 2 | this is not fun
6 | 3 | long note here
I want to take all distinct notes that have the same p_c_id and join them together as shown below.
Any additional information can be provided so feel free to comment.
select distinct
p_c_id
,'<br/><br/>'+(select distinct '• ' +cast(note as nvarchar(max)) + ' <br/> '
from dbo.spec_notes_join m2
where m.p_c_id = m2.p_c_id
and isnull(note,'') <> ''
for xml path(''), type).value('.[1]', 'nvarchar(max)') as notes_spec
from dbo.spec_notes_join m
so the export would look as follows:
p_c_id | notes
--------------
1 | <br/><br/> • this note is really long. <br/> &bull This is fun <br/>
2 | <br/><br/> • This is not fun. <br/>
3 | <br/><br/> • long note here. <br/>

I think you will get slightly better performance you skip the distinct in the outer query and do a group by p_c_id instead.
select p_c_id,
'<br/><br/>'+(select distinct '• ' +cast(note as nvarchar(max)) + ' <br/> '
from dbo.spec_notes_join m2
where m.p_c_id = m2.p_c_id and
isnull(note,'') <> ''
for xml path(''), type).value('.', 'nvarchar(max)') as notes_spec
from dbo.spec_notes_join m
group by p_c_id
You could also try concatenating with a CLR User-Defined Aggregate Function.
Other alternatives can be found here Concatenating Row Values in Transact-SQL.

While this alternative skips the XML, I don’t know if it improves performance—if you could test and post results as a comment, I’d apreciate it. (It worked on my quick mock up, you may need to do some minor debugging on your own structures.)
Start with this function:
CREATE FUNCTION dbo.Testing
(
#p_c_id int
)
RETURNS varchar(max)
AS
BEGIN
DECLARE #ReturnString varchar(max)
SELECT #ReturnString = isnull(#ReturnString + ' <br/> , <br/><br/>• ', '<br/><br/>• ') + Name
from (select distinct note
from spec_notes_join
where p_c_id = #p_c_id
and isnull(note, '') <> '') xx
SET #ReturnString = #ReturnString + ' <br/> '
RETURN #ReturnString
END
GO
and then embed it in your query:
SELECT p_c_id, dbo.Testing(p_c_id)
from (select distinct p_c_id
from dbo.spec_notes_join) xx
This may perform poorly because of the function called required for each row. A possibly quicker variant would be to write the function as a table-valued function, and reference it by a CROSS APPLY in the join clause.

Related

Using TSQL to perform calculations

Ive got a table called NewCodes with the following records
| NewCode | Mapping |
| -------- | -------------- |
| pp1 | [US1] + [US5] |
| qq1 | [US8] – [US9] |
| ww1 | [RE5] + RE6] + [RE7] |
| zx1 | [KJ1] – [XC4] |
Ive got another table called Source Codes which contains a list of values assigned to all the code in the mapping column.
Code
Value
US1
35
US5
10
US8
20
US9
5
RE5
7
RE6
8
RE7
6
I am trying to figure out a way of assigning a value to the codes in the NewCode column using the calculations defined in the Mapping column. I currently use SSMS. So for example.
I have no idea how to attempt this and I was wondering anyone could help
As long as the production doesn't get too much more complicated than the example then this can be done. Specifically:
Only addition and subtraction can be performed, or at least there is no concern for order of operations.
The expressions are all well and consistently formed.
All variables exist in SourceCodes. (This could be overcome using a LEFT JOIN and providing a default value like 0).
The level of the SQL Server supports string_split. (Though I used to split with xml back in the day so this can be overcome.)
The following query will do the following.
Split each Mapping into a table of symbols.
Determine the proper order of symbols since string_split is non-deterministic.
Normalize the symbols so the codes in brackets will match what is found in SourceCodes.
Accumulate the result for each new code.
Return the accumulated result in the last row for each new code partition.
The secret sauce in this solution is the use of recursive CTEs to act like for loops. The first instance is used when determining the order of symbols. In order to determine the start index for successive occurrences of the same symbol the unioned part of the CTE gets the char index from the previous. The second instance in a similar fashion to accumulate values except it relies on the convention that an operator appears on every even row and code on every odd one.
WITH Symbols AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY NewCode, Symbol ORDER BY Symbol) [SymbolSeqNum]
FROM NewCodes
CROSS APPLY (
SELECT value [Symbol]
FROM string_split( Mapping, ' ')
) x
)
, UnnormalizedOrderedSymbols AS (
-- since string_split is nondeterministic we need a way to restore the order.
SELECT NewCode, Symbol, SymbolSeqNum, CHARINDEX(Symbol, Mapping, 1) SymbolOrderIndex
FROM Symbols
WHERE SymbolSeqNum = 1
UNION ALL
SELECT s.NewCode, s.Symbol, s.SymbolSeqNum, CHARINDEX(s.Symbol, s.Mapping, os.SymbolOrderIndex + 1) SymbolOrderIndex
FROM UnnormalizedOrderedSymbols os
INNER JOIN Symbols s ON s.NewCode = os.NewCode AND s.Symbol = os.Symbol AND s.SymbolSeqNum = os.SymbolSeqNum + 1
)
, NormalizedOrderedSymbols AS (
SELECT NewCode
, CASE SymbolType WHEN 'Code' THEN SUBSTRING(Symbol, 2, LEN(Symbol) - 2) ELSE Symbol END [Symbol]
, SymbolType
, ROW_NUMBER() OVER (PARTITION BY NewCode ORDER BY SymbolOrderIndex) [SymbolOrderIndex]
FROM UnnormalizedOrderedSymbols
CROSS APPLY (
SELECT CASE WHEN Symbol LIKE '[[]%]' THEN 'Code' ELSE 'Operator' END [SymbolType]
) x
)
, RunningTotal AS (
SELECT NewCode, c.Value, SymbolOrderIndex
FROM NormalizedOrderedSymbols o
INNER JOIN SourceCodes c ON c.Code = o.Symbol
WHERE o.SymbolOrderIndex = 1
UNION ALL
SELECT rt.NewCode
, CASE op.Symbol
WHEN '+' THEN rt.Value + c.Value
WHEN '-' THEN rt.Value - c.Value
END
, num.SymbolOrderIndex
FROM RunningTotal rt
INNER JOIN NormalizedOrderedSymbols op ON op.NewCode = rt.NewCode AND op.SymbolOrderIndex = rt.SymbolOrderIndex + 1
INNER JOIN NormalizedOrderedSymbols num ON num.NewCode = rt.NewCode AND num.SymbolOrderIndex = rt.SymbolOrderIndex + 2
INNER JOIN SourceCodes c ON c.Code = num.Symbol
)
SELECT x.NewCode, x.Value
FROM (
SELECT rt.NewCode, rt.Value, ROW_NUMBER() OVER (PARTITION BY rt.NewCode ORDER BY SymbolOrderIndex DESC) rn
FROM RunningTotal rt
) x
WHERE x.rn = 1
ORDER BY NewCode
This is obviously is not a very good use of SQL Server and you're probably better off writing a script to perform whatever you're trying to accomplish.

REGEXP_COUNT in postgres

We are migrating from Oracle to Postgres.
here is SQL, where i used to extract data from employee_name column and used to report.
but now i am not sure how to do the regex_count part.
Oracle SQL
with A4 as
(
select 'govinda j/INDIA_MH/9975215025' as employee_name from dual
)
select employee_name ,
TRIM(SUBSTR(upper(A4.employee_name),1,INSTR(A4.employee_name,'/',1,1)-1)) AS employee_name1,
TRIM(SUBSTR(upper(A4.employee_name),INSTR(A4.employee_name,'/',1,1)+1,INSTR(A4.employee_name,'_',1,1)-INSTR(A4.employee_name,'/',1,1)-1)) AS Country,
TRIM(SUBSTR(upper(A4.employee_name),INSTR(A4.employee_name,'_',1,1)+1,INSTR(A4.employee_name,'/',1,2)-INSTR(A4.employee_name,'_',1,1)-1)) AS STATE,
CASE WHEN REGEXP_COUNT(A4.employee_name,'_')>1 THEN 'WRONG_NAME>1_'
WHEN REGEXP_COUNT(A4.employee_name,'/')>2 THEN 'WRONG_NAME>2/'
WHEN TRIM(SUBSTR(upper(A4.employee_name),INSTR(A4.employee_name,'/',1,1)+1,INSTR(A4.employee_name,'_',1,1)-INSTR(A4.employee_name,'/',1,1)-1))NOT IN
('INDIA','NEPAL') THEN 'WRONG_COUNTRY'
ELSE 'CORRECT' END AS VALIDATION
from A4
In Postgres with help i am able to convert it into below part.
with A4 as
(
select 'govinda j/INDIA_MH/9975215025'::text as employee_name
)
select employee_name,
split_part(employee_name, '/', 1) as employee_name1,
split_part(split_part(employee_name, '/', 2), '_', 1) as country,
split_part(split_part(employee_name, '/', 2), '_', 2) as state
from A4
But validation part in not able to convert . any help is highly appreciated as we are very new to postgres.
You can create a custom function:
create or replace function number_of_chars(text, text)
returns integer language sql immutable as $$
select length($1) - length(replace($1, $2, ''))
$$;
Use:
with example(str) as (
values
('a_b_c'),
('a___b'),
('abc')
)
select str, number_of_chars(str, '_') as count
from example
str | count
-------+-------
a_b_c | 2
a___b | 3
abc | 0
(3 rows)
Note that the above function just counts occurrences of a character in a string and does not use regular expressions, which in general are more expensive.
A Postgres equivalent of regexp_count() may look like this:
create or replace function regexp_count(text, text)
returns integer language sql as $$
select count(m)::int
from regexp_matches($1, $2, 'g') m
$$;
with example(str) as (
values
('a_b_c'),
('a___b'),
('abc')
)
select str, regexp_count(str, '_') as single, regexp_count(str, '__') as double
from example
str | single | double
-------+--------+--------
a_b_c | 2 | 0
a___b | 3 | 1
abc | 0 | 0
(3 rows)
For anyone who (like me) is visiting this question in the present day, regexp_count is apparently going to be included in Postgres 15 as per: https://pgpedia.info/r/regexp_count.html
It has the following syntax:
regexp_count ( string text, pattern text [, start integer [, flags text ] ] ) → integer

Extract words before and after a specific word

I need to extract words before and after a word like '%don%' in a ntext column.
table A, column name: Text
Example:
TEXT
where it was done it will retrieve the...
at the end of the trip clare done everything to improve
it is the only one done in these times
I would like the following results:
was done it
clare done everything
one done in
I am using T-SQL, Left and right functions did not work with ntext data type of the column containing text.
As others have said, you can use a string splitting function to split out each word and then return those you require. Using the previously linked DelimitedSplit8K:
CREATE FUNCTION dbo.DelimitedSplit8K
--===== Define I/O parameters
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
;
go
declare #t table (t ntext);
insert into #t values('where it was done it will retrieve the...'),('at the end of the trip clare done everything to improve'),('we don''t take donut donations here'),('ending in don');
with t as (select cast(t as nvarchar(max)) as t from #t)
,d as (select t.t
,case when patindex('%don%',s.Item) > 0 then 1 else 0 end as d
,s.ItemNumber as i
,lag(s.Item,1,'') over (partition by t.t order by s.ItemNumber) + ' '
+ s.Item + ' '
+ lead(s.Item,1,'') over (partition by t.t order by s.ItemNumber) as r
from t
cross apply dbo.DelimitedSplit8K(t.t, ' ') as s
)
select t
,r
from d
where d = 1
order by t
,i;
Output:
+---------------------------------------------------------+-----------------------+
| t | r |
+---------------------------------------------------------+-----------------------+
| at the end of the trip clare done everything to improve | clare done everything |
| ending in don | in don |
| we don't take donut donations here | we don't take |
| we don't take donut donations here | take donut donations |
| we don't take donut donations here | donut donations here |
| where it was done it will retrieve the... | was done it |
+---------------------------------------------------------+-----------------------+
And a working example:
http://rextester.com/RND43071

TSQL: FOR XML PATH('') Failing To Group

I'm trying to group column values by a specific column using FOR XML PATH('') in TSQL. This is the result in both cases (note that the without XML code - ie: SELECT * FROM #xml - is the same as the with XML code):
Class | Animals
=================================
Asteroidea | Starfish
Mammalia | Dog
Mammalia | Cat
Mammalia | Coyote
Reptilia | Crocodile
Reptilia | Lizard
According to this article and this article (note that the second article leaves out the GROUP BY, which I'm unsure how the author managed to pull this off without it - I've tried and it only generates all the values), the syntax should be as shown below this:
DECLARE #xml TABLE(
Animal VARCHAR(50),
Class VARCHAR(50)
)
INSERT INTO #xml
VALUES ('Dog','Mammalia')
, ('Cat','Mammalia')
, ('Coyote','Mammalia')
, ('Starfish','Asteroidea')
, ('Crocodile','Reptilia')
, ('Lizard','Reptilia')
SELECT x1.Class
, STUFF((SELECT ',' + x2.Animal AS [text()]
FROM #xml x2
WHERE x1.Animal = x2.Animal
ORDER BY x2.Animal
FOR XML PATH('')),1,1,'' ) AS "Animals"
FROM #xml x1
GROUP BY Class
After a few hours, between these examples and the above code, I fail to see where I'm wrong on syntax, but I'm receiving the error "Column '#xml.Animal' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause." Note that if I leave off the GROUP BY clause, it still doesn't produce the values in the appropriate manner. Another set of eyes would be useful.
I think you have your WHERE clause using the wrong column, you want to use Class not Animal:
SELECT x1.Class
, STUFF((SELECT ',' + x2.Animal AS [text()]
FROM #xml x2
WHERE x1.Class = x2.Class
ORDER BY x2.Animal
FOR XML PATH('')),1,1,'' ) AS "Animals"
FROM #xml x1
GROUP BY Class
See SQL Fiddle with Demo. The result is:
| CLASS | ANIMALS |
---------------------------------
| Asteroidea | Starfish |
| Mammalia | Cat,Coyote,Dog |
| Reptilia | Crocodile,Lizard |

Converting Access Pivot Table to SQL Server

I'm having trouble converting a MS Access pivot table over to SQL Server. Was hoping someone might help..
TRANSFORM First(contacts.value) AS FirstOfvalue
SELECT contacts.contactid
FROM contacts RIGHT JOIN contactrecord ON contacts.[detailid] = contactrecord.[detailid]
GROUP BY contacts.contactid
PIVOT contactrecord.wellknownname
;
Edit: Responding to some of the comments
Contacts table has three fields
contactid | detailid | value |
1 1 Scott
contactrecord has something like
detailid | wellknownname
1 | FirstName
2 | Address1
3 | foobar
contractrecord is dyanamic in that the user at anytime can create a field to be added to contacts
the access query pulls out
contactid | FirstName | Address1 | foobar
1 | Scott | null | null
which is the pivot on the wellknownname. The key here is that the number of columns is dynamic since the user can, at anytime, create another field for the contact. Being new to pivot tables altogether, I'm wondering how I can recreate this access query in sql server.
As for transform... that's a built in access function. More information is found about it here. First() will just take the first result on that matching row.
I hope this helps and appreciate all the help.
I quick search for dynamic pivot tables comes up with this article.
After renaming things in his last query on the page I came up with this:
DECLARE #PivotColumnHeaders VARCHAR(max);
SELECT #PivotColumnHeaders = COALESCE(#PivotColumnHeaders + ',['+ CAST(wellknownname as varchar) + ']','['+ CAST(wellknownname as varchar) + ']')
FROM contactrecord;
DECLARE #PivotTableSQL NVARCHAR(max);
SET #PivotTableSQL = N'
SELECT *
FROM (
SELECT
c.contactid,
cr.wellknownname,
c.value
FROM contacts c
RIGHT JOIN contactrecord cr
on c.detailid = cr.detailid
) as pivotData
pivot(
min(value)
for wellknownname in (' + #PivotColumnHeaders +')
) as pivotTable
'
;
execute(#PivotTableSQL);
which despite its ugliness, it does the job