DB2/400 Alternative to Opening a Cursor - db2

I'm running DB2 for i, V7R2 TR3.
I was told opening a cursor provides a lot of overhead and should be avoided whenever possible. From what I've read, using EXECUTE INTO var1 USING var2 is an alternative, but I can't get it working. I am getting a SQL0104 error.
This is my Stored Procedure:
BEGIN
DECLARE STMT1 VARCHAR ( 500 ) ;
SET STMT1 = 'SELECT SUBSTR (''' || TRIM(ITEM) || ''' , ( LENGTH ( TRIM ( PREFIX ) ) + 1 ) , ( 20 - LENGTH ( TRIM ( PREFIX ) ) ) ) ' ||
'FROM MYLIB.MYTABLE ' ||
'WHERE PREFIX = SUBSTR(''' || TRIM(ITEM) || ''', 0,LENGTH ( TRIM ( PREFIX ) ) + 1 ) ' ||
'AND SEQ1 = ' || TYPE || ' ' ||
'ORDER BY LENGTH ( TRIM ( PREFIX) ) DESC , TRIM (PREFIX) DESC ' ||
'FETCH FIRST 1 ROWS ONLY ';
PREPARE S1 FROM STMT1;
EXECUTE S1 INTO BASEITEM;
--OPEN C1 ;
-- FETCH C1 INTO BASEITEM ;
--CLOSE C1 ;
IF(TRIM(BASEITEM) = '') THEN
SET BASEITEM = ITEM;
END IF;
END ;
It has three variables defined as such:
IN ITEM CHAR(20) CCSID 37 DEFAULT '' ,
IN TYPE INT DEFAULT 1 ,
INOUT BASEITEM CHAR(20) DEFAULT ''
When I have EXECUTE INTO... this will not compile, if I use EXECUTE USING... it will compile but BASEITEM ends up being blank and the IF statements resolves as true.
I've tried following the the EXECUTE documentation and apparently the INTO can only work for CALL or VALUES INTO statements. So I then tried to follow the VALUES INTO documentation, but can't figure out how to use a query with it.
Note*: I'd like to eventually change the concatenated variables to use parameter markers, but I was going to worry about that next.
I figured I should post the table being used:
MYLIB.MYTABLE
Column |Prefix|SEQ1|SEQ2|
Row 1 |aaa | 1| 3|
Row 2 |aab | 1| 3|
Row 3 |aabd | 2| 4|
I'm essentially passing in a string and then removing the longest prefix from the string. Then I returned the new string. SEQ1 is just the type of prefix (either 1 or 2), SEQ2 is the length of the prefix.

You're getting an error because EXECUTE stmt INTO doesn't work the way you think it does.
The real problem is that you're trying to use dynamic SQL when you don't need to be. Use static instead.
BEGIN
SELECT SUBSTR (TRIM(ITEM), ( LENGTH ( TRIM ( PREFIX ) ) + 1 ) , ( 20 - LENGTH ( TRIM ( PREFIX ) ) ) )
INTO BASEITEM
FROM MYLIB.MYTABLE
WHERE PREFIX = SUBSTR(TRIM(ITEM), 0,LENGTH ( TRIM ( PREFIX ) ) + 1 )
AND SEQ1 = TYPE
ORDER BY LENGTH ( TRIM ( PREFIX) ) DESC , TRIM (PREFIX) DESC
FETCH FIRST 1 ROWS ONLY;
IF(TRIM(BASEITEM) = '') THEN
SET BASEITEM = ITEM;
END IF;
END ;

EXECUTE can only be used for DML statements, not for queries, so there is really no alternative to declaring a dynamic cursor. Having said that, there is no "overhead in opening a cursor", no more than in executing a query in any other manner, because for any query a cursor will still be created and opened implicitly, if not explicitly.
Cursors are less efficient than set-based operations, e.g. when, instead of doing something like INSERT INTO table1 SELECT something FROM table2 WHERE... you open a cursor over table2 and insert rows into table1 one by one in a loop. Inefficiency in such case arises from performing single row inserts (or updates) vs. multirow updates, not from the cursor itself, and that is completely different from stating that "opening a cursor provides a lot of overhead".
Since in your case you only fetch one record anyway, this loop vs. set argument does not apply.

Related

How to use queried table name in subquery

I'm trying to query field names as well as their maximum length in their corresponding table with a single query - is it at all possible? I've read about correlated subqueries, but I couldn't get the desired result.
Here is the query I have so far:
select T1.RDB$FIELD_NAME, T2.RDB$FIELD_NAME, T2.RDB$RELATION_NAME as tabName, T1.RDB$CHARACTER_SET_ID, T1.RDB$FIELD_LENGTH,
(select max(char_length(T2.RDB$FIELD_NAME))
FROM tabName as MaxLength)
from RDB$FIELDS T1, RDB$RELATION_FIELDS T2
The above doesn't work because, of course, here the subquery tries to find "tabName" table. My guess is that I should use some kind of joins, but my SQL skills are very limited in this matter.
The origin of the request is that I want to apply this script in order to transform all my non-utf8 fields to UTF8 but I run into "string truncation" issues, as I have a few `VARCHAR(8192)' fields that lead to string truncation errors with the script. Usually, none of the fields would actually use these 8192 chars, but I'd rather make sure before truncating.
What you're trying to do cannot be done this way. It looks like you want to obtain the actual maximum length of fields in tables, but you cannot dynamically reference table and column names like this; being able to do that would be a SQL injection heaven. In addition, your use of a SQL-89 cross join instead of an inner join (preferably in SQL-92 style) causes other problems, as you will combine fields incorrectly (as a Cartesian product).
Instead you need to write PSQL to dynamically build and execute the statement to obtain the lengths (using EXECUTE BLOCK (or a stored procedure) and EXECUTE STATEMENT).
For example, something like this:
execute block
returns (
table_name varchar(63) character set unicode_fss,
column_name varchar(63) character set unicode_fss,
type varchar(10),
length smallint,
charset_name varchar(63) character set unicode_fss,
collation_name varchar(63) character set unicode_fss,
max_length smallint)
as
begin
for select
trim(rrf.RDB$RELATION_NAME) as table_name,
trim(rrf.RDB$FIELD_NAME) as column_name,
case rf.RDB$FIELD_TYPE when 14 then 'CHAR' when 37 then 'VARCHAR' end as type,
coalesce(rf.RDB$CHARACTER_LENGTH, rf.RDB$FIELD_LENGTH / rcs.RDB$BYTES_PER_CHARACTER) as length,
trim(rcs.RDB$CHARACTER_SET_NAME) as charset_name,
trim(rc.RDB$COLLATION_NAME) as collation_name
from RDB$RELATIONS rr
inner join RDB$RELATION_FIELDS rrf
on rrf.RDB$RELATION_NAME = rr.RDB$RELATION_NAME
inner join RDB$FIELDS rf
on rf.RDB$FIELD_NAME = rrf.RDB$FIELD_SOURCE
inner join RDB$CHARACTER_SETS rcs
on rcs.RDB$CHARACTER_SET_ID = rf.RDB$CHARACTER_SET_ID
left join RDB$COLLATIONS rc
on rc.RDB$CHARACTER_SET_ID = rf.RDB$CHARACTER_SET_ID
and rc.RDB$COLLATION_ID = rf.RDB$COLLATION_ID
and rc.RDB$COLLATION_NAME <> rcs.RDB$DEFAULT_COLLATE_NAME
where coalesce(rr.RDB$RELATION_TYPE, 0) = 0 and coalesce(rr.RDB$SYSTEM_FLAG, 0) = 0
and rf.RDB$FIELD_TYPE in (14 /* char */, 37 /* varchar */)
into table_name, column_name, type, length, charset_name, collation_name
do
begin
execute statement 'select max(character_length("' || replace(column_name, '"', '""') || '")) from "' || replace(table_name, '"', '""') || '"'
into max_length;
suspend;
end
end
As an aside, the maximum length of a VARCHAR of character set UTF8 is 8191, not 8192.

Select query to remove non-numeric characters

I've got dirty data in a column with variable alpha length. I just want to strip out anything that is not 0-9.
I do not want to run a function or proc. I have a script that is similar that just grabs the numeric value after text, it looks like this:
Update TableName
set ColumntoUpdate=cast(replace(Columnofdirtydata,'Alpha #','') as int)
where Columnofdirtydata like 'Alpha #%'
And ColumntoUpdate is Null
I thought it would work pretty good until I found that some of the data fields I thought would just be in the format Alpha # 12345789 are not.
Examples of data that needs to be stripped
AB ABCDE # 123
ABCDE# 123
AB: ABC# 123
I just want the 123. It is true that all data fields do have the # prior to the number.
I tried substring and PatIndex, but I'm not quite getting the syntax correct or something. Anyone have any advice on the best way to address this?
See this blog post on extracting numbers from strings in SQL Server. Below is a sample using a string in your example:
DECLARE #textval NVARCHAR(30)
SET #textval = 'AB ABCDE # 123'
SELECT LEFT(SUBSTRING(#textval, PATINDEX('%[0-9.-]%', #textval), 8000),
PATINDEX('%[^0-9.-]%', SUBSTRING(#textval, PATINDEX('%[0-9.-]%', #textval), 8000) + 'X') -1)
Here is an elegant solution if your server supports the TRANSLATE function (on sql server it's available on sql server 2017+ and also sql azure).
First, it replaces any non numeric characters with a # character.
Then, it removes all # characters.
You may need to add additional characters that you know may be present in the second parameter of the TRANSLATE call.
select REPLACE(TRANSLATE([Col], 'abcdefghijklmnopqrstuvwxyz+()- ,#+', '##################################'), '#', '')
You can use stuff and patindex.
stuff(Col, 1, patindex('%[0-9]%', Col)-1, '')
SQL Fiddle
This works well for me:
CREATE FUNCTION [dbo].[StripNonNumerics]
(
#Temp varchar(255)
)
RETURNS varchar(255)
AS
Begin
Declare #KeepValues as varchar(50)
Set #KeepValues = '%[^0-9]%'
While PatIndex(#KeepValues, #Temp) > 0
Set #Temp = Stuff(#Temp, PatIndex(#KeepValues, #Temp), 1, '')
Return #Temp
End
Then call the function like so to see the original something next to the sanitized something:
SELECT Something, dbo.StripNonNumerics(Something) FROM TableA
In case if there are some characters possible between digits (e.g. thousands separators), you may try following:
declare #table table (DirtyCol varchar(100))
insert into #table values
('AB ABCDE # 123')
,('ABCDE# 123')
,('AB: ABC# 123')
,('AB#')
,('AB # 1 000 000')
,('AB # 1`234`567')
,('AB # (9)(876)(543)')
;with tally as (select top (100) N=row_number() over (order by ##spid) from sys.all_columns),
data as (
select DirtyCol, Col
from #table
cross apply (
select (select C + ''
from (select N, substring(DirtyCol, N, 1) C from tally where N<=datalength(DirtyCol)) [1]
where C between '0' and '9'
order by N
for xml path(''))
) p (Col)
where p.Col is not NULL
)
select DirtyCol, cast(Col as int) IntCol
from data
Output is:
DirtyCol IntCol
--------------------- -------
AB ABCDE # 123 123
ABCDE# 123 123
AB: ABC# 123 123
AB # 1 000 000 1000000
AB # 1`234`567 1234567
AB # (9)(876)(543) 9876543
For update, add ColToUpdate to select list of the data cte:
;with num as (...),
data as (
select ColToUpdate, /*DirtyCol, */Col
from ...
)
update data
set ColToUpdate = cast(Col as int)
CREATE FUNCTION FN_RemoveNonNumeric (#Input NVARCHAR(512))
RETURNS NVARCHAR(512)
AS
BEGIN
DECLARE #Trimmed NVARCHAR(512)
SELECT #Trimmed = #Input
WHILE PATINDEX('%[^0-9]%', #Trimmed) > 0
SELECT #Trimmed = REPLACE(#Trimmed, SUBSTRING(#Trimmed, PATINDEX('%[^0-9]%', #Trimmed), 1), '')
RETURN #Trimmed
END
GO
SELECT dbo.FN_RemoveNonNumeric('ABCDE# 123')
Pretty late to the party, I found the following which I though worked brilliantialy.. if anyone is still looking
SELECT
(SELECT CAST(CAST((
SELECT SUBSTRING(FieldToStrip, Number, 1)
FROM master..spt_values
WHERE Type='p' AND Number <= LEN(FieldToStrip) AND
SUBSTRING(FieldToStrip, Number, 1) LIKE '[0-9]' FOR XML Path(''))
AS xml) AS varchar(MAX)))
FROM
SourceTable
Here's a version which pulls all digits from a string; i.e. given I'm 35 years old; I was born in 1982. The average family has 2.4 children. this would return 35198224. i.e. it's good where you've got numeric data which may have been formatted as a code (e.g. #123,456,789 / 123-00005), but isn't appropriate if you're looking to pull out specific numbers (i.e. as opposed to digits / just the numeric characters) from the text. Also it only handles digits; so won't return negative signs (-) or periods .).
declare #table table (id bigint not null identity (1,1), data nvarchar(max))
insert #table (data)
values ('hello 123 its 45613 then') --outputs: 12345613
,('1 some other string 98 example 4') --outputs: 1984
,('AB ABCDE # 123') --outputs: 123
,('ABCDE# 123') --outputs: 123
,('AB: ABC# 123') --outputs: 123
; with NonNumerics as (
select id
, data original
--the below line replaces all digits with blanks
, replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(data,'0',''),'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9','') nonNumeric
from #table
)
--each iteration of the below CTE removes another non-numeric character from the original string, putting the result into the numerics column
, Numerics as (
select id
, replace(original, substring(nonNumeric,1,1), '') numerics
, replace(nonNumeric, substring(nonNumeric,1,1), '') charsToreplace
, len(replace(nonNumeric, substring(nonNumeric,1,1), '')) charsRemaining
from NonNumerics
union all
select id
, replace(numerics, substring(charsToreplace,1,1), '') numerics
, replace(charsToreplace, substring(charsToreplace,1,1), '') charsToreplace
, len(replace(charsToreplace, substring(charsToreplace,1,1), '')) charsRemaining
from Numerics
where charsRemaining > 0
)
--we select only those strings with `charsRemaining=0`; i.e. the rows for which all non-numeric characters have been removed; there should be 1 row returned for every 1 row in the original data set.
select * from Numerics where charsRemaining = 0
This code works by removing all the digits (i.e. the characters we want) from a the given strings by replacing them with blanks. Then it goes through the original string (which includes the digits) removing all of the characters that were left (i.e. the non-numeric characters), thus leaving only the digits.
The reason we do this in 2 steps, rather than just removing all non-numeric characters in the first place is there are only 10 digits, whilst there are a huge number of possible characters; so replacing that small list is relatively fast; then gives us a list of those non-numeric characters which actually exist in the string, so we can then replace that small set.
The method makes use of recursive SQL, using common table expressions (CTEs).
To add on to Ken's answer, this handles commas and spaces and parentheses
--Handles parentheses, commas, spaces, hyphens..
declare #table table (c varchar(256))
insert into #table
values
('This is a test 111-222-3344'),
('Some Sample Text (111)-222-3344'),
('Hello there 111222 3344 / How are you?'),
('Hello there 111 222 3344 ? How are you?'),
('Hello there 111 222 3344. How are you?')
select
replace(LEFT(SUBSTRING(replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',',''), PATINDEX('%[0-9.-]%', replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',','')), 8000),
PATINDEX('%[^0-9.-]%', SUBSTRING(replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',',''), PATINDEX('%[0-9.-]%', replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',','')), 8000) + 'X') -1),'.','')
from #table
Create function fn_GetNumbersOnly(#pn varchar(100))
Returns varchar(max)
AS
BEGIN
Declare #r varchar(max) ='', #len int ,#c char(1), #x int = 0
Select #len = len(#pn)
while #x <= #len
begin
Select #c = SUBSTRING(#pn,#x,1)
if ISNUMERIC(#c) = 1 and #c <> '-'
Select #r = #r + #c
Select #x = #x +1
end
return #r
End
In your case It seems like the # will always be after teh # symbol so using CHARINDEX() with LTRIM() and RTRIM() would probably perform the best. But here is an interesting method of getting rid of ANY non digit. It utilizes a tally table and table of digits to limit which characters are accepted then XML technique to concatenate back to a single string without the non-numeric characters. The neat thing about this technique is it could be expanded to included ANY Allowed characters and strip out anything that is not allowed.
DECLARE #ExampleData AS TABLE (Col VARCHAR(100))
INSERT INTO #ExampleData (Col) VALUES ('AB ABCDE # 123'),('ABCDE# 123'),('AB: ABC# 123')
DECLARE #Digits AS TABLE (D CHAR(1))
INSERT INTO #Digits (D) VALUES ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9')
;WITH cteTally AS (
SELECT
I = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
#Digits d10
CROSS APPLY #Digits d100
--add more cross applies to cover longer fields this handles 100
)
SELECT *
FROM
#ExampleData e
OUTER APPLY (
SELECT CleansedPhone = CAST((
SELECT TOP 100
SUBSTRING(e.Col,t.I,1)
FROM
cteTally t
INNER JOIN #Digits d
ON SUBSTRING(e.Col,t.I,1) = d.D
WHERE
I <= LEN(e.Col)
ORDER BY
t.I
FOR XML PATH('')) AS VARCHAR(100))) o
Declare #MainTable table(id int identity(1,1),TextField varchar(100))
INSERT INTO #MainTable (TextField)
VALUES
('6B32E')
declare #i int=1
Declare #originalWord varchar(100)=''
WHile #i<=(Select count(*) from #MainTable)
BEGIN
Select #originalWord=TextField from #MainTable where id=#i
Declare #r varchar(max) ='', #len int ,#c char(1), #x int = 0
Select #len = len(#originalWord)
declare #pn varchar(100)=#originalWord
while #x <= #len
begin
Select #c = SUBSTRING(#pn,#x,1)
if(#c!='')
BEGIN
if ISNUMERIC(#c) = 0 and #c <> '-'
BEGIN
Select #r = cast(#r as varchar) + cast(replace((SELECT ASCII(#c)-64),'-','') as varchar)
end
ELSE
BEGIN
Select #r = #r + #c
END
END
Select #x = #x +1
END
Select #r
Set #i=#i+1
END
I have created a function for this
Create FUNCTION RemoveCharacters (#text varchar(30))
RETURNS VARCHAR(30)
AS
BEGIN
declare #index as int
declare #newtexval as varchar(30)
set #index = (select PATINDEX('%[A-Z.-/?]%', #text))
if (#index =0)
begin
return #text
end
else
begin
set #newtexval = (select STUFF ( #text , #index , 1 , '' ))
return dbo.RemoveCharacters(#newtexval)
end
return 0
END
GO
Here is the answer:
DECLARE #t TABLE (tVal VARCHAR(100))
INSERT INTO #t VALUES('123')
INSERT INTO #t VALUES('123S')
INSERT INTO #t VALUES('A123,123')
INSERT INTO #t VALUES('a123..A123')
;WITH cte (original, tVal, n)
AS
(
SELECT t.tVal AS original,
LOWER(t.tVal) AS tVal,
65 AS n
FROM #t AS t
UNION ALL
SELECT tVal AS original,
CAST(REPLACE(LOWER(tVal), LOWER(CHAR(n)), '') AS VARCHAR(100)),
n + 1
FROM cte
WHERE n <= 90
)
SELECT t1.tVal AS OldVal,
t.tval AS NewVal
FROM (
SELECT original,
tVal,
ROW_NUMBER() OVER(PARTITION BY tVal + original ORDER BY original) AS Sl
FROM cte
WHERE PATINDEX('%[a-z]%', tVal) = 0
) t
INNER JOIN #t t1
ON t.original = t1.tVal
WHERE t.sl = 1
You can create SQL CLR scalar function in order to be able to use regular expressions like replace patterns.
Here you can find example of how to create such function.
Having such function will solve the issue with just the following lines:
SELECT [dbo].[fn_Utils_RegexReplace] ('AB ABCDE # 123', '[^0-9]', '');
SELECT [dbo].[fn_Utils_RegexReplace] ('ABCDE# 123', '[^0-9]', '');
SELECT [dbo].[fn_Utils_RegexReplace] ('AB: ABC# 123', '[^0-9]', '');
More important, you will be able to solve more complex issues as the regular expressions will bring a whole new world of options directly in your T-SQL statements.
Use this:
REPLACE(TRANSLATE(SomeString, REPLACE(TRANSLATE(SomeString, '0123456789', '##########'), '#', ''), REPLICATE('#', LEN(REPLACE(TRANSLATE(SomeString, '0123456789', '##########'), '#', '') + 'x') - 1)), '#', '')
Demo:
DROP TABLE IF EXISTS #MyTempTable;
CREATE TABLE #MyTempTable (SomeString VARCHAR(255));
INSERT INTO #MyTempTable
VALUES ('ssss123ssg99d362sdg')
, ('hey 62q&*^(n43')
, (NULL)
, ('')
, ('hi')
, ('123');
SELECT SomeString
, REPLACE(TRANSLATE(SomeString, REPLACE(TRANSLATE(SomeString, '0123456789', '##########'), '#', ''), REPLICATE('#', LEN(REPLACE(TRANSLATE(SomeString, '0123456789', '##########'), '#', '') + 'x') - 1)), '#', '')
FROM #MyTempTable;
DROP TABLE IF EXISTS #MyTempTable;
Results:
SomeString
(No column name)
ssss123ssg99d362sdg
12399362
hey62q&*^(n43
6243
NULL
NULL
hi
123
123
While the OP wanted to "strip out anything that is not 0-9", the post is also tagged with "substring" and "patindex", and the OP mentioned the concern "not quite getting the syntax correct or something". To address that the requirements note that "all data fields do have the # prior to the number" and to provide an answer that addresses the challenges with substring/patindex, consider the following:
/* A sample select */
;WITH SampleValues AS
( SELECT 'AB ABCDE # 123' [Columnofdirtydata]
UNION ALL SELECT 'AB2: ABC# 123')
SELECT
s.Columnofdirtydata,
f1.pos1,
'['+ f2.substr +']' [InspectOutput]
FROM
SampleValues s
CROSS APPLY (SELECT PATINDEX('%# %',s.Columnofdirtydata) [pos1]) f1
CROSS APPLY (SELECT SUBSTRING(s.Columnofdirtydata, f1.pos1 + LEN('#-'),LEN(s.Columnofdirtydata)) [substr]) f2
/* Using update scenario from OP */
UPDATE t1
SET t1.Columntoupdate = CAST(f2.substr AS INT)
FROM
TableName t1
CROSS APPLY (SELECT PATINDEX('%# %',t1.Columnofdirtydata) [pos1]) f1
CROSS APPLY (SELECT SUBSTRING(t1.Columnofdirtydata, f1.pos1 + LEN('#-'),LEN(t1.Columnofdirtydata)) [substr]) f2
Note that my syntax advice for patindex/substring, is to:
consider using APPLY as a way to temporarily alias results from one function for use as parameters in the next. It's not uncommon to (in ETL, for example) need to parse out parameter/position-based substrings in an updatable column of a staging table. If you need to "debug" and potentially fix some parsing logic, this style will help.
consider using LEN('PatternSample') in your substring logic, to account for reusing this pattern or adjusting it when your source data changes (instead of "+ 1"
SUBSTRING() requires a length parameter, but it can be greater than the length of the string. Therefore, if you are getting "the rest of the string" after the pattern, you can just use "The source length"
DECLARE #STR VARCHAR(400)
DECLARE #specialchars VARCHAR(50) = '%[~,#,#,$,%,&,*,(,),!^?:]%'
SET #STR = '1, 45 4,3 68.00-'
WHILE PATINDEX( #specialchars, #STR ) > 0
---Remove special characters using Replace function
SET #STR = Replace(Replace(REPLACE( #STR, SUBSTRING( #STR, PATINDEX( #specialchars, #STR ), 1 ),''),'-',''), ' ','')
SELECT #STR
SELECT REGEXP_REPLACE( col, '[^[:digit:]]', '' ) AS new_col FROM my_table

Postgresql - how to groups rows and convert some rows to columns [duplicate]

I have an interesting conundrum which I believe can be solved in purely SQL. I have tables similar to the following:
responses:
user_id | question_id | body
----------------------------
1 | 1 | Yes
2 | 1 | Yes
1 | 2 | Yes
2 | 2 | No
1 | 3 | No
2 | 3 | No
questions:
id | body
-------------------------
1 | Do you like apples?
2 | Do you like oranges?
3 | Do you like carrots?
and I would like to get the following output
user_id | Do you like apples? | Do you like oranges? | Do you like carrots?
---------------------------------------------------------------------------
1 | Yes | Yes | No
2 | Yes | No | No
I don't know how many questions there will be, and they will be dynamic, so I can't just code for every question. I am using PostgreSQL and I believe this is called transposition, but I can't seem to find anything that says the standard way of doing this in SQL. I remember doing this in my database class back in college, but it was in MySQL and I honestly don't remember how we did it.
I'm assuming it will be a combination of joins and a GROUP BY statement, but I can't even figure out how to start.
Anybody know how to do this? Thanks very much!
Edit 1: I found some information about using a crosstab which seems to be what I want, but I'm having trouble making sense of it. Links to better articles would be greatly appreciated!
Use:
SELECT r.user_id,
MAX(CASE WHEN r.question_id = 1 THEN r.body ELSE NULL END) AS "Do you like apples?",
MAX(CASE WHEN r.question_id = 2 THEN r.body ELSE NULL END) AS "Do you like oranges?",
MAX(CASE WHEN r.question_id = 3 THEN r.body ELSE NULL END) AS "Do you like carrots?"
FROM RESPONSES r
JOIN QUESTIONS q ON q.id = r.question_id
GROUP BY r.user_id
This is a standard pivot query, because you are "pivoting" the data from rows to columnar data.
I implemented a truly dynamic function to handle this problem without having to hard code any specific class of answers or use external modules/extensions. It also gives full control over column ordering and supports multiple key and class/attribute columns.
You can find it here: https://github.com/jumpstarter-io/colpivot
Example that solves this particular problem:
begin;
create temporary table responses (
user_id integer,
question_id integer,
body text
) on commit drop;
create temporary table questions (
id integer,
body text
) on commit drop;
insert into responses values (1,1,'Yes'), (2,1,'Yes'), (1,2,'Yes'), (2,2,'No'), (1,3,'No'), (2,3,'No');
insert into questions values (1, 'Do you like apples?'), (2, 'Do you like oranges?'), (3, 'Do you like carrots?');
select colpivot('_output', $$
select r.user_id, q.body q, r.body a from responses r
join questions q on q.id = r.question_id
$$, array['user_id'], array['q'], '#.a', null);
select * from _output;
rollback;
This outputs:
user_id | 'Do you like apples?' | 'Do you like carrots?' | 'Do you like oranges?'
---------+-----------------------+------------------------+------------------------
1 | Yes | No | Yes
2 | Yes | No | No
You can solve this example with the crosstab function in this way
drop table if exists responses;
create table responses (
user_id integer,
question_id integer,
body text
);
drop table if exists questions;
create table questions (
id integer,
body text
);
insert into responses values (1,1,'Yes'), (2,1,'Yes'), (1,2,'Yes'), (2,2,'No'), (1,3,'No'), (2,3,'No');
insert into questions values (1, 'Do you like apples?'), (2, 'Do you like oranges?'), (3, 'Do you like carrots?');
select * from crosstab('select responses.user_id, questions.body, responses.body from responses, questions where questions.id = responses.question_id order by user_id') as ct(userid integer, "Do you like apples?" text, "Do you like oranges?" text, "Do you like carrots?" text);
First, you must install tablefunc extension. Since 9.1 version you can do it using create extension:
CREATE EXTENSION tablefunc;
I wrote a function to generate the dynamic query.
It generates the sql for the crosstab and creates a view (drops it first if it exists).
You can than select from the view to get your results.
Here is the function:
CREATE OR REPLACE FUNCTION public.c_crosstab (
eavsql_inarg varchar,
resview varchar,
rowid varchar,
colid varchar,
val varchar,
agr varchar
)
RETURNS void AS
$body$
DECLARE
casesql varchar;
dynsql varchar;
r record;
BEGIN
dynsql='';
for r in
select * from pg_views where lower(viewname) = lower(resview)
loop
execute 'DROP VIEW ' || resview;
end loop;
casesql='SELECT DISTINCT ' || colid || ' AS v from (' || eavsql_inarg || ') eav ORDER BY ' || colid;
FOR r IN EXECUTE casesql Loop
dynsql = dynsql || ', ' || agr || '(CASE WHEN ' || colid || '=''' || r.v || ''' THEN ' || val || ' ELSE NULL END) AS ' || agr || '_' || r.v;
END LOOP;
dynsql = 'CREATE VIEW ' || resview || ' AS SELECT ' || rowid || dynsql || ' from (' || eavsql_inarg || ') eav GROUP BY ' || rowid;
RAISE NOTICE 'dynsql %1', dynsql;
EXECUTE dynsql;
END
$body$
LANGUAGE 'plpgsql'
VOLATILE
CALLED ON NULL INPUT
SECURITY INVOKER
COST 100;
And here is how I use it:
SELECT c_crosstab('query_txt', 'view_name', 'entity_column_name', 'attribute_column_name', 'value_column_name', 'first');
Example:
Fist you run:
SELECT c_crosstab('Select * from table', 'ct_view', 'usr_id', 'question_id', 'response_value', 'first');
Than:
Select * from ct_view;
There is an example of this in contrib/tablefunc/.

A CLR function equivalent to string.Format in C#

I was looking for a CLR function that will do the same thing as String.Format in C#. As an example, I would like to do the following through a CLR function:
String.Format("My name is {0}, and I live in {1}",myName, cityName)
I would like to be able to pass variable number of parameters after the first parameter, which would be as many place holders are specified in the first parameter to this CLR function. Anyone has any idea on what would be the C# code for such a CLR function?
I'm looking for the same thing. I came across these two sites: http://www.fotia.co.uk/fotia/Blog/2009/05/indispensable-sql-clr-functions.html & http://stringformat-in-sql.blogspot.com/.
The first uses a CLR function, the other uses a stored procedure.
In the first post, the writer says that UDF parameters aren't optional so you can't have variable number of params (at least the way he is doing it). He just creates a different function with the number or args he will need.
[SqlFunction(DataAccess = DataAccessKind.None)]
public static SqlString FormatString1(SqlString format, object arg0, object arg1)
{
return format.IsNull ? SqlString.Null :
string.Format(format.Value, SqlTypeToNetType(arg0, arg1));
}
EDIT : This is how I solved this problem for me.
Using the stored procedure from the second article, I created this function to format the text for me using a pivot table. Our database is setup with a table dbo.[Rules] that has a column [Description] that has a string that needs to be formatted such as: "NET INCOME MARGINAL (${0} TO BELOW ${1})". We then have a 1-many table dbo.[RuleParameters] with a row for each value that needs to be substituted. The column dbo.[SortOrder] specifies which order the arguments go in. This function will work for AT MOST 4 arguments which is the most that we have.
CREATE FUNCTION [dbo].[GetRuleDescriptionWithParameters]
(
#Code VARCHAR(3)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE #Result NVARCHAR(400)
SELECT
#Result =
dbo.FormatString([Description],
ISNULL([1], '') + ',' +
ISNULL([2], '') + ',' +
ISNULL([3], '') + ',' +
ISNULL([4], '')
)
FROM
(
SELECT
r.[Description]
,rp.Value
,rp.SortOrder
FROM
[Rules] r
LEFT OUTER JOIN [RuleParameters] rp
ON r.Id = rp.RuleId
WHERE r.Code = #Code
) AS SourceTable
PIVOT
(
MAX(Value)
FOR SortOrder IN ([1], [2], [3], [4])
) AS PivotTable;
RETURN #Result
END
EDIT #2: Adding more examples
Format string function:
CREATE FUNCTION [dbo].[FormatString]
(
#Format NVARCHAR(4000) ,
#Parameters NVARCHAR(4000)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE #Message NVARCHAR(400),
#Delimiter CHAR(1)
DECLARE #ParamTable TABLE ( ID INT IDENTITY(0,1), Paramter VARCHAR(1000) )
SELECT #Message = #Format, #Delimiter = ','
;WITH CTE (StartPos, EndPos) AS
(
SELECT 1, CHARINDEX(#Delimiter, #Parameters)
UNION ALL
SELECT EndPos + (LEN(#Delimiter)), CHARINDEX(#Delimiter,#Parameters, EndPos + (LEN(#Delimiter)))
FROM CTE
WHERE EndPos > 0
)
INSERT INTO #ParamTable ( Paramter )
SELECT
[ID] = SUBSTRING ( #Parameters, StartPos, CASE WHENEndPos > 0 THEN EndPos - StartPos ELSE 4000 END )
FROM CTE
UPDATE #ParamTable SET #Message = REPLACE ( #Message, '{'+CONVERT(VARCHAR,ID) + '}', Paramter )
RETURN #Message
END
GO
Here is how I call the stored procedure:
SELECT r.[Id]
,[Code]
,[Description]
,dbo.GetRuleDescriptionWithParameters([Code])
,rp.[Value]
,rp.SortOrder
FROM [Rules] r
INNER JOIN [RuleParameters] rp
ON r.Id = rp.RuleId
Now I can just call the function like this to get everything formatted nicely for me! Here is sample usage with the results:
Id Code Description (No column name) Value SortOrder
1 1 NOT THE MINIMUM AGE NOT THE MINIMUM AGE 18 1
3 8 NET INCOME (BELOW ${0}) NET INCOME (BELOW $400) 400 1
4 9 NET (${0} TO BELOW ${1}) NET ($400 TO BELOW $600) 400 1
4 9 NET (${0} TO BELOW ${1}) NET ($400 TO BELOW $600) 600 2
Normally when calling this, I wouldn’t join on the [RuleParameters] table, but I did in this case so that you can see the data before and after in one data set.

T-SQL: Pivot but for semicolon-separated values instead of columns

I've got semicolon-separated values in a column Values in my table:
Values
1;2;3;4;5
I would like to transform it in a procedure to have there values as rows:
Values
1
2
3
4
5
How could I do it in T-SQL?
Solution 1(using xml):
declare #str varchar(20)
declare #xml as xml
set #str= '1;2;3;4;5'
SET #xml = cast(('<x>'+replace(#str,';' ,'</x><x>')+'</x>') as xml)
SELECT col.value('.', 'varchar(10)') as value FROM #xml.nodes('x') as tbl(col)
Solution 2(using recursive cte)
declare #str as varchar(100)
declare #delimiter as char(1)
set #delimiter = ';'
set #str = '1;2;3;4;5' -- original data
set #str = #delimiter + #str + #delimiter
;with num_cte as
(
select 1 as rn
union all
select rn +1 as rn
from num_cte
where rn <= len(#str)
)
, get_delimiter_pos_cte as
(
select
ROW_NUMBER() OVER (ORDER BY rn) as rowid,
rn as delimiterpos
from num_cte
cross apply( select substring(#str,rn,1) AS chars) splittedchars
where chars = #delimiter
)
select substring(#str,a.delimiterpos+1 ,c2.delimiterpos - a.delimiterpos - 1) as Countries
from get_delimiter_pos_cte a
inner join get_delimiter_pos_cte c2 on c2.rowid = a.rowid+1
option(maxrecursion 0)
The thing that struck me as possibly leaving room for an additional answer, or additional improvement was that most of the answers/links given were how to split values like this for a single scalar value as opposed to how to apply that kind of splitting logic for a column of values in a table.
I include both a numbers table solution and an XML solution. The XML solution was inspired by the earlier post priyanka.sarkar. I think that a numbers table solution, using an actual numbers table instead of the CTE as in the below solution is probably the fastest, but the XML approach deserves to be developed upon because it's really nice looking.
So, here goes my attempt.
CREATE PROCEDURE PARSE_DELIMITED_VALUES
AS
WITH FIRST_NUMBERS (N) AS (
SELECT 1 UNION ALL SELECT 1
), SECOND_NUMBERS (N) AS (
SELECT E1.N
FROM FIRST_NUMBERS E1
CROSS JOIN FIRST_NUMBERS E2
), THIRD_NUMBERS (N) AS (
SELECT E1.N
FROM SECOND_NUMBERS E1
CROSS JOIN SECOND_NUMBERS E2
), FOURTH_NUMBERS (N) AS (
SELECT E1.N
FROM THIRD_NUMBERS E1
CROSS JOIN THIRD_NUMBERS E2
), FIFTH_NUMBERS (N) AS (
SELECT E1.N
FROM FOURTH_NUMBERS E1
CROSS JOIN FOURTH_NUMBERS E2
), NUMBERS (N) AS (
SELECT N
FROM NUMBERS
WHERE N <= 8000 /*adjust these as needed to come up with a max number equal to the max character length allowed in the Values column*/
/*or better yet, if you can, just remove this first...numbers... header stuff so long as you create a temp or permanent table that contains the same numbers to work with*/
)
SELECT SUBSTRING(
MYTABLE.Values,
CASE
WHEN NUMBERS.NUMBER = 1 THEN 1
ELSE NUMBERS.NUMBER + 1
END,
CASE CHARINDEX(';', MYTABLE.Values, NUMBERS.NUMBER + 1)
WHEN 0 THEN LEN('^' + MYTABLE.Values + '^') - 2 + 1
ELSE CHARINDEX(';', MYTABLE.Values, NUMBERS.NUMBER + 1)
END
- CASE
WHEN NUMBERS.NUMBER = 1 THEN 1
ELSE NUMBERS.NUMBER + 1
END
) AS PARSED_VALUE
FROM MYTABLE
INNER JOIN NUMBERS
ON NUMBERS.NUMBER <= LEN('^' + MYTABLE.Values + '^') - 2
AND (
NUMBERS.NUMBER = 1
OR SUBSTRING(MYTABLE.Values, NUMBERS.NUMBER, 1) = ';'
)
GO
-- if your values column can contain NULL values I would change the join at the end as follows:
--from INNER JOIN NUMBERS
--to LEFT OUTER JOIN NUMBERS
The above would probably be most performant if the WITH NUMBERS ... CTEs were replaced by a temporary or permanent table containing the same numeric values.
On the other hand the CTE does the job and keeps it more in one place.
CREATE PROCEDURE PARSE_DELIMITED_VALUES
AS
SELECT E.x.value('.', 'VARCHAR(MAX)') AS PARSED_VALUE
FROM (
SELECT CAST('<x>' + REPLACE(Values, ';', '</x><x>') + '</x>' AS XML) my_x
FROM MYTABLE
) TT
CROSS APPLY my_x.nodes('/x') AS E(x)
GO
-- if your values column can contain NULL values I would change the join at the end as follows:
from `CROSS APPLY`
to `OUTER APPLY`
It's not the most elegant approach, but this might be worth a try. It creates a Sql Command as a string, and at the end executes it.
DECLARE #Values VARCHAR(8000)
-- Flatten all values lists into one string
SET #Values = REPLACE(REPLACE((SELECT [Value] FROM [dbo.MyTable] FOR XML PATH('')), '<Value>', ''), '</Value>', ';')
SET #Values = SUBSTRING(#Values, 0, LEN(#Values))
DECLARE #SeparatorIndex INT
SET #SeparatorIndex = (SELECT TOP 1 PATINDEX('%[;]%', #Values))
DECLARE #InsertClause VARCHAR(50)
SET #InsertClause = 'INSERT INTO [dbo.MyTable] VALUES ('
DECLARE #SQL VARCHAR(500)
SET #SQL = #InsertClause + SUBSTRING(#Values, 0, #SeparatorIndex) + '); '
SET #Values = RIGHT(#Values, LEN(#Values) - (#SeparatorIndex - 1))
SET #SQL = REPLACE(#SQL + (SELECT (REPLACE(#Values, ';', '); ' + #InsertClause))) + ')', '; )', '')
EXEC (#SQL)
The command ends up (in Sql Server 2005) as:
INSERT INTO [dbo.MyTable] VALUES (1); INSERT INTO [dbo.MyTable] VALUES (2); INSERT INTO [dbo.MyTable] VALUES (3); INSERT INTO [dbo.MyTable] VALUES (4); INSERT INTO [dbo.MyTable] VALUES (5) ...'
Do you actually mean, "rows," as in, "tuples," (so you can insert the data into another table, one element per row) or do you mean you want the data displayed vertically?
I'd think a string Replace (look up T-SQL's String Functions) would do the trick, no? Depending on the output target, you'd replace ; with CRLF or . You could even use Replace to create dynamic SQL Insert statements that could be executed by the SP to do row inserts (if that was your intent).
For presentation purposes, this is bad practice.
If it is purely for presentation and you are permitted, I'd output everything as XML then XSLT it any way you want. Honestly, I don't remember the last time I operated directly on a recordset. I always output to XML.