Related
Good afternoon, I am using POSTGRESql version 9.2 and I'm trying to use a crosstab function to transpose two columns on a table so that i can later join it to a different SELECT query.
I have installed the tablefunc extension.
However i keep getting this "Return and SQL tuple descriptions are incompatible" error which seems to be because of typecasts.
I don't need them to be a specific type.
My original SELECT query is this
SELECT inventoryid, ttype, tamount
FROM inventorytesting
Which gives me the following result:
inventoryid ttype tamount
2451530088940460 7 0.2
2451530088940460 2 0.5
2451530088940460 8 0.1
2451530088940460 1 15.7
8751530077940461 7 0.7
8751530077940461 2 0.2
8751530077940461 8 1.1
8751530077940461 1 19.2
and my goal is to get it like:
inventoryid 7 2 8 1
8751530077940461 0.7 0.2 1.1 19.2
2451530088940460 0.2 0.5 0.1 15.7
The 'ttype' field has 49 different values such as "7","2","8","1" which are fixed.
The 'tamount' field varies its values depending on the 'inventoryid' field but there will always be 49 of them, even if its value is zero. It will never be "null".
I have tried a few variations that i could find in the internet which sum up to this:
SELECT *
FROM crosstab (
$$SELECT inventoryid, ttype, tamount
FROM inventorytesting
WHERE inventoryid = '2451530088940460'
ORDER BY inventoryid, ttype$$
)
AS ct("inventoryid" text,"ttype" smallint,"tamount" numeric)
The fieldtypes on the inventorytesting table are
select column_name, data_type from information_schema.columns
where table_name = 'inventorytesting'
Results:
column_name data_type
id bigint
ttype smallint
tamount numeric
tunit text
tlessthan smallint
plantid text
sessiontime bigint
deleted smallint
inventoryid text
docdata text
docname text
labid bigint
Any pointers would be great.
demo:db<>fiddle
The resulting table definition has to contain the table structure you are expecting - the pivoted one - and not the structure of the given one:
SELECT *
FROM crosstab(
$$SELECT inventoryid, ttype, tamount
FROM inventorytesting
WHERE inventoryid = '2451530088940460'
ORDER BY inventoryid, ttype$$
)
AS ct("inventoryid" text,"type1" numeric,"type2" numeric,"type7" numeric,"type8" numeric)
Addionally there is no need to use the crosstab function. You can achieve a pivot by simply using the standard CASE function:
SELECT
inventoryid,
SUM(CASE WHEN ttype = 1 THEN tamount END) AS type1,
SUM(CASE WHEN ttype = 2 THEN tamount END) AS type2,
SUM(CASE WHEN ttype = 7 THEN tamount END) AS type7,
SUM(CASE WHEN ttype = 8 THEN tamount END) AS type8
FROM
inventorytesting
GROUP BY 1
If you were on 9.4 or higher you could use the Postgres specific FILTER clause:
SELECT
inventoryid,
SUM(tamount) FILTER (WHERE ttype = 1) AS type1,
SUM(tamount) FILTER (WHERE ttype = 2) AS type2,
SUM(tamount) FILTER (WHERE ttype = 7) AS type7,
SUM(tamount) FILTER (WHERE ttype = 8) AS type8
FROM
inventorytesting
GROUP BY 1
demo:db<>fiddle
With the crosstab, you define the actual result table (basically the result of the pivot). The input query defines three columns which are then processed as:
grouping column result in the actual rows
the pivot columns
value for the pivot column
In your case, the crosstab therefore needs to be defined as:
ct(
"inventoryid" text,
"tamount_1" numeric,
"tamount_2" numeric,
"tamount_3" numeric,
...
)
The column header will then correlate to a certain value of column ttype in the order as defined by the inner query's ORDER BY.
The thing with crosstab is that missing values for ttype (e.g. some value returned for 4 but not for 3), the resulting columns would be 1, 2, 4, ... with 3 being missing. Here, you'd have to make sure (if you need consistent output) that your inner query returns at least a NULL row (e.g. via LEFT JOIN).
I have a table with unstructured data I am trying to analyze to try to build a relational lookup. I do not have use of word cloud software.
I really have no idea how to solve this problem. Searching for solutions has lead me to tools that might do this for me that cost money, not coded solutions.
Basically my data looks like this:
CK1 CK2 Comment
--------------------------------------------------------------
1 A This is a comment.
2 A Another comment here.
And this is what I need to create:
CK1 CK2 Words
--------------------------------------------------------------
1 A This
1 A is
1 A a
1 A comment.
2 A Another
2 A comment
2 A here.
What you are trying to do is tokenize a string using a space as a Delimiter. In the SQL world people often refer to functions that do this as a "Splitter". The potential pitfall of using a splitter for this type of thing is how words can be separated by multiple spaces, tabs, CHAR(10)'s, CHAR(13)'s, CHAR()'s, etc. Poor grammar, such as not adding a space after a period results in this:
" End of sentence.Next sentence"
sentence.Next is returned as a word.
The way I like to tokenize human text is to:
Replace any text that isn't a character with a space
Replace duplicate spaces
Trim the string
Split the newly transformed string using a space as the delimiter.
Below is my solution followed by the DDL to create the functions used.
-- Sample Data
DECLARE #yourtable TABLE (CK1 INT, CK2 CHAR(1), Comment VARCHAR(8000));
INSERT #yourtable (CK1, CK2, Comment)
VALUES
(1,'A','This is a typical comment...Follewed by another...'),
(2,'A','This comment has double spaces and tabs and even carriage
returns!');
-- Solution
SELECT t.CK1, t.CK2, split.itemNumber, split.itemIndex, split.itemLength, split.item
FROM #yourtable AS t
CROSS APPLY samd.patReplace(t.Comment,'[^a-zA-Z ]',' ') AS c1
CROSS APPLY samd.removeDupChar8K(c1.newString,' ') AS c2
CROSS APPLY samd.delimitedSplitAB8K(LTRIM(RTRIM(c2.NewString)),' ') AS split;
Results (truncated for brevity):
CK1 CK2 itemNumber itemIndex itemLength item
----------- ---- -------------------- ----------- ----------- --------------
1 A 1 1 4 This
1 A 2 6 2 is
1 A 3 9 1 a
1 A 4 11 7 typical
1 A 5 19 7 comment
...
2 A 1 1 4 This
2 A 2 6 7 comment
2 A 3 14 3 has
2 A 4 18 6 double
...
Note that the splitter I'm using is based of Jeff Moden's Delimited Split8K with a couple tweeks.
Functions used:
CREATE FUNCTION dbo.rangeAB
(
#low bigint,
#high bigint,
#gap bigint,
#row1 bit
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH L1(N) AS
(
SELECT 1
FROM (VALUES
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0)) T(N) -- 90 values
),
L2(N) AS (SELECT 1 FROM L1 a CROSS JOIN L1 b CROSS JOIN L1 c),
iTally AS (SELECT rn = ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM L2 a CROSS JOIN L2 b)
SELECT r.RN, r.OP, r.N1, r.N2
FROM
(
SELECT
RN = 0,
OP = (#high-#low)/#gap,
N1 = #low,
N2 = #gap+#low
WHERE #row1 = 0
UNION ALL -- COALESCE required in the TOP statement below for error handling purposes
SELECT TOP (ABS((COALESCE(#high,0)-COALESCE(#low,0))/COALESCE(#gap,0)+COALESCE(#row1,1)))
RN = i.rn,
OP = (#high-#low)/#gap+(2*#row1)-i.rn,
N1 = (i.rn-#row1)*#gap+#low,
N2 = (i.rn-(#row1-1))*#gap+#low
FROM iTally AS i
ORDER BY i.rn
) AS r
WHERE #high&#low&#gap&#row1 IS NOT NULL AND #high >= #low AND #gap > 0;
GO
CREATE FUNCTION samd.NGrams8k
(
#string VARCHAR(8000), -- Input string
#N INT -- requested token size
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT
position = r.RN,
token = SUBSTRING(#string, CHECKSUM(r.RN), #N)
FROM dbo.rangeAB(1, LEN(#string)+1-#N,1,1) AS r
WHERE #N > 0 AND #N <= LEN(#string);
GO
CREATE FUNCTION samd.patReplace8K
(
#string VARCHAR(8000),
#pattern VARCHAR(50),
#replace VARCHAR(20)
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT newString =
(
SELECT CASE WHEN #string = CAST('' AS VARCHAR(8000)) THEN CAST('' AS VARCHAR(8000))
WHEN #pattern+#replace+#string IS NOT NULL THEN
CASE WHEN PATINDEX(#pattern,token COLLATE Latin1_General_BIN)=0
THEN ng.token ELSE #replace END END
FROM samd.NGrams8K(#string, 1) AS ng
ORDER BY ng.position
FOR XML PATH(''),TYPE
).value('text()[1]', 'VARCHAR(8000)');
GO
CREATE FUNCTION samd.delimitedSplitAB8K
(
#string VARCHAR(8000), -- input string
#delimiter CHAR(1) -- delimiter
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT
itemNumber = ROW_NUMBER() OVER (ORDER BY d.p),
itemIndex = CHECKSUM(ISNULL(NULLIF(d.p+1, 0),1)),
itemLength = CHECKSUM(item.ln),
item = SUBSTRING(#string, d.p+1, item.ln)
FROM (VALUES (DATALENGTH(#string))) AS l(s) -- length of the string
CROSS APPLY
(
SELECT 0 UNION ALL -- for handling leading delimiters
SELECT ng.position
FROM samd.NGrams8K(#string, 1) AS ng
WHERE token = #delimiter
) AS d(p) -- delimiter.position
CROSS APPLY (VALUES( --LEAD(d.p, 1, l.s+l.d) OVER (ORDER BY d.p) - (d.p+l.d)
ISNULL(NULLIF(CHARINDEX(#delimiter,#string,d.p+1),0)-(d.p+1), l.s-d.p))) AS item(ln);
GO
CREATE FUNCTION dbo.RemoveDupChar8K(#string varchar(8000), #char char(1))
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT NewString =
replace(replace(replace(replace(replace(replace(replace(
#string COLLATE LATIN1_GENERAL_BIN,
replicate(#char,33), #char), --33
replicate(#char,17), #char), --17
replicate(#char,9 ), #char), -- 9
replicate(#char,5 ), #char), -- 5
replicate(#char,3 ), #char), -- 3
replicate(#char,2 ), #char), -- 2
replicate(#char,2 ), #char); -- 2
GO
1) If we are using SQL Server 2016 and above then we should probably
use the built-in function STRING_SPLIT
-- SQL 2016and above
DECLARE #txt NVARCHAR(100) = N'This is a comment.'
select [value] from STRING_SPLIT(#txt, ' ')
2) Only if 1 does not fit, then if the number of separation (the space in our case) is less then 3 which fit your sample data, then we should probably use PARSENAME
-- BEFORE SQL 2016 if we have less than 4 parts
DECLARE #txt NVARCHAR(100) = N'This is a comment.'
DECLARE #Temp NVARCHAR(200) = REPLACE (#txt,'.','#')
SELECT t FROM (VALUES(1),(2),(3),(4))T1(n)
CROSS APPLY (SELECT REPLACE(PARSENAME(REPLACE(#Temp,' ','.'),T1.n), '#','.'))T2(t)
3) Only if the 1 and 2 does not fit, then we should use SQLCLR function
http://dataeducation.com/sqlclr-string-splitting-part-2-even-faster-even-more-scalable/
4) Only if we cannot use 1,2 and we cannot use SQLCLR (which implies a real problematic administration and has nothing with security since you can have all the SQLCLR function in a read-only database for the use of all users, as I explain in my lectures), then you can use T-SQL and create UDF.
https://sqlperformance.com/2012/07/t-sql-queries/split-strings
Function px_explode will be provided with two parameters:
separator
string
Final result will look like this:
SELECT * FROM dbo.px_explode('xxy', 'alfaxxybetaxxygama')
and will return
But...
Query won't finish execution, so I assume that I ran into an infinite loop here, now assuming this, my question might be.
How can I avoid the infinite loop I ran into and what am I missing?
Code:
CREATE FUNCTION dbo.px_explode
(#separator VARCHAR(10), #string VARCHAR(2000))
RETURNS #expl_tbl TABLE
(val VARCHAR(100))
AS
BEGIN
IF (CHARINDEX(#separator, #string) = 0) and (LTRIM(RTRIM(#string)) <> '')
INSERT INTO #expl_tbl VALUES(LTRIM(RTRIM(#string)))
ELSE
BEGIN
WHILE CHARINDEX(#separator, #string) > 0
BEGIN
IF (LTRIM(RTRIM(LEFT(#string, CHARINDEX(#separator, #string) - 1)))
<> '')
INSERT INTO #expl_tbl VALUES(LTRIM(RTRIM(LEFT(#string,
CHARINDEX(#separator, #string) - 1))))
END
IF LTRIM(RTRIM(#string)) <> ''
INSERT INTO #expl_tbl VALUES(LTRIM(RTRIM(#string)))
END
RETURN
END
Loops are bad and so are mutli-statement table valued functions (e.g. where you define the table). If performance is important then you want a tally table and and inline table valued function (iTVF).
For a high-performing way to resolve this I would first grab a copy of Ngrams8k. The solution you're looking for will look like this:
DECLARE #string varchar(8000) = 'alfaxxybetaxxygama',
#delimiter varchar(20) = 'xxy'; -- use
SELECT
itemNumber = row_number() over (ORDER BY d.p),
itemIndex = isnull(nullif(d.p+l.d, 0),1),
item = SUBSTRING
(
#string,
d.p+l.d, -- delimiter position + delimiter length
isnull(nullif(charindex(#delimiter, #string, d.p+l.d),0) - (d.p+l.d), 8000)
)
FROM (values (len(#string), len(#delimiter))) l(s,d) -- 1 is fine for l.d but keeping uniform
CROSS APPLY
(
SELECT -(l.d) union all
SELECT ng.position
FROM dbo.NGrams8K(#string, l.d) as ng
WHERE token = #delimiter
) as d(p); -- delimiter.position
Which returns
itemNumber itemIndex item
-------------------- -------------------- ---------
1 1 alfa
2 8 beta
3 15 gama
Against a table it would look like this:
DECLARE #table table (string varchar(8000));
INSERT #table VALUES ('abcxxyXYZxxy123'), ('alfaxxybetaxxygama');
DECLARE #delimiter varchar(100) = 'xxy';
SELECT *
FROM #table t
CROSS APPLY
(
SELECT
itemNumber = row_number() over (ORDER BY d.p),
itemIndex = isnull(nullif(d.p+l.d, 0),1),
item = SUBSTRING
(
t.string,
d.p+l.d, -- delimiter position + delimiter length
isnull(nullif(charindex(#delimiter, t.string, d.p+l.d),0) - (d.p+l.d), 8000)
)
FROM (values (len(t.string), len(#delimiter))) l(s,d) -- 1 is fine for l.d but keeping uniform
CROSS APPLY
(
SELECT -(l.d) union all
SELECT ng.position
FROM dbo.NGrams8K(t.string, l.d) as ng
WHERE token = #delimiter
) as d(p) -- delimiter.position
) split;
Results:
string itemNumber itemIndex item
------------------------- -------------------- -------------------- ------------------
abcxxyXYZxxy123 1 1 abc
abcxxyXYZxxy123 2 7 XYZ
abcxxyXYZxxy123 3 13 123
alfaxxybetaxxygama 1 1 alfa
alfaxxybetaxxygama 2 8 beta
alfaxxybetaxxygama 3 15 gama
My favourite is the XML splitter. This needs no function and is fully inlineable. If you can introduce a function to your database, the suggested links in Gareth's comment give you some very good ideas.
This is simple and quite straight forward:
DECLARE #YourString VARCHAR(100)='alfaxxybetaxxygama';
SELECT nd.value('text()[1]','nvarchar(max)')
FROM (SELECT CAST('<x>' + REPLACE((SELECT #YourString AS [*] FOR XML PATH('')),'xxy','</x><x>') + '</x>' AS XML)) AS A(Casted)
CROSS APPLY A.Casted.nodes('/x') AS B(nd);
This will first transform your string to an XML like this
<x>alfa</x>
<x>beta</x>
<x>gama</x>
... simply by replacing the delimiters xxy with XML tags. The rest is easy reading from XML .nodes()
This question already has answers here:
Turning a Comma Separated string into individual rows
(16 answers)
Closed 5 years ago.
Suppose I have a table like this with an undetermined number of comma-delimited values in one column:
thingID personID
1 123,234,345
2 456,567
and I want to get it into a form like this:
thingID personID
1 123
1 234
1 345
2 456
2 567
What is my best option for doing this?
Oh I should mention the data is in a SQL 2008 R2 database so I may not be able to use the very latest functionality.
Use CROSS APPLY with a string splitting function.
To find the string splitting function that works best for you, read Aaron Bertrand's Split strings the right way – or the next best way.
For this demonstration I've chosen to use the SplitStrings_XML function, simply because it's the first pure t-sql function in the article:
CREATE FUNCTION dbo.SplitStrings_XML
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT Item = y.i.value('(./text())[1]', 'nvarchar(4000)')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
Now that we have a string splitting function, create and populate the sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
thingID int,
personID varchar(max)
)
INSERT INTO #T VALUES
(1, '123,234,345'),
(2, '456,567')
The query:
SELECT thingId, Item
FROM #T
CROSS APPLY dbo.SplitStrings_XML(personID, ',')
Results:
thingId Item
1 123
1 234
1 345
2 456
2 567
You can see a live demo on rextester.
There are several ways to do that. Here are two methods for SQL Server 2008:
XML-Method: requires the string to allow for the xml-trick (no invalid XML chars)
SELECT a.thingID, Split.a.value('.', 'VARCHAR(100)') AS Data
FROM (SELECT OtherID,
CAST('<M>' + REPLACE(personID, ',', '</M><M>') + '</M>' AS XML) AS Data
FROM table1) AS A CROSS APPLY Data.nodes ('/M') AS Split(a);
Recursive method:
;WITH tmp(thingID, DataItem, Data) AS (
SELECT thingID, LEFT(personID, CHARINDEX(',', personID + ',') - 1),
STUFF(personID, 1, CHARINDEX(',', personID + ','), '')
FROM table1
UNION ALL
SELECT thingID, LEFT(personID, CHARINDEX(',', personID + ',') - 1),
STUFF(personID, 1, CHARINDEX(',', personID + ','), '')
FROM tmp
WHERE Data > ''
)
SELECT thingID, DataItem AS personID
FROM tmp
My requirement is as follows:
Am using Postgresql and ireport 4.0.1 for generating this report.
I've four tables like g_employee,g_year,g_period,g_salary, by joining these four tables and passing parameter are fromDate and toDate these parameter values like '01/02/14' between '01/05/14'.Based this parameters the displaying months will be vary in the headings as i shown in the below example:
EmpName
01/02/14 01/03/14 01/04/14 01/05/14
abc
2000 3000 3000 2000
Can anyone help me in this getting output?
What you're describing sounds like the number of columns would grow or shrink based on the number of months between the 2 parameters, which just doesn't work.
I don't know any way to add additional columns based on an interval between 2 parameters without a procedural code generated sql statement.
What is possible is:
emp_id1 period1 salary
emp_id1 period2 salary
emp_id1 period3 salary
epd_id1 period4 salary
emp_id2 period1 salary
emp_id2 period2 salary
emp_id2 period3 salary
epd_id2 period4 salary
generated with something like:
select g_employee_id,
g_period_start,
g_salary_amt
from g_employee, g_year, g_period, g_salary
where <join everything>
and g_period_start between date_param_1 and date_param_2
group by g_employee_id, g_period_start;
Hard to get more specific with out the table structure.
As the range between date_param_1 and date_param_2 grew, the number of rows would grow for each employee with pay in that "g_period"
EDIT - Other option:
The less dynamic option which requires more parameters would be:
select g_employee_id,
(select g_salary_amount
from g_period, g_salary
where g_period_id = g_salary_period_id
and g_salard_emp_id = g_employee_id
and g_period_start = <DATE_PARAM_1> ) as "DATE_PARAM_1_desc",
(select g_salary_amount
from g_period, g_salary
where g_period_id = g_salary_period_id
and g_salard_emp_id = g_employee_id
and g_period_start = <DATE_PARAM_2> ) as "DATE_PARAM_2_desc",
(select g_salary_amount
from g_period, g_salary
where g_period_id = g_salary_period_id
and g_salard_emp_id = g_employee_id
and g_period_start = <DATE_PARAM_3> ) as "DATE_PARAM_3_desc"
,..... -- dynamic not possible
from employee;
i create one table #g_employee and insert dummy data
create table #g_employee(empid int,yearid int,periodid int,salary int)
insert into #g_employee(empid,yearid,periodid,salary)
select 1,2014,02,2000
union
select 2,2014,02,2000
union
select 3,2014,02,2000
union
select 3,2014,03,2000
union
select 1,2014,03,3000
union
select 1,2014,04,4000
output query as per your requirement :
Solution 1 :
select empid, max(Case when periodid=2 and yearid=2014 then salary end) as '01/02/2014'
, max(Case when periodid=3 and yearid=2014 then salary end) as '01/03/2014'
, max(Case when periodid=4 and yearid=2014 then salary end) as '01/04/2014'
from #g_employee
group by empid
you can do with dynamic sql :
Solution 2 :
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT ',' + QUOTENAME(periodid)
from #g_employee
group by periodid
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT empid,' + #cols + ' from
(
select salary, periodid,empid
from #g_employee
) x
pivot
(
max(salary)
for periodid in (' + #cols + ')
) p '
execute(#query)
hope this will help