TSQL - Parsing substring out of larger string - tsql

I have a bunch of rows with values that look like below. It's json extract that I unfortunately have to parse out and load. Anyway, my json parsing tool for some reason doesn't want to parse this full column out so i need to do it in TSQL. I only need the unique_id field:
[{"unique_id":"12345","system_type":"Test System."}]
I tried the below SQL but it's only returning the first 5 characters of the whole column. I know what the issue is which is I need to know how to tell the substring to continue until the 4th set of quotes which comes after the value. I'm not sure how to code the substring like that.
select substring([jsonfield],CHARINDEX('[{"unique_id":"',[jsonfield]),
CHARINDEX('"',[jsonfield]) - CHARINDEX('[{"unique_id":"',[jsonfield]) +
LEN('"')) from etl.my_test_table
Can anyone help me with this?
Thank you, I appreciate it!

Since you tagged 2016, why not use OPENJSON()
Here's an example:
DECLARE #TestData TABLE
(
[SampleData] NVARCHAR(MAX)
);
INSERT INTO #TestData (
[SampleData]
)
VALUES ( N'[{"unique_id":"12345","system_type":"Test System."}]' )
,( N'[{"unique_id":"1234567","system_type":"Test System."},{"unique_id":"1234567_2","system_type":"Test System."}]' )
SELECT b.[unique_id]
FROM #TestData [a]
CROSS APPLY
OPENJSON([a].[SampleData], '$')
WITH (
[unique_id] NVARCHAR(100) '$.unique_id'
) AS [b];
Giving you:
unique_id
---------------
12345
1234567
1234567_2
You can get all the fields as well, just add them to the WITH clause:
SELECT [b].[unique_id]
, [b].[system_type]
FROM #TestData [a]
CROSS APPLY
OPENJSON([a].[SampleData], '$')
WITH (
[unique_id] NVARCHAR(100) '$.unique_id'
, [system_type] NVARCHAR(100) '$.system_type'
) AS [b];

Take it step by step
First get everything to the left of system_type
SELECT LEFT(jsonfield, CHARINDEX('","system_type":"',jsonfield) as s
FROM -- etc
Then take everything to the right of "unique_id":"
SELECT RIGHT(S, LEN(S) - (CHARINDEX('"unique_id":"',S) + 12)) as Result
FROM (
SELECT LEFT(jsonfield, CHARINDEX('","system_type":"',jsonfield) as s
FROM -- etc
) X
Note, I did not test this so it could be off by one or have a syntax error, but you get the idea.

If your larger string ist just a simple JSON as posted, the solution is very easy:
SELECT
JSON_VALUE(N'[{"unique_id":"12345","system_type":"Test System."}]','$[0].unique_id');
JSON_VALUE() needs SQL-Server 2016 and will extract one single value from a specified path.

Related

How do i extract texts from string and save it as two columns and add character at the end for third column

I m using TSQL, I want to extract text from the string and save it as two column and third one as
The following code is not complete but just getting rid of PRD1T_ the Finapp and not sure how to cater for rest of the text
Select substring(Table_name,
charindex('_',Table_name)+1,
Len(Table_name) - charindex('.',Table_name)) as Landing_Schema_Name
FROM [e].[Load_History_test]
When asking questions like this it is best to provide some sample data and expected results, as Ronen suggested. For SQL questions a really good way of doing this is with a temp table and sample data, like this:
CREATE TABLE #load_history_test (
table_name VARCHAR(100)
);
INSERT INTO #load_history_test
SELECT 'PRD1T_FINAPP.HOLCONTRACT'
UNION ALL
SELECT 'PRD1T_FINAPP.TOCCASE'
UNION ALL
SELECT 'PRD1T_FINAPP.TOCCASE';
So that provides something people can run and starts towards the criteria for a minimal, reproducible example, which is the secret to getting a good answer on StackOverflow. I have used SELECT with UNION ALL as Azure Synapse does not currently support the VALUES clause for multiple records.
For expected results, it's often good to display them in a table, something like this:
col1
col2
col3
PRD1T
FINAPP
HOLCONTRACT
PRD1T
FINAPP
TOCCASE
This way it's clear to people what you expect. It is not clear why you have two TOCCASE examples in your screenprint.
For your problem, there is more than one approach. You are along the right lines with CHARINDEX, SUBSTRING and LEFT but things can start to look complicated. Therefore I tend to wrap up some complexity in a Common Table Expression (CTE), see below for an example. There is also a kind of 'trick' approach with a built-in SQL function called PARSENAME. This is designed to extract from four-part object names common in SQL Server eg <server-name>.<database-name>.<schema-name>.<object-name>. As long as your object names will never have more than four parts, this approach will work for you. See the main help for PARSENAME here. See below for a complete demo that runs end to end with a temp table to demonstrate the different principles:
IF OBJECT_ID('#load_history_test') IS NOT NULL
DROP TABLE #load_history_test;
CREATE TABLE #load_history_test (
table_name VARCHAR(100)
);
INSERT INTO #load_history_test
SELECT 'PRD1T_FINAPP.HOLCONTRACT'
UNION ALL
SELECT 'PRD1T_FINAPP.TOCCASE'
UNION ALL
SELECT 'PRD1T_FINAPP.TOCCASE';
;WITH cte AS (
SELECT
table_name AS original_table_name,
CHARINDEX( '_', table_name ) underscorePos,
CHARINDEX( '.', table_name ) stopPos,
REPLACE( table_name, '_', '.' ) AS clean_table_name
FROM #load_history_test
)
SELECT
*,
PARSENAME( clean_table_name, 3 ) a,
PARSENAME( clean_table_name, 2 ) b,
PARSENAME( clean_table_name, 1 ) c,
LEFT( original_table_name, underscorePos - 1 ) getItemBeforeUnderscore,
SUBSTRING( original_table_name, underscorePos + 1, ( ( stopPos - 1 ) - underscorePos ) ) AS getItemAfterUnderscore,
SUBSTRING( original_table_name, stopPos + 1, 99 ) getItemAfterStop
FROM cte;

Trying to manipulate string such as if '26169;#c785643', then the result should be like 'c785643'

I am trying to manipulate string data in a column such as if the given string is '20591;#e123456;#17507;#c567890;#15518;#e135791' or '26169;#c785643', then the
result should be like 'e123456;c567890;e135791' or 'c785643'. The number of digits in between can be of any length.
Some of the things I have tried so far are:
select replace('20591;#e123456;#17507;#c567890;#15518;#e135791','#','');
This leaves me with '20591;e123456;17507;c567890;15518;e135791', which still includes the digits without 'e' or 'c' prefixed to them. i want to get rid of 20591, 17507 and 15518.
Create function that will keep a pattern of '%[#][ec][0-9][;]%' and will get rid of the rest.
The most important advise is: Do not store any data in a delimited string. This is violating the most basic principle of relational database concepts (1.NF).
The second hint is SO-related: Please always add / tag your questions with the appropriate tool. The tag [tsql] points to SQL-Server, but this might be wrong (which would invalidate both answers). Please tag the full product with its version (e.g. [sql-server-2012]). Especially with string splitting there are very important product related changes from version to version.
Now to your question.
Working with (almost) any version of SQL-Server
My suggestion uses a trick with XML:
(credits to Alan Burstein for the mockup)
DECLARE #table TABLE (someid INT IDENTITY, somestring VARCHAR(50));
INSERT #table VALUES ('20591;#e123456;#17507;#c567890;#15518;#e135791'),('26169;#c785643')
--the query
SELECT t.someid,t.somestring,A.CastedToXml
,STUFF(REPLACE(A.CastedToXml.query('/x[contains(text()[1],"#") and empty(substring(text()[1],2,100) cast as xs:int?)]')
.value('.','nvarchar(max)'),'#',';'),1,1,'') TheNewList
FROM #table t
CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.somestring,';','</x><x>') + '</x>' AS XML)) A(CastedToXml);
The idea in short:
By replacing the ; with XML tags </x><x> we can transform your delimited list to XML. I included the intermediate XML into the result set. Just click it to see how this works.
In the next query I use a XQuery predicate first to find entries, which contain a # and second, which do NOT cast to an integer without the #.
The thrid step is specific to XML again. The XPath . in .value() will return all content as one string.
Finally we have to replace the # with ; and cut away the leading ; using STUFF().
UPDATE The same idea, but a bit shorter:
You can try this as well
SELECT t.someid,t.somestring,A.CastedToXml
,REPLACE(A.CastedToXml.query('data(/x[empty(. cast as xs:int?)])')
.value('.','nvarchar(max)'),' ',';') TheNewList
FROM #table t
CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.somestring,';#','</x><x>') + '</x>' AS XML)) A(CastedToXml);
Here I use ;# to split your string and data() to implicitly concatenate your values (blank-separated).
UPDATE 2 for v2017
If you have v2017+ I'd suggest a combination of a JSON splitter and STRING_AGG():
SELECT t.someid,STRING_AGG(A.[value],';') AS TheNewList
FROM #table t
CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.somestring,';#','","'),'"]')) A
WHERE TRY_CAST(A.[value] AS INT) IS NULL
GROUP BY t.someid;
You did not include the version of SQL Server you are on. If you are using 2016+ you can use SPLIT_STRING, otherwise a good T-SQL splitter will do.
Against a single variable:
DECLARE #somestring VARCHAR(1000) = '20591;#e123456;#17507;#c567890;#15518;#e135791';
SELECT NewString = STUFF((
SELECT ','+split.item
FROM STRING_SPLIT(#somestring,';') AS s
CROSS APPLY (VALUES(REPLACE(s.[value],'#',''))) AS split(item)
WHERE split.item LIKE '[a-z][0-9]%'
FOR XML PATH('')),1,1,'');
Against a table:
NewString
----------------------
e123456,c567890,e135791
-- Against a table
DECLARE #table TABLE (someid INT IDENTITY, somestring VARCHAR(50));
INSERT #table VALUES ('20591;#e123456;#17507;#c567890;#15518;#e135791'),('26169;#c785643')
SELECT t.*, fn.NewString
FROM #table AS t
CROSS APPLY
(
SELECT NewString = STUFF((
SELECT ','+split.item
FROM STRING_SPLIT(t.somestring,';') AS s
CROSS APPLY (VALUES(REPLACE(s.[value],'#',''))) AS split(item)
WHERE split.item LIKE '[a-z][0-9]%'
FOR XML PATH('')),1,1,'')
) AS fn;
Returns:
someid somestring NewString
----------- -------------------------------------------------- -----------------------------
1 20591;#e123456;#17507;#c567890;#15518;#e135791 e123456,c567890,e135791
2 26169;#c785643 c785643

SQL NOT LIKE comparison against dynamic list

Working on a new TSQL Stored Procedure, I am wanting to get all rows where values in a specific column don't start with any of a specific set of 2 character substrings.
The general idea is:
SELECT * FROM table WHERE value NOT LIKE 's1%' AND value NOT LIKE 's2%' AND value NOT LIKE 's3%'.
The catch is that I am trying to make it dynamic so that the specific substrings can be pulled from another table in the database, which can have more values added to it.
While I have never used the IN operator before, I think something along these lines should do what I am looking for, however, I don't think it is possible to use wildcards with IN, so I might not be able to compare just the substrings.
SELECT * FROM table WHERE value NOT IN (SELECT substrings FROM subTable)
To get around that limitation, I am trying to do something like this:
SELECT * FROM table WHERE SUBSTRING(value, 1, 2) NOT IN (SELECT Prefix FROM subTable WHERE Prefix IS NOT NULL)
but I'm not sure this is right, or if it is the most efficient way to do this. My preference is to do this in a Stored Procedure, but if that isn't feasible or efficient I'm also open to building the query dynamically in C#.
Here's an option. Load values you want to filter to a table, left outer join and use PATINDEX().
DECLARE #FilterValues TABLE
(
[FilterValue] NVARCHAR(10)
);
--Table with values we want filter on.
INSERT INTO #FilterValues (
[FilterValue]
)
VALUES ( N's1' )
, ( N's2' )
, ( N's3' );
DECLARE #TestData TABLE
(
[TestValues] NVARCHAR(100)
);
--Load some test data
INSERT INTO #TestData (
[TestValues]
)
VALUES ( N's1 Test Data' )
, ( N's2 Test Data' )
, ( N's3 Test Data' )
, ( N'test data not filtered out' )
, ( N'test data not filtered out 1' );
SELECT a.*
FROM #TestData [a]
LEFT OUTER JOIN #FilterValues [b]
ON PATINDEX([b].[FilterValue] + '%', [a].[TestValues]) > 0
WHERE [b].[FilterValue] IS NULL;

Select into a table with a CTE [duplicate]

I have a very complex CTE and I would like to insert the result into a physical table.
Is the following valid?
INSERT INTO dbo.prf_BatchItemAdditionalAPartyNos
(
BatchID,
AccountNo,
APartyNo,
SourceRowID
)
WITH tab (
-- some query
)
SELECT * FROM tab
I am thinking of using a function to create this CTE which will allow me to reuse. Any thoughts?
You need to put the CTE first and then combine the INSERT INTO with your select statement. Also, the "AS" keyword following the CTE's name is not optional:
WITH tab AS (
bla bla
)
INSERT INTO dbo.prf_BatchItemAdditionalAPartyNos (
BatchID,
AccountNo,
APartyNo,
SourceRowID
)
SELECT * FROM tab
Please note that the code assumes that the CTE will return exactly four fields and that those fields are matching in order and type with those specified in the INSERT statement.
If that is not the case, just replace the "SELECT *" with a specific select of the fields that you require.
As for your question on using a function, I would say "it depends". If you are putting the data in a table just because of performance reasons, and the speed is acceptable when using it through a function, then I'd consider function to be an option.
On the other hand, if you need to use the result of the CTE in several different queries, and speed is already an issue, I'd go for a table (either regular, or temp).
WITH common_table_expression (Transact-SQL)
The WITH clause for Common Table Expressions go at the top.
Wrapping every insert in a CTE has the benefit of visually segregating the query logic from the column mapping.
Spot the mistake:
WITH _INSERT_ AS (
SELECT
[BatchID] = blah
,[APartyNo] = blahblah
,[SourceRowID] = blahblahblah
FROM Table1 AS t1
)
INSERT Table2
([BatchID], [SourceRowID], [APartyNo])
SELECT [BatchID], [APartyNo], [SourceRowID]
FROM _INSERT_
Same mistake:
INSERT Table2 (
[BatchID]
,[SourceRowID]
,[APartyNo]
)
SELECT
[BatchID] = blah
,[APartyNo] = blahblah
,[SourceRowID] = blahblahblah
FROM Table1 AS t1
A few lines of boilerplate make it extremely easy to verify the code inserts the right number of columns in the right order, even with a very large number of columns. Your future self will thank you later.
Yep:
WITH tab (
bla bla
)
INSERT INTO dbo.prf_BatchItemAdditionalAPartyNos ( BatchID, AccountNo,
APartyNo,
SourceRowID)
SELECT * FROM tab
Note that this is for SQL Server, which supports multiple CTEs:
WITH x AS (), y AS () INSERT INTO z (a, b, c) SELECT a, b, c FROM y
Teradata allows only one CTE and the syntax is as your example.
Late to the party here, but for my purposes I wanted to be able to run the code the user inputted and store in a temp table. Using oracle no such issues.. the insert is at the start of the statement before the with clause.
For this to work in sql server, the following worked:
INSERT into #stagetable execute (#InputSql)
(so the select statement #inputsql can start as a with clause).

Combining INSERT INTO and WITH/CTE

I have a very complex CTE and I would like to insert the result into a physical table.
Is the following valid?
INSERT INTO dbo.prf_BatchItemAdditionalAPartyNos
(
BatchID,
AccountNo,
APartyNo,
SourceRowID
)
WITH tab (
-- some query
)
SELECT * FROM tab
I am thinking of using a function to create this CTE which will allow me to reuse. Any thoughts?
You need to put the CTE first and then combine the INSERT INTO with your select statement. Also, the "AS" keyword following the CTE's name is not optional:
WITH tab AS (
bla bla
)
INSERT INTO dbo.prf_BatchItemAdditionalAPartyNos (
BatchID,
AccountNo,
APartyNo,
SourceRowID
)
SELECT * FROM tab
Please note that the code assumes that the CTE will return exactly four fields and that those fields are matching in order and type with those specified in the INSERT statement.
If that is not the case, just replace the "SELECT *" with a specific select of the fields that you require.
As for your question on using a function, I would say "it depends". If you are putting the data in a table just because of performance reasons, and the speed is acceptable when using it through a function, then I'd consider function to be an option.
On the other hand, if you need to use the result of the CTE in several different queries, and speed is already an issue, I'd go for a table (either regular, or temp).
WITH common_table_expression (Transact-SQL)
The WITH clause for Common Table Expressions go at the top.
Wrapping every insert in a CTE has the benefit of visually segregating the query logic from the column mapping.
Spot the mistake:
WITH _INSERT_ AS (
SELECT
[BatchID] = blah
,[APartyNo] = blahblah
,[SourceRowID] = blahblahblah
FROM Table1 AS t1
)
INSERT Table2
([BatchID], [SourceRowID], [APartyNo])
SELECT [BatchID], [APartyNo], [SourceRowID]
FROM _INSERT_
Same mistake:
INSERT Table2 (
[BatchID]
,[SourceRowID]
,[APartyNo]
)
SELECT
[BatchID] = blah
,[APartyNo] = blahblah
,[SourceRowID] = blahblahblah
FROM Table1 AS t1
A few lines of boilerplate make it extremely easy to verify the code inserts the right number of columns in the right order, even with a very large number of columns. Your future self will thank you later.
Yep:
WITH tab (
bla bla
)
INSERT INTO dbo.prf_BatchItemAdditionalAPartyNos ( BatchID, AccountNo,
APartyNo,
SourceRowID)
SELECT * FROM tab
Note that this is for SQL Server, which supports multiple CTEs:
WITH x AS (), y AS () INSERT INTO z (a, b, c) SELECT a, b, c FROM y
Teradata allows only one CTE and the syntax is as your example.
Late to the party here, but for my purposes I wanted to be able to run the code the user inputted and store in a temp table. Using oracle no such issues.. the insert is at the start of the statement before the with clause.
For this to work in sql server, the following worked:
INSERT into #stagetable execute (#InputSql)
(so the select statement #inputsql can start as a with clause).