Trying to manipulate string such as if '26169;#c785643', then the result should be like 'c785643' - tsql

I am trying to manipulate string data in a column such as if the given string is '20591;#e123456;#17507;#c567890;#15518;#e135791' or '26169;#c785643', then the
result should be like 'e123456;c567890;e135791' or 'c785643'. The number of digits in between can be of any length.
Some of the things I have tried so far are:
select replace('20591;#e123456;#17507;#c567890;#15518;#e135791','#','');
This leaves me with '20591;e123456;17507;c567890;15518;e135791', which still includes the digits without 'e' or 'c' prefixed to them. i want to get rid of 20591, 17507 and 15518.
Create function that will keep a pattern of '%[#][ec][0-9][;]%' and will get rid of the rest.

The most important advise is: Do not store any data in a delimited string. This is violating the most basic principle of relational database concepts (1.NF).
The second hint is SO-related: Please always add / tag your questions with the appropriate tool. The tag [tsql] points to SQL-Server, but this might be wrong (which would invalidate both answers). Please tag the full product with its version (e.g. [sql-server-2012]). Especially with string splitting there are very important product related changes from version to version.
Now to your question.
Working with (almost) any version of SQL-Server
My suggestion uses a trick with XML:
(credits to Alan Burstein for the mockup)
DECLARE #table TABLE (someid INT IDENTITY, somestring VARCHAR(50));
INSERT #table VALUES ('20591;#e123456;#17507;#c567890;#15518;#e135791'),('26169;#c785643')
--the query
SELECT t.someid,t.somestring,A.CastedToXml
,STUFF(REPLACE(A.CastedToXml.query('/x[contains(text()[1],"#") and empty(substring(text()[1],2,100) cast as xs:int?)]')
.value('.','nvarchar(max)'),'#',';'),1,1,'') TheNewList
FROM #table t
CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.somestring,';','</x><x>') + '</x>' AS XML)) A(CastedToXml);
The idea in short:
By replacing the ; with XML tags </x><x> we can transform your delimited list to XML. I included the intermediate XML into the result set. Just click it to see how this works.
In the next query I use a XQuery predicate first to find entries, which contain a # and second, which do NOT cast to an integer without the #.
The thrid step is specific to XML again. The XPath . in .value() will return all content as one string.
Finally we have to replace the # with ; and cut away the leading ; using STUFF().
UPDATE The same idea, but a bit shorter:
You can try this as well
SELECT t.someid,t.somestring,A.CastedToXml
,REPLACE(A.CastedToXml.query('data(/x[empty(. cast as xs:int?)])')
.value('.','nvarchar(max)'),' ',';') TheNewList
FROM #table t
CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.somestring,';#','</x><x>') + '</x>' AS XML)) A(CastedToXml);
Here I use ;# to split your string and data() to implicitly concatenate your values (blank-separated).
UPDATE 2 for v2017
If you have v2017+ I'd suggest a combination of a JSON splitter and STRING_AGG():
SELECT t.someid,STRING_AGG(A.[value],';') AS TheNewList
FROM #table t
CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.somestring,';#','","'),'"]')) A
WHERE TRY_CAST(A.[value] AS INT) IS NULL
GROUP BY t.someid;

You did not include the version of SQL Server you are on. If you are using 2016+ you can use SPLIT_STRING, otherwise a good T-SQL splitter will do.
Against a single variable:
DECLARE #somestring VARCHAR(1000) = '20591;#e123456;#17507;#c567890;#15518;#e135791';
SELECT NewString = STUFF((
SELECT ','+split.item
FROM STRING_SPLIT(#somestring,';') AS s
CROSS APPLY (VALUES(REPLACE(s.[value],'#',''))) AS split(item)
WHERE split.item LIKE '[a-z][0-9]%'
FOR XML PATH('')),1,1,'');
Against a table:
NewString
----------------------
e123456,c567890,e135791
-- Against a table
DECLARE #table TABLE (someid INT IDENTITY, somestring VARCHAR(50));
INSERT #table VALUES ('20591;#e123456;#17507;#c567890;#15518;#e135791'),('26169;#c785643')
SELECT t.*, fn.NewString
FROM #table AS t
CROSS APPLY
(
SELECT NewString = STUFF((
SELECT ','+split.item
FROM STRING_SPLIT(t.somestring,';') AS s
CROSS APPLY (VALUES(REPLACE(s.[value],'#',''))) AS split(item)
WHERE split.item LIKE '[a-z][0-9]%'
FOR XML PATH('')),1,1,'')
) AS fn;
Returns:
someid somestring NewString
----------- -------------------------------------------------- -----------------------------
1 20591;#e123456;#17507;#c567890;#15518;#e135791 e123456,c567890,e135791
2 26169;#c785643 c785643

Related

With PostgREST, convert a column to and from an external encoding in the API

We are using PostgREST to automatically generate a REST API for a Postgres database. Our primary keys have an external representation that's different from how we store them internally. For simplicity's sake lets pretend the ids are stored as integers but we represent them as hexadecimal strings outwardly.
It's simple enough to get PostgREST to convert to the external representation for read operations:
CREATE DOMAIN hexid AS bigint;
CREATE TABLE fruits (
fruit_id hexid PRIMARY KEY,
name text
);
CREATE OR REPLACE VIEW api_fruits AS
SELECT to_hex(fruit_id) as fruit_id, name FROM fruits;
INSERT INTO fruits(fruit_id, name) VALUES('51955', 'avocado');
PostgREST generates the expected representation when we GET api_fruits:
[
{
"fruit_id": "caf3",
"name": "avocado"
}
]
But that's about as far as we get with this solution. It's a one way transformation so we won't be able to POST/PATCH records this way. The way PostgREST works is to transform such requests into equivalent INSERT and UPDATE statements. But this view with its custom formatting is not updatable. This is what would happen if we tried:
ERROR: cannot insert into column "fruit_id" of view "api_fruits"
DETAIL: View columns that are not columns of their base relation are not updatable.
STATEMENT: WITH pgrst_source AS (WITH pgrst_payload AS (SELECT $1::json AS json_data), pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload) INSERT INTO "api_x"."api_fruits"("fruit_id", "name") SELECT "fruit_id", "name" FROM json_populate_recordset (null::"api_x"."api_fruits", (SELECT val FROM pgrst_body)) _ RETURNING "api_x"."api_fruits".*) SELECT '' AS total_result_set, pg_catalog.count(_postgrest_t) AS page_total, CASE WHEN pg_catalog.count(_postgrest_t) = 1 THEN coalesce((
WITH data AS (SELECT row_to_json(_) AS row FROM pgrst_source AS _ LIMIT 1)
SELECT array_agg(json_data.key || '=eq.' || json_data.value)
FROM data CROSS JOIN json_each_text(data.row) AS json_data
WHERE json_data.key IN ('')
), array[]::text[]) ELSE array[]::text[] END AS header, '' AS body, nullif(current_setting('response.headers', true), '') AS response_headers, nullif(current_setting('response.status', true), '') AS response_status FROM (SELECT * FROM pgrst_source) _postgrest_t
We can't INSERT into "View columns that are not columns of their base relation".
The obvious workaround is to serve fruit_id as a straight column, just an integer. With some post and preprocessing at the nginx level we can hex encode it there (and hex decode incoming ids). I'm wondering if we can do better than that though. For large API operations, re-encoding the JSON will use a lot of memory and CPU time and it seems so unnecessary.
It would have been great to be able to use a custom CREATE CAST to take the incoming hexadecimal strings and turn them back into integers, something like this:
CREATE CAST (json AS hexid) WITH FUNCTION json_to_hexid AS ASSIGNMENT;
But alas custom casts are ignored on CREATE DOMAIN types. And we can't make a true custom column type because our cloud Postgres host (Google Cloud SQL) doesn't allow custom extensions.
It feels like some combination of INSTEAD OF triggers or rules could work. But when using query parameters to filter results using query parameters (e.g. select a fruit by id), I don't think there's an appropriate trigger to use. INSTEAD OF doesn't work for straight SELECT does it?
For example I've tested doing something like this to take care of INSERT and allow POST with PostgREST. It works:
CREATE OR REPLACE FUNCTION api_fruits_insert()
RETURNS trigger AS
$$
BEGIN
INSERT INTO fruits(fruit_id, name) VALUES (('x' || lpad(NEW.fruit_id, 16, '0'))::bit(64)::bigint::hexid, NEW.name);
RETURN NEW;
END
$$ LANGUAGE 'plpgsql';
CREATE TRIGGER api_fruits_insert
INSTEAD OF INSERT
ON api_fruits
FOR EACH ROW
EXECUTE PROCEDURE api_fruits_insert();
The trouble is in the WHERE clause. Let's PATCH api_fruits?fruit_id=in.(7b,caf3) with {"name": "pear"}. This works out of the box since the name column is updatable but look at the query:
WITH pgrst_source AS (WITH pgrst_payload AS (SELECT $1::json AS json_data), pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload) UPDATE "api_x"."api_fruits" SET "name" = _."name" FROM (SELECT * FROM json_populate_recordset (null::"api_x"."api_fruits" , (SELECT val FROM pgrst_body) )) _ WHERE "api_x"."api_fruits"."fruit_id" = ANY ($2) RETURNING 1) SELECT '' AS total_result_set, pg_catalog.count(_postgrest_t) AS page_total, array[]::text[] AS header, '' AS body, nullif(current_setting('response.headers', true), '') AS response_headers, nullif(current_setting('response.status', true), '') AS response_status FROM (SELECT * FROM pgrst_source) _postgrest_t
DETAIL: parameters: $1 = '{
"name": "pear"
}', $2 = '{7b,caf3}'
So we have essentially UPDATE api_fruits SET name='berry' WHERE fruit_id IN ('7b', 'caf3');. Surprisingly this works but it's a full table scan so Postgres can evaluate to_hex(fruit_id) for each row looking for matches. The same happens if we try to GET a record by fruit_id. How would we rewrite the WHERE clauses?
It really feels like some combination of just the right Postgres and PostgREST features should be able to get us to a point where it's all happening in Postgres without nginx's help and without excessive complexity. Any ideas?

TSQL - Parsing substring out of larger string

I have a bunch of rows with values that look like below. It's json extract that I unfortunately have to parse out and load. Anyway, my json parsing tool for some reason doesn't want to parse this full column out so i need to do it in TSQL. I only need the unique_id field:
[{"unique_id":"12345","system_type":"Test System."}]
I tried the below SQL but it's only returning the first 5 characters of the whole column. I know what the issue is which is I need to know how to tell the substring to continue until the 4th set of quotes which comes after the value. I'm not sure how to code the substring like that.
select substring([jsonfield],CHARINDEX('[{"unique_id":"',[jsonfield]),
CHARINDEX('"',[jsonfield]) - CHARINDEX('[{"unique_id":"',[jsonfield]) +
LEN('"')) from etl.my_test_table
Can anyone help me with this?
Thank you, I appreciate it!
Since you tagged 2016, why not use OPENJSON()
Here's an example:
DECLARE #TestData TABLE
(
[SampleData] NVARCHAR(MAX)
);
INSERT INTO #TestData (
[SampleData]
)
VALUES ( N'[{"unique_id":"12345","system_type":"Test System."}]' )
,( N'[{"unique_id":"1234567","system_type":"Test System."},{"unique_id":"1234567_2","system_type":"Test System."}]' )
SELECT b.[unique_id]
FROM #TestData [a]
CROSS APPLY
OPENJSON([a].[SampleData], '$')
WITH (
[unique_id] NVARCHAR(100) '$.unique_id'
) AS [b];
Giving you:
unique_id
---------------
12345
1234567
1234567_2
You can get all the fields as well, just add them to the WITH clause:
SELECT [b].[unique_id]
, [b].[system_type]
FROM #TestData [a]
CROSS APPLY
OPENJSON([a].[SampleData], '$')
WITH (
[unique_id] NVARCHAR(100) '$.unique_id'
, [system_type] NVARCHAR(100) '$.system_type'
) AS [b];
Take it step by step
First get everything to the left of system_type
SELECT LEFT(jsonfield, CHARINDEX('","system_type":"',jsonfield) as s
FROM -- etc
Then take everything to the right of "unique_id":"
SELECT RIGHT(S, LEN(S) - (CHARINDEX('"unique_id":"',S) + 12)) as Result
FROM (
SELECT LEFT(jsonfield, CHARINDEX('","system_type":"',jsonfield) as s
FROM -- etc
) X
Note, I did not test this so it could be off by one or have a syntax error, but you get the idea.
If your larger string ist just a simple JSON as posted, the solution is very easy:
SELECT
JSON_VALUE(N'[{"unique_id":"12345","system_type":"Test System."}]','$[0].unique_id');
JSON_VALUE() needs SQL-Server 2016 and will extract one single value from a specified path.

How to update sql column based on matching special character

I'm trying to get rid of some bad characters in our database. The rows I'm working on at the moment start with a bullet and a space. My where clause isn't matching the rows in question, however. How can I get a match? This is what I'm currently trying.
update Skill
set Name = substring(Name, 3, 50)
where Name like char(150) + ' %'
Perhaps you aren't capturing the correct ASCII value. Here's a way with unicode that uses a similar method.
declare #table table ([Name] nvarchar(64))
insert into #table
values
('- some data')
select
UNICODE(left([Name],1)) --this will tell you what VALUE to use in the where clause
,NCHAR(UNICODE(left([Name],1)))
from #table
update #table
set [Name] = substring([Name], 3, 50)
where UNICODE(left([Name],1)) = 45 --use the appropriate UNICODE values here
select * from #table
As stated in the comments you can use ASCII() to get the code for the character you are looking for and then use that code in your update.
UPDATE Skill
set Name = substring(Name, 3, 50)
WHERE LEFT(name,1) = CHAR(149)

TSQL split comma delimited string

I am trying to create a stored procedure that will split 3 text boxes on a webpage that have user input that all have comma delimited strings in it. We have a field called 'combined_name' in our table that we have to search for first and last name and any known errors or nicknames etc. such as #p1: 'grei,grie' #p2: 'joh,jon,j..' p3: is empty.
The reason for the third box is after I get the basics set up we will have does not contain, starts with, ends with and IS to narrow our results further.
So I am looking to get all records that CONTAINS any combination of those. I originally wrote this in LINQ but it didn't work as you cannot query a list and a dataset. The dataset is too large (1.3 million records) to be put into a list so I have to use a stored procedure which is likely better anyway.
Will I have to use 2 SP, one to split each field and one for the select query or can this be done with one? What function do I use for contains in tsql? I tried using IN win a query but cannot figure out how it works with multiple parameters.
Please note that this will be an internal site that has limited access so worrying about sql injection is not a priority.
I did attempt dynamic SQL but am not getting the correct results back:
CREATE PROCEDURE uspJudgments #fullName nvarchar(100) AS
EXEC('SELECT *
FROM new_judgment_system.dbo.defendants_ALL
WHERE combined_name IN (' + #fullName + ')')
GO
EXEC uspJudgments #fullName = '''grein'', ''grien'''
Even if this did retrieve the correct results how would this be done with 3 parameters?
You may try use this to split string and obtain a tables of strings. Then to have all the combinations you may use full join of these two tables. And then do your select.
Here is the Table valued function I set up:
ALTER FUNCTION [dbo].[Split] (#sep char(1), #s varchar(8000))
RETURNS table
AS
RETURN (
WITH splitter_cte AS (
SELECT CHARINDEX(#sep, #s) as pos, 0 as lastPos
UNION ALL
SELECT CHARINDEX(#sep, #s, pos + 1), pos
FROM splitter_cte
WHERE pos > 0
)
SELECT SUBSTRING(#s, lastPos + 1,
case when pos = 0 then 80000
else pos - lastPos -1 end) as OutputValues
FROM splitter_cte
)
)

help with TSQL IN statement with int

I am trying to create the following select statement in a stored proc
#dealerids nvarchar(256)
SELECT *
FROM INVOICES as I
WHERE convert(nvarchar(20), I.DealerID) in (#dealerids)
I.DealerID is an INT in the table. and the Parameter for dealerids would be formatted such as
(8820, 8891, 8834)
When I run this with parameters provided I get no rows back. I know these dealerIDs should provided rows as if I do it individually I get back what I expect.
I think I am doing
WHERE convert(nvarchar(20), I.DealerID) in (#dealerids)
incorrectly. Can anyone point out what I am doing wrong here?
Use a table values parameter (new in SQl Server 2008). Set it up by creating the actual table parameter type:
CREATE TYPE IntTableType AS TABLE (ID INTEGER PRIMARY KEY)
Your procedure would then be:
Create Procedure up_TEST
#Ids IntTableType READONLY
AS
SELECT *
FROM ATable a
WHERE a.Id IN (SELECT ID FROM #Ids)
RETURN 0
GO
if you can't use table value parameters, see: "Arrays and Lists in SQL Server 2005 and Beyond, When Table Value Parameters Do Not Cut it" by Erland Sommarskog, then there are many ways to split string in SQL Server. This article covers the PROs and CONs of just about every method. in general, you need to create a split function. This is how a split function can be used:
SELECT
*
FROM YourTable y
INNER JOIN dbo.yourSplitFunction(#Parameter) s ON y.ID=s.Value
I prefer the number table approach to split a string in TSQL but there are numerous ways to split strings in SQL Server, see the previous link, which explains the PROs and CONs of each.
For the Numbers Table method to work, you need to do this one time table setup, which will create a table Numbers that contains rows from 1 to 10,000:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this split function:
CREATE FUNCTION [dbo].[FN_ListToTable]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this will not return empty rows
----------------
SELECT
ListValue
FROM (SELECT
LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(#SplitOn, List2, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS List2
) AS dt
INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
WHERE SUBSTRING(List2, number, 1) = #SplitOn
) dt2
WHERE ListValue IS NOT NULL AND ListValue!=''
);
GO
You can now easily split a CSV string into a table and join on it:
Create Procedure up_TEST
#Ids VARCHAR(MAX)
AS
SELECT * FROM ATable a
WHERE a.Id IN (SELECT ListValue FROM dbo.FN_ListToTable(',',#Ids))
You can't use #dealerids like that, you need to use dynamic SQL, like this:
#dealerids nvarchar(256)
EXEC('SELECT *
FROM INVOICES as I
WHERE convert(nvarchar(20), I.DealerID) in (' + #dealerids + ')'
The downside is that you open yourself up to SQL injection attacks unless you specifically control the data going into #dealerids.
There are better ways to handle this depending on your version of SQL Server, which are documented in this great article.
Split #dealerids into a table then JOIN
SELECT *
FROM INVOICES as I
JOIN
ufnSplit(#dealerids) S ON I.DealerID = S.ParsedIntDealerID
Assorted split functions here (I'd probably a numbers table in this case for a small string