SQL basic full-text search - tsql

I have not worked much with TSQL or the full-text search feature of SQL Server so bear with me.
I have a table nvarchar column (Col) like this:
Col ... more columns
Row 1: '1'
Row 2: '1|2'
Row 3: '2|40'
I want to do a search to match similar users. So if I have a user that has a Col value of '1' I would expect the search to return the first two rows. If I had a user with a Col value of '1|2' I would expect to get Row 2 returned first and then Row 1. If I try to match users with a Col value of '4' I wouldn't get any results. I thought of doing a 'contains' by splitting the value I am using to query but it wouldn't work since '2|40' contains 4...
I looked up the documentation on using the 'FREETEXT' keyword but I don't think that would work for me since I essentially need to break up the Col values into words using the '|' as a break.
Thanks,
John

You should not store values like '1|2' in a field to store 2 values. If you have a maximum of 2 values, you should use 2 fields to store them. If you can have 0-many values, you should store them in a new table with a foreign key pointing to the primary key of your table..
If you only have max 2 values in your table. You can find your data like this:
DECLARE #s VARCHAR(3) = '1'
SELECT *
FROM <table>
WHERE #s IN(
PARSENAME(REPLACE(col, '|', '.'), 1),
PARSENAME(REPLACE(col, '|', '.'), 2)
--,PARSENAME(REPLACE(col, '|', '.'), 3) -- if col can contain 3
--,PARSENAME(REPLACE(col, '|', '.'), 4) -- or 4 values this can be used
)
Parsename can handle max 4 values. If 'col' can contain more than 4 values use this
DECLARE #s VARCHAR(3) = '1'
SELECT *
FROM <table>
WHERE '|' + col + '|' like '%|' + #s + '|%'

Need to mix this in with a case for when there is no | but this returns the left and right hand sides
select left('2|10', CHARINDEX('|', '2|10') - 1)
select right('2|10', CHARINDEX('|', '2|10'))

Related

Redshift how to split a stringified array into separate parts

Say I have a varchar column let's say religions that looks like this: ["Christianity", "Buddhism", "Judaism"] (yes it has a bracket in the string) and I want the string (not array) split into multiple rows like "Christianity", "Buddhism", "Judaism" so it can be used in a WHERE clause.
Eventually I want to use the results of the query in a where clause like this:
SELECT ...
FROM religions
WHERE name in
(
<this subquery>
)
How can one do this?
You can use the function JSON_PARSE to convert the varchar string into an array. Then you can use the strategy described in Convert varchar array to rows in redshift - Stack Overflow to convert the array to separate rows.
You can do the following.
Create a temporary table with sequence of numbers
Using the sequence and split_part function available in redshift, you can split the values based on the numbers generated in the temporary table by doing a cross join.
To replace the double quote and square brackets, you can use the regexp_replace function in Redshift.
create temp table seq as
with recursive numbers(NUMBER) as
(
select 1 UNION ALL
select NUMBER + 1 from numbers where NUMBER < 28
)
select * from numbers;
select regexp_replace(split_part(val,',',seq.number),'[]["]','') as value
from
(select '["christianity","Buddhism","Judaism"]' as val) -- You can select the actual column from the table here.
cross join
seq
where seq.number <= regexp_count(val,'[,]')+1;

Query table by a value in the second dimension of a two dimensional array column

WHAT I HAVE
I have a table with the following definition:
CREATE TABLE "Highlights"
(
id uuid,
chunks numeric[][]
)
WHAT I NEED TO DO
I need to query the data in the table using the following predicate:
... WHERE id = 'some uuid' and chunks[????????][1] > 10 chunks[????????][3] < 20
What should I put instead of [????????] in order to scan all items in the first dimension of the array?
Notes
I'm not entirely sure that chunks[][1] even close to something I need.
All I need is to test a row, whether its chunks column contains a two dimensional array, that has in any of its tuples some specific values.
May be there's better alternative, but this might do - you just go over first dimension of each array and testing your condition:
select *
from highlights as h
where
exists (
select
from generate_series(1, array_length(h.chunks, 1)) as tt(i)
where
-- your condition goes here
h.chunks[tt.i][1] > 10 and h.chunks[tt.i][3] < 20
)
db<>fiddle demo
update as #arie-r pointed out, it'd be better to use generate_subscripts function:
select *
from highlights as h
where
exists (
select *
from generate_subscripts(h.chunks, 1) as tt(i)
where
h.chunks[tt.i][3] = 6
)
db<>fiddle demo

to find new lines character in postgres

Having couple of entries in database table that have multiple line "names" data.
I try to find single newline character from it.
SELECT
id,
strpos ( NAME, E'\n' ) AS Position_of_substring
FROM
problems
WHERE
strpos ( NAME, E'\n' ) > 0;
But it fails for the data that have more than 1 new line character (\n).
ANy way to find "n" number of "\n" in names data.
regexp_matches will emit a row for each match. doc
SELECT
id,
strpos ( NAME, E'\n' ) AS Position_of_substring
FROM
problems p
WHERE
(select count(*) from regexp_matches(p.name,E'\n','g') ) = ?;
This one gives you a list of all indexes with \n in your string. I am not sure if you were expecting this result:
demo:db<>fiddle
SELECT
name,
array_remove( -- 5
(array_agg(sum))::int[], -- 4
length(name) + 1
)
FROM (
-- 3
SELECT
name,
SUM(length(lines) + 1) OVER (PARTITION BY name ORDER BY row_number)
FROM (
-- 2
SELECT
*,
row_number() OVER ()
FROM (
-- 1
SELECT
name,
regexp_split_to_table(name, '\n') as lines
FROM problems
)s
)s
) s
GROUP BY name
Splitting the string at the \n chars. Every split part is now one row in a temporary table.
Adding a row_count to assure the right order of the split parts
This counts the length of all single split parts. The (length + 1) gives the position of the \n. The SUM window function sums up all values within a group (your original text). That's why the order is relevant. For example: The first two parts of "abc\nde\nfgh" have the lengths of 3 and 2. So the breaks are at 4 (abc = 3, + 1) and 3 (de = 2, + 1). But the 3 of the second part is no real index, but if you sum up these values you get the right indexes: 4 and 7.
Aggregating these results
If (as in my example) the last char is always a \n and you are only interested in the \n chars the string you could remove the last entry of the aggregated array.
Changed problem in comments below:
Would like to replace \n with spaces. So I am thinking how above query
will look in the Update statement. – Pranav Unde
Replacing the \n by spaces is a quiet different problem then getting indexes for all occurances of a special character. And it's much simpler:
UPDATE problems
SET name = trim(regexp_replace(name, E'\n', ' ', 'g'));
regexp_replace(..., 'g') finds all occurances of \n and does the replacing
trim() removes the whitespaces before and after the string if necessary (maybe because there was a trailing \n as in my example - which was replaced by a space as well in the step before)
demo:db<>fiddle

TSQL split comma delimited string

I am trying to create a stored procedure that will split 3 text boxes on a webpage that have user input that all have comma delimited strings in it. We have a field called 'combined_name' in our table that we have to search for first and last name and any known errors or nicknames etc. such as #p1: 'grei,grie' #p2: 'joh,jon,j..' p3: is empty.
The reason for the third box is after I get the basics set up we will have does not contain, starts with, ends with and IS to narrow our results further.
So I am looking to get all records that CONTAINS any combination of those. I originally wrote this in LINQ but it didn't work as you cannot query a list and a dataset. The dataset is too large (1.3 million records) to be put into a list so I have to use a stored procedure which is likely better anyway.
Will I have to use 2 SP, one to split each field and one for the select query or can this be done with one? What function do I use for contains in tsql? I tried using IN win a query but cannot figure out how it works with multiple parameters.
Please note that this will be an internal site that has limited access so worrying about sql injection is not a priority.
I did attempt dynamic SQL but am not getting the correct results back:
CREATE PROCEDURE uspJudgments #fullName nvarchar(100) AS
EXEC('SELECT *
FROM new_judgment_system.dbo.defendants_ALL
WHERE combined_name IN (' + #fullName + ')')
GO
EXEC uspJudgments #fullName = '''grein'', ''grien'''
Even if this did retrieve the correct results how would this be done with 3 parameters?
You may try use this to split string and obtain a tables of strings. Then to have all the combinations you may use full join of these two tables. And then do your select.
Here is the Table valued function I set up:
ALTER FUNCTION [dbo].[Split] (#sep char(1), #s varchar(8000))
RETURNS table
AS
RETURN (
WITH splitter_cte AS (
SELECT CHARINDEX(#sep, #s) as pos, 0 as lastPos
UNION ALL
SELECT CHARINDEX(#sep, #s, pos + 1), pos
FROM splitter_cte
WHERE pos > 0
)
SELECT SUBSTRING(#s, lastPos + 1,
case when pos = 0 then 80000
else pos - lastPos -1 end) as OutputValues
FROM splitter_cte
)
)

help with TSQL IN statement with int

I am trying to create the following select statement in a stored proc
#dealerids nvarchar(256)
SELECT *
FROM INVOICES as I
WHERE convert(nvarchar(20), I.DealerID) in (#dealerids)
I.DealerID is an INT in the table. and the Parameter for dealerids would be formatted such as
(8820, 8891, 8834)
When I run this with parameters provided I get no rows back. I know these dealerIDs should provided rows as if I do it individually I get back what I expect.
I think I am doing
WHERE convert(nvarchar(20), I.DealerID) in (#dealerids)
incorrectly. Can anyone point out what I am doing wrong here?
Use a table values parameter (new in SQl Server 2008). Set it up by creating the actual table parameter type:
CREATE TYPE IntTableType AS TABLE (ID INTEGER PRIMARY KEY)
Your procedure would then be:
Create Procedure up_TEST
#Ids IntTableType READONLY
AS
SELECT *
FROM ATable a
WHERE a.Id IN (SELECT ID FROM #Ids)
RETURN 0
GO
if you can't use table value parameters, see: "Arrays and Lists in SQL Server 2005 and Beyond, When Table Value Parameters Do Not Cut it" by Erland Sommarskog, then there are many ways to split string in SQL Server. This article covers the PROs and CONs of just about every method. in general, you need to create a split function. This is how a split function can be used:
SELECT
*
FROM YourTable y
INNER JOIN dbo.yourSplitFunction(#Parameter) s ON y.ID=s.Value
I prefer the number table approach to split a string in TSQL but there are numerous ways to split strings in SQL Server, see the previous link, which explains the PROs and CONs of each.
For the Numbers Table method to work, you need to do this one time table setup, which will create a table Numbers that contains rows from 1 to 10,000:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this split function:
CREATE FUNCTION [dbo].[FN_ListToTable]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this will not return empty rows
----------------
SELECT
ListValue
FROM (SELECT
LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(#SplitOn, List2, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS List2
) AS dt
INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
WHERE SUBSTRING(List2, number, 1) = #SplitOn
) dt2
WHERE ListValue IS NOT NULL AND ListValue!=''
);
GO
You can now easily split a CSV string into a table and join on it:
Create Procedure up_TEST
#Ids VARCHAR(MAX)
AS
SELECT * FROM ATable a
WHERE a.Id IN (SELECT ListValue FROM dbo.FN_ListToTable(',',#Ids))
You can't use #dealerids like that, you need to use dynamic SQL, like this:
#dealerids nvarchar(256)
EXEC('SELECT *
FROM INVOICES as I
WHERE convert(nvarchar(20), I.DealerID) in (' + #dealerids + ')'
The downside is that you open yourself up to SQL injection attacks unless you specifically control the data going into #dealerids.
There are better ways to handle this depending on your version of SQL Server, which are documented in this great article.
Split #dealerids into a table then JOIN
SELECT *
FROM INVOICES as I
JOIN
ufnSplit(#dealerids) S ON I.DealerID = S.ParsedIntDealerID
Assorted split functions here (I'd probably a numbers table in this case for a small string