FullText search with CONTAINS on multiple columns and predicate - AND - tsql

I have a search table with, say, 4 columns of text data to search.
I do something like this:
SELECT * FROM dbo.SearchTable
WHERE CONTAINS((co1, col2, col3, col4), 'term1 AND term2')
It looks like Contains only returns true if term1 and term2 are in the same column. Is there any way to specify that all columns should be included with an AND?
If not, my idea is to JSON all search columns and stick them into one. That way I can full text search them but still easily extract the individual columns in .NET. I'm presuming that the indexer won't have a problem with this and will dispense with the JSON characters and quotes. Is this correct?
Thanks
EDIT
Thinking about the JSON idea, the crawler would also index the property names so I'd have to rename {name}, {details}, {long_details} to something like {x1}, {x2}, {x3} to ensure they'd not be picked in a search. Hopefully if they're so short they wouldn't be indexed anyway.
EDIT2
I can create a Stoplist, based on the system Stoplist and put the property names into that.

This should work:
SELECT * FROM dbo.SearchTable
WHERE CONTAINS((co1, col2, col3, col4), 'term1')
AND CONTAINS((co1, col2, col3, col4), 'term2');
Alternatively, you could add a new computed column with a full text index on it. Add a column like this:
computedCol AS col1 + ' ' + col2 + ' ' + col3 + ' ' + col4
And create the full text index:
CREATE FULLTEXT INDEX ON SearchTable (computedCol LANGUAGE 1033)
KEY INDEX pk_SearchTable_yourPrimaryKeyName
Then you can do this:
SELECT * FROM dbo.SearchTable
WHERE CONTAINS(*, 'term1 AND term2')

Related

How do i extract texts from string and save it as two columns and add character at the end for third column

I m using TSQL, I want to extract text from the string and save it as two column and third one as
The following code is not complete but just getting rid of PRD1T_ the Finapp and not sure how to cater for rest of the text
Select substring(Table_name,
charindex('_',Table_name)+1,
Len(Table_name) - charindex('.',Table_name)) as Landing_Schema_Name
FROM [e].[Load_History_test]
When asking questions like this it is best to provide some sample data and expected results, as Ronen suggested. For SQL questions a really good way of doing this is with a temp table and sample data, like this:
CREATE TABLE #load_history_test (
table_name VARCHAR(100)
);
INSERT INTO #load_history_test
SELECT 'PRD1T_FINAPP.HOLCONTRACT'
UNION ALL
SELECT 'PRD1T_FINAPP.TOCCASE'
UNION ALL
SELECT 'PRD1T_FINAPP.TOCCASE';
So that provides something people can run and starts towards the criteria for a minimal, reproducible example, which is the secret to getting a good answer on StackOverflow. I have used SELECT with UNION ALL as Azure Synapse does not currently support the VALUES clause for multiple records.
For expected results, it's often good to display them in a table, something like this:
col1
col2
col3
PRD1T
FINAPP
HOLCONTRACT
PRD1T
FINAPP
TOCCASE
This way it's clear to people what you expect. It is not clear why you have two TOCCASE examples in your screenprint.
For your problem, there is more than one approach. You are along the right lines with CHARINDEX, SUBSTRING and LEFT but things can start to look complicated. Therefore I tend to wrap up some complexity in a Common Table Expression (CTE), see below for an example. There is also a kind of 'trick' approach with a built-in SQL function called PARSENAME. This is designed to extract from four-part object names common in SQL Server eg <server-name>.<database-name>.<schema-name>.<object-name>. As long as your object names will never have more than four parts, this approach will work for you. See the main help for PARSENAME here. See below for a complete demo that runs end to end with a temp table to demonstrate the different principles:
IF OBJECT_ID('#load_history_test') IS NOT NULL
DROP TABLE #load_history_test;
CREATE TABLE #load_history_test (
table_name VARCHAR(100)
);
INSERT INTO #load_history_test
SELECT 'PRD1T_FINAPP.HOLCONTRACT'
UNION ALL
SELECT 'PRD1T_FINAPP.TOCCASE'
UNION ALL
SELECT 'PRD1T_FINAPP.TOCCASE';
;WITH cte AS (
SELECT
table_name AS original_table_name,
CHARINDEX( '_', table_name ) underscorePos,
CHARINDEX( '.', table_name ) stopPos,
REPLACE( table_name, '_', '.' ) AS clean_table_name
FROM #load_history_test
)
SELECT
*,
PARSENAME( clean_table_name, 3 ) a,
PARSENAME( clean_table_name, 2 ) b,
PARSENAME( clean_table_name, 1 ) c,
LEFT( original_table_name, underscorePos - 1 ) getItemBeforeUnderscore,
SUBSTRING( original_table_name, underscorePos + 1, ( ( stopPos - 1 ) - underscorePos ) ) AS getItemAfterUnderscore,
SUBSTRING( original_table_name, stopPos + 1, 99 ) getItemAfterStop
FROM cte;

Trying to manipulate string such as if '26169;#c785643', then the result should be like 'c785643'

I am trying to manipulate string data in a column such as if the given string is '20591;#e123456;#17507;#c567890;#15518;#e135791' or '26169;#c785643', then the
result should be like 'e123456;c567890;e135791' or 'c785643'. The number of digits in between can be of any length.
Some of the things I have tried so far are:
select replace('20591;#e123456;#17507;#c567890;#15518;#e135791','#','');
This leaves me with '20591;e123456;17507;c567890;15518;e135791', which still includes the digits without 'e' or 'c' prefixed to them. i want to get rid of 20591, 17507 and 15518.
Create function that will keep a pattern of '%[#][ec][0-9][;]%' and will get rid of the rest.
The most important advise is: Do not store any data in a delimited string. This is violating the most basic principle of relational database concepts (1.NF).
The second hint is SO-related: Please always add / tag your questions with the appropriate tool. The tag [tsql] points to SQL-Server, but this might be wrong (which would invalidate both answers). Please tag the full product with its version (e.g. [sql-server-2012]). Especially with string splitting there are very important product related changes from version to version.
Now to your question.
Working with (almost) any version of SQL-Server
My suggestion uses a trick with XML:
(credits to Alan Burstein for the mockup)
DECLARE #table TABLE (someid INT IDENTITY, somestring VARCHAR(50));
INSERT #table VALUES ('20591;#e123456;#17507;#c567890;#15518;#e135791'),('26169;#c785643')
--the query
SELECT t.someid,t.somestring,A.CastedToXml
,STUFF(REPLACE(A.CastedToXml.query('/x[contains(text()[1],"#") and empty(substring(text()[1],2,100) cast as xs:int?)]')
.value('.','nvarchar(max)'),'#',';'),1,1,'') TheNewList
FROM #table t
CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.somestring,';','</x><x>') + '</x>' AS XML)) A(CastedToXml);
The idea in short:
By replacing the ; with XML tags </x><x> we can transform your delimited list to XML. I included the intermediate XML into the result set. Just click it to see how this works.
In the next query I use a XQuery predicate first to find entries, which contain a # and second, which do NOT cast to an integer without the #.
The thrid step is specific to XML again. The XPath . in .value() will return all content as one string.
Finally we have to replace the # with ; and cut away the leading ; using STUFF().
UPDATE The same idea, but a bit shorter:
You can try this as well
SELECT t.someid,t.somestring,A.CastedToXml
,REPLACE(A.CastedToXml.query('data(/x[empty(. cast as xs:int?)])')
.value('.','nvarchar(max)'),' ',';') TheNewList
FROM #table t
CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.somestring,';#','</x><x>') + '</x>' AS XML)) A(CastedToXml);
Here I use ;# to split your string and data() to implicitly concatenate your values (blank-separated).
UPDATE 2 for v2017
If you have v2017+ I'd suggest a combination of a JSON splitter and STRING_AGG():
SELECT t.someid,STRING_AGG(A.[value],';') AS TheNewList
FROM #table t
CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.somestring,';#','","'),'"]')) A
WHERE TRY_CAST(A.[value] AS INT) IS NULL
GROUP BY t.someid;
You did not include the version of SQL Server you are on. If you are using 2016+ you can use SPLIT_STRING, otherwise a good T-SQL splitter will do.
Against a single variable:
DECLARE #somestring VARCHAR(1000) = '20591;#e123456;#17507;#c567890;#15518;#e135791';
SELECT NewString = STUFF((
SELECT ','+split.item
FROM STRING_SPLIT(#somestring,';') AS s
CROSS APPLY (VALUES(REPLACE(s.[value],'#',''))) AS split(item)
WHERE split.item LIKE '[a-z][0-9]%'
FOR XML PATH('')),1,1,'');
Against a table:
NewString
----------------------
e123456,c567890,e135791
-- Against a table
DECLARE #table TABLE (someid INT IDENTITY, somestring VARCHAR(50));
INSERT #table VALUES ('20591;#e123456;#17507;#c567890;#15518;#e135791'),('26169;#c785643')
SELECT t.*, fn.NewString
FROM #table AS t
CROSS APPLY
(
SELECT NewString = STUFF((
SELECT ','+split.item
FROM STRING_SPLIT(t.somestring,';') AS s
CROSS APPLY (VALUES(REPLACE(s.[value],'#',''))) AS split(item)
WHERE split.item LIKE '[a-z][0-9]%'
FOR XML PATH('')),1,1,'')
) AS fn;
Returns:
someid somestring NewString
----------- -------------------------------------------------- -----------------------------
1 20591;#e123456;#17507;#c567890;#15518;#e135791 e123456,c567890,e135791
2 26169;#c785643 c785643

Concatenate a column from multiple rows into a single formatted string

I have rows like so:
roll_no
---------
0690543
0005331
0760745
0005271
And I want string like this :
"0690543.pdf" "0005331.pdf" "0760745.pdf" "0005271.pdf"
I have tried concat but unable to do so
You can use an aggregate function like string_agg, after first mangling the quotes and the .pdf extension to your column data. Use a space as your delimiter:
SELECT string_agg('"'||roll_no||'.pdf "', ' ') from myTable
SqlFiddle here

returning negative matches in Postgres

Let us say that I want to find rows that contain a match. For example:
select * from tableName where tableName.colName like '%abc%' limit 5;
This is going to give me rows that contain the substring abc within the column colName in the table tableName. Now, how do I get rows rows where the column colName does not contain the string abc. Is there some sort of a negation operator in Postgres?
Use not like:
select * from tableName where tableName.colName not like '%abc%' limit 5;

TSQL find records with different writing in another table

I have a "MasterTable" with the following records:
MasterTable:
Col1
PX02894
PX02895
PX02896
PX02897/98
From the lookup table I want to get the Col2 links, keeping the formatting of the MasterTable, represented as the Output table below.
LookupTable:
Col1 Col2
PX02894-PX02895 Link001
PX02896 Link002
PX02897-PX02898 Link003
OutputTable:
Col1 Col2
PX02894 Link001
PX02895 Link001
PX02896 Link002
PX02897/98 Link003
As you can see the writing is different "/" and "-".
I've tried with
len(col1) > 7 THEN LEFT(col1,5) + RIGHT(col1,2)
but that's wrong. Do I need a Union first?
Here's a Fiddle
What do I need to do here? Thanks in advance.
select m.col1,l.col2
From MasterTable m
inner join linkTable l
On (Substring(m.col1,1,7) = SubString(l.col1,1,7)) or (Substring(m.col1,1,7) = Substring(l.col1,9,7))
should do it as long as you can trust the data formating. if not a few more checks e.g
Substring(l.col1,8,1) = '-'
Try this --
This works in sybase-- if you using SQL server find alternate functions
select substring(col1,1,(charindex("-",col1)-1)),
UNION
select substring(col1,(charindex("-",col1)+1),char_length(col1))
Thanks,
Gopal