T-SQL: Find column match within a string (LIKE but different) - tsql

Server: SQL Server 2008 R2
I apologize in advance, as I'm not sure of the best way to verbalize the question. I'm receiving a string of email addresses and I need to see if, within that string, any of the addresses exist as a user already. The query that obviously doesn't work is shown below, but hopefully it helps to clarify what I'm looking for:
SELECT f_emailaddress
FROM tb_users
WHERE f_emailaddress LIKE '%user1#domain.com,user2#domain.com%'
I was hoping SQL had an "InString" operator, that would check for matches "within the string", but I my Google abilities must be weak today.
Any assistance is greatly appreciated. If there simply isn't a way, I'll have to dig in and do some work in the codebehind to split each item in the string and search on each one.
Thanks in advance,
Beems

Split the input string and use IN clause
to split the CSV to rows use this.
SELECT Ltrim(Rtrim(( Split.a.value('.', 'VARCHAR(100)') )))
FROM (SELECT Cast ('<M>'
+ Replace('user1#domain.com,user2#domain.com', ',', '</M><M>')
+ '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)
Now use the above query in where clause.
SELECT f_emailaddress
FROM tb_users
WHERE f_emailaddress IN(SELECT Ltrim(Rtrim(( Split.a.value('.', 'VARCHAR(100)') )))
FROM (SELECT Cast ('<M>'
+ Replace('user1#domain.com,user2#domain.com', ',', '</M><M>')
+ '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a))
Or use can use Inner Join
SELECT f_emailaddress
FROM tb_users A
JOIN (SELECT Ltrim(Rtrim(( Split.a.value('.', 'VARCHAR(100)') )))
FROM (SELECT Cast ('<M>'
+ Replace('user1#domain.com,user2#domain.com', ',', '</M><M>')
+ '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)) B
ON a.f_emailaddress = b.f_emailaddress

You first need to split the CSV list into a temp table and then use that to INNER JOIN with your existing table, as that will act as a filter.
You cannot use CONTAINS unless you have created a Full Text index on that table and column, which I doubt is the case here.
For example:
CREATE TABLE #EmailAddresses (Email NVARCHAR(500) NOT NULL);
INSERT INTO #EmailAddress (Email)
SELECT split.Val
FROM dbo.Splitter(#IncomingListOfEmailAddresses);
SELECT usr.f_emailaddress
FROM tb_users usr
INNER JOIN #EmailAddresses tmp
ON tmp.Email = usr.f_emailaddress;
Please note that the reference to "dbo.Splitter" is a placeholder for whatever string splitter you already have or might get. Please do not use any splitter that makes use of a WHILE loop. The best options are either the SQLCLR- or XML- based ones. The XML-based ones are generally fast but do have some issues with encoding if the string to be split has special XML characters such as &, <, or ". If you want a quick and easy SQLCLR-based splitter, you can download the Free version of the SQL# library (which I am the creator of, but this feature is in the free version) which contains String_Split and String_Split4k (for when the input is always <= 4000 characters).

SQL has a CONTAINS and an IN function. You can use either of those to accomplish your task. Click on either for more information via MSDNs website! Hope this helps.
CONTAINS
CONTAINS will look to see if any values in your data contain the entire string you provided. Kind of similar in presentations to LIKE '%myValue%';
SELECT f_emailaddress
FROM tb_users
WHERE CONTAINS (f_emailaddress, 'user1#domain.com');
IN
IN will return matches for any values in the provided comma delimited list. They need to be exact matches however. You can't provide partial terms.
SELECT f_emailaddress
FROM tb_users
WHERE f_emailaddress IN ('user1#domain.com','user2#domain.com')
As far as splitting each of the values out into separate strings, have a look at the StackOverflow question found HERE. This might point you in the proper direction.

You can try like this(not tested).
Before using this, make sure that you have created a Full Text index on that table and column.
Replace your comma with AND then
SELECT id,email
FROM t
where CONTAINS(email, 'user1#domain.com and user2#domain.com');

--prepare temp table for testing
DECLARE #tb_users AS TABLE
(f_emailaddress VARCHAR(100))
INSERT INTO #tb_users
( f_emailaddress)
VALUES ( 'user1#domain.com' ),
( 'user2#domain.com' ),
( 'user3#domain.com' ),
( 'user4#domain.com' )
--Your query
SELECT f_emailaddress
FROM #tb_users
WHERE 'user1#domain.com,user2#domain.com' LIKE '%' + f_emailaddress + '%'

Related

SQL Pivot using a subquery in FOR

Using SQL Server 2016 and referring to this article:
https://www.sqlshack.com/dynamic-pivot-tables-in-sql-server/
That article uses this pivot:
SELECT * FROM (
SELECT
[Student],
[Subject],
[Marks]
FROM Grades
) StudentResults
PIVOT (
SUM([Marks])
FOR [Subject]
IN (
[Mathematics],
[Science],
[Geography]
)
) AS PivotTable
How can you change the query so that the Subjects ([Mathematics], [Science], [Geography]) don't have to be hardcoded in the query?
Can you rather get the Subject list using a subquery? How do you get the FOR to work with a query like this?
...
FOR [Subject]
IN (
SELECT subject FROM grades WHERE student = "Jacob"
)
How can you change the query so that the Subjects ([Mathematics], [Science], [Geography]) don't have to be hardcoded in the query?
You can't; you'll have to form the SQL as a string and execute it dynamically
SQL makes it easy to have a variable number of columns (you just write more words in a SELECT), which then also makes it easy to forget that columns are like properties of an object (and an entire row is like an instance of an object); they aren't something that vary dynamically every time you run a program. As a Person you don't have a Name this week and not next week.
The number of columns output from a query isn't meant to vary; the number of rows is. If you want variable numbers of attributes, you'll have to form them as rows and then have your front end behave differently to account for them (i.e. don't do the pivot). If you can't do this because you have no front end, and you really do need a varying number of columns, you have to write a different SQL each time (which you can do by concatenating together a new SQL string and EXECing it, but be under no illusions - it works because it's a totally different SQL/the programmatic equivalent of you editing your hardcoded query and re-running it)
It looks something like (not tested - consider this pseudocode):
DECLARE #sql VARCHAR(4000) = CONCAT('
SELECT * FROM (
SELECT
[Student],
[Subject],
[Marks]
FROM Grades
) StudentResults
PIVOT (
SUM([Marks])
FOR [Subject]
IN (',
SELECT STRING_AGG(Subject, ',') FROM (SELECT DISTINCT QUOTENAME(Subject) FROM Grades) x,
' )
) AS PivotTable'
) --end concat
EXEC #sql

Using a list as replacement for singular patterns in regexp_replace

I have a table that I need to delete random words/characters out of. To do this, I have been using a regexp_replace function with the addition of multiple patterns. An example is below:
select regexp_replace(combined,'\y(NAME|001|CONTAINERS:|MT|COUNT|PCE|KG|PACKAGE)\y','', 'g')
as description, id from export_final;
However, in the full list, there are around 70 different patterns that I replace out of the description. As you can imagine, the code if very cluttered: This leads me to my question. Is there a way to put these patterns into another table then use that table to check the descriptions?
Of course. Populate your desired 'other' table with what patterns you need. Then create a CTE that uses string_agg function to build the regex. Example:
create table exclude_list( pattern_word text);
insert into exclude_list(pattern_word)
values('NAME'),('001'),('CONTAINERS:'),('MT'),('COUNT'),('PCE'),('KG'),('PACKAGE');
with exclude as
( select '\y(' || string_agg(pattern_word,'|') || ')\y' regex from exclude_list )
-- CTE simulates actual table to provide test data
, export_final (id,combined) as (values (0,'This row 001 NAME Main PACKAGE has COUNT 3 units'),(1,'But single package can hold 6 KG'))
select regexp_replace(combined,regex,'', 'g')
as description, id
from export_final cross join exclude;

Casting rows to arrays in PostgreSQL

I need to query a table as in
SELECT *
FROM table_schema.table_name
only each row needs to be a TEXT[] with array values corresponding to column values casted to TEXT coming in the same order as in SELECT * so assuming the table has columns a, b and c I need the result to look like
SELECT ARRAY[a::TEXT, b::TEXT, c::TEXT]
FROM table_schema.table_name
only it shouldn't explicitly list columns by name. Ideally it should look like
SELECT as_text_array(a)
FROM table_schema.table_name AS a
The best I came up with looks ugly and relies on "hstore" extension
WITH columnz AS ( -- get ordered column name array
SELECT array_agg(attname::TEXT ORDER BY attnum) AS column_name_array
FROM pg_attribute
WHERE attrelid = 'table_schema.table_name'::regclass AND attnum > 0 AND NOT attisdropped
)
SELECT hstore(a)->(SELECT column_name_array FROM columnz)
FROM table_schema.table_name AS a
I am having a feeling there must be a simpler way to achieve that
UPDATE 1
Another query that achieves the same result but arguably as ugly and inefficient as the first one is inspired by the answer by #bspates. It may be even less efficient but doesn't rely on extensions
SELECT r.text_array
FROM table_schema.table_name AS a
INNER JOIN LATERAL ( -- parse ROW::TEXT presentation of a row
SELECT array_agg(COALESCE(replace(val[1], '""', '"'), NULLIF(val[2], ''))) AS text_array
FROM regexp_matches(a::text, -- parse double-quoted and simple values separated by commas
'(?<=\A\(|,) (?: "( (?:[^"]|"")* )" | ([^,"]*) ) (?=,|\)\Z)', 'xg') AS t(val)
) AS r ON TRUE
It is still far from ideal
UPDATE 2
I tested all 3 options existing at the moment
Using JSON. It doesn't rely on any extensions, it is short to write, easy to understand and the speed is ok.
Using hstore. This alternative is the fastest (>10 times faster than JSON approach on a 100K dataset) but requires an extension. hstore in general is very handy extension to have through.
Using regex to parse TEXT presentation of a ROW. This option is really slow.
A somewhat ugly hack is to convert the row to a JSON value, then unnest the values and aggregate it back to an array:
select array(select (json_each_text(to_json(t))).value) as row_value
from some_table t
Which is to some extent the same as your hstore hack.
If the order of the columns is important, then using json and with ordinality can be used to keep that:
select array(select val
from json_each_text(to_json(t)) with ordinality as t(k,val,idx)
order by idx)
from the_table t
The easiest (read hacky-est) way I can think of is convert to a string first then parse that string into an array. Like so:
SELECT string_to_array(table_name::text, ',') FROM table_name
BUT depending on the size and type of the data in the table, this could perform very badly.

Dynamic number of fields in table

I have a problem with TSQL. I have a number of tables, each table contain different number of fielsds with different names.
I need dynamically take all this tables, read all records and manage each record into string list, where each value separated by commas. And do smth. with this string.
I think that I need to use CURSORS, but I can't FETCH em without knowing A concrete amount of fields with names and types. Maybe I can create a table variable with dynamic number of fields?
Thanks a lot!
Makarov Artem.
I would repurpose one of the many T-SQL scripts written to generate INSERT statements. They do exactly what you require. Namely
Reverse engineer a given table to determine columns names and types
Generate a delimited string of values
The most complete example I've found is here
But just a simple Google search for "INSERT STATEMENT GENERATOR" will yield several examples that you can repurpose to fit your needs.
Best of luck!
SELECT
ORDINAL_POSITION
,COLUMN_NAME
,DATA_TYPE
,CHARACTER_MAXIMUM_LENGTH
,IS_NULLABLE
,COLUMN_DEFAULT
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME = 'MYTABLE'
ORDER BY
ORDINAL_POSITION ASC;
from http://weblogs.sqlteam.com/joew/archive/2008/04/27/60574.aspx
Perhaps you can do something with this.
select T2.X.query('for $i in *
return concat(data($i), ",")'
).value('.', 'nvarchar(max)') as C
from (
select *
from YourTable
for xml path('Row'),elements xsinil, type
) as T1(X)
cross apply T1.X.nodes('/Row') T2(X)
It will give you one row for each row in YourTable with each value in YourTable separated by a comma in the column C.
This builds an XML for the entire table and then parses that XML. Might get you into trouble if you have tables with a lot of rows.
BTW: I saw from a comment that you can "use only pure SQL". I really don't think this qualifies as "pure SQL" :).

Easy way to identify required fields in a table

Scenario: Table with over 100 fields (not my doing... I inherited this)
Only 50 these fields are required to be displayed on a web site
They want to maintain the other 50 fields for historical purposes.
There is a possibility that some of the not required fields may become required sometime in the future.
Problem: I'm looking for a way to easily indentify the 50 required fields such that I could pull the field names with a query.
Psuedo Query: Select FieldNames from TableName where Required = Yes
Is there a setting I could change?
What about using Extended Properties?
Thanks in advance for any direction you can provide.
Unless I'm missing a nuance to your question, use the INFORMATION_SCHEMA table for COLUMNS. This query identifies all the columns in table dbo.dummy that are required.
SELECT
IC.COLUMN_NAME
FROM
INFORMATION_SCHEMA.COLUMNS IC
WHERE
IC.TABLE_SCHEMA = 'dbo'
AND IC.TABLE_NAME = 'dummy'
AND IC.IS_NULLABLE = 'NO'
After doing more thinking, perhaps you wanted a generic query that would grab all the required columns and then build out the select query. This query covers that possible request
DECLARE
#hax varchar(max)
, #schemaName sysname
, #tableName sysname
SELECT
#schemaName = 'dbo'
, #tableName = 'dummy'
; WITH A AS
(
-- this query identifies all the columns that are not nullable
SELECT
IC.TABLE_SCHEMA + '.' + IC.TABLE_NAME AS tname
, IC.COLUMN_NAME
FROM
INFORMATION_SCHEMA.COLUMNS IC
WHERE
IC.TABLE_SCHEMA = #schemaName
AND IC.TABLE_NAME = #tableName
AND IC.IS_NULLABLE = 'NO'
)
, COLUMN_SELECT (column_list) AS
(
-- this query concatenates all the column names
-- returned by the above
SELECT STUFF((SELECT '], [' + A.Column_Name
FROM A
FOR XML PATH('')),1, 2, '')
)
-- Use the above to build a query string
SELECT DISTINCT
#hax = 'SELECT ' + CS.column_list + '] FROM ' + A.tname
FROM
A
CROSS APPLY
COLUMN_SELECT CS
-- invoke the query
EXECUTE (#hax)
How about creating a view that only has the required fields.
I am not sure if I understand the question correctly. Is this what you are looking for? The code is in MS SQL.
select t.name as TABLE_NAME, c.name as COLUMN_NAME, c.is_nullable
from sys.tables t
inner join sys.columns c on c.object_id = t.object_id
WHERE t.name = '<TableName>'
and c.is_nullable = 0
There's no flag you can put on a field to determine whether it's relevant or not -- that's what the SELECT list is for. A couple of ideas...
1) Split the historical data out into a separate table, with a one-to-one relationship to the source table.
2) Re-name the historical fields in your table as "OBSOLETE_" + fieldname. This will at least give you a quick visual reference for when you're writing your sql.
3) Create a view. Big drawback to this one would be that you can take some big performance hits as soon as you try to use the view as a table in other queries. But if you're just pulling off it directly without joining it, you should be fine.
We use separate metatables describing all tables and columns in database. We store information like friendly name (for example 'username' column shoud be displayed to user as 'User name'), formating, etc. You could use this approach to store information about required columns.
We have tried object extended properties (sp_addextendedproperty etc.), but metatable(s) solution came up better for us.
Within TSQL this is not easy as you cannot dynamically build the columns in the select line nor the alias name for those columns. The parser and query optimizer need some stuff to be static. Is it an ASP.NET web site? In your development environment (e.g. C#) you could dynamically build the query.