TSQL Select comma list to rows - tsql

How do I turn a comma list field in a row and display it in a column?
For example,
ID | Colour
------------
1 | 1,2,3,4,5
to:
ID | Colour
------------
1 | 1
1 | 2
1 | 3
1 | 4
1 | 5

The usual way to solve this is to create a split function. You can grab one from Google, for example this one from SQL Team. Once you have created the function, you can use it like:
create table colours (id int, colour varchar(255))
insert colours values (1,'1,2,3,4,5')
select colours.id
, split.data
from colours
cross apply
dbo.Split(colours.colour, ',') as split
This prints:
id data
1 1
1 2
1 3
1 4
1 5

Another possible workaround is to use XML (assuming you are working with SQL Server 2005 or greater):
DECLARE #s TABLE
(
ID INT
, COLOUR VARCHAR(MAX)
)
INSERT INTO #s
VALUES ( 1, '1,2,3,4,5' )
SELECT s.ID, T.Colour.value('.', 'int') AS Colour
FROM ( SELECT ID
, CONVERT(XML, '<row>' + REPLACE(Colour, ',', '</row><row>') + '</row>') AS Colour
FROM #s a
) s
CROSS APPLY s.Colour.nodes('row') AS T(Colour)

I know this is an older post but thought I'd add an update. Tally Table and cteTally table based splitters all have a major problem. They use concatenated delimiters and that kills their speed when the elements get wider and the strings get longer.
I've fixed that problem and wrote an article about it which may be found at he following URL. http://www.sqlservercentral.com/articles/Tally+Table/72993/
The new method blows the doors off of all While Loop, Recursive CTE, and XML methods for VARCHAR(8000).
I'll also tell you that a fellow by the name of "Peter" made an improvement even to that code (in the discussion for the article). The article is still interesting and I'll be updating the attachments with Peter's enhancements in the next day or two. Between my major enhancement and the tweek Peter made, I don't believe you'll find a faster T-SQL-Only solution for splitting VARCHAR(8000). I've also solved the problem for this breed of splitters for VARCHAR(MAX) and am in the process of writing an article for that, as well.

Related

User Sessions | Month's Since Last Active Using SQL

UserID
CalMonth
ActiveFlag
Months_since_last_active
A
1/1/2021
1
0
A
2/1/2021
1
A
3/1/2021
2
A
4/1/2021
1
0
B
1/1/2021
1
0
B
2/1/2021
1
B
3/1/2021
1
0
Problem --> The first 3 colums are given. Generate the last one 'Months_since_last_active' by adding 1 until the use is active again
My Solution as below:
With active_sessions as (
Select
User_Id
, CalMonth
, active flag as current_flag
, LAG (ActiveFlag,1) over (partition by User_Id order by CalMonth) as previous_flag
)
Select User_Id, CalMonth, current_flag, sum(case when current_flag =1 then 0
when current_flag IS NULL then Months_since_last_active + 1
END
) as Months_since_last_active
from active_sessions
order by 1,2
I was asked the above question in an interview and told that my proposed solution would not work because:
When it comes to 3/1/2021 and beyond, the previous values of 'Months_since_last_active' are not in the table yet -- they are only in the code
If I wanted to use LAG function, then it'd take innumerable LAG functions to achieve what I was trying to achieve
I will appreciate if someone can comment on my solution.
Your solution has 3 major problems, 2 of them may be related to copy/past errors. The active_sessions CTE is missing the from clause, so there is no data source. Then the main portion uses the aggregate function SUM, however, the query has no group by which is required for the aggregate function. These are easily corrected. The other issue concerns the LAG function and your use of it.
First off in the CTE you alias the result as previous_flag, then in the main query you reference Months_since_last_active which does not exist yet. I think this is the source of the interviewer's first point.
The interviewer's second point also stems form the LAG function. As written it always looks back exactly 1 row, but from the current row yet it needs to look back 2 rows for (userid, calmonth) = ('A', 2021-03-01), and 3 rows for (A, 2021-04-01), etc. Basically you need to look back to to the last row with active_flag = 1. This leads directly to the it'd take innumerable LAG functions as you do not know how far beck you need to look. Suppose you had 30-40 or more inactive rows between active rows. You need a LAG(activeflag,n) ... for each possibility.
A solution. I dislike the problem statement it should not contain by adding 1 until the use is active again (is it yours or theirs). Either way this is an XY. If theirs they should be telling you what to solve, i.e. find number of months since last active. If yours you have created the problem for yourself. The problem statement should not say anything about how to solve the it. I will ignore that portion of the problem (And in a real interview I would/have ignored it, but be prepared to explain why).
What you have a a version of a Gaps And Islands (google it, you will find more that to think about). In this version lets consider each row with activeflag = 'Y' an as island, and anything else as a gap. Nor what you are looking for is the length of the gaps between islands. In the following the island_num CTE does 2 things. It assigns a sequence number to each row for a (userid, calmonth) and generates a boolean for each island. The `gap_points' then joins the results with itself, selecting the assigned for the max island whose calmonth is less than the current rows calmonth. In the main part the Months_since_last_active is assigned 0 if the current row is an island, and the difference between the generated row numbers if it is a gap. (see demo)
with island_num (userid, cal_month, active_flag, is_island, row_num) as
( select am.*
, case when am.activeflag = 1 then true else false end is_island
, row_number() over (partition by am.userid order by am.calmonth) rn
from active_month am
) -- select * from island_num
, gap_points(userid, cal_month, active_flag, is_island, row_num, island_row) as
( select *
from island_num i1
join lateral
(select max(row_num)
from island_num i2
where i1.userid = i2.userid
and i2.cal_month < i1.cal_month
and i2.is_island
) s0
on true
) --select * from gap_points;
select userid "User Id"
, cal_month "Cal Month"
, active_flag "Active Flag"
, case when is_island then 0
else row_num - island_row
end "Months_since_last_active"
from gap_points;

How to combine PostgreSQL text search with fuzzystrmatch

I'd like to be able to query words from column of type ts_vector but everything which has a levenshtein distance below X should be considered a match.
Something like this where my_table is:
id | my_ts_vector_colum | sentence_as_text
------------------------------------------------------
1 | 'bananna':3 'tasty':2 'very':1 | Very tasty bananna
2 | 'banaana':2 'yellow':1 | Yellow banaana
3 | 'banana':2 'usual':1 | Usual banana
4 | 'baaaanaaaanaaa':2 'black':1 | Black baaaanaaaanaaa
I want to query something like "Give me id's of all rows, which contain the word banana or words similar to banana where similar means that its Levenshtein distance is less than 4". So the result should be 1, 2 and 3.
I know i can do something like select id from my_table where my_ts_vector_column ## to_tsquery('banana');, but this would only get me exact matches.
I also know i could do something like select id from my_table where levenshtein(sentence_as_text, 'banana') < 4;, but this would work only on a text column and would work only if the match would contain only the word banana.
But I don't know if or how I could combine the two.
P.S. Table where I want to execute this on contains about 2 million records and the query should be blazing fast (less than 100ms for sure).
P.P.S - I have full control on the table's schema, so changing datatypes, creating new columns, etc would be totally feasible.
2 million short sentences presumably contains far fewer distinct words than that. But if all your sentences have "creative" spellings, maybe not.
So you can perhaps create a table of distinct words to search relatively quickly with the unindexed distance function:
create materialized view words as
select distinct unnest(string_to_array(lower(sentence_as_text),' ')) word from my_table;
And create an exact index into the larger table:
create index on my_table using gin (string_to_array(lower(sentence_as_text),' '));
And then join the together:
select * from my_table join words
ON (ARRAY[word] <# string_to_array(lower(sentence_as_text),' '))
WHERE levenshtein(word,'banana')<4;

Keyword search using PostgreSQL

I am trying to identify observations from my data using a list of keywords.However, the search results contains observations where only part of the keyword matches. For instance the keyword ice returns varices. I am using the following code
select *
from mytab
WHERE myvar similar to'%((ice)|(cool))%';
I tried the _tsquery and it does the exact match and does not include observations with varices. But this approach is taking significantly longer to query. (2 keyword search for similar to '% %' takes 5 secs, whereas _tsquerytakes 30 secs for 1 keyword search.I have more than 900 keywords to search)
select *
from mytab
where myvar ## to_tsquery(('ice'));
Is there a way to query multiple keywords using the _tsquery and any way to speed the querying process.
I'd suggest using keywords in a relational sense rather than having a running list of them under one field, which makes for terrible performance. Instead, you can have a table of keywords with id's as primary keys and have foreign keys referring to mytab's primary keys. So you'd end up with the following:
keywords table
id | mytab_id | keyword
----------------------
1 1 liver
2 1 disease
3 1 varices
4 2 ice
mytab table
id | rest of fields
---------------------
1 ....
2 ....
You can then do an inner join to find what keywords belong to the specified entries in mytab:
SELECT * FROM mytab
JOIN keywords ON keywords.mytab_id = mytab.id
WHERE keyword = 'ice'
You could also add a constraint to make sure the keyword and mytab_id pair is unique, that way you don't accidentally end up with the same keyword for the same entry in mytab.

Build a list of grouped values

I'm new to this page and this is the first time i post a question. Sorry for anything wrong. The question may be old, but i just can't find any answer for SQL AnyWhere.
I have a table like
Order | Mark
======|========
1 | AA
2 | BB
1 | CC
2 | DD
1 | EE
I want to have result as following
Order | Mark
1 | AA,CC,EE
2 | BB,DD
My current SQL is
Select Order, Cast(Mark as NVARCHAR(20))
From #Order
Group by Order
and it just give me with result completely the same with the original table.
Any idea for this?
You can use the ASA LIST() aggregate function (untested, you might need to enclose the order column name into quotes as it is also a reserved name):
SELECT Order, LIST( Mark )
FROM #Order
GROUP BY Order;
You can customize the separator character and order if you need.
Note: it is rather a bad idea to
name your table and column name with like regular SQL clause (Order by)
use the same name for column an table (Order)

Alternative when IN clause is inputed A LOT of values (postgreSQL)

I'm using the IN clause to retrieve places that contains certain tags. For that I simply use
select .. FROM table WHERE tags IN (...)
For now the number of tags I provide in the IN clause is around 500) but soon (in the near future) number tags will probably jump off to easily over 5000 (maybe even more)
I would guess there is some kind of limition in both the size of the query AND in the number values in the IN clause (bonus question for curiosity what is this value?)
So my question is what is a good alternative query that would be future proof even if in the future I would be matching against let's say 10'000 tags ?
ps: I have looked around and see people mentioning "temporary table". I have never used those. How will they be used in my case? Will i need to create a temp table everytime I make a query ?
Thanks,
Francesco
One option is to join this to a values clause
with parms (tag) as (
values ('tag1'), ('tag2'), ('tag3')
)
select t.*
from the_table t
join params p on p.tag = t.tag;
You could create a table using:
tablename
id | tags
----+----------
1 | tag1
2 | tag2
3 | tag3
And then do:
select .. FROM table WHERE tags IN (SELECT * FROM tablename)