How can I break each string in below column by using a sliding window in PostgreSQL.
Input
Column
TTTTACAATATAGCCAC
TTTGAAGAAAACATGCA
TTTCATACGGCTAGCGG
TTTAGTCTGTATGCTTG
For first string the expected output is below (sliding window = 9). I am expecting such output for every string of the column.
Output
TTTTACAAT
TTTACAATA
TTACAATAT
TACAATATA
ACAATATAG
CAATATAGC
AATATAGCC
ATATAGCCA
TATAGCCAC
Thanks
The generate_series function is your friend here.
https://www.postgresql.org/docs/current/functions-srf.html
Firstly you will need to split your string as such
WITH split AS(
SELECT generate_series(1, length('TTTTACAATATAGCCAC') - 8) AS start
)
SELECT substring('TTTTACAATATAGCCAC', split.start, 9)
FROM split;
Then, assuming you are getting it from a table, your query would go something like this.
WITH split AS(
SELECT
your_table_column as text,
generate_series(1, length(your_table_column) - 8) AS start
FROM your_table_name
)
SELECT substring(text, split.start, 9)
FROM split;
This will not display any columns that are below 9 characters, so other logic will need to be applied.
I am trying to query a column which contains string such as
595.1,N30.10
630.5,E10
I have tried separating the two values into different columns
split_part(code, ',', 1) AS code1,
split_part(code, ',', 2) AS code2
But now I see that some of the rows have 3 (or could be more)
785.59, R57.1, R
I wonder if there is a way to specify and query only the first part of the string without having to split the string. In this case only look for enteries with 595.1,785.59 and ignore the rest.
SELECT distinct ON (id) id,time,year,code
FROM data
where code= ANY('{595.1,785.59}');
I think you already the have logic you need, you only need to thread it together:
SELECT DISTINCT ON (id) id, time, year, code
FROM data
WHERE split_part(code, ',', 1) = ANY('{595.1,785.59}');
This logic appears to be working in the demo below.
Demo
To query on first part of code column you can do like below
SELECT * FROM tableName
WHERE split_part(code, ',', 1) = somevalue
Example
I have a string...
'/this/is/a/given/string/test.file'.
How can I get substring 'given/string/test.file' in PSQL?
Thank you!
You can use a regular expression
with example(str) as (
values('/this/is/a/given/string/test.file')
)
select regexp_replace(str, '(/.*?){4}', '')
from example;
regexp_replace
------------------------
given/string/test.file
(1 row)
or the function string_to_array():
select string_agg(word, '/' order by ord)
from example,
unnest(string_to_array(str, '/')) with ordinality as u(word, ord)
where ord > 4;
Read also How to find the 3rd occurrence of a pattern on a line.
I dont know how to get the nth occurence of a substring, but for this problem, you can use regular expression. Like this:
select substring('/this/is/a/given/string/test.file' from '/[^/]+/[^/]+/[^/]+/(.*)')
You can improve the regular expression, this is just for demo purpose.
I've been given a table with a few fields that hold comma-separated values (either blank or Y/N) like so (and the field name where this data is stored is People_Notified):
Y,,N,
,Y,,N
,,N,Y
Each 'slot' relates to a particular field value and I need to now include that particular field name in the string as well (in this case Parent, Admin, Police and Medical) but inserting a "N" if the current value is blank but leaving the existing Y's and N's in place. So for the above example, where there are four known slots, I would want a tsql statement to end up with:
Parent=Y,Admin=N,Police=N,Medical=N
Parent=N,Admin=Y,Police=N,Medical=N
Parent=N,Admin=N,Police=N,Medical=Y
I tried to use a combination of CHARINDEX and CASE but haven't figured a way to make this work.
js
Although a bit messy, in theory can be done in one statement:
select
'Parent=' +stuff((stuff((stuff(
substring((replace(
(','+(replace((replace(#People_Notified,',,,',',N,N,')),',,',',N,'))+','),',,',',N,')),2,7),7,0,
'Medical=')),5,0,'Police=')),3,0,'Admin=')
broken down is easier to follow:
declare #People_Notified varchar(100)=',,Y,Y' -- test variable
-- Insert Ns
set #People_Notified= (select replace(#People_Notified,',,,',',N,N,')) -- case two consecutive missing
set #People_Notified= (select replace(#People_Notified,',,',',N,')) -- case one missing
set #People_Notified= (select replace((','+#People_Notified+','),',,',',N,')) -- case start or end missing
set #People_Notified= substring(#People_Notified,2,7) -- remove extra commas added previously
-- Stuff the labels
select 'Parent=' +stuff((stuff((stuff(#People_Notified,7,0,'Medical=')),5,0,'Police=')),3,0,'Admin=')
If you're able to use XQuery in SQL Server, I don't think you need to get too complex. You could do something like this:
SELECT CONVERT(XML, REPLACE('<pn>' + REPLACE(People_Notified, ',', '</pn><pn>') + '</pn>', '<pn></pn>', '<pn>N</pn>')).query('
concat("Parent=", data(/pn[1])[1], ",Admin=", data(/pn[2])[1], ",Police=", data(/pn[3])[1], ",Medical=", data(/pn[4])[1])
')
FROM ...
Explanation: Construct an XML-like string out of the original delimited string by replacing commas with closing and opening tags. Add an opening tag to the start and a closing tag to the end. Replace each empty element with one containing "N". Convert the XML-like string into actual XML data so that you can use XQuery. Then just concatenate what you need using concat() and the right indexes for the elements' data.
Here's one way to do it:
;WITH cteXML (Id, Notified)
AS
(
SELECT Id,
CONVERT(XML,'<Notified><YN>'
+ REPLACE([notified],',', '</YN><YN>')
+ '</YN></Notified>') AS Notified
FROM People_Notified
)
select id,
'Parent=' + case Notified.value('/Notified[1]/YN[1]','varchar(1)') when '' then 'N' else Notified.value('/Notified[1]/YN[1]','varchar(1)') end + ',' +
'Admin=' + case Notified.value('/Notified[1]/YN[2]','varchar(1)') when '' then 'N' else Notified.value('/Notified[1]/YN[2]','varchar(1)') end + ',' +
'Police=' + case Notified.value('/Notified[1]/YN[3]','varchar(1)') when '' then 'N' else Notified.value('/Notified[1]/YN[3]','varchar(1)') end + ',' +
'Medical=' + case Notified.value('/Notified[1]/YN[4]','varchar(1)') when '' then 'N' else Notified.value('/Notified[1]/YN[4]','varchar(1)') end Notified
from cteXML
SQL Fiddle
Check this page out for an explanation of what the XML stuff is doing.
This page has a pretty thorough look at the various ways you can split a delimited string into rows.
In short, I am looking for a single recursive query that can perform multiple replaces over one string. I have a notion it can be done, but am failing to wrap my head around it.
Granted, I'd prefer the biz-layer of the application, or even the CLR, to do the replacing, but these are not options in this case.
More specifically, I want to replace the below mess - which is C&P in 8 different stored procedures - with a TVF.
SET #temp = REPLACE(RTRIM(#target), '~', '-')
SET #temp = REPLACE(#temp, '''', '-')
SET #temp = REPLACE(#temp, '!', '-')
SET #temp = REPLACE(#temp, '#', '-')
SET #temp = REPLACE(#temp, '#', '-')
-- 23 additional lines reducted
SET #target = #temp
Here is where I've started:
-- I have a split string TVF called tvf_SplitString that takes a string
-- and a splitter, and returns a table with one row for each element.
-- EDIT: tvf_SplitString returns a two-column table: pos, element, of which
-- pos is simply the row_number of the element.
SELECT REPLACE('A~B!C#D#C!B~A', MM.ELEMENT, '-') TGT
FROM dbo.tvf_SplitString('~-''-!-#-#', '-') MM
Notice I've joined all the offending characters into a single string separated by '-' (knowing that '-' will never be one of the offending characters), which is then split. The result from this query looks like:
TGT
------------
A-B!C#D#C!B-A
A~B!C#D#C!B~A
A~B-C#D#C-B~A
A~B!C-D-C!B~A
A~B!C#D#C!B~A
So, the replace clearly works, but now I want it to be recursive so I can pull the top 1 and eventually come out with:
TGT
------------
A-B-C-D-C-B-A
Any ideas on how to accomplish this with one query?
EDIT: Well, actual recursion isn't necessary if there's another way. I'm pondering the use of a table of numbers here, too.
You can use this in a scalar function. I use it to remove all control characters from some external input.
SELECT #target = REPLACE(#target, invalidChar, '-')
FROM (VALUES ('~'),(''''),('!'),('#'),('#')) AS T(invalidChar)
I figured it out. I failed to mention that the tvf_SplitString function returns a row number as "pos" (although a subquery assigning row_number could also have worked). With that fact, I could control cross join between the recursive call and the split.
-- the cast to varchar(max) matches the output of the TVF, otherwise error.
-- The iteration counter is joined to the row number value from the split string
-- function to ensure each iteration only replaces on one character.
WITH XX AS (SELECT CAST('A~B!C#D#C!B~A' AS VARCHAR(MAX)) TGT, 1 RN
UNION ALL
SELECT REPLACE(XX.TGT, MM.ELEMENT, '-'), RN + 1 RN
FROM XX, dbo.tvf_SplitString('~-''-!-#-#', '-') MM
WHERE XX.RN = MM.pos)
SELECT TOP 1 XX.TGT
FROM XX
ORDER BY RN DESC
Still, I'm open to other suggestions.