Breaking strings in a column by using a sliding window in PostgreSQL - postgresql

How can I break each string in below column by using a sliding window in PostgreSQL.
Input
Column
TTTTACAATATAGCCAC
TTTGAAGAAAACATGCA
TTTCATACGGCTAGCGG
TTTAGTCTGTATGCTTG
For first string the expected output is below (sliding window = 9). I am expecting such output for every string of the column.
Output
TTTTACAAT
TTTACAATA
TTACAATAT
TACAATATA
ACAATATAG
CAATATAGC
AATATAGCC
ATATAGCCA
TATAGCCAC
Thanks

The generate_series function is your friend here.
https://www.postgresql.org/docs/current/functions-srf.html
Firstly you will need to split your string as such
WITH split AS(
SELECT generate_series(1, length('TTTTACAATATAGCCAC') - 8) AS start
)
SELECT substring('TTTTACAATATAGCCAC', split.start, 9)
FROM split;
Then, assuming you are getting it from a table, your query would go something like this.
WITH split AS(
SELECT
your_table_column as text,
generate_series(1, length(your_table_column) - 8) AS start
FROM your_table_name
)
SELECT substring(text, split.start, 9)
FROM split;
This will not display any columns that are below 9 characters, so other logic will need to be applied.

Related

Redshift how to split a stringified array into separate parts

Say I have a varchar column let's say religions that looks like this: ["Christianity", "Buddhism", "Judaism"] (yes it has a bracket in the string) and I want the string (not array) split into multiple rows like "Christianity", "Buddhism", "Judaism" so it can be used in a WHERE clause.
Eventually I want to use the results of the query in a where clause like this:
SELECT ...
FROM religions
WHERE name in
(
<this subquery>
)
How can one do this?
You can use the function JSON_PARSE to convert the varchar string into an array. Then you can use the strategy described in Convert varchar array to rows in redshift - Stack Overflow to convert the array to separate rows.
You can do the following.
Create a temporary table with sequence of numbers
Using the sequence and split_part function available in redshift, you can split the values based on the numbers generated in the temporary table by doing a cross join.
To replace the double quote and square brackets, you can use the regexp_replace function in Redshift.
create temp table seq as
with recursive numbers(NUMBER) as
(
select 1 UNION ALL
select NUMBER + 1 from numbers where NUMBER < 28
)
select * from numbers;
select regexp_replace(split_part(val,',',seq.number),'[]["]','') as value
from
(select '["christianity","Buddhism","Judaism"]' as val) -- You can select the actual column from the table here.
cross join
seq
where seq.number <= regexp_count(val,'[,]')+1;

PostgreSQL calculate prefix combinations after split

I do have a string as entry, of the form foo:bar:something:221. I'm looking for a way to generate a table with all prefixes for this string, like:
foo
foo:bar
foo:bar:something
foo:bar:something:221
I wrote the following query to split the string, but can't figure out where to go from there:
select unnest(string_to_array('foo:bar:something:221', ':'));
An option is to simulate a loop over all elements, then take the sub-array from the input for each element index:
with data(input) as (
values (string_to_array('foo:bar:something:221', ':'))
)
select array_to_string(input[1:g.idx], ':')
from data
cross join generate_series(1, cardinality(input)) as g(idx);
generate_series(1, cardinality(input)) generates as many rows as the array has elements. And the expression input[1:g.idx] takes the "sub-array" starting with the first up to the "idx" one. As the output is an array, I use array_to_string to re-create the representation with the :
You can use string_agg as a window function. The default frame is from the beginning of the partition to the current row:
SELECT string_agg(s, ':') OVER (ORDER BY n)
FROM unnest(string_to_array('foo:bar:something:221', ':')) WITH ORDINALITY AS u(s, n);
string_agg
-----------------------
foo
foo:bar
foo:bar:something
foo:bar:something:221
(4 rows)

Querying part of a string

I am trying to query a column which contains string such as
595.1,N30.10
630.5,E10
I have tried separating the two values into different columns
split_part(code, ',', 1) AS code1,
split_part(code, ',', 2) AS code2
But now I see that some of the rows have 3 (or could be more)
785.59, R57.1, R
I wonder if there is a way to specify and query only the first part of the string without having to split the string. In this case only look for enteries with 595.1,785.59 and ignore the rest.
SELECT distinct ON (id) id,time,year,code
FROM data
where code= ANY('{595.1,785.59}');
I think you already the have logic you need, you only need to thread it together:
SELECT DISTINCT ON (id) id, time, year, code
FROM data
WHERE split_part(code, ',', 1) = ANY('{595.1,785.59}');
This logic appears to be working in the demo below.
Demo
To query on first part of code column you can do like below
SELECT * FROM tableName
WHERE split_part(code, ',', 1) = somevalue

Substring SQL Select statement

I have a number of references with a length of 20 and I need to remove the 1st 12 numbers, replace with a G and select the next 7 numbers
An example of the format of the numbers being received
50125426598525412584
I then need to remove first 12 digits and select the next 7 (not including the last)
2541258
Lastly I need to put a G in front of the number so I'm left with
G25412584
My SQL is as follows:
SELECT SUBSTRING(ref, 12, 7) AS ref
FROM mytable
WHERE ref LIKE '5012%'
The results of this will leave me with
25412584
But how do I insert the G in front of the number in the same SQL statement?
Many thanks
SELECT 'G'+SUBSTRING(ref, 12, 7) AS ref FROM mytable where ref like '5012%'
SELECT CONCAT( 'G', SUBSTRING('50125426598525412584', 13,7)) from dual;

select first letter of different columns in oracle

I want a query which will return a combination of characters and number
Example:
Table name - emp
Columns required - fname,lname,code
If fname=abc and lname=pqr and the row is very first of the table then result should be code = ap001.
For next row it should be like this:
Fname = efg, lname = rst
Code = er002 and likewise.
I know that we can use substr to retrieve first letter of a colume but I don't know how to use it to do with two columns and how to concatenate.
OK. You know you can use substr function. Now, to concatenate you will need a concatenation operator ||. To get the number of row retrieved by your query, you need the rownum pseudocolumn. Perhaps you will also need to use to_char function to format the number. About all those functions and operators you can read in SQL reference. Anyway I think you need something like this (I didn't check it):
select substr(fname, 1, 1) || substr(lname, 1, 1) || to_char(rownum, 'fm009') code
from emp