Extract specific columns from SQL Redshift - amazon-redshift

I have the following table in Redshift:
Column
Lant On h1
Grent Off h3
Hasard Varvey On h1
Richie Unknown h1
I would like to have the following outcome:
Column
On
Off
On
Unknown
Utilizing split_part(Column, ' ', 2) is not feasible as the third record would return Varvey instead of on. Is there any way that I could extract the desired values?

You could make use of the REGEXP_SUBSTR as you can apply a Regex pattern to match your use case.
select REGEXP_SUBSTR ('Lant On h1', '(On|Off|Unknown)');
select REGEXP_SUBSTR ('Grent Off h3', '(On|Off|Unknown)');
select REGEXP_SUBSTR ('Hasard Varvey On h1', '(On|Off|Unknown)');
select REGEXP_SUBSTR ('Richie Unknown h1', '(On|Off|Unknown)');
This would pick up the values that are provided within the regex pattern

Related

Using a list as replacement for singular patterns in regexp_replace

I have a table that I need to delete random words/characters out of. To do this, I have been using a regexp_replace function with the addition of multiple patterns. An example is below:
select regexp_replace(combined,'\y(NAME|001|CONTAINERS:|MT|COUNT|PCE|KG|PACKAGE)\y','', 'g')
as description, id from export_final;
However, in the full list, there are around 70 different patterns that I replace out of the description. As you can imagine, the code if very cluttered: This leads me to my question. Is there a way to put these patterns into another table then use that table to check the descriptions?
Of course. Populate your desired 'other' table with what patterns you need. Then create a CTE that uses string_agg function to build the regex. Example:
create table exclude_list( pattern_word text);
insert into exclude_list(pattern_word)
values('NAME'),('001'),('CONTAINERS:'),('MT'),('COUNT'),('PCE'),('KG'),('PACKAGE');
with exclude as
( select '\y(' || string_agg(pattern_word,'|') || ')\y' regex from exclude_list )
-- CTE simulates actual table to provide test data
, export_final (id,combined) as (values (0,'This row 001 NAME Main PACKAGE has COUNT 3 units'),(1,'But single package can hold 6 KG'))
select regexp_replace(combined,regex,'', 'g')
as description, id
from export_final cross join exclude;

CHARINDEX Function is not supported by Tableau

col1=LEFT([col2], CHARINDEX('_', [col2]) - 1)
I am trying to join on columns and I should match col1 data equal to col2 data, but in col2 it should check the characters before the delimiter value '_'
What can I use in the join condition instead on Charindex in Tableau, as this is not supported by Tableau.
col1
abc
dcb
col2
abc_123
dcb_123
You can define a calculated field to use in your joins. The split() function is available and is designed for exactly this purpose. That is you can use split() to obtain the substring before the underscore, which you can use a join key

Subtract multiple strings from one record

I am novice to Postgres queries. I am trying to pull substring from each record of column based on specific set.
Suppose, I substring from each record between keywords 'start' & 'end'. So the thing is it can be multiple occurrences of 'start' & 'end' in one record and need to extract what occurs between each set of 'start' & 'end' keywords.
Do we have possibility to achieve this with single query in Postgres, rather than creating a procedure? If yes, could you please help on this or re-direct me where I can find related information?
Assuming that / always delimits the elements, you can use string_to_array() to convert the string into multiple elements and unnest() to turn the array into a result. You can then use regexp_replace() to get rid of the delimiters in the curly braces:
select d.id, regexp_replace(t.name, '{start}|{end}', '', 'g')
from the_able d
cross join unnest(string_to_array(d.body,'/')) as t(name);
SQLFiddle example: http://sqlfiddle.com/#!15/9eecb7db59d16c80417c72d1e1f4fbf1/8863
You achieve all this using regular expressions, and the PostgreSQL regex functions regexp_matches (to match content between your tags) and regexp_replace (to remove the tags):
with t(id,body) as (values
(1, '{start}John{end}/{start}Jack{end}'),
(2, '{start}David{end}'),
(3, '{start}Ken{end}/{start}Kane{end}/{start}John{end}'))
select id, regexp_replace(
(regexp_matches(body, '{start}.*?{end}', 'g'))[1],
'^{start}|{end}$', '', 'g') matches
from t

Error while using regexp_split_to_table (Amazon Redshift)

I have the same question as this:
Splitting a comma-separated field in Postgresql and doing a UNION ALL on all the resulting tables
Just that my 'fruits' column is delimited by '|'. When I try:
SELECT
yourTable.ID,
regexp_split_to_table(yourTable.fruits, E'|') AS split_fruits
FROM yourTable
I get the following:
ERROR: type "e" does not exist
Q1. What does the E do? I saw some examples where E is not used. The official docs don't explain it in their "quick brown fox..." example.
Q2. How do I use '|' as the delimiter for my query?
Edit: I am using PostgreSQL 8.0.2. unnest() and regexp_split_to_table() both are not supported.
A1
E is a prefix for Posix-style escape strings. You don't normally need this in modern Postgres. Only prepend it if you want to interpret special characters in the string. Like E'\n' for a newline char.Details and links to documentation:
Insert text with single quotes in PostgreSQL
SQL select where column begins with \
E is pointless noise in your query, but it should still work. The answer you are linking to is not very good, I am afraid.
A2
Should work as is. But better without the E.
SELECT id, regexp_split_to_table(fruits, '|') AS split_fruits
FROM tbl;
For simple delimiters, you don't need expensive regular expressions. This is typically faster:
SELECT id, unnest(string_to_array(fruits, '|')) AS split_fruits
FROM tbl;
In Postgres 9.3+ you'd rather use a LATERAL join for set-returning functions:
SELECT t.id, f.split_fruits
FROM tbl t
LEFT JOIN LATERAL unnest(string_to_array(fruits, '|')) AS f(split_fruits)
ON true;
Details:
What is the difference between LATERAL and a subquery in PostgreSQL?
PostgreSQL unnest() with element number
Amazon Redshift is not Postgres
It only implements a reduced set of features as documented in its manual. In particular, there are no table functions, including the essential functions unnest(), generate_series() or regexp_split_to_table() when working with its "compute nodes" (accessing any tables).
You should go with a normalized table layout to begin with (extra table with one fruit per row).
Or here are some options to create a set of rows in Redshift:
How to select multiple rows filled with constants in Amazon Redshift?
This workaround should do it:
Create a table of numbers, with at least as many rows as there can be fruits in your column. Temporary or permanent if you'll keep using it. Say we never have more than 9:
CREATE TEMP TABLE nr9(i int);
INSERT INTO nr9(i) VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9);
Join to the number table and use split_part(), which is actually implemented in Redshift:
SELECT *, split_part(t.fruits, '|', n.i) As fruit
FROM nr9 n
JOIN tbl t ON split_part(t.fruits, '|', n.i) <> ''
Voilá.

How can I sort (order by) in postgres ignoring leading words like "the, a, etc"

I would like to be able to sort (order by) in postgres ignoring leading words like "the, a, etc"
one way: script (using your favorite language) the creation of an extra column of the text with noise words removed, and sort on that.
Add a SORT_NAME column that has all that stuff stripped out. For bonus points, use an input trigger to populate it automatically, using your favorite SQL dialect's regex parser or similar.
Try splitting the column and sorting on the second item in the resulting array:
select some_col from some_table order by split_part(some_col, ' ', 2);
No need to add an extra column. Strip out the leading words in your ORDER BY:
SELECT col FROM table ORDER BY REPLACE(REPLACE(col, 'A ', ''), 'The ', '')