PostgreSQL - Pattern Match - String to Sub-string - postgresql

I am trying to join two tables within a database, based upon a matching postcode, but am struggling where there are multiple postcodes relating to a single row of data.
i.e. table 1 has 2 columns (a unique ID and postcodes). It is possible for a record to have just a single postcode in this column or multiple postcodes in comma-separated form.
table 2 also has two columns (development description and postcode). In this table the postcode column can have only one postcode.
I would like to identify & join where the postcode from table 2 matches or is included within the relevant column in table 1. I have been able to do so where there is a single postcode within each column, but am currently unable to do so where there are multiple postcodes in table 1.
The below code brings back the matches where there is a single postcode.
SELECT t1.id,
t1.postcodes,
t2.dev_description,
t2.postcode
FROM table1 AS t1
INNER JOIN table2 AS t2
ON t2.postcode LIKE t1.postcodes
WHERE t2.postcode = 'XXX XXX'
I have tried using '%'|| ||'%' and various other functions, but am at a bit of a loss to be honest.
If someone could help it would be great!
Thanks

You could join on:
',' || t1.postcodes || ',' like '%,' || t2.postcode || ',%'
This would expand to:
',1234AB,2345AB,3456AB,' like '%,1234AB,%'
Or you can use string_to_array and the #> contains operator:
string_to_array(t1.postcodes, ',') #> array[t2.postcode]
This expands to:
array['1234AB','2345AB','3456AB'] #> array['1234AB']

Hmmm, I've never joined two tables using ON and LIKE... Anyway, look up the command STRPOS.
Something like this perhaps:
...
OR (STRPOS(t1.postcodes, t2.postcode) > 0)
...

Related

Using a list as replacement for singular patterns in regexp_replace

I have a table that I need to delete random words/characters out of. To do this, I have been using a regexp_replace function with the addition of multiple patterns. An example is below:
select regexp_replace(combined,'\y(NAME|001|CONTAINERS:|MT|COUNT|PCE|KG|PACKAGE)\y','', 'g')
as description, id from export_final;
However, in the full list, there are around 70 different patterns that I replace out of the description. As you can imagine, the code if very cluttered: This leads me to my question. Is there a way to put these patterns into another table then use that table to check the descriptions?
Of course. Populate your desired 'other' table with what patterns you need. Then create a CTE that uses string_agg function to build the regex. Example:
create table exclude_list( pattern_word text);
insert into exclude_list(pattern_word)
values('NAME'),('001'),('CONTAINERS:'),('MT'),('COUNT'),('PCE'),('KG'),('PACKAGE');
with exclude as
( select '\y(' || string_agg(pattern_word,'|') || ')\y' regex from exclude_list )
-- CTE simulates actual table to provide test data
, export_final (id,combined) as (values (0,'This row 001 NAME Main PACKAGE has COUNT 3 units'),(1,'But single package can hold 6 KG'))
select regexp_replace(combined,regex,'', 'g')
as description, id
from export_final cross join exclude;

Snowflake invalid identifier when performin a join

I have been trying to do an outer join across two different tables in two different schemas. I am trying to filter out before from the table variants the character that are smaller than 4 and bigger than 5 digits. The join was not working with a simply where clause in the end, hence this decision.
The problem is if I do not put the quotes, Snowflake will say that I put invalid identifiers. However, when I run this with the quotes, it works but I get as values in the fields of the column raw.stitch_heroku.spree_variants.SKU only named as the column name, all across the table!
SELECT
analytics.dbt_lcasucci.product_category.product_description,
'raw.stitch_heroku.spree_variants.SKU'
FROM analytics.dbt_lcasucci.product_category
LEFT JOIN (
SELECT * FROM raw.stitch_heroku.spree_variants
WHERE LENGTH('raw.stitch_heroku.spree_variants.SKU')<=5
and LENGTH('raw.stitch_heroku.spree_variants.SKU')>=4
) ON 'analytics.dbt_lcasucci.product_category.product_id'
= 'raw.stitch_heroku.spree_variants.SKU'
Is there a way to work this around? I am confused and have not found this issue on forums yet!
thx in advance
firstly single quote define a string literal 'this is text' where as double quotes are table/column names "this_is_a_table_name"
add alias's to the tables makes the SQL more readable, and the duplicate length command can be reduced with a between, thus this should work better:
SELECT pc.product_description,
sp.SKU
FROM analytics.dbt_lcasucci.product_category AS PC
LEFT JOIN (
SELECT SKU
FROM raw.stitch_heroku.spree_variants
WHERE LENGTH(SKU) BETWEEN 4 AND 5
) AS sp
ON pc.product_id = sp.SKU;
So I reduced the sub-selects results as you only used sku from sp but given you are comparing product_id to sku as your example exists you don't need to join to sp.
the invalid indentifiers implies to me something is named incorrectly, the first step there is to check the tables exist and the columns are named as you expect and the type of the columns are the same for the JOIN x ON y clause via:
describe table analytics.dbt_lcasucci.product_category;
describe table raw.stitch_heroku.spree_variants;

PostgreSQL select uniques from three different columns

I have one large table 100m+ rows and two smaller ones 2m rows ea. All three tables have a column of company names that need to be sent out to an API for matching. I want to select the strings from each column and then combine into a single column of unique strings.
I'm using a version of this response, but unsurprisingly the performance is very slow. Combined 2 columns into one column SQL
SELECT DISTINCT
unnest(string_to_array(upper(t.buyer) || '#' || upper(a.aw_supplier_name) || '#' || upper(b.supplier_source_string), '#'))
FROM
tenders t,
awards a,
banking b
;
Any ideas on a more performant way to achieve this?
Update: the banking table is the largest table with 100m rows.
Assuming PostgreSQL 9.6 and borrowing the select from rd_nielsen's answer, the following should give you a comma delimited string of the distinct names.
WITH cte
AS (
SELECT UPPER(T.buyer) NAMES
FROM tenders T
UNION
SELECT UPPER(A.aw_supplier_name) NAMES
FROM awards A
UNION
SELECT UPPER(b.supplier_source_string) NAMES
FROM banking b
)
SELECT array_to_string(ARRAY_AGG(cte.names), ',')
FROM cte
To get just a list of the combined names from all three tables, you could instead union together the selections from each table, like so:
select
upper(t.buyer)
from
tenders t
union
select
upper(a.aw_supplier_name)
from
awards a
union
select
upper(b.supplier_source_string)
from
banking b
;

Concatenate all items with matching FKey's

I have a table that has a PKey, a FKey, a LineNum, and a TextLine.
In my table, I have multiple results from the FKey. It's a 1 to many relationship.
What I want to do is have the TextLines that match the FKey be concatenated into a single row. (The reason for this is that we're converting from an old COBOL database to T-SQL, and transferring the information to a new database with a different structure, where these "Comments" will all be handled by a single field)
My end query will look something like this:
SELECT Fkey, Line1 + Line2...,
FROM Table1
The issue is that there is a non-consistent number of lines. In addition, I'm trying to avoid any dynamic queries, because I want un-trained/basic users to be able to modify and customize this query. Is there any way to do this?
You could do something like this to get all the data in a single row:
select
t.FKey,
STUFF((SELECT ',' + textline
from Table1 where FKey = t.FKey
FOR XML PATH('')), 1, 1, '') as ConcatTextLines
from
Table1 t
group by t.FKey
There will be some size limitations on the ConcatTextLines column so this may not be applicable if you have thousands of line for some foreign keys.

a dual variable not in statement?

I have the need to look at two tables that share two variables and get a list of the data from one table that does not have matching data in the other table. Example:
Table A
xName
Date
Place
xAmount
Table B
yName
Date
Place
yAmount
I need to be able to write a query that will check Table A and find entries that have no corresponding entry in Table B. If it was a one variable issue I could use not in statement but I can't think of a way to do that with two variables. A left join also does not appear like you could do it. Since looking at it by a specific date or place name would not work since we are talking about thousands of dates and hundreds of place names.
Thanks in advance to anyone who can help out.
SELECT TableA.Date,
TableA.Place,
TableA.xName,
TableA.xAmount,
TableB.yName,
TableB.yAmount
FROM TableA
LEFT OUTER JOIN TableB
ON TableA.Date = TableB.Date
AND TableA.Place = TableB.Place
WHERE TableB.yName IS NULL
OR TableB.yAmount IS NULL
SELECT * FROM A WHERE NOT EXISTS
(SELECT 1 FROM B
WHERE A.xName = B.yName AND A.Date = B.Date AND A.Place = B.Place AND A.xAmount = B.yAmount)
in ORACLE:
select xName , xAmount from tableA
MINUS
select yName , yAmount from tableB