MYSQL WORKBENCH: Separating last 2 characters from a column - mysql-workbench

I am trying to remove the last two characters from a column. The current column that I am targeting has already been created by separating a string, but as you'll see below, it wasn't successful for the 'City' column.
This is how the original looks:
enter image description here
This is what I've been able to output from my code:
| StreetNumber | Street | **City** | State |
|----------------------------------------------------------------------|
| 1808 | FOX CHASE DR | **GOODLETTSVILLE TN** | TN |
| 1832 | FOX CHASE DR | **GOODLETTSVILLE TN** | TN |
| 2005 | SADIE LN | **GOODLETTSVILLE TN** | TN |
actual pic:enter image description here
This is my code:
select substring_index(substring_index(OwnerAddress, ' ', 1), ' ', -1) as StreetNumber,
substring(OwnerAddress, locate(' ', OwnerAddress),
(length(OwnerAddress) - (length(substring_index(OwnerAddress, ' ', 1))
+ length(substring_index(OwnerAddress, ' ', -2))))) as Street,
substring(substring_index(OwnerAddress, ' ', -2) from 1 for length(OwnerAddress)-2) as City,
substring_index(OwnerAddress, ' ', -1) as State
from nashhousing;
The goal is to remove the state abbreviations from the 'City' column because there's a state column already. I thought I could simply -2 for the last two characters but obviously, that didn't work. I hope I've explained my situation clearly, but if not, please let me know. I don't want to give up on this situation but I've been on it for 5 hours already and can't source a solution. Please help and thank you in advance!

To directly answer your question, you are using the wrong field and value for the length portion of the SUBSTRING on the city field.
This should correct your issue:
substring(substring_index(OwnerAddress, ' ', -2), 1, substring_index(OwnerAddress, ' ', -2) - 2) as City,
Or even
substring_index(substring_index(OwnerAddress, ' ', -2), ' ', 1) as City
Please Note:
The major issue with doing things this way is you are assuming every entry is formatted the same, and you're not taking into account cities with two names. ie New York City, Los Angeles, San Francisco, etc.
This is something that you likely need to parse outside of MySQL. Since you only need it to be parsed, you could likely write a decent enough RegEx to handle the majority of the cases. However, if accuracy is your top priority, I would recommend geocoding the data.

Related

T-sql: Highlight invoice numbers if they occur in a payment description field

I have two sql-server tables: bills and payments. I am trying to create a VIEW to highlight the bill numbers if they occur in the payment description field. For example:
TABLE bll
|bllID | bllNum |
| -------- | -------- |
| 1 | qwerty123|
| 2 | qwerty345|
| 3 | 1234 |
TABLE payments
|paymentID | description |
| -------- | ---------------------------------- |
| 1 | payment of qwerty123 and qwerty345 |
I want to highlight both the 'qwerty123' and 'qwerty345' strings by adding html code to it. The code I have is this:
SELECT REPLACE(payments.description,
COALESCE((SELECT TOP 1 bll.bllNum
FROM bll
WHERE COALESCE(bll.bllNum, '') <> '' AND
PATINDEX('%' + bll.bllNum + '%', payments.description) > 0), ''),
'<font color=red>' +
COALESCE((SELECT TOP 1 bll.bllNum
FROM bll
WHERE COALESCE(bll.bllNum, '') <> '' AND
PATINDEX('%' + bll.bllNum + '%', payments.description) > 0), '') +
'</font>')
FROM payments
This works but only for the first occurrence of a bill number. If the description field has more than one bill number, the consecutive bill numbers are not highlighted. So in my example 'qwerty123' gets highlighted, but not 'qwerty345'
I need to highlight all occurrences. How can I accomplish this?
With the caveat that this is not a task best done in the database, one possible approach you could try is to use string_split to break your description into words and then join this to your Bills, doing your string manipulation on matching rows.
Note, according to the documentation, string_split is not 100% guaranteed to retain its correct ordering but always has in my usage. It could always be substituted for an alternative function to work on the same principle.
select string_agg (w.word,' ') [Description]
from (
select
case when exists (select * from bill b where b.billnum=s.value)
then Concat('<font colour="red">',s.value,'</font>') else s.value end word
from payments p
cross apply String_Split(description,' ')s
)w
Example DB Fiddle
Okay, I understand, I can put code in the front-end application by looping through the bill numbers and replacing them as they are found in the description. Just thought/ hoped there was a simple solution to this using t-sql. But I understand the difficulty.

Postgres full text search and spelling mistakes (aka fuzzy full text search)

I have a scenario, where I have data for informal communications that I need to be able to search. Therefore I want full text search, but I also to make sense of spelling mistakes. Question is how do I take spelling mistakes into account in order to be able to do fuzzy full text search??
This is very briefly discussed in Postgres Full Text Search is Good Enough where the article discusses misspelling.
So I have built a table of "documents", created indexes etc.
CREATE TABLE data (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
text TEXT NOT NULL);
I can create an additional column of type tsvector and index accordingly...
alter table data
add column search_index tsvector
generated always as (to_tsvector('english', coalesce(text, '')))
STORED;
create index search_index_idx on data using gin (search_index);
I have for example, some text where the data says "baloon", but someone may search "balloon", so I insert two rows (one deliberately misspelled)...
insert into data (text) values ('baloon');
insert into data (text) values ('balloon');
select * from data;
id | text | search_index
----+---------+--------------
1 | baloon | 'baloon':1
2 | balloon | 'balloon':1
... and perform full text searches against the data...
select * from data where search_index ## plainto_tsquery('balloon');
id | text | search_index
----+---------+--------------
2 | balloon | 'balloon':1
(1 row)
But I don't get back results for the misspelled version "baloon"... So using the suggestion in the linked article I've built a lookup table of all the words in my lexicon as follows...
"you may obtain good results by appending the similar lexeme to your tsquery"
CREATE TABLE data_words AS SELECT word FROM ts_stat('SELECT to_tsvector(''simple'', text) FROM data');
CREATE INDEX data_words_idx ON data_words USING GIN (word gin_trgm_ops);
... and I can search for similar words which may have been misspelled
select word, similarity(word, 'balloon') as similarity from data_words where similarity(word, 'balloon') > 0.4 order by similarity(word, 'balloon');
word | similarity
---------+------------
baloon | 0.6666667
balloon | 1
... but how do I actually include misspelled words in my query?
Isn't this what the article above means?
select plainto_tsquery('balloon' || ' ' || (select string_agg(word, ' ') from data_words where similarity(word, 'balloon') > 0.4));
plainto_tsquery
----------------------------------
'balloon' & 'baloon' & 'balloon'
(1 row)
... plugged into an actual search, and I get no rows!
select * from data where text ## plainto_tsquery('balloon' || ' ' || (select string_agg(word, ' ') from data_words where similarity(word, 'balloon') > 0.4));
select * from data where search_index ## phraseto_tsquery('baloon balloon'); -- no rows returned
I'm not sure where I'm going wrong here - can any shed any light? I feel like I'm super close to getting this going...?
SELECT to_tsquery('balloon |' ||
string_agg(word, ' | ')
)
FROM data_words
WHERE similarity(word, 'balloon') > 0.4;
For anyone looking at this thread, the accepted answer by #laurenz-albe needed a slight modification for me:
It required single quotes around the argument values passed to the string_agg function, which can be done using the format function along with the %L placeholder.
This updated code worked for me:
SELECT to_tsquery('balloon |' ||
string_agg(format('%L', word), ' | ')
)
FROM data_words
WHERE similarity(word, 'balloon') > 0.4;

Truncating leading zero from the string in postgresql

I'm trying to truncate leading zero from the address. example:
input
1 06TH ST
12 02ND AVE
123 001St CT
expected output
1 6TH ST
12 2ND AVE
123 1St CT
Here is what i have:
update table
set address = regexp_replace(address,'(0\d+(ST|ND|TH))','?????? need help here')
where address ~ '\s0\d+(ST|ND|TH)\s';
many thanks in advance
assuming that the address always has some number/letter address (1234, 1a, 33B) followed by a sequence of 1 or more spaces followed by the part you want to strip leading zeroes...
select substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0') from table;
or, to update the table:
update table set address = substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0');
-g
What you are looking for is the back references in the regular expressions:
UPDATE table
SET address = regexp_replace(address, '\m0+(\d+\w+)', '\1', 'g')
WHERE address ~ '\m0+(\d+\w+)'
Also:
\m used to match the beginning of a word (to avoid replacing inside words (f.ex. in 101Th)
0+ truncates all zeros (does not included in the capturing parenthesis)
\d+ used to capture the remaining numbers
\w+ used to capture the remaining word characters
a word caracter can be any alphanumeric character, and the underscore _.

Return first and last words in a person name - postgres

I have a list of names and I want to separate the first and last words in a person's name.
I was trying to use the "trim" function without success.
Can someone explain how could I do it?
table:
Names
Mary Johnson Angel Smith
Dinah Robertson Donald
Paul Blank Power Silver
Then I want to have as a result:
Names
Mary Smith
Dinah Donald
Paul Silver
Thanks,
You can do it simply with regular expressions, like:
substring(trim(name) FROM '^([^ ]+)') || ' ' || substring(trim(name) FROM '([^ ]+)$')
Of course it would only work you are 100% there is always supplied at least a first and a last name. I'm not 100% sure it is the case for everybody in the World. For instance: would that work for names in Chinese? I'm not sure and I avoid doing any assumption about people names. The best is to simply ask the user two fields, one for the "name" and another for "How would you like to be called?".
Another approach, which takes advantage of Postgres string processing built-in functions:
SELECT split_part(name, ' ', 1) as first_token,
split_part(name, ' ', array_length(regexp_split_to_array(name, ' '), 1)) as last_token
FROM mytable
Here's how I extracted full names from emails with a dot in them, eg Jeremy.Thompson#abc.com
SELECT split_part(email, '.', 1) || ' ' || replace(split_part(email, '.', 2), '#abc','')
FROM people
Result:
Jeremy | Thompson
You can easily replace the dot with a space:
SELECT split_part(email, ' ', 1) || ' ' || replace(split_part(email, ' ', 2), '#abc','')
FROM people

How do you exclude a column from showing up if there is no value?

Question about a query I'm trying to write in SQL Server Management Studio 2008. I am pulling 2 rows. The first row being the header information, the second row being the information for a certain Line Item. Keep in mind, the actual header information reads as "Column 0, 1, 2, 3, 4,.... etc."
The data looks something like this:
ROW 1: Model # | Item Description| XS | S | M | L | XL|
ROW 2: 3241 | Gray Sweatshirt| | 20 | 20 | 30 | |
Basically this shows that there are 20 smalls, 20 mediums, and 30 larges of this particular item. There are no XS's or XL's.
I want to create a subquery that puts this information in one row, but at the same time, disinclude the sizes with a blank quantity amount as shown under the XS and XL sizes.
I want it to look like this when all is said and done:
ROW 1: MODEL #| 3241 | ITEM DESCRIPTION | Gray Sweatshirt | S | 10 | M | 20 | L | 30 |
Notice there are no XS or XL's included. How do I do make it so those columns do not appear?
Since you are not posting your query, nor your table structure, I guess it is with columns Id, Description, Size. If so, you could do this and just replace with your table and column names:
DECLARE #columns varchar(8000)
SELECT #columns = COALESCE (#columns + ',[' + cast(Size as varchar) + ']', '[' + cast(Size as varchar) + ']' )
FROM YourTableName
WHERE COUNT(Size) > 0
DECLARE #query varchar(8000) = 'SELECT Id, Description, '
+ #columns +'
FROM
(SELECT Id, Description, Size
FROM YourTableName) AS Source
PIVOT
(
COUNT(Size)
FOR Size IN ('+ #columns +')
) AS Pvt'
EXEC(#query)
Anyhow, I also agree with #MichaelFredickson. I have implemented this pivot solution, yet it is absolutely better to let the presentation layer to take care of this after just pulling the raw data from SQL. If not, you would be processing the data twice, one on SQL to create the table and the other in the presentation when reading and displaying the values with your c#/vb/other code.