I can't understand the behaviour of btrim() - postgresql

I'm currently working with postgresql, I learned about this function btrim, I checked many websites for explanation, but I don't really understand.
Here they mention this example:
btrim('xyxtrimyyx', 'xyz')
It gives trim.
When I try this example:
btrim('xyxtrimyyx', 'yzz')
or
btrim('xyxtrimyyx', 'y')
I get this: xyxtrimyyx
I don't understand this. Why didn't it remove the y?

From the docs you point to, the definition says:
Remove the longest string consisting only of characters in characters
(a space by default) from the start and end of string
The reason your example doesn't work is because the function tries to strip the text from Both sides of the text, consisting only of the characters specified
Lets take a look at the first example (from the docs):
btrim('xyxtrimyyx', 'xyz')
This returns trim, because it goes through xyxtrimyyx and gets up to the t and doesn't see that letter in xyz, so that is where the function stops stripping from the front.
We are now left with trimyyx
Now we do the same, but from the end of the string.
While one of xyz is the last letter, remove that letter.
We do this until m, so we are left with trim.
Note: I have never worked with any form of sql. I could be wrong about the exact way that postgresql does this, But I am fairly certain from the docs that this is how it is done.

Related

Extracting Portions of String

I have a field with the following types of string
X000233756_9981900025_201901_EUR_/
I firstly need to take take the characters to the left of the first _
Secondly I need to take the characters between the first and 2nd _
First _ is CHARINDEX('_',[Line_Item_Text],1) AS Position_1
Second _ is CHARINDEX('_',[Line_Item_Text],CHARINDEX('_',[Line_Item_Text],1)+1) AS Position_2
I was then expecting to be able to do
left([Line_Item_Text],CHARINDEX('_',[Line_Item_Text],1)-1) AS Data_1
Substring([Line_Item_Text],CHARINDEX('_',[Line_Item_Text],1)+1),CHARINDEX('_',[Line_Item_Text],CHARINDEX('_',[Line_Item_Text],1)+1) - CHARINDEX('_',[Line_Item_Text],1)+1)) AS Data_2"
Which should give me
X000233756
9981900025
But getting errors with incorrect number of functions when I start adding and subtracting from CHARINDEX Function.
Any ideas where I am going wrong?
TIA
Geoff
Actually, using the base string functions here is going to be an ugly nightmare. You might find that STRING_SPLIT along with some clever logic might be easier:
SELECT value
FROM STRING_SPLIT('X000233756_9981900025_201901_EUR_', '_')
WHERE LEN(value) > 6 AND NOT value LIKE '[A-Z]%';
This answer assumes that the third and fourth components would always be a 6 digit date and 3 letter currency code, and that the first (but not second) component would always start with some letter.
Demo

Is it correct to use Word Joiner (U+2060) in the same word?

In Bangla, Hosonto (U+09CD) is used to create a ligature, which joins adjacent letters. For example ক্ক is created using ক + ্ + ক. But sometimes we need a non-joining Hosonto (ক্‌ক). To make it possible, traditionally we use a Zero-width non-joiner (‌‌‌‌‌U+200C‌).
The problem with ‌‌‌‌‌ZWNJ is that, when the line is too long and line wrapping occurs, the word is broken into two lines. To keep the word as a whole, I need a character, something like “Zero-width non-breaking non-joiner”. But I don’t see such character in Unicode. So I think, Word Joiner (U+2060) is the best option.
To me, Word Joiner sounds like “joins two words”. But in my case, I need to join two parts of a single word. So, the question is, is it correct to use Word Joiner here?
U+200C ZERO WIDTH NON-JOINER has no effect on line breaking. Its absence or presence does not change where line wrapping can occur. If inserting a ZWNJ within a word causes that word to be broken across lines, then whatever application you are using to view your text does not implement the standard correctly.
ZWNJ is the only correct character for your purposes. More than that, using U+2060 WORD JOINER could in fact lead to inconsistent results. Much like ZWNJ does not affect line breaks, WJ is not supposed to affect joining behaviour (it is defined as “transparent” in that regard). While the standard doesn’t explicitly mention cases like this to the best of my knowledge, one could reasonably argue that inserting a WJ between the two letters in your example should not change the way they are displayed.

PATINDEX incorrect result when looking for dash character "-"

This simple example shows the issue I've run into, but I don't understand why...
I'm testing for the location of the first character that is either a lower or upper case letter, a single dash, or a period in a string parameter passed to me.
These two pattern matches appear to check the same thing, and yet run this code yourself and it will print a 0 then a 3:
PRINT PATINDEX ( '%[a-z,A-Z,-,.]%', '16-82')
PRINT PATINDEX ( '%[-,a-z,A-Z,.]%', '16-82')
I don't understand why it works only if the dash character is the first one we check for.
Is this a bug? Or working as designed and I missed something... I'm using SQL Server 2016, but I don't think that matters.
A dash within a character group may play either of the two roles:
It may denote the dash itself, like it does in the expression [-abc]
It may denote the "everything inbetween" operator, like it does in the expression [a-z].
In your particular example, the character group [a-z,A-Z,-,.] denotes the following:
Everything from a to z
Comma ,
Everything from A to Z
Everything from , to , (i.e. just the comma again).
Dot .
In fact, you probably wanted to write [-a-zA-Z.]

How do I use pg_trgm to be more permissible

I used pg_trgrm to check string matches and I am pretty happy with the results. But it is not pefrectly the way I want it. I want that searches like "poduto" finds "produtos" (the r was missing). And Also that "sofáa" finds "sofa". I am using posgresql 9.6.
It does find "vermelho" when I type "vermelo" (h is missing). And it does find "sofa" when I type "sof". It seems that only some letters in middle can be left out and I always can miss a final letter. I want to be able to miss any letter in the middle of the word. And also be able to commit "two mistakes" in the case of sofáa and sofá (I used an accent and used one additional "a").
The solution is to lower pg_trgm.similarity_threshold (or pg_trgm.word_similarity_threshold if you are using <% or %>).
Then words with lower similarity will also be found.

Strip excess padding from a string

I asked a question earlier today and got a really quick answer from llbrink. I really should have asked that question before I spent several hours trying to find an answer.
So - here's another question that I have never found an answer for (although I have created a work-around which seems very cludgy).
My AHK program asks the user for a login name. The program then compares the login name with an existing list of names in a file.
The login name in the file may contain spaces, but there are never spaces at the beginning of the name. When the user enters the name, he may include spaces at the beginning. This means that when my program compares the name with those in the file, it can not find a match (because of the extra spaces).
I want to find a way of stripping the spaces from the beginning of the input.
My work-round has been to split the input string into an array (which does ignore leading spaces) and then use the first element of the array. This is my code :
name := DoStrip(name)
DoStrip(xyz) ; strip leading and trailing spaces from string
{
StringSplit, out, xyz, `,, %A_Space%
Return out1
}
This seems to be a very laboured way to do it - is there a better way ?
I don't see a problem with your example if it works on all cases.
There is a much simpler way; just use Autotrim which works like this.
AutoTrim, On ; not required it is on by default
my_variable = %my_variable%
There are also many other different ways to trim string in autohotkey,
which you can combine into something useful.
You can also use #LTrim and #RTrim to remove white spaces at the beginning and at the end of the string.