Postgresql Function to sort characters within a string - postgresql

Is there a postgresql function, preferably native function, that can sort a string such as 'banana' to 'aaabnn'?
Algorithmic efficiency of sorting is not of much importance since words will never be too long. However, database join efficiency is of some but not critical importance.

There is no native function with such functionality but you can use regexp_split_to_table to do so as this:
select theword
from (select regexp_split_to_table('banana',E'(?=.)') theword) tab
order by theword;
The result will be:
theword
a
a
a
b
n
n
This (?=.) will split by each character leaving the character as separator. It will also identify spaces. If you have a word with spaces and do not want it (the space) use E'(\\s*)' matches any whitespace character. I don't recall what the E means. I will search and edit the answer asap.
As explained in the DOCs in the section "regexp_split_to_table"
EDIT: As I said: The meaning of the E before the string you can see here: What's the "E" before a Postgres string?

Related

If there's even one non-english character in any entry in a column I want have 'TRUE' in another column

There are several entries in the column, eng characters with non english characters, eng characters numbers/symbols, non eng characters with numbers/symbols etc. If there's even one non-english character in any entry in the column, I want 'TRUE' in the adjacent column.
SELECT * 
FROM companies
WHERE name LIKE '%[a-z]%';
This code doesn't work.
You can achieve this using regular expressions. Here's a regular expression that will match all ASCII printable characters along with tab (\t), new-line/line-feed (\n), and carriage return (\r).
SELECT
*,
name ~ '[^\t\n\r\x20-\x7E]' AS has_bad_chars
FROM companies
Now this will match any character that's not A-Z, a-z, 0-9, , ., ;, :, ", ', /.
Working from the assumption that the adjacent column you mention is defined in the table as
has_non_english_char boolean then try
update companies
set has_non_english_char = name ~ '[^A-Za-z0-9_.,/$]';
]
Alternatively look into character classes or include additional characters in the above. Note: The regular expression should include the 'English' characters you want to allow.
Perhaps this is not the preferred SO protocol, but I think another answer may be better than just expanding a previous one. If not community please forgive me.
Run:
with companies as
( select * from (values ('abc.com:;'), ('abc.com - "Something')) as c(name))
SELECT 'My RE',name,name ~ '[^A-Za-z0-9&_`!##$^&*()_+=\|\][{’\;:"<.,.}? -]' has_bad_chars, 'f' desirded FROM companies
union
SELECT 'Your RE',name,name ~ '[^A-Za-z0-9&_`!##$^&*()_+=\|][{’;:""<.,.}?-]', 'f' from companies
order by 2,1 desc;
Let's examine the RE themselves and see the difference:
My RE [^A-Za-z0-9&_`!##$^&*()_+=\|\][{’\;:""<.,.}? -]
Your RE [^A-Za-z0-9&_`!##$^&*()_+=\|][{’;:""<.,.}?-]
^^^
Notice the difference at the indicated position. You have '|]' while I have '|\]' also I later have a space.
See the regular expression reference from #cpburnz earlier, in particular section "9.7.3.2. Bracket Expressions".
Your RE breaks down to '[^...][...]' which breaks the RE into looking for 2 distinct set of characters telling the RE engine to find 'any character not in the first bracketed expression' followed immediately by 'any character that is in the second bracketed expression'
The difference is I escaped the right bracket ] thus removing its special meaning in the RE and making it just another character. This is the nature of REs the exact individual characters can make all the difference. If you are going to do much of this type of stuff study REs intently.
Good luck finding the exact RE you need, its out there you just need to work with it until you find it.

officejs : Search Word document using regular expression

I want to search strings like "number 1" or "number 152" or "number 36985".
In all above strings "number " will be constant but digits will change and can have any length.
I tried Search option using wildcard but it doesn't seem to work.
basic regEx operators like + seem to not work.
I tried 'number*[1-9]*' and 'number*[1-9]+' but no luck.
This regular expression only selects upto one digit. e.g. If the string is 'number 12345' it only matches number 12345 (the part which is in bold).
Does anyone know how to do this?
Word doesn't use regular expressions in its search (Find) functionality. It has its own set of wildcard rules. These are very similar to RegEx, but not identical and not as powerful.
Using Word's wildcards, the search text below locates the examples given in the question. (Note that the semicolon separator in 1;100 may be soemthing else, depending on the list separator set in Windows (or on the Mac). My European locale uses a semicolon; the United States would use a comma, for example.
"number [0-9]{1;100}"
The 100 is an arbitrary number I chose for the maximum number of repeats of the search term just before it. Depending on how long you expect a number to be, this can be much smaller...
The logic of the search text is: number is a literal; the valid range of characters following the literal are 0 through 9; there may be one to one hundred of these characters - anything in that range is a match.
The only way RegEx can be used in Word is to extract a string and run the search on the string. But this dissociates the string from the document, meaning Word-specific content (formatting, fields, etc.) will be lost.
Try putting < and > on the ends of your search string to indicate the beginning and ending of the desired strings. This works for me: '<number [1-9]*>'. So does '<number [1-9]#>' which is probably what you want. Note that in Word wildcards the # is used where + is used in other RegEx systems.

I can't understand the behaviour of btrim()

I'm currently working with postgresql, I learned about this function btrim, I checked many websites for explanation, but I don't really understand.
Here they mention this example:
btrim('xyxtrimyyx', 'xyz')
It gives trim.
When I try this example:
btrim('xyxtrimyyx', 'yzz')
or
btrim('xyxtrimyyx', 'y')
I get this: xyxtrimyyx
I don't understand this. Why didn't it remove the y?
From the docs you point to, the definition says:
Remove the longest string consisting only of characters in characters
(a space by default) from the start and end of string
The reason your example doesn't work is because the function tries to strip the text from Both sides of the text, consisting only of the characters specified
Lets take a look at the first example (from the docs):
btrim('xyxtrimyyx', 'xyz')
This returns trim, because it goes through xyxtrimyyx and gets up to the t and doesn't see that letter in xyz, so that is where the function stops stripping from the front.
We are now left with trimyyx
Now we do the same, but from the end of the string.
While one of xyz is the last letter, remove that letter.
We do this until m, so we are left with trim.
Note: I have never worked with any form of sql. I could be wrong about the exact way that postgresql does this, But I am fairly certain from the docs that this is how it is done.

T-SQL Procedure parameter name

I have to create a procedure with same parameters names as excel columns. Some loook like this 'xxx/xxx' or 'xxx - xxx'. Is there any work around to name parameteres in a stored procedure like this?
Forward slash (/) or dash (-) are not allowed in variable names
According to http://msdn.microsoft.com/en-us/library/ms175874.aspx, that means that the allowed characters are:
Letters as defined in the Unicode Standard 3.2. The Unicode
definition of letters includes Latin characters from a through z,
from A through Z, and also letter characters from other languages.
Decimal numbers from either Basic Latin or other national scripts.
The at sign (#), dollar sign ($), number sign (#), or underscore (_).
Okay first of all why would you ever want to use special characters? That is like saying I want to fry toast underwater with electricity, why can't I have an outlet to allow that? Special characters denote special things and as such many engines, not just in SQL, but most languages will not allow reserved characters for use in variables. The best you could do was put in parameters with the reversed '_' and then replace that AFTER the object was already created for echoing out. The placeholder name of '#(something)' is really arbitrary and could be #X or #LookAtMe. It's type is important to form a contract that must be fulfilled for execution but the naming is really for hooking up. Having said that if you just must have these weird names echoed out you could do something like this:
CREATE PROC pSimpleParam #My_Param INT
AS
SELECT #My_Param
GO
ALTER PROC pSimpleParam #My_Param INT
AS
SELECT
pr.name AS ParameterName
, REPLACE(pr.name, '_', '-') AS AlteredParameterName
FROM sys.procedures p
INNER JOIN sys.parameters pr ON pr.object_id = p.object_id
AND p.name = 'pSimpleParam'
GO

Selecting words out of table which sound similar

I read an interesting article about English and phonetics - and would like to see if my newfound knowledge can be applied in TSQL to generate a fuzzy result set. In one of my applications, there is a table containing words, which I extracted from a word list. It is literally a one-column table -
Word |
------
A
An
Apple
...
their
there
Is there an built-in function in SQL Server to Select a word which Sounds The same, even though it is spelled different? (The globalization settings are on en-ZA - as last time I checked)
SELECT Word FROM WordTable WHERE Word = <word that sounds similar>
SoundEx()
SOUNDEX converts an alphanumeric string to a four-character code that is based on how the string sounds when spoken.
Difference()
Returns an integer value that indicates the difference between the SOUNDEX values of two character expressions.
SELECT word
, SoundEx(word) As word
, SoundEx(word_that_sounds_similar) As word_that_sounds_similar
, Difference(SoundEx(word), SoundEx(word_that_sounds_similar)) As how_similar
FROM wordtable
WHERE Difference(SoundEx(word), SoundEx(word_that_sounds_similar)) <= 1 /* quite close! */
The value returned by Difference() indicates how similar the two words are.
A value of 0 indicates a strong match and a value of 4 means slim-to-no match.