how to format number separated by comma for every three integer digits in presto - number-formatting

I want to format a number separated by comma for every three integer digits. for example 12345.894 -- > 12,345.894. I have no clue how to format it. I have tried for an example but no luck.
format('%,.2f', 12345.894)
The above code will round decimal to 2 digits so it returns 12,345.89. In my case, I want to keep the decimal 12,345.894.

You could use regular expression:
SELECT regexp_replace(cast(123456.8943 as VARCHAR), '(\d)(?=(\d{3})+\.)', '$1,')
Results:
-------
123,456.8943
Some explanation:
First we cast to varchar as regex works on string.
The regex actually says: replace any digit \d you see only if it has one or more + groups of 3 digits \d{3} just before the "." (dot) sign \.. The digit is replaced by the same digit $1 but with comma after it ,.
The example can be seen here.
You can see more discussions on the regex here.

If you want 3 decimal numbers you can use %,.3f as the format string:
presto> select format('%,.3f', 12345.894);
_col0
------------
12,345.894
(1 row)

Related

How to replace a character using sed with different lengths in preceding string

I have a file in which I want to replace the "_" string with "-" in cases where it makes up a part of my gene name. Examples of the gene names and my intended output are:
aa1c1_123 -> aa1c1-123
aa1c2_456 -> aa1c1-456
aa1c10_789 -> aa1c1-789
In essence, the first four characters are fixed, followed by 1 or 2 characters depending on the chromosome, an underscore and then the remainder of the gene ID which could vary in length and character. Important is that there are other strings in this gene information column contains other strings with underscores (e.g. "gene_id", "transcript_id", "five_prime_utr") so using sed -i.bak s/_/-/g' file.gtf
can't be done.
Perhaps not the most elegant way, but this should work:
sed -i.bak 's/\([0-9a-z]\{4\}[0-9][0-9]\?\)_/\1-/g' file.gtf
i.e. capture a group (referenced by \1 in the substitution) of 4 characters consisting of lower case letters and digits followed by exactly one digit and perhaps another digit, which is followed by an underscore; if found, replace it by the group's content and a dash. This should exclude your other occurrences consisting of only characters and an underscore.

I have a question about matching the first 5 digits in a string in a regular expression

I would like to limit a string of numbers to 14 digits, and require that the first 5 digits are: 26173. The rest of the digits can be any number between 1-9. Example: 26173000740380.
The regexp 26173\d{9} specifies the first 5 characters must be 26173 and the following 9 characters must be any decimal number \d.
If the remaining 9 characters must be between 1 - 9 you could use 26173[1-9]{9}. Both examples are using java regexp syntax.
Regexp planet is a good site for testing regular expressions
https://www.regexplanet.com/advanced/java/index.html

Inserting hyphens into length limited String using regex

Within a Swift project I have some regex which at present ensures that an input can only be 10 characters long:
"^[\\da-zA-Z]{10,10}$"
I need to tweak this slightly, so that the string which this is working on will have the below format:
#####-####
i.e, inserting a character after the fifth character.
So far I have tried combining what I have with some other regex, however this is incorrect and I can't figure out what I need to do differently to make this work:
"^[\\da-zA-Z]{10,10}$(.{5}),$1-$2"
If you have as string of 10 characters and you want to replace the character after the sixth character you could use 2 capturing groups.
Capture the first 5 characters in the first group, then match the sixth character which you want to replace and capture the last 4 in the second group.
^([\\da-zA-Z]{5})[\\da-zA-Z]([\\da-zA-Z]{4})$
regex demo
In the replacement use $1-$2 which in total will be 10 characters as in your desired pattern #####-####
Note that {10,10} can be written as {10}

PATINDEX does not recognize dot and comma

I have a column that should contain phone numbers but it contains whatever the user wanted. I need to create an update to remove all the characters after an invalid character.
To do this I am using a regex as PATINDEX('%[^0-9+-/()" "]%', [MobilNr]) and it seemed to work until I had some numbers as +1235, 36446 and to my surprise the result is 0 instead of 6. Also if the number contains . it returns 0.
Does PATINDEX ignores dot(".") and comma(",")? Are there other characters that PATINDEX will ignore?
It's not that PATINDEX ignores the comma and the dot, it's your pattern that created this problem.
With PATINDEX, the hyphen char (-) has a special meaning - it's in fact an operator that denotes an inclusive range - like 0-9 denotes all digits between 0 and 9 - so when you do +-/ it means all the chars between + and / (inclusive, of course). The comma and dot chars are within this range, that's why you get this result.
Fixing the pattern is easy: either use | as a logical or, or simply move the hyphen to the end of the pattern:
SELECT PATINDEX('%[^0-9/()" "+-]%', '+1235, 36446') -- Result: 6

Regex for currency - Exclude commas from the count limit

I am using the following regex in my app:
^(([0-9|(\\,)]{0,10})?)?(\\.[0-9]{0,2})?$
So it allows 10 characters before the decimal and 2 character after it.
But I am inserting one additional functionality of formatting textfield as currency while typing. So if I have 1234567 it becomes 1,234,567 after formatting. The regex fails when I enter 10 characters instead of 10 digits. Ideally it should be that regex ignores the commas when counting 10.
I tried this too ^(([0-9|(\\,)]{0,13})?)?(\\.[0-9]{0,2})?$ but it doesn't seem the right approach.
Can anyone help me get a proper regex instead of using this tweak.
You may use
"^(?:,*[0-9]){0,10}(?:\.[0-9]{0,2})?$"
Or, if there must be a digit after . in the fractional part use
"^(?:,*[0-9]){0,10}(?:\.[0-9]{1,2})?$"
See the regex demo. The (?:,*[0-9]){0,10} part is what does the job: it matches any 0+ , chars followed with a single digit 0 to 10 times. If , can also appear before ., add ,* after the ((?:,*[0-9]){0,10})?.
Details
^ - start of string
(?:,*[0-9]){0,10} - 0 to 10 occurrences of 0+ commas followed with a digit
(?:\.[0-9]{0,2})? - an optional sequence of:
\. - a period
[0-9]{0,2} - 0 to 2 digits (if there must be a digit after . use [0-9]{1,2})
$ - end of string.