How do I encode the percent symbol in a DTD? - unicode

How do I encode the percent sign in a DTD?
The following syntax is invalid due to '%' sign
<!ATTLIST disk unit_of_measurement (TB|TiB|GB|GiB|MB|MiB|kB|KiB|B|s|percent|percentage|%) #REQUIRED>
I've also tested the & # 37 ; syntax with no better success
What is the trick to tell that the percent sign is a valid value for attribute unit_of_measurement?

Related

Postgresql 15 - syntax sensitivity

I've noticed that in older version of PG (example 13)
when I had query like:
select 1 where 1=1and 2=2
all was OK
but i try this in PG 15 I get error: trailing junk after numeric literal at or near "1a"
Have something changed or maybe there is a new option in configuration to make it more strict ?
This was changed in v 15.0.
From the release notes:
Prevent numeric literals from having non-numeric trailing characters (Peter Eisentraut)
Previously, query text like 123abc would be interpreted as 123 followed by a separate token abc.
and similar
Adjust JSON numeric literal processing to match the SQL/JSON-standard (Peter Eisentraut)
This accepts numeric formats like .1 and 1., and disallows trailing junk after numeric literals, like 1.type().

postgres: how to count multibyte emoji strings display length in UTF-8

Postgres (v11) counts the red heart ❤️ as two characters, and so on for other multibyte UTF-8 chars with selector units. Anyone know how I get postgres to count true characters and not the bytes?
For example, I would like both of the examples below should return 1.
select length('❤️') = 2 (Unicode: 2764 FE0F)
select length('🏃‍♂️') = 4 (Unicode: 1F3C3 200D 2642 FE0F)
UPDATE
Thank you to folks pointing out that postgres is correctly counting the Unicode code points and why and how this happens.
I don't see any other option other than pre-processing the emoji strings as bytes against a table of official Unicode character bytes, in Python or some such, to get the perceived length.
So one way to do this is to ignore all characters in the Variation Selector and decrement by 2 if you hit the General Punctuation range.
This could be converted into a postgres function.
python
"""
# For reference, these code pages apply to emojis
Name Range
Emoticons 1F600-1F64F
Supplemental_Symbols_and_Pictographs 1F900-1F9FF
Miscellaneous Symbols and Pictographs 1F300-1F5FF
General Punctuation 2000-206F
Miscellaneous Symbols 2600-26FF
Variation Selectors FE00-FE0F
Dingbats 2700-27BF
Transport and Map Symbols 1F680-1F6FF
Enclosed Alphanumeric Supplement 1F100-1F1FF
"""
emojis="🏃‍♂️🏃‍♂️🏃‍♂️🏃‍♂️🏃‍♂️🏃‍♂️🏃‍♂️" # true count is 7, postgres length() returns 28
true_count=0
for char in emojis:
d=ord(char)
char_type=None
if (d>=0x2000 and d<=0x206F) : char_type="GP" # Zero Width Joiner
elif (d>=0xFE00 and d<=0xFE0F) : char_type="VS" # Variation Selector
print(d, char_type)
if ( char_type=="GP") : true_count-=2
elif (char_type!="VS" ): true_count+=1
print(true_count)

db2 remove all non-alphanumeric, including non-printable, and special characters

This may sound like a duplicate, but existing solutions does not work.
I need to remove all non-alphanumerics from a varchar field. I'm using the following but it doesn't work in all cases (it works with diamond questionmark characters):
select TRANSLATE(FIELDNAME, '?',
TRANSLATE(FIELDNAME , '', 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'))
from TABLENAME
What it's doing is the inner translate parse all non-alphanumeric characters, then the outer translate replace them all with a '?'. This seems to work for replacement character�. However, it throws The second, third or fourth argument of the TRANSLATE scalar function is incorrect. which is expected according to IBM:
The TRANSLATE scalar function does not allow replacement of a character by another character which is encoded using a different number of bytes. The second and third arguments of the TRANSLATE scalar function must end with correctly formed characters.
Is there anyway to get around this?
Edit: #Paul Vernon's solution seems to be working:
· 6005308 ??6005308
–6009908 ?6009908
–6011177 ?6011177
��6011183�� ??6011183??
Try regexp_replace(c,'[^\w\d]','') or regexp_replace(c,'[^a-zA-Z\d]','')
E.g.
select regexp_replace(c,'[^a-zA-Z\d]','') from table(values('AB_- C$£abc�$123£')) t(c)
which returns
1
---------
ABCabc123
BTW Note that the allowed regular expression patterns are listed on this page Regular expression control characters
Outside of a set, the following must be preceded with a backslash to be treated as a literal
* ? + [ ( ) { } ^ $ | \ . /
Inside a set, the follow must be preceded with a backslash to be treated as a literal
Characters that must be quoted to be treated as literals are [ ] \
Characters that might need to be quoted, depending on the context are - &

how to convert base64 string to bytea in postgresql8.2

I need to convert base64 string to bytea type. But when I Executed SQL statements by the pgAdminIII:
select decode("ygAAA", 'base64');
I got the following error message:
ERROR: syntax error at or near ")"
LINE 1: select decode('ygAAA', 'base64');
^
********** 错误 **********
ERROR: syntax error at or near ")"
SQL 状态: 42601
字符:59
My postgresql's version is 8.2.15. And I could use encode function.
I googled it, but didn't find the solution. Can somebody help me? TKS!
Try it with single quotes instead of double quotes. Also base 64 strings turn groups of 4 characters into 3 bytes (24 bits in 3 bytes are spread across the lower 6 bits of 4 characters.) So your base64 string is invalid.
This works:
select decode('ygAA', 'base64');
Hope this helps,
Adam.

What's the best candidate padding char for url-safe and filename-safe base64?

The padding char for the official base64 is '=', which might need to be percent-encoded when used in a URL. I'm trying to find the best padding char so that my encoded string can be both url safe (I'll be using the encoded string as parameter value, such as id=encodedString) AND filename safe (I'll be using the encoded string directly as filename).
Dot ('.') is a popular candidate, it's url safe but it's not exactly filename safe: Windows won't allow a file name which ends with a trailing dot.
'!' seems to be a viable choice, although I googled and I've never seen anybody using it as the padding char. Any ideas? Thanks!
Update: I replaced "+" with "-" (minus) and replaced "/" with "_" (underscore) in my customized base64 encoding already, so '-' or '_' is not available for the padding char any more.
The best solution (I've spent last month working on this problem with an email sending website) is to not use padding character (=) at all
The only reason why padding character is there is because of "lazy" decoders. You can extremely easy add missing = -> just do %4 on text and subtract the number you get from 4 and that is how many = you need to add in string end. Here is C# code:
var pad = 4 - (text.Length % 4);
if (pad < 4)
text = text.PadRight(text.Length + pad, '=');
Also, most people who do this are interested in replacing + and / with other URL safe character... I propose:
replace with -
/ replace with _
DO NOT USE . as it can produce crazy results on different systems / web servers (for example on IIS Base64 encoded string can't end with . or IIS will search for the file)
The RFC 2396 unreserved characters in URIs are:
"-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
It's worth pointing out, though, that the Microsoft article also says "Do not assume case sensitivity." Perhaps you should just stick with base 16 or 32?
The Wikipedia article states;
a modified Base64 for URL variant
exists, where no padding '=' will be
used
I would go with '-' or '_'
They're URL and file safe, and they looks more or less like padding