Allowed characters in CSS 'content' property? - unicode

I've read that we must use Unicode values inside the content CSS property i.e. \ followed by the special character's hexadecimal number.
But what characters, other than alphanumerics, are actually allowed to be placed as is in the value of content property? (Google has no clue, hence the question.)

The rules for “escaping” characters are in the CSS 2.1 specification, clause 4.1.3 Characters and case. The special rules for quoted strings, as in content property value, are in clause 4.3.7 Strings. Within a quoted string, any character may appear as such, except for the character used to quote the string (" or '), a newline character, or a backslash character \.
The information that you must use \ escapes is thus wrong. You may use them, and may even need to use them if the character encoding of the document containing the style sheet does not let you enter all characters directly. But if the encoding is UTF-8, and is properly declared, then you can write content: '☺ Я Ω ⁴ ®'.

As far as I know, you can insert any Unicode character. (Here's a useful list of Unicode characters and their codes.)
To utilize these codes, you must escape them, like so:
U+27BA Becomes \27BA
Or, alternatively, I think you may just be able to escape the character itself:
content: '\➺';
Source: http://mathiasbynens.be/notes/css-escapes

Related

Perl regex presumably removing non ASCII characters

I found a code with regex where it is claimed that it strips the text of any non-ASCII characters.
The code is written in Perl and the part of code that does it is:
$sentence =~ tr/\000-\011\013-\014\016-\037\041-\055\173-\377//d;
I want to understand how this regex works and in order to do this I have used regexr. I found out that \000, \011, \013, \014, \016, \037, \041, \055, \173, \377 mean separate characters as NULL, TAB, VERTICAL TAB ... But I still do not get why "-" symbols are used in the regex. Do they really mean "dash symbol" as shown in regexr or something else? Is this regex really suited for deleting non-ASCII characters?
This isn't really a regex. The dash indicates a character range, like inside a regex character class [a-z].
The expression deletes some ASCII characters, too (mainly whitespace) and spares a range of characters which are not ASCII; the full ASCII range would simply be \000-\177.
To be explicit, the d flag says to delete any characters not between the first pair of slashes. See further the documentation.

How should Fancytree node keys be escaped to avoid special characters?

I discovered that the '+' character in a custom node key is converted silently to a space character. I obviously need to escape these special characters, but I could not find documentation about which characters are not allowed in keys.
Thanks!
There should be no conversion, except for casting non-strings to a string.
When the generateIdsoption is used, the key is added as id="KEY" attribute to the generated HTML element, so the standard restrictions apply.
The key is also internally used as JavaScript hash key.
I'd recommend plain ascii keys, but '{', '.', '~', ... should be no problem as well.
As far as I know, + is interpreted as space by browsers, when part of a URL, so maybe you see the conversion there.

When are double quotes required to create a KDB/q symbol?

Normally, for simple character strings, a leading backtick does the trick.
Example: `abc
However, if the string has some special characters, such as space, this will not work.
Example: `$"abc def"
Example: `$"BAT-3Kn.BK"
What are the rules when $"" is required?
Simple syntax for symbols can be used when the symbol consists of alphanumeric characters, dots (.), colons (:), and (non-leading) underscores (_). In addition, slashes (/) are allowed when there is a colon before it. Everything else requires the `$"" syntax.
The book 'Q for mortals', which is available online, has a section discussing datatypes. For symbols it states:
A symbol can include arbitrary text, including text that cannot be
directly entered from the console – e.g., embedded blanks and special
characters such as back-tick. You can manufacture a symbol from any
text by casting the corresponding list of char to a symbol. (You will
need to escape special characters into the string.) See §6.1.5 for
more on casting.
q)`$"A symbol with blanks and `"
`A symbol with blanks and `
The essential takeaway here is that converting a string to a symbol is required when special characters are involved. In the examples you have given both space " " and hyphen "-" are characters that cannot be directly placed into a symbol type.

Is Encoding the same as Escaping?

I am interested in theory on whether Encoding is the same as Escaping? According to Wikipedia
an escape character is a character
which invokes an alternative
interpretation on subsequent
characters in a character sequence.
My current thought is that they are different. Escaping is when you place an escape charater in front of a metacharacter(s) to mark it/them as to behave differently than what they would have normally.
Encoding, on the other hand, is all about transforming data into another form, and upon wanting to read the original content it is decoded back to its original form.
Escaping is a subset of encoding: You only encode certain characters by prefixing a special character instead of transferring (typically all or many) characters to another representation.
Escaping examples:
In an SQL statement: ... WHERE name='O\' Reilly'
In the shell: ls Thirty\ Seconds\ *
Many programming languages: "\"Test\" string (or """Test""")
Encoding examples:
Replacing < with < when outputting user input in HTML
The character encoding, like UTF-8
Using sequences that do not include the desired character, like \u0061 for a
They're different, and I think you're getting the distinction correctly.
Encoding is when you transform between a logical representation of a text ("logical string", e.g. Unicode) into a well-defined sequence of binary digits ("physical string", e.g. ASCII, UTF-8, UTF-16). Escaping is a special character (typically the backslash: '\') which initiates a different interpretation of the character(s) following the escape character; escaping is necessary when you need to encode a larger number of symbols to a smaller number of distinct (and finite) bit sequences.
They are indeed different.
You pretty much got it right.

I do replace literal \xNN with their character in Perl?

I have a Perl script that takes text values from a MySQL table and writes it to a text file. The problem is, when I open the text file for viewing I am getting a lot of hex characters like \x92 and \x93 which stands for single and double quotes, I guess.
I am using DBI->quote function to escape the special chars before writing the values to the text file. I have tried using Encode::Encoder, but with no luck. The character set on both the tables is latin1.
How do I get rid of those hex characters and get the character to show in the text file?
ISO Latin-1 does not define characters in the range 0x80 to 0x9f, so displaying these bytes in hex is expected. Most likely your data is actually encoded in Windows-1252, which is the same as Latin1 except that it defines additional characters (including left/right quotes) in this range.
\x92 and \x93 are empty characters in the latin1 character set (see here or here). If you are certain that you are indeed dealing with latin1, you can simply delete them.
It sounds like you need to change the character sets on the tables, or translate the non-latin-1 characters into latin-1 equivalents. I'd prefer the first solution. Get used to Unicode; you're going to have to learn it at some point. :)