Can't encode HTML special characters in Sublime Text 2 - encoding

I've always been able to encode html special characters in Sublime Text 2 with the keyboard shortcut [Shift]+[Cmd]+[p] then HTML: Encode special characters. For some unknown reason, the latter is no longer available when I type the keyboard shortcut. Is there a particular package that needs to be installed in order for the encoding to be applied?
Thanks

You can use more functional plugin SublimeStringEncode.
Install it using Package Control, type "StringEncode".
cmd + shift + P then "HTML entitize"
In addition you will have some other useful commands:
html_deentitize: Converts HTML entities to a character
url_encode: Uses urllib.quote to escape special URL characters
url_decode: Uses urllib.unquote to convert escaped URL characters
json_escape: Escapes a string and surrounds it in quotes, according to the JSON encoding.
json_unescape: Unescapes a string (include the quotes!) according to JSON encoding.
base64_encode: Uses base64 to encode into base64
base64_decode: Uses base64 to decode from base64
md5_encode: Uses sha package to create md5 hash
sha256_encode: Uses sha package to create sha256 hash
sha512_encode: Uses sha package to create sha512 hash
escape_regex: Escapes regex meta characters
escape_like: Escapes SQL-LIKE meta characters
safe_html_entitize: Converts characters to their HTML entity, but preserves HTML reserved characters
safe_html_deentitize: Converts HTML entities to a character, but preserves HTML reserved characters
xml_entitize: Converts characters to their XML entity
xml_deentitize: Converts XML entities to a character

Related

How can I encode UTF-8 to create case-insensitive ASCII filenames?

I have a list of UTF-8 strings, most of which are ASCII, but some of which contain Korean/Japanese characters. I want to create a file associated with each string. To avoid any problems with the filesystem not supporting these weird characters, I want to somehow encode each UTF-8 string into ASCII to generate its filename. The generated ASCII should be case-insensitive and all-lowercase to avoid problems with Windows' case-insensitive filesystem. (So e.g. Base64 encoding along would not work.) What are some ways to do this?

what's the definition of "encoding-agnostic"?

In lua 5.3 reference manual, we can see:
Lua is also encoding-agnostic; it makes no assumptions about the contents of a string.
I can't understand what the sentence says.
The same byte value in a string may represent different characters depending on the character encoding used for that string. For example, the same value \177 may represent ▒ in Code page 437 encoding or ± in Windows 1252 encoding.
Lua makes no assumption as to what the encoding of a given string is and the ambiguity needs to be resolved at the script level; in other words, your script needs to know whether to deal with the byte sequence as Windows 1252, Code page 437, UTF-8, or something else encoded string.
Essentially, a Lua string is a counted sequence of bytes. If you use a Lua string for binary data, the concept of character encodings is not relevant and does not interfere with the binary data. It that way, string is encoding-agnostic.
There are functions in the standard string library that treat string values as text—an uncounted, sequence of characters. There is no text but encoded text. An encoding maps a member of a character set to a sequence of bytes. A string would have the bytes for zero or more such encoded characters. To understand a string as text, you must know the character set and encoding. To use the string functions, the encoding should be compatible with os.setlocale().

addPortalMessage requires decode('utf-8')

Currently it seems that in order for UTF-8 characters to display in a portal message you need to decode them first.
Here is a snippet from my code:
self.context.plone_utils.addPortalMessage(_(u'This document (%s) has already been uploaded.' % (doc_obj.Title().decode('utf-8'))))
If Titles in Plone are already UTF-8 encoded, the string is a unicode string and the underscore function is handled by i18ndude, I do not see a reason why we specifically need to decode utf-8. Usually I forget to add it and remember once I get a UnicodeError.
Any thoughts? Is this the expected behavior of addPortalMessage? Is it i18ndude that is causing the issue?
UTF-8 is a representation of Unicode, not Unicode and not a Python unicode string. In Python, we convert back and forth between Python's unicode strings and representations of unicode via encode/decode.
Decoding a UTF-8 string via utf8string.decode('utf-8') produces a Python unicode string that may be concatenated with other unicode strings.
Python will automatically convert a string to unicode if it needs to by using the ASCII decoder. That will fail if there are non-ASCII characters in the string -- because, for example, it is encoded in UTF-8.

Allowed characters in CSS 'content' property?

I've read that we must use Unicode values inside the content CSS property i.e. \ followed by the special character's hexadecimal number.
But what characters, other than alphanumerics, are actually allowed to be placed as is in the value of content property? (Google has no clue, hence the question.)
The rules for “escaping” characters are in the CSS 2.1 specification, clause 4.1.3 Characters and case. The special rules for quoted strings, as in content property value, are in clause 4.3.7 Strings. Within a quoted string, any character may appear as such, except for the character used to quote the string (" or '), a newline character, or a backslash character \.
The information that you must use \ escapes is thus wrong. You may use them, and may even need to use them if the character encoding of the document containing the style sheet does not let you enter all characters directly. But if the encoding is UTF-8, and is properly declared, then you can write content: '☺ Я Ω ⁴ ®'.
As far as I know, you can insert any Unicode character. (Here's a useful list of Unicode characters and their codes.)
To utilize these codes, you must escape them, like so:
U+27BA Becomes \27BA
Or, alternatively, I think you may just be able to escape the character itself:
content: '\➺';
Source: http://mathiasbynens.be/notes/css-escapes

Is Encoding the same as Escaping?

I am interested in theory on whether Encoding is the same as Escaping? According to Wikipedia
an escape character is a character
which invokes an alternative
interpretation on subsequent
characters in a character sequence.
My current thought is that they are different. Escaping is when you place an escape charater in front of a metacharacter(s) to mark it/them as to behave differently than what they would have normally.
Encoding, on the other hand, is all about transforming data into another form, and upon wanting to read the original content it is decoded back to its original form.
Escaping is a subset of encoding: You only encode certain characters by prefixing a special character instead of transferring (typically all or many) characters to another representation.
Escaping examples:
In an SQL statement: ... WHERE name='O\' Reilly'
In the shell: ls Thirty\ Seconds\ *
Many programming languages: "\"Test\" string (or """Test""")
Encoding examples:
Replacing < with < when outputting user input in HTML
The character encoding, like UTF-8
Using sequences that do not include the desired character, like \u0061 for a
They're different, and I think you're getting the distinction correctly.
Encoding is when you transform between a logical representation of a text ("logical string", e.g. Unicode) into a well-defined sequence of binary digits ("physical string", e.g. ASCII, UTF-8, UTF-16). Escaping is a special character (typically the backslash: '\') which initiates a different interpretation of the character(s) following the escape character; escaping is necessary when you need to encode a larger number of symbols to a smaller number of distinct (and finite) bit sequences.
They are indeed different.
You pretty much got it right.