When I type a comma in VS code I get
,
but what I really want is
,
I'm not sure why there's two types of commas, but the first one is giving me run time errrors in python
"I'm not sure why there's two types of commas"
Well ... it is because Unicode defines them1. The first one is U+FF0C : FULLWIDTH COMMA. The second one is U+002C : COMMA
The latter is the one that should be bound to the "comma" key on your keyboard, but it is possible that something has changed your VSCode key bindings. This page describes how to examine your VSCode key bindings.
But I think the more likely explanation is that you have copy-pasted some source code from (for example) a PDF file that is using U+FF0C instead of U+002C for cosmetic reasons ... or something. It ia also possible that they were placed there by the author of the original document, or by their word-processing software.
You could try using the Gremlins extension to highlight any potentially troublesome characters in your source code.
1 - According to Wikipedia, the purpose is "so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to/from Unicode.".
Related
I would like to display list.emptyqm() as list.empty?() in function names for specific language. So, two symbols qm if they are at the end of the function name should be displayed as ? (possibly some unicode symbol looking similar to question mark).
Is that possible in VSCode?
The VSCode already knows that piece of text is string, or function-name/keyword/variable-name (as it highlights it properly), so the ligature should be displayed only if qm are the last
characters of function-name/keyword/variable-name. It shouldn't be displayed in the middle of the function name, like aqma() shouldn't be displayed as a?a().
You seem to misunderstand what a ligature is. A ligature describes how two individual letters can be combined to form a visual pleasing appearance. A ligature never changes the syntax of a text. Hence, converting qm to ? is a completely different thing.
Replacing text in vscode is of course possible, for instance as part of the format command. You can register your own formatter and determine the text edit actions that you want to be applied, including the transformation of these character sequences.
In a mapping editor, the display is correct after the legacy to unicode conversion for DEVANAGARI text shown using a unicode font (Arial Unicode MS). However, in MS-WORD, the display isn't as expected for the same unicode text in the unicode font (Arial Unicode MS) or any other Devanagari unicode fonts. The expected sequence of unicodes are provided as per the documentation. The sequence can be seen on the left-hand side table.
Please let me know where I am going wrong.
Thanks for your help!
Does your map have to insert the zero_width_joiner? The halant (virama) by itself is enough to get the half-consonant (for some combinations) and in particular, it may be that Word is using the presence of the ZWJ to keep them separate.
If getting rid of the ZWJ doesn't help, another possibility is that Word may be treating the individual characters of the text string as individual "runs" of text.
If those first 4 characters are not in a single run, this can happen.
[aside: the way to tell if it's being treated as a single run, is to save the document as an xml file and then open it with something like notepad++ and look at the xml "w:t" element (IIRC) associated with these characters. If they're all in separate w:t elements, it means they're in separate runs. In that case, you might need to copy the text from Word to some other tool (e.g. Notepad++) and then copy it from there and paste it back in Word -- that might cause it to be imported into Word in a single run.
I have a document which has been sloppily authored. It's a dictionary that contains cyrillic characters. Most of the dictionary is manageable, but I'm stuck with one thing I need help with. Words have accented letters in them and they're mostly formatted properly as a letter with a unicode accent (thus forming a single letter). However there are some very peculiar letters that look similar for example to: a;´ (where "a" is any arbitrary cyrillic letter). You'd expect á in its place. However it wouldn't be a problem per se if only this thing could be exported to, say HTML and manipulated in a text editor. The problem is that Word treats this "thing" as a single character/entity and
when exporting it is COMPLETELY omitted
when copied it can only be pasted into Notepad (which translates it into three separate characters), when being pasted into WordPad it just won't appear at all.
when a search is run in Word it won't find the letter, neither the actual character nor the exactly copied/pasted combination.
the letter will disappear when the document is opened in any other software, such as Libre Office
At this point I'm trying to:
understand what this combination is exactly
run a search/replace operation to find and weed out all of those errors
Here's a sample Word file.
Here's a screenshot of the word/letter in question:
which when typed correctly should appear like "скре́пка".
The 'character' appears to be a Word field of type 'eq' (equation). Here is the field with toggled field codes:
If it is a large document you could try to create a VBA routine that removes the fields and replaces them with corresponding characters.
Assuming that #Anonimista’s analysis is correct, as I think it is, you could fix the file by running some search and replace operations in Word, replacing e.g. ^19eq \o(е;´)^21 by е́ (the latter is Cyrillic letter е followed by combining acute accent U+0301). This is dull because you would need to do this for each vowel separately (and for uppercase vowels too). But I cannot find a way to use wildcards in this context; the codes ^19 and ^21 for start and end of field work only when wildcards are not enabled.
I have built a set of scripts, part of which transform XML documents from one vocabulary to a subset of the document in another vocabulary.
For reasons that are opaque to me, but apparently non-negotiable, the target platform (Java-based) requires the output document to have 'encoding="UTF-8"' in the XML declaration, but some special characters within text nodes must be encoded with their hex unicode value - e.g. '”' must be replaced with '”' and so forth. I have not been able to acquire a definitive list of which chars must be encoded, but it does not appear to be as simple as "all non-ASCII".
Currently, I have a horrid mess of VBScript using ADODB to directly check each line of the output file after processing, and replace characters where necessary. This is painfully slow, and unsurprisingly some characters get missed (and are consequently nuked by the target platform).
While I could waste time "refining" the VBScript, the long-term aim is to get rid of that entirely, and I'm sure there must be a faster and more accurate way of achieving this, ideally within the XSLT stage itself.
Can anyone suggest any fruitful avenues of investigation?
(edit: I'm not convinced that character maps are the answer - I've looked at them before, and unless I'm mistaken, since my input could conceivably contain any unicode character, I would need to have a map containing all of them except the ones I don't want encoded...)
<xsl:output encoding="us-ascii"/>
Tells the serialiser that it has to produce ASCII-compatible output. That should force it to produce character references for all non-ASCII characters in text content and attribute values. (Should there be non-ASCII in other places like tag or attribute names, serialisation will fail.)
Well with XSLT 2.0 you have tagged your post with you can use a character map, see http://www.w3.org/TR/xslt20/#character-maps.
As we're enabling our (English) application to be localized, we're replaced all in-line strings with NSLocalizedString() calls. Since all of our English strings are all there in-line with the code, e.g. NSLocalizedString(#"OK, #"OK button in a message box"), is there any reason we need the English version of Localizable.strings? When we try removing the strings from the English Localizable.strings, the program seems to work fine. Just wanted to double check if there was some side-effect to not having that around. Thanks, alex
One of the main points of using the NSLocalizedString() macro is so that your programming code can be parsed with the genstrings command-line tool to generate the corresponding Localizable.strings file(s) (see Resource Programming Guide: About the String-Loading Macros and Using the Genstrings Tool to Create Strings Files).
That Localizable.strings file then serves as a starting point for your translators, to use to translate to another language. Without that file to work with, your translators would basically need access to your source code in order to see all the strings you want to use (which kind of defeats the purpose).
Yes, your English version works fine right now, since if a localized version of the string you try to get in code –– for example, NSLocalizedString(#"OK", #"") –– cannot be found in a .strings file, it simply uses the #"OK" string that you passed in.
Another reason why you should likely be keeping the English Localizable.strings is that you should generally try to avoid using high-ASCII characters in your code, but should use the full range of available characters in your actual user interface. For example, you may not want to put the following characters in your code, but would want to use them in your user interface:
… (horizontal ellipsis) (U+2026)
“ (left double quotation mark) (U+201C)
” (right double quotation mark) (U+201D)
‘ (left single quotation mark) (U+2018)
’ (right single quotation mark) (U+2019)
So in code, you'd do something like this:
NSLocalizedString(#"Add Bookmark...", #"")
and then in your .strings file (which is UTF16, so this is fine):
"Add Bookmark..." = "Add Bookmark…";
It's not recommended to use the words as keys, if you want to change that Ok for something else and you have any other Localizable.strings, you will have to edit every single of them to update the Okkey.
Surely your translators will want your Localizable.strings file to work from. And want it to be ALL the strings.
(It is true that the system will fall back to the key, but relying on that seems like bad practice. And you'll find it more reliable to use proper punctuation in a Unicode file, which source code seldom is.)