I am using search and replace functions in Word Macros. But I could not figure out the basic difference between ^13 and ^p.
These are regular expressions.
use the ^p character code to search for paragraph marks.
In MS Word, the ^13 character matches the paragraph mark at the end of each line.
So, no difference
Related
I am trying to combine a symbol from the font range "combining diacritical marks for symbols" (20D5, Combining Clockwise Arrow Above) with an ordinary letter, and no luck. In fact, I don't seem to be able to combine it with any other character.
Now, I'm attempting this in MS Word 2010, using Arial Unicode MS, but I'm suspecting this is a question about unicode font combining rules, not about Word per se. (And FWIW, I do know the Word procedure to combine a normal diacritical with a normal character).
So the name of the group that 20D5 belongs to says "for symbols". So perhaps there's a rule that says it must combine only with "symbols". So I tried it with Currency Symbols, Letterlike Symbols, Miscellaneous Symbols, and no success.
So what characters are these "combining diacritical marks for symbols" supposed to combine with?
Every programming language has their own interpretation of \n and \r.
Unicode supports multiple characters that can represent a new line.
From the Rust reference:
A whitespace escape is one of the characters U+006E (n), U+0072 (r),
or U+0074 (t), denoting the Unicode values U+000A (LF), U+000D (CR) or
U+0009 (HT) respectively.
Based on that statement, I'd say a Rust character is a new-line character if it is either \n or \r. On Windows it might be the combination of \r and \n. I'm not sure though.
What about the following?
Next line character (U+0085)
Line separator character (U+2028)
Paragraph separator character (U+2029)
In my opinion, we are missing something like a char.is_new_line().
I looked through the Unicode Character Categories but couldn't find a definition for new-lines.
Do I have to come up with my own definition of what a Unicode new-line character is?
There is considerable practical disagreement between languages like Java, Python, Go and JavaScript as to what constitutes a newline-character and how that translates to "new lines". The disagreement is demonstrated by how the batteries-included regex engines treat patterns like $ against a string like \r\r\n\n in multi-line-mode: Are there two lines (\r\r\n, \n), three lines (\r, \r\n, \n, like Unicode says) or four (\r, \r, \n, \n, like JS sees it)? Go and Python do not treat \r\n as a single $ and neither does Rust's regex crate; Java's does however. I don't know of any language whose batteries extend newline-handling to any more Unicode characters.
So the takeaway here is
It is agreed upon that \n is a newline
\r\n may be a single newline
unless \r\n is treated as two newlines
unless \r\n is "some character followed by a newline"
You shall not have any more newlines beside that.
If you really need more Unicode characters to be treated as newlines, you'll have to define a function that does that for you. Don't expect real-world input that expects that. After all, we had the ASCII Record separator for a gazillion years and everybody uses \t instead as well.
Update: See http://www.unicode.org/reports/tr14/tr14-32.html#BreakingRules section LB5 for why \r\r\n should be treated as two line breaks. You could read the whole page to get a grip on how your original question would have to be implemented. My guess is by the point you reach "South East Asian: line breaks require morphological analysis" you'll close the tab :-)
The newline character is declared as 0xA from this documentation
Sample: Rust Playground
// c is our `char`
if c == 0xA as char {
println!("got a newline character")
}
I'm having trouble breaking a word into its individual unicode components. I'm working with the devanagari script using google input tools. An example is र्म (pronounced -rm), which I want to break into म (-m) and the that hook at the top (-r). But I can't seem to find the unicode character that corresponds to the hook at the top. Here's some of the solutions I tried
1. copy and past र्म into MS word and hit alt x. But this breaks the word into र् and म. It doesn't give me the unicode character for the top hook
2. I tried the site http://shapecatcher.com/. I found a character called latin egyptological ain; while similar in shape, it cannot be used on top of another character. I'm looking the conjunct version of the hook.
Any help would be appreciated. I'm using TekMaker on Windows 8.
The ‘hook at the top’ representing a preceding र् is an inseparable part of the glyph for a variety of biconsonantal ligatures. It's not a discrete, freely-combinable diacritical mark as we would understand it in Latin-like scripts.
Consequently the visual rendering element doesn't have its own Unicode representation distinct from its linguistic meaning र्, sorry!
I want to create an add-in for Word to convert between Vietnamese character sets but I don't how to scan each character in document and check its font to convert (there are many charsets and fonts in VN) after has converted I want to replace origin character with the converted character.
This is covered here: How can I loop through every letter in MS Word using VBA?
I need to bookmark parts of a document from the name of paragraphs but the name of a paragraph is not always a valid name for a bookmark name. I have not found on Google or MSDN an exhaustive list of limitations for bookmark names.
What special characters are forbidden?
The only thing I found is that the length must not exceed 40 characters.
If you are familiar with regular expressions, I would say it is
^(?!\d)\w{1,40}$
Where \w refers to the range of Unicode word characters, which also contain the underscore and the digits from 0-9.
In plain English: The bookmark name must...
be between 1 and 40 characters long
consist of any combination of Unicode letters, digits, underscores
not start with a digit
not contain any kind of white space or punctuation
As stated in the comments, bookmark names beginning with an underscore are treated as hidden. They will not appear in the regular user interface, but they can be used from VBA code. It it is not possible to create bookmarks that begin with an underscore via the regular user interface, but you can do it through VBA code with Bookmarks.Add().