Hi I am a complete novice with Microsoft word wildcard function using ctrl h
I need some assistance with finding all capital letters in a word document and insert line break before each capital letter
Example
Perfect party wear Classic black jacket Open front Collarless
I need to change the above to:
Perfect party wear
Classic black jacket
Open front Collarless
I have tried using Find what:<[A-Z][a-z]{2,}> Replace with ^13 and it replaces the all the words that start with a capital letter with a line break instead of inserting a line break before the capital letter.
I would really appreciate some help please
Find what: <[A-Z][a-z]{2,}>
Replace with: ^p^&
Options: Use wildcards
^p is a paragraph mark (end of paragraph, same as typing <Enter>, not the caracter ¶). ^& means "the text what was found". Press the button Special with the cursor in the Find what or the Replace with fields to see (some of) the available specialties.
Related
So I got a body of space separated text and I'm trying to mine names. These names are the first :
Tsuru Stork greeting for a long last life. Unisex
Yama Mountain; Restrainer; Unisex
Yuka A bright Star Unisex
Yumi A beautiful archery bow Unisex
Yuna The archer Unisex
How can I select everything right of the first string in each row?
I figured out how to select the names themselves with this:
(\n+)[A-Z]{1}\w+
But there doesn't seem to be an easy way in word to highlight, copy then paste the selection.
In summary, how do I select elements after the first string in a new line?
If this is done in Microsoft Word then try the following:
*^13
This stands for:
- A space character.
* - Match any sequence of characters.
^13 - Match a newline character (ASCII 13).
If I understood your question correctly, this will highlight all text to the right of the first word in each line. See the below screenshot (don't mind the Dutch pls.):
If you actually need to make sure you select everything after the first multiple space seperation, then maybe use {3,4}*^13:
Again, don't mind the Dutch along with the locale parameter delimiter (semi-colon) in the occurence indicator. This will be a comma if your locale is English.
You can use regex like : (^\w+)
^ start of line
\w+ matches world char one or many times
Demo
I have a csv file that has random line breaks throughout the file. (probably load errors when the file was created where the loader somehow managed to put a carriage return into the field)
How do I go in and remove all carriage returns / line breaks where the last character is not "
I have word and sublime text available for text editors
I have tried ^p with a letter infront and find and replace, but that doesnt seem to work for some of the lines for some reason
Example
"3203","Shelving Units
",".033"
instead of
"3203","Shelving Units",".033"
and
"3206","Broom
","1.00"
instead of
"3206","Broom","1.00"
Menu > Find > Replace... or Ctrl+H
Select "Regular Expression" (probably a .* icon in the bottom left, depending on your theme).
Use \n to select newlines (LF) or \r\n (CRLF).
As #GerardRoche said you can use search and replace in Sublime Text. Open it via ctrl+h and press alt+r to toggle regex to enable it. (You may want to create a backup of your file before doing such changes.)
Search for (?<=[^"\n])\n+ and replace it with nothing, press Replace All or ctrl+alt+enter to replace it.
The regex just mean: search for alt least one (+) newlines (\n), that are preceded by something different than a quotation mark or a newline (?<=[^"\n]).
You don't need to worry about carriage returns, because ST only uses them when reading and writing the file and not in the editor.
I want to view a word document along with the unicode representation of the special characters.
For example, I want to a word doc containing :
Hi,
How are you ?
As ,
Hi \r\n How are you ?
Is there any way to do this?
Not programatically. Any software or software mode would suffice.
In Word, select the character and press "alt-x".
This appears to be unavailable in Word for Mac version 2016 (according to Microsoft Answers), or in Office 365's version 16.
If you want to see format control characters as visible symbols, which is what your example is about, then there does not seem to be any direct way. But if you click on the “¶” button (in the Start pane, Paragraph group in new versions of Word), Word adds symbols at ends of visible lines to indicate presence of such controls, e.g.
Hi,·¶
How·are·you?·¶
Here “¶” indicates the presence of CR (U+000D CARRIAGE RETURN, “\r”), whereas a symbol resembling “⤶” would indicate LF (U+000A LINE FEED, “\n”), which indicates a forced line break without paragraph break in Word. And “·” indicates a normal space (U+0020 SPACE), whereas “°” would indicate a no-break space (U+00A0 NO-BREAK SPACE).
For visible characters, the AltX method described by #JasonPlutext works well. You don’t even need to select the character. You can just click between it and the next character, to place the cursor there, and then press AltX.
I can write Arabic/Urdu/Persian on MS Word or Notepad just fine, but whenever I insert any English word or number, the sequence is just disturbed and seems like the all the words have been shuffled in the sentence.
Look at the example below:
یہ ایک مثال ہے اردو کی ...
Now I inserted an English word and it became:
یہ ایک مثال ہےword اردو کی ...
So you can see almost all of the words have been jumbled ... what is the solution for that ?
For example:
باللغة العربية “keyboard” انا أريد أن أعرف الكلمة
Finish typing the Arabic word and add a space after it (this space separates the embedded text from the Arabic text to its right).
Insert special character U+200F (to render the preceding space an Arabic character). The character name is "Right to Left Mark".
Insert special character U+202A (to begin the left-to-right embedding). The character name is "Left to Right Embedding".
Insert another space (to separate the embedded text from the Arabic text that will continue to its left).
Change the keyboard to e.g. English and type the left-to-right word.
Insert special character U+202C (to restore the bidrectional state to what it was before the left-to-right embedding). The Character name is "Pop directional formatting".
Change the keyboard back and continue writing in Arabic.
If you're working in Microsoft Office or Open Office, the "special characters" can be found under "insert" [Insert -> symbols -> other symbols -> special characters in MS 2013]. Scroll through until you find the character with the appropriate Unicode number, and if the Unicode number does not appear in your version of MS Word, select it by its name [as indicated above].
You can also add the character by writing it's unicode and then selecting it and pressing Alt+X - but that can be confusing because it needs constant change between Arabic and English.
All of the special characters involved in this little manoeuvre are invisible characters (their job is simply to change the direction of the text) so don't be surprised if it looks like you're not inserting anything.
Pay attention to select the RTL option from the ribbon when the majority of your paragraph is RTL and keep it selected [as shown in the picture in this answer https://stackoverflow.com/a/46050171/8558867 ].
Before you start typing in Arabic/Persian make sure you have chosen "Right-to-Left-Direction" button. This button can be found on Paragraph tab just left side of AZ sorting button. Also select "Align Text Right" button which can be found in Paragraph tab left side of Justify button.
Start typing your language
Before putting an English word put an space then select left ALT + SHIFT and type your English word
Once finished your English words select right ALT + SHIFT and then put a space and keep typing your language again
Hope this helps
This is OK; they're not shuffled: you're seeing them in LTR rendering mode.
You just need to make them right-to-left. In Notepad or Word, press right Ctrl+Shift to make their direction right-to-left and it will be okay. (It's like having <p dir="rtl">...</p> in HTML).
The control characters LRE and RLE (0x202A and 0x202B) and also LRM and RLM (0x200E and 0x200F) need to be applied to the whole paragraph, i.e they should come at the beginning of the sequence. Some text display widgets of some platforms may discard these control characters though, particularly older (pre-2000) platforms or those who do not support Unicode bidirectional algorithm correctly. Newer OS'es and programs should be fine; try with Windows Notepad for example.
I personally recommend using the platform's means to make the text RTL, and avoid special control characters because they're invisible and may cause surprising results if they go out of control. So you'd better use Word's API to make the text RTL, or if your output is HTML put them in <div dir="rtl">...</div> tags. For plain text file, user has to manually press the Ctrl+Shift keys himself.
Edit: this was written as a clarification answer to the first answer here, I later edited the first answer and added the important notes I wrote here [the edit still needs approval though].
I was able to fix my text by following the steps in the first answer here.
In case anyone faces troubles while following the steps, let me clarify some things:
If you are entering an English word in an Arabic text, make sure that RTL option in the ribbon is selected [circled in red in the following figure]:
Keep it selected throughout the paragraph irrespective of the language you are using [as long as the majority of the paragraph is written in an RTL language like Arabic or Hebrew].
Where to find the special characters and how to insert them:
You can write the unicode of the character and then select it and press "Alt + X". However, this can be a bit confusing because of the need to change back and forth between English and Arabic to write the codes, so the best thing to do is enter them 'manually' by inserting their names.
You can do that by going to Insert -> Symbol -> More Symbols -> Special characters [scroll down]. Then select the name of the characters you need to use instead of its unicode.
The names of the characters you'll need to use [as specified in the first answer here] are:
"Right to Left Mark" : U+200F.
"Left to Right Embedding": U+202A.
"Pop Directional Formatting": U+202C.
As the first answer says, nothing will appear on the screen because it's a non-printing character, so it's normal if you felt like nothing happened when you insert.
If you need to do it the other way around, that is, insert a Hebrew or Arabic word in an English text, just reverse the use of unicodes -- Or follow the steps in the following link: https://superuser.com/a/1247476/767967
If you want to know more about what the special characters do and what it means to make your paragraph LTR or RTL, visit the following link: http://dotancohen.com/howto/rtl_right_to_left.html#Directionality
Select the paragraph (e.g. using triple click) and use the button for right-to-left direction (¶◀) in the Paragraph section of the Start pane.
As Hossein’s answer explains, the issue is the directionality in the paragraph. It changes to left to right when you insert a Latin letter, and you need to fix this manually.
You need to add an invisible RLE Unicode Character at the start of the line [^].
It's : 0x202B hex = 8235 decimal or RIGHT-TO-LEFT EMBEDDING (RLE).
It's necessary for Notepad but MS-Word is able to handle it. you need to right align your text correctly.
How to enter RLE: http://www.fileformat.info/tip/microsoft/enter_unicode.htm
In word processing, you have a main text direction which is either left-to-right or right-to-left (or top to bottom, but let's ignore that :-), and you have a text direction for individual characters, which will also be left to right or right to left.
The word processor splits the text into chunks of strings with the same character ordering, then displays these chunks according to the main text ordering.
It seems that your main text ordering was left to right. As long as all your text is arabic, there is just one chunk with arabic text. You see already it is displayed left aligned and not right aligned because the text ordering is left to right. The characters are displayed right to left because that is how arabic is displayed.
When you inserted latin text, you had three chunks: Arabic, latin, arabic. These three chunks are displayed left to right because that is the main text ordering. That would be fine for text that is mostly latin (like "The arabic words for dog and cow are ... and ..."). For text that is mostly arabic with the occasional latin word, you need to change the main text ordering to "right to left".
Just follow this:
Copy and paste the arabic text into from word or text document to ADOBE Illustrator.
Save the illustrator document as in .EPS format.
Open indesign and place the .EPS document into the place you want.
Since indesign can't handle arabic text issue by it self, this method will help many designers.
Suppose you have this file:
x
a
b
c
x
x
a
b
c
x
x
and you want to find the sequence abc (and select the whole 3 lines) with Notepad++ . How to express the newline in regex, please?
Notepad++ can do that comfortably, you don't even need regexes
In the find dialogue box look in the bottom left and switch your search mode to Extended which allows \n etc.
As odds on you're working on a file in windows format you'll be looking for \r\n (carriage return, newline)
a\r\nb\r\nc
Will find the pattern over three lines
Update 18th June 2012
With the new Notepad++ v6, you can indeed search for newlines with regexes. So you can just use
a\r\nb\r\nc
even with regular expressions to accomplish what you want. Note \r\n is Windows encoding of line-breaks. In Unix files, its just \n.
Unfortunately, you can't do that in Notepad++ when using regex search. Notepad++ is based on the Scintilla editor component, which doesn't handle newlines in regex.
You can use extended search for newline searching, but I don't think that will help you search for 3 lines.
More info here.
Update: Robb and StartClass0830 were right about extended search. It does work, but not when using regular expressions search.
^a\x0D\x0Ab\x0D\x0Ac
This will work \x0D is newline and \x0A is carriage return. Assumption is that each line in your file ends with ascii 10 and 13.
I found a workaround for this.
Simply, in Extended mode replace all \r\n to a string that didn't exist in the rest of the document eg. ,,,newline,,, (watch out for special regexp chars like $, &, and *).
Then switch to Regexp mode, do some replacements (now newline is ,,,newline,,,).
Next, switch to Extended mode again and replace all ,,,newline,,, to \r\n.
For Notepad 6 and beyond, do this as a regular expression:
Select Search Mode > Regular expression (w/o . matches newline)
And in the Find what Textbox : a[\r\n]b[\r\n]+c[\r\n]
or if you are looking at the (Windows or Unix) file to see its line breaks as \r\n or \n then you may find it easier to use Extended Mode:
Select Search Mode > Extended (\n, \r, \t, \0, \x...)
And in the Find what Textbox for Windows: a\r\nb\r\nc\r\n
Or in the Find what Textbox for Unix: a\nb\nc\n
Wasn't clear if the OP intent is to select the trailing line return (after the 'c') as well, as would be necessary to remove the lines.
To not select the trailing line return, as appropriate for replacing with a non-empty string, simply remove the final line return from the matching statement.
Note that if there should be a match on the last line of the string, without a matching trailing line return, the match fails.
a\r\nb\r\nc works for me, but not ^a\x0D\x0Ab\x0D\x0Ac
Hmm, too bad that newline is not working with regular expressions. Now I have to go back to Textpad again. :(
Select Search Mode Which is
Extended (\n, \r, \t, \0, \x...)
\n is new line and such
This is Manuel
Find: "(^a.$)\r\n(b.)\r\n^(c.*)$" - pickup 3 whole lines, only storing data
Replace with: "\1\2\3" - Put down (replay) data
Works fine in Regex with Notepad++ v7.9.5
Place holders: ^ Start and $ End of line can be inside or out of ()store as shown, though clearly not necessary in given example. Note "[^x]" is different - here "^" is "NOT".
Advantage of storing and replay allows much more complicated pattern match without having to type in again what you want to end up with, and even change of replay: "\2\3\1" for "bca"
I have run accross this little issue when the document is windows CR/LF
If you click the box for . to match newlines you need .. to match CR/LF so if you have
<blah><blah>",
"<more><blah>
you need to use ",.." to match some string comma cr/lf another string
In Notepad++ you can also try highlighting the desired part of the text and then pressing CTRL+J.
That would justify the text and thus removing all line endings.