Removing some paragraph marks in a word document - ms-word

I copy text from PDF files into word 2010 documents using Abbyy conversion software. I find the result will contain many line breaks which are incorrect. Is there any way I can remove any such marks if they are not preceded by either "." or "?" or "!"
I write macros in excel but have no experience of word coding

You could do a search and replace depending if you can find some sort of rules wich you can apply. Mayeby a little screenshot?

Related

Is it possible to ignore paragraph marks when using getTextRanges() in word add in?

I am currently developing a word addin using the office js library. I need to get all sentences in the word document as individual ranges. For this I used getTextRanges() on the body of the document with "." as the delimiter. However, it also separates on paragraph mark which is not ideal for my use case. All I want is for the document to be divvied up into ranges where the only delimiter is "." - regardless of whether the ranges will then expand across paragraphs.
Is there a way to ignore paragraph marks with getTextRanges(), or is there another method entirely that I seem to have overlooked?
Thanks.
I have been unable to resolve it.

VSCode multiline search of two words?

I saw a SO post that says you can search using regex or an actual literal text on it to search multiline texts. But what if you want to (quickly) search two or three of words within a specified lines of text content?
For example, what if you want to search for multiline text area that contains "ruby" and "regex" (assuming you want to know where you took a note on your txt (or markdown or rich text format) file. you may want to search for "how to use regex in ruby" or "the ruby regex tutorial", right? )
Now you can use a simple (but redundant) regex like ruby(.*\n)+regex|regex(.*\n)+ruby. But to me it doesn't look beautiful. For three or more words, this kind of regex workaround increases its redundancy exponentially also, not good.
So is there a smarter way to do this? Thanks.

VS Code - Select Current Word in JSON Files Includes Quotes

Using cmd+d to select the current word in VS Code also annoyingly selects any quotes that may surround the word. Is there any way to prevent this?
Edit: This appears to only happen in JSON files.
I can't reproduce the quotes selection with Cmd+D but I can with the Expand Selection command..
If you still have this issue. it may be fixed in vscode v1.69. The quotes will no longer be considered part of a json word. From the release notes:
Every language comes with a word pattern that defines which characters
belong to a word when in that language. JSON was different to all
other language in that it included the quotes of string literals and
the full string literal content. It now follows other languages and
contains just letters, numbers and hyphens.
The change will fix issues when expanding the selection (Command: Expand Selection), allow word completions inside strings and fix some
quick suggestion abnormalities.

MS Word: Carriage Returns in numbering format

in MS Word 2010, is it possible to include a manual line breaks in the formatting for a numbered list?
What I mean is I'm creating a style that includes numbering in a list. I'd like the list to appear like this:
Section 1[MLB]
Benefits
Section 2[MLB]
Drawbacks
etc.
I'm in the Define New Number Format dialog box, trying to find a way to include a manual line break in the Number Format field. I've got the word "Section" in there, but the line break is a problem so far. I've tried ^|, which is the search-and-replace code for manual line breaks. But that includes a literal carat followed by a pipe. Is there some other way of including things like paragraph breaks or line breaks in numbered lists? Thanks everyone.

How can I use the DocX library to change the font globally, remove superfluous spaces, and remove or add extra line breaks?

I want to, using the DocX library [https://docx.codeplex.com/], convert a .docx document to use a different font. Does anybody know how to do that? The samples projects are very spare, and the documentation is nonexistent.
I find, too, that often there are extraneous spaces in documents, and I want to iterate over all these until there are never two contiguous spaces. I can do this in a loop, I guess, replacing " " (2 spaces) with " " (1 space) until " " (2 spaces) is no longer found.
However, I also want to remove superfluous line breaks that sometimes occur when copying-and-pasting text into a document. I can do it "manually" (in Libre Office, not sure how it's done in MS Word), as I got an answer to this question:
(select "Regular Expressions" and then replace "$" (without the quotes) with a space)
...but how programmatically, with DocX?
Additionally, in some cases I want to ADD line breaks/"paragraph returns" where there are legitimate line breaks between the end of one paragraph and the start of another, but no extra line to separate them visually. According to this:
...I can add a paragraph/line break to a legitimate line break by searching for "$" and replacing that with "\n\n"
This does work, too (manually, in Libre Office); but again...how to do this with the DocX library?
It appears that not all of this is possible with the current version of the DocX library you are using. If it is not exposed in documentation, the functions might as well not exist, and you should not be using undocumented features.
There is a much more mature library available, however, called the "Open XML SDK", that can do everything you need.
The correct way to change a font, regardless of whether you are doing it with the document editor, or you are writing a program to manipulate these files, is to change the appropriate text's style attribute, or changing the definition of style in use.
You should never, ever, ever, ever directly change the font of any text. Personally, I think that the 'font type' and 'font size' menus should be removed entirely from word/libreoffice/etc, and only be accessible inside a 'change style properties' dialog; the only reason to directly apply a font is if you are actually providing an example of particular typeface under discussion!
See How to: Replace the styles parts in a word processing document (Open XML SDK) from the MSDN documentation for a description of the way that works.
To search and replace text, the applicable MSDN page is How to: Search and replace text in a document part (Open XML SDK). For specifically replacing multiple spaces with a single space, there are numerous results on Google that should all work to at least some degree.