VSCode multiline search of two words? - visual-studio-code

I saw a SO post that says you can search using regex or an actual literal text on it to search multiline texts. But what if you want to (quickly) search two or three of words within a specified lines of text content?
For example, what if you want to search for multiline text area that contains "ruby" and "regex" (assuming you want to know where you took a note on your txt (or markdown or rich text format) file. you may want to search for "how to use regex in ruby" or "the ruby regex tutorial", right? )
Now you can use a simple (but redundant) regex like ruby(.*\n)+regex|regex(.*\n)+ruby. But to me it doesn't look beautiful. For three or more words, this kind of regex workaround increases its redundancy exponentially also, not good.
So is there a smarter way to do this? Thanks.

Related

Is it possible to ignore paragraph marks when using getTextRanges() in word add in?

I am currently developing a word addin using the office js library. I need to get all sentences in the word document as individual ranges. For this I used getTextRanges() on the body of the document with "." as the delimiter. However, it also separates on paragraph mark which is not ideal for my use case. All I want is for the document to be divvied up into ranges where the only delimiter is "." - regardless of whether the ranges will then expand across paragraphs.
Is there a way to ignore paragraph marks with getTextRanges(), or is there another method entirely that I seem to have overlooked?
Thanks.
I have been unable to resolve it.

officejs : Search Word document using regular expression

I want to search strings like "number 1" or "number 152" or "number 36985".
In all above strings "number " will be constant but digits will change and can have any length.
I tried Search option using wildcard but it doesn't seem to work.
basic regEx operators like + seem to not work.
I tried 'number*[1-9]*' and 'number*[1-9]+' but no luck.
This regular expression only selects upto one digit. e.g. If the string is 'number 12345' it only matches number 12345 (the part which is in bold).
Does anyone know how to do this?
Word doesn't use regular expressions in its search (Find) functionality. It has its own set of wildcard rules. These are very similar to RegEx, but not identical and not as powerful.
Using Word's wildcards, the search text below locates the examples given in the question. (Note that the semicolon separator in 1;100 may be soemthing else, depending on the list separator set in Windows (or on the Mac). My European locale uses a semicolon; the United States would use a comma, for example.
"number [0-9]{1;100}"
The 100 is an arbitrary number I chose for the maximum number of repeats of the search term just before it. Depending on how long you expect a number to be, this can be much smaller...
The logic of the search text is: number is a literal; the valid range of characters following the literal are 0 through 9; there may be one to one hundred of these characters - anything in that range is a match.
The only way RegEx can be used in Word is to extract a string and run the search on the string. But this dissociates the string from the document, meaning Word-specific content (formatting, fields, etc.) will be lost.
Try putting < and > on the ends of your search string to indicate the beginning and ending of the desired strings. This works for me: '<number [1-9]*>'. So does '<number [1-9]#>' which is probably what you want. Note that in Word wildcards the # is used where + is used in other RegEx systems.

MongoDB Text Search AND multiple search words with word stemming

I am trying to search for multiple words in text inclusively(AND operation)
without losing word stemming.
For example:
db.supplies.runCommand("text", {search:"printers inks"})
should return results with (printer and ink) or (printers ink) or (printers ink) or (printers inks) , instead of all results with either printer or ink.
This post covers the search for multiple words as an AND operation, but the solution doesn't search for stemmed words ->MongoDB Text Search AND multiple search words.
The only way I could think of is creating a permutation of all the words and then running the search for the number of permutations(which could be large)
This may not be an effective way to search on a large collection.
Is there a better and smarter way to do it ?
So is there a reason you have to use a text search? If it were me i would use a regular expression.
https://docs.mongodb.com/manual/reference/operator/query/regex/
Off the top of my head something like this.
db.collection.find({products:/printers inks|printers|inks/})
Now i suppose you can do the same thing with a text search too.
db.collection.find({$text:{$search : "\"printers inks\" printers inks"}})
note the escaped quotes.

Removing some paragraph marks in a word document

I copy text from PDF files into word 2010 documents using Abbyy conversion software. I find the result will contain many line breaks which are incorrect. Is there any way I can remove any such marks if they are not preceded by either "." or "?" or "!"
I write macros in excel but have no experience of word coding
You could do a search and replace depending if you can find some sort of rules wich you can apply. Mayeby a little screenshot?

How can I use the DocX library to change the font globally, remove superfluous spaces, and remove or add extra line breaks?

I want to, using the DocX library [https://docx.codeplex.com/], convert a .docx document to use a different font. Does anybody know how to do that? The samples projects are very spare, and the documentation is nonexistent.
I find, too, that often there are extraneous spaces in documents, and I want to iterate over all these until there are never two contiguous spaces. I can do this in a loop, I guess, replacing " " (2 spaces) with " " (1 space) until " " (2 spaces) is no longer found.
However, I also want to remove superfluous line breaks that sometimes occur when copying-and-pasting text into a document. I can do it "manually" (in Libre Office, not sure how it's done in MS Word), as I got an answer to this question:
(select "Regular Expressions" and then replace "$" (without the quotes) with a space)
...but how programmatically, with DocX?
Additionally, in some cases I want to ADD line breaks/"paragraph returns" where there are legitimate line breaks between the end of one paragraph and the start of another, but no extra line to separate them visually. According to this:
...I can add a paragraph/line break to a legitimate line break by searching for "$" and replacing that with "\n\n"
This does work, too (manually, in Libre Office); but again...how to do this with the DocX library?
It appears that not all of this is possible with the current version of the DocX library you are using. If it is not exposed in documentation, the functions might as well not exist, and you should not be using undocumented features.
There is a much more mature library available, however, called the "Open XML SDK", that can do everything you need.
The correct way to change a font, regardless of whether you are doing it with the document editor, or you are writing a program to manipulate these files, is to change the appropriate text's style attribute, or changing the definition of style in use.
You should never, ever, ever, ever directly change the font of any text. Personally, I think that the 'font type' and 'font size' menus should be removed entirely from word/libreoffice/etc, and only be accessible inside a 'change style properties' dialog; the only reason to directly apply a font is if you are actually providing an example of particular typeface under discussion!
See How to: Replace the styles parts in a word processing document (Open XML SDK) from the MSDN documentation for a description of the way that works.
To search and replace text, the applicable MSDN page is How to: Search and replace text in a document part (Open XML SDK). For specifically replacing multiple spaces with a single space, there are numerous results on Google that should all work to at least some degree.