I want to find and replace an ordered list in word from the . to the ) - ms-word

I have tried [0-9] and checked the use wildcard box but it replaces the individual numbers with the literal [0-9] string. How do I replace with the number it found plus a character?

Backreferences. Your unspecified environment may or may not support them, but if it does, you would:
replace \([0-9]*\)
with \1 <then, whatever the character you want is>

Related

Add words at beginning and end of a FASTA header line with sed

I have the following line:
>XXX-220_5004_COVID-A6
TTTATTTGACATGAGTAAATTTCCCCTTAAATTAAGGGGTACTGCTGTTATGTCTTTAAA
AGAAGGTCAAATCAATGATATGATTTTATCTCTTCTTAGTAAAGGTAGACTTATAATTAG
AGAAAACAAC
I would like to convert the first line as follows:
>INITWORD/XXX-220_5004_COVID-A6/FINALWORD
TTTATTTGACATGAGTAAATTTCCCCTTAAATTAAGGGGTACTGCTGTTATGTCTTTAAA
AGAAGGT...
So far I have managed to add the first word as follows:
sed 's/>/>INITTWORD\//I'
That returns:
>INITWORD/XXX-220_5004_COVID-A6
TTTATTTGACATGAGTAAATTTCCCCTTAAATTAAGGGGTACTGCTGTTATGTCTTTAAA
AGAAGGT
How can i add the FINALWORD at the end of the first line?
Just substitute more. sed conveniently allows you to recall the text you matched with a back reference, so just embed that between the things you want to add.
sed 's%^>\(.*\)%>INITWORD/\1/FINALWORD%I' file.fasta
I also added a ^ beginning-of-line anchor, and switched to % delimiters so the slashes don't need to be escaped.
In some more detail, the s command's syntax is s/regex/replacement/flags where regex is a regular expression to match the text you want to replace, and replacement is the text to replace it with. In the regex, you can use grouping parentheses \(...\) to extract some of the matched text into the replacement; so \1 refers to whatever matched the first set of grouping parentheses, \2 to the second, etc. The /flags are optional single-character specifiers which modify the behavior of the command; so for example, a /g flag says to replace every match on a line, instead of just the first one (but we only expect one match per line so it's not necessary or useful here).
The I flag is non-standard but since you are using that, I assume it does something useful for you.

SCALA Replace with $

I want replace a Letter with a literal $. I tried:
var s = string.replaceAll("Register","$10")
I want that this text Register saved to be changed to: $10 saved
Illegal group reference is the error I get.
If you look at the scaladoc for replaceAll, you'll see that it takes a regular expression string as the parameter. Escape the $ with a \, or use replaceAllLiterally
replaceAll uses a regular expressions to find the match. In the replacement string $ is a special character that refers to a specific capture group in the matching string. You have no capture groups so this is an error. It's not what you want anyway since you want the literal text "$10".
Usereplaceinstead ofreplaceAll`. It just does a direct string replacement.

Text file search for match strings regex

I am trying to understand how regex works and what are the possibilities of working with it.
So I have a txt file and I am trying to search for 8 char long strings containing numbers. for now I use a quite simple option:
clear
Get-ChildItem random.txt | Select-String -Pattern [0-9][a-z] | foreach {$_.line}
It sort of works but I am trying to find a better option. ATM it takes too long to read through the left out text since it writes entire lines and it does not filter them by length.
You can use a lookahead to assert that a string contains at least 1 digit, then specify the length of the match and finally anchor it with ^ (start of string) and $ (end of string) if the string is on a line of its own, or \b (word boundary) if it's part of an HTML document as your comments seem to suggest:
Get-ChildItem C:\files\ |Select-String -Pattern '^(?=.*\d)\w{8}$'
Get-ChildItem C:\files\ |Select-String -Pattern '\b(?=.*\d)\w{8}\b'
The pattern [0-9][a-z] matches a digit followed by a letter. If you want to match a sequence of 8 characters use .{8}. The dot in regular expressions matches any character except newlines. A number in curly brackets matches the preceding expression the given number of times.
If you want to match non-whitespace characters use \S instead of .. If you want to match only digits and letters use [0-9a-z] (a character class) instead of ..
For a more thorough introduction please go find a tutorial. The subject is way too complex to be covered by a single answer on SO.
What you're currently searching for is a single number ranging from 0-9 followed by a single lowercase letter ranging from a-z.
this, for example, will match any 8 char long strings containing only alphanumeric characters.
\w{8}
i often forget what some regex classes are, and it may be useful to you as a learning tool, but i use this as a point of reference: http://regexr.com/
It can also validate what you're typing inline via a text field so you can see if what you're doing works or not.
If you need more of a tutorial than a reference, i found this extremely useful when i learned: regexone.com

Unable to use '*' to search/replace -- sed

I want to make all a.b.c.top*.gz mentions to new-word/new-table.
Something like -->
es.fr.en.top20.gz becomes binarised-model/phrase-table
I did this :
sed -i 's/es\.fr\.en\.top*\.gz/binarised-model\/phrase-table/g' top*/mert-work/moses.ini
I had initially not used backslash before periods, but, once it did not work, I thought maybe period is tricky.
But, it does not seem to replace anything. What's going wrong ?
Thanks !
Using * as a wildcard is correct for bash globbing, but not if you work with regex, which is the case when using sed. Instead of *, try .*.
In regex, * means match the preceding character any number of times. The wildcard character is ., so .* matches any number of any characters.
If you know that the character you want to match is always a number, it's safer to use [0-9]*. If you even know how many characters this number will have, then you can even use e.g. [0-9]\{2\} to match exactly two numerals.
Sed uses regular expressions, not shell globbing. That means that (1) . matches any single character except a newline, so you are right to escape them to match a literal dot, and (2) * matches zero or more of the token preceding it, here that's p. You need
sed -i 's/es\.fr\.en\.top.*\.gz/binarised-model\/phrase-table/g' top*/mert-work/moses.ini
# ˆ

sed - remove specific subscript from string

please provide me a sed oneliner which provides this output:
sdc3 sdc2
for Input :
sdc3[1] sdc2[0]
I mean remove all subscript value from the string ..
sed 's/\[[^]]*\]//g'
reads: substitute any string with literal "[" followed by zero or more characters that aren't a "]", and then the closing "]", with an empty string.
You need the [^]] bit to prevent greedy matching treating "[1] sdc2[0]" as a single match in your sample string.
As for your comment:
sed 's#\([^[ ]*\)\[[^]]*\]#/dev/\1#g'
I switch the seperator from the usual '/' to '#', just to avoid escaping the /dev/ bit you asked for (I won't say "for clarity")
the \(...\) bit matches a subgroup, here sdc2 or whatever, so we can refer to it in the replacement
the subgroup uses a similar character class to the one we used discarding the index: [^[ ] means any character except an "[" (again, to avoid greedily matching the index) or a space (assuming your values are space-delimited as per your post)
the replacement is now the literal "/dev/" followed by the first (and only) subgroup match
the g flag at the end tells it to perform multiple matches per line, instead of stopping at the first one