how to replace the second occurrent in emacs evil? - emacs

I have the following text
tset "abc" "123" kk
test "xyz" "345" zz
How to replace the second string inside double quotas? So the result should be
tset "abc" "replaced" kk
test "xyz" "replaced" zz

An Evil mode solution would be to use this command:
:g/".*".*".*"/norm 3f"lct"replaced
Which means:
g/".*".*".*" - On any line containing the regex (which matches two quoted strings)
norm - In normal mode
3f"lct"replaced - go to the third ", move one character right, and change the text until the next " to "replaced"
It also takes ranges, so you can use it on a subset of lines if you want to.

I'm assuming you want to do that in an regular emacs buffer.
I'm also assuming you want to have the second string always replaced with the same value.
Then you can use the replace-regexp command in emacs as follows:
M-x replace-regexp <RET> \(tset "[^"]*"\) "[^"]*" <RET> \1 "replaced" <RET>
<RET> represents the Enter/Return key on your keyboard.
This command searches for a string that fits an regular expression in your file and replaces it with what ever you want. The first part is set as a group, everything within the parenthesis, that you want to keep. Expressed as \1 in the replacement string.

Related

Alphanumeric substitution with vim

I'm using the vscode vimplugin. I have a bunch of lines that look like:
Terry,169,80,,,47,,,22,,,6,,
I want to remove all the alphanumeric characters after the first comma so I get:
Terry,,,,,,,,,,,,,
In command mode I tried:
s/^.+\,[a-zA-Z0-9-]\+//g
But this does not appear to do anything. How can I get this working?
edit:
s/^[^,]\+,[a-zA-Z0-9-]\+//g
\+ is greedy; ^.\+, eats the entire line up to the last ,.
Instead of the dot (which means "any character") use [^,] which means "any but a comma". Then ^[^,]\+, means "any characters up to the first comma".
The problem with your requirement is that you want to anchor at the beginning using ^ so you cannot use flag g — with the anchor any substitution will be done once. The only way I can solve the puzzle is to use expressions: match and preserve the anchored text and then use function substitute() with flag g.
I managed with the following expression:
:s/\(^[^,]\+\)\(,\+\)\(.\+\)$/\=submatch(1) . submatch(2) . substitute(submatch(3), '[^,]', '', 'g')/
Let me split it in parts. Searching:
\(^[^,]\+\) — first, match any non-commas
\(,\+\) — any number of commas
\(.\+\)$ — all chars to the end of the string
Substituting:
\= — the substitution is an expression
See http://vimdoc.sourceforge.net/htmldoc/change.html#sub-replace-expression
submatch(1) — replace with the first match (non-commas anchored with ^)
submatch(2) — replace with the second match (commas)
substitute(submatch(3), '[^,]', '', 'g') — replace in the rest of the string
The last call to substitute() is simple, it replaces all non-commas with empty strings.
PS. Tested in real vim, not vscode.

Add words at beginning and end of a FASTA header line with sed

I have the following line:
>XXX-220_5004_COVID-A6
TTTATTTGACATGAGTAAATTTCCCCTTAAATTAAGGGGTACTGCTGTTATGTCTTTAAA
AGAAGGTCAAATCAATGATATGATTTTATCTCTTCTTAGTAAAGGTAGACTTATAATTAG
AGAAAACAAC
I would like to convert the first line as follows:
>INITWORD/XXX-220_5004_COVID-A6/FINALWORD
TTTATTTGACATGAGTAAATTTCCCCTTAAATTAAGGGGTACTGCTGTTATGTCTTTAAA
AGAAGGT...
So far I have managed to add the first word as follows:
sed 's/>/>INITTWORD\//I'
That returns:
>INITWORD/XXX-220_5004_COVID-A6
TTTATTTGACATGAGTAAATTTCCCCTTAAATTAAGGGGTACTGCTGTTATGTCTTTAAA
AGAAGGT
How can i add the FINALWORD at the end of the first line?
Just substitute more. sed conveniently allows you to recall the text you matched with a back reference, so just embed that between the things you want to add.
sed 's%^>\(.*\)%>INITWORD/\1/FINALWORD%I' file.fasta
I also added a ^ beginning-of-line anchor, and switched to % delimiters so the slashes don't need to be escaped.
In some more detail, the s command's syntax is s/regex/replacement/flags where regex is a regular expression to match the text you want to replace, and replacement is the text to replace it with. In the regex, you can use grouping parentheses \(...\) to extract some of the matched text into the replacement; so \1 refers to whatever matched the first set of grouping parentheses, \2 to the second, etc. The /flags are optional single-character specifiers which modify the behavior of the command; so for example, a /g flag says to replace every match on a line, instead of just the first one (but we only expect one match per line so it's not necessary or useful here).
The I flag is non-standard but since you are using that, I assume it does something useful for you.

Extracting substring from inside bracketed string, where the substring may have spaces

I've got an application that has no useful api implemented, and the only way to get certain information is to parse string output. This is proving to be very painful...
I'm trying to achieve this in bash on SLES12.
Given I have the following strings:
QMNAME(QMTKGW01) STATUS(Running)
QMNAME(QMTKGW01) STATUS(Ended normally)
I want to extract the STATUS value, ie "Ended normally" or "Running".
Note that the line structure can move around, so I can't count on the "STATUS" being the second field.
The closest I have managed to get so far is to extract a single word from inside STATUS like so
echo "QMNAME(QMTKGW01) STATUS(Running)" | sed "s/^.*STATUS(\(\S*\)).*/\1/"
This works for "Running" but not for "Ended normally"
I've tried switching the \S* for [\S\s]* in both "grep -o" and "sed" but it seems to corrupt the entire regex.
This is purely a regex issue, by doing \S you requested to match non-white space characters within (..) but the failing case has a space between which does not comply with the grammar defined. Make it simple by explicitly calling out the characters to match inside (..) as [a-zA-Z ]* i.e. zero or more upper & lower case characters and spaces.
sed 's/^.*STATUS(\([a-zA-Z ]*\)).*/\1/'
Or use character classes [:alnum:] if you want numbers too
sed 's/^.*STATUS(\([[:alnum:] ]*\)).*/\1/'
sed 's/.*STATUS(\([^)]*\)).*/\1/' file
Output:
Running
Ended normally
Extracting a substring matching a given pattern is a job for grep, not sed. We should use sed when we must edit the input string. (A lot of people use sed and even awk just to extract substrings, but that's wasteful in my opinion.)
So, here is a grep solution. We need to make some assumptions (in any solution) about your input - some are easy to relax, others are not. In your example the word STATUS is always capitalized, and it is immediately followed by the opening parenthesis (no space, no colon etc.). These assumptions can be relaxed easily. More importantly, and not easy to work around: there are no nested parentheses. You will want the longest substring of non-closing-parenthesis characters following the opening parenthesis, no mater what they are.
With these assumptions:
$ grep -oP '\bSTATUS\(\K[^)]*(?=\))' << EOF
> QMNAME(QMTKGW01) STATUS(Running)
> QMNAME(QMTKGW01) STATUS(Ended normally)
> EOF
Running
Ended normally
Explanation:
Command options: o to return only the matched substring; P to use Perl extensions (the \K marker and the lookahead). The regexp: we look for a word boundary (\b) - so the word STATUS is a complete word, not part of a longer word like SUBSTATUS; then the word STATUS and opening parenthesis. This is required for a match, but \K instructs that this part of the matched string will not be returned in the output. Then we seek zero or more non-closing-parenthesis characters ([^)]*) and we require that this be followed by a closing parenthesis - but the closing parenthesis is also not included in the returned string. That's a "lookahead" (the (?= ... ) construct).

What does the following sed statement mean

sed 's/<img src=\"\([^"]*\).*/\1/g'
input:
<img src="geo.yahoo.com/b?s=792600534"; height="1" width="1" style="position: absolute;" />
output:
https://geo.yahoo.com/b?s=792600534
This part is the regular expression to match with a capturing group Later referred as \1 (first capturing group). It extracting the value of the src attribute.
First part if the regex -> <img src=\"
capturing group -> \([^"]*\)
rest of the regex -> .*
The expression inside the square brackets could be read as: "anything not a double quote".
sed is a scripting language. Its s command performs substitutions using regular expressions. The syntax is s/regex/replacement/flags. In your example, you have the regex
<img src=\"\([^"]*\).*
and the replacement
\1
and the flags
g
The regex is apparently attempting to parse HTML, which deserves you a place in a warm location where a friendly gentleman with a pitchfork helps you with motivational issues. Far, far away, God reluctantly ends the life of a fluffy kitten.
The regular expression contains a capturing group, which is simply the text which matched between the parentheses. The replacement \1 refers back to this captured text. So in brief, you are taking away the parts which matched around this captured string.
s/foo\(bar\)baz/\1/
replaces foobarbaz with just baz, retrieving the "baz" part from whatever matched, rather than hard-coding a replacement string.
The regular expression .* matches any character any number of times; the regular expression engine will prefer the longest, leftmost possible match.
The regular expression [^"]* matches a single character which is not (newline or) " and the * again says to match as many times as possible. So "\([^"]*\)" finds a double-quoted string, and captures its contents; the negated " prevents the regular expression from matching past the closing quote when matching as many characters as possible. (As noted in comments, the backslash before the first " is unnecessary, but basically harmless. It just tells us that whoever wrote this isn't a regex wizard.)
However, your example just implicitly includes the closing quote in the .* match which will simply match everything from the closing quote through to the end of the line.
The g flag says to repeat the substitution command as many times as possible; so if an input line contains multiple matches, all of them will be replaced. (Without the g flag, sed will just replace the first match it finds on a line.) But since you just removed the rest of the line, the flag isn't actually useful here; there can only ever be a single match.
The gentleman with the pitchfork doesn't want me to tell you this, but this code is not suitable for a general-purpose script. There is no guarantee that the src attribute of the img element will be immediately adjacent to the img opening tag with just a single space in between; HTML allows arbitrary spacing (including a line wrap) and you can have other attributes like id or alt or title which could go before or after the src attribute. The proper solution is to use a HTML parser to extract the src attributes of img tags with proper understanding of the surrounding syntax.
xmlstarlet sel -T -t -m "/img" -m "#src" -v '.' -n
... though the stray semicolon after the src attribute is a HTML syntax violation; is it really there in your input?
(xmlstarlet command line shamelessly adapted from https://stackoverflow.com/a/3174307/874188)

sed - remove specific subscript from string

please provide me a sed oneliner which provides this output:
sdc3 sdc2
for Input :
sdc3[1] sdc2[0]
I mean remove all subscript value from the string ..
sed 's/\[[^]]*\]//g'
reads: substitute any string with literal "[" followed by zero or more characters that aren't a "]", and then the closing "]", with an empty string.
You need the [^]] bit to prevent greedy matching treating "[1] sdc2[0]" as a single match in your sample string.
As for your comment:
sed 's#\([^[ ]*\)\[[^]]*\]#/dev/\1#g'
I switch the seperator from the usual '/' to '#', just to avoid escaping the /dev/ bit you asked for (I won't say "for clarity")
the \(...\) bit matches a subgroup, here sdc2 or whatever, so we can refer to it in the replacement
the subgroup uses a similar character class to the one we used discarding the index: [^[ ] means any character except an "[" (again, to avoid greedily matching the index) or a space (assuming your values are space-delimited as per your post)
the replacement is now the literal "/dev/" followed by the first (and only) subgroup match
the g flag at the end tells it to perform multiple matches per line, instead of stopping at the first one