How to replace a character within a matched pattern using ampersand (&) - sed

When we match a pattern using sed, the matched pattern is stored in the "ampersand" (&) variable. IS there a way to replace a character in this matched pattern using the ampersand itself ?
For example, if & contains the string "apple1", how can I use & to make the string to "apple2" (i.e replace 1 by 2) ?

If I guessed right, what you want to do is to apply a subsitution in a pattern matched. You can't do that using &. You want to do this instead:
echo apple1 apple3 apple1 apple2 botemo1 | sed '/apple./ { s/apple1/apple2/g; }'
This means that you want to execute the command substitution only on the lines that matches the pattern /apple./.

You can also use a capture group. A capture is used to grab a part of the match and save it into an auxiliary variable, that is named numerically in the order that the capture appears.
echo apple1 | sed -e 's/\(a\)\(p*\)\(le\)1/\1\2\32/g'
We used three captures:
The first one, stored in \1, contains an "a"
The second one, stored in \2, contains a sequence of "p"s (in the example it contains "pp")
The third one, stored in \3, contains the sequence "le"
Now we can print the replacement using the matches we captured: \1\2\32. Notice that we are using 3 capture values to generate "apple" and then we append a 2. This wont be interpreted as variable \32 because we can only have a total of 9 captures.
Hope this helps =)

you can first match a pattern and then change the text if matched:
echo "apple1" | sed '/apple/s/1/2/' # gives you "apple2"
this code changes 1 to 2 in all lines containing apple

This might work for you (GNU sed and Bash):
sed 's/apple1/sed "s|1|2|" <<<"&"/e' file

Related

getting the first letter of an filtered part in sed

I have a filename e.g. 15736--1_brand-new-image.jpg
My goal is to get the first letter after the _ in this case the b.
With s/\(.*\)\_\(.*\)$/\2/ I am able to extract brand-new-image.jpg
which is partly based on the info found on https://www.oncrashreboot.com/use-sed-to-split-path-into-filename-extension-and-directory
I've already found get first letter of words using sed but fail to combine the two.
To validate my sed statement I've used https://sed.js.org/
How can I combina a new sed statement on the part I've filtered to get the first letter?
With your shown samples could you please try following.
echo "15736--1_brand-new-image.jpg" | sed 's/[^_]*_\(.\).*/\1/'
Explanation: Simply using substitution operation of sed, then looking till 1st occurrence of _ then saving next 1 char into back reference and mentioning .* will cover everything after it, while substituting simply substituting everything with 1st back reference value which will be after 1st _ in this case its b.
Explanation: Following is only for explanation purposes.
sed ' ##Starting sed program from here.
s/ ##using s to tell sed to perform substitution operation.
[^_]*_\(.\).* ##using regex to match till 1st occurrence of _ then using back reference \(.\) to catch value in temp buffer memory here.
/\1/ ##Substituting whole line with 1st back reference value here which is b in this case.
'
Using a . or \w could also match _ in case there are 2 consecutive __
If you want to match the first word character without matching the _ you could also use
echo "15736--1_brand-new-image.jpg" | sed 's/[^_]*_\([[:alnum:]]\).*/\1/'
Output
b
This might work for you (GNU sed):
sed -nE 's/^[^_]*_[^[:alpha:]]*([[:alpha:]]).*/\1/p' file
Since this a filtering type operation use the -n option to print only when there is a positive match.
Match the first _ from the start of the line and then discard any non-alpha characters until an alpha character and finally discard any other characters.
Print the result if there is a match.
N.B. Anchoring the match to the start of the line, prevents the result containing more than one character i.e. consider the string 123_456_abc might otherwise result in 4 or 123_a.

how to replace each ,, with ,?, using sed?

I have tried the following command:
echo "123456,,7,,,,890" | sed 's/,,/\,?,/g'
Result:
123456,?,7,?,,?,890
But the result I want is:
123456,?,7,?,?,?,890
Could anyone help me ?
Thanks
Your problem is, that the ,, in the result was never seen by the g option.
One of the two is coming from replacing.
With your special desired output (I would have expect only three instead of four replacements...) you need to look at the result of one replacement and replace again, until no replacing takes place anymore.
You can achieve that by making a loop, with :a, i.e. the label "a" and then go back after a successful replacement with ta, "to label a".
(The g becomes unnecessary, but might be more efficient. Time it to find out in your environment.)
sed ':a;s/,,/\,?,/g;ta'
result
"123456,?,7,?,?,?,890"
Regular expressions can not match overlapping spans. Thus, if you have ,,,,, the first two commas will be the first match, and the third and fourth comma will constitute the second match. There is no way to match the second and third comma with /??/.
Typically, this would be done using lookahead, to avoid one of the commas to be a part of the match; but sed does not support it. So you can switch to a more powerful regex engine, like that of perl:
echo "123456,,7,,,,890" | perl -pe 's/,(?=,)/,?/g'
Alternately, since in your specific case you will miss every other adjacent comma pair, you can just run your sed twice:
echo "123456,,7,,,,890" | sed 's/,,/,?,/g' | sed 's/,,/,?,/g'
or combine the two operations into one sed invocation:
echo "123456,,7,,,,890" | sed 's/,,/\,?,/g; s/,,/,?,/g'

Extract filename from multiple lines in unix

I'm trying to extract the name of the file name that has been generated by a Java program. This Java program spits out multiple lines and I know exactly what the format of the file name is going to be. The information text that the Java program is spitting out is as follows:
ABCASJASLEKJASDFALDSF
Generated file YANNANI-0008876_17.xml.
TDSFALSFJLSDJF;
I'm capturing the output in a variable and then applying a sed operator in the following format:
sed -n 's/.*\(YANNANI.\([[:digit:]]\).\([xml]\)*\)/\1/p'
The result set is:
YANNANI-0008876_17.xml.
However, my problem is that want the extraction of the filename to stop at .xml. The last dot should never be extracted.
Is there a way to do this using sed?
Let's look at what your capture group actually captures:
$ grep 'YANNANI.\([[:digit:]]\).\([xml]\)*' infile
Generated file YANNANI-0008876_17.xml.
That's probably not what you intended:
\([[:digit:]]\) captures just a single digit (and the capture group around it doesn't do anything)
\([xml]\)* is "any of x, m or l, 0 or more times", so it matches the empty string (as above – or the line wouldn't match at all!), x, xx, lll, mxxxxxmmmmlxlxmxlmxlm, xml, ...
There is no way the final period is removed because you don't match anything after the capture groups
What would make sense instead:
Match "digits or underscores, 0 or more": [[:digit:]_]*
Match .xml, literally (escape the period): \.xml
Make sure the rest of the line (just the period, in this case) is matched by adding .* after the capture group
So the regex for the string you'd like to extract becomes
$ grep 'YANNANI.[[:digit:]_]*\.xml' infile
Generated file YANNANI-0008876_17.xml.
and to remove everything else on the line using sed, we surround regex with .*\( ... \).*:
$ sed -n 's/.*\(YANNANI.[[:digit:]_]*\.xml\).*/\1/p' infile
YANNANI-0008876_17.xml
This assumes you really meant . after YANNANI (any character).
You can call sed twice: first in printing and then in replacement mode:
sed -n 's/.*\(YANNANI.\([[:digit:]]\).\([xml]\)*\)/\1/p' | sed 's/\.$//g'
the last sed will remove all the last . at the end of all the lines fetched by your first sed
or you can go for a awk solution as you prefer:
awk '/.*YANNANI.[0-9]+.[0-9]+.xml/{print substr($NF,1,length($NF)-1)}'
this will print the last field (and truncate the last char of it using substr) of all the lines that do match your regex.

Perl one-liner: deleting a line with pattern matching

I am trying to delete bunch of lines in a file if they match with a particular pattern which is variable.
I am trying to delete a line which matches with abc12, abc13, etc.
I tried writing a C-shell script, and this is the code:
**!/bin/csh
foreach $x (12 13 14 15 16 17)
perl -ni -e 'print unless /abc$x/' filename
end**
This doesn't work, but when I use the one-liner without a variable (abc12), it works.
I am not sure if there is something wrong with the pattern matching or if there is something else I am missing.
Yes, it's the fact you're using single quotes. It means that $x is being interpreted literally.
Of course, you're also doing it very inefficiently, because you're processing each file multiple times.
If you're looking to remove lines abc12 to abc17 you can do this all in one go:
perl -n -i.bak -e 'print unless m/abc1[234567]/' filename
Try this
perl -n -i.bak -e 'print unless m/abc1[2-7]/' filename
using the range [2-7] only removes the need to type [234567] which has the effect of saving you three keystrokes.
man 1 bash: Pattern Matching
[...] Matches any one of the enclosed characters. A pair of characters separated by a hyphen denotes a range expression; any character that sorts between those two characters, inclusive, using the current locale's collating sequence and character set, is matched. If the first character following the [ is a ! or a ^ then any character not enclosed is matched.
A - may be matched by including it as the first or last character in the set. A ] may be matched by including it as the first character in the set.

Confining Substitution to Match Space Using sed?

Is there a way to substitute only within the match space using sed?
I.e. given the following line, is there a way to substitute only the "." chars that are contained within the matching single quotes and protect the "." chars that are not enclosed by single quotes?
Input:
'ECJ-4YF1H10.6Z' ! 'CAP' ! '10.0uF' ! 'TOL' ; MGCDC1008.S1 MGCDC1009.A2
Desired result:
'ECJ-4YF1H10-6Z' ! 'CAP' ! '10_0uF' ! 'TOL' ; MGCDC1008.S1 MGCDC1009.A2
Or is this just a job to which perl or awk might be better suited?
Thanks for your help,
Mark
Give the following a try which uses the divide-and-conquer technique:
sed "s/\('[^']*'\)/\n&\n/g;s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g;s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g;s/\n//g" inputfile
Explanation:
s/\('[^']*'\)/\n&\n/g - Add newlines before and after each pair of single quotes with their contents
s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g - Using a newline and the single quotes to key on, replace the dot with a dash for strings that end in "Z"
s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g - Using a newline and the single quotes to key on, replace the dot with a dash for strings that end in "uF"
s/\n//g - Remove the newlines added in the first step
You can restrict the command to acting only on certain lines:
sed "/foo/{s/\('[^']*'\)/\n&\n/g;s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g;s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g;s/\n//g}" inputfile
where you would substitute some regex in place of "foo".
Some versions of sed like to be spoon fed (instead of semicolons between commands, use -e):
sed -e "/foo/{s/\('[^']*'\)/\n&\n/g" -e "s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g" -e "s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g" -e "s/\n//g}" inputfile
$ cat phoo1234567_sedFix.sed
#! /bin/sed -f
/'[0-9][0-9]\.[0-9][a-zA-Z][a-zA-Z]'/s/'\([0-9][0-9]\)\.\([0-9][a-zA-Z][a-zA-Z]\)'/\1_\2/
This answers your specific question. If the pattern you need to fix isn't always like the example you provided, they you'll need multiple copies of this line, with reg-expressions modified to match your new change targets.
Note that the cmd is in 2 parts, "/'[0-9][0-9].[0-9][a-zA-Z][a-zA-Z]'/" says, must match lines with this pattern, while the trailing "s/'([0-9][0-9]).([0-9][a-zA-Z][a-zA-Z])'/\1_\2/", is the part that does the substitution. You can add a 'g' after the final '/' to make this substitution happen on all instances of this pattern in each line.
The \(\) pairs in match pattern get converted into the numbered buffers on the substitution side of the command (i.e. \1 \2). This is what gives sed power that awk doesn't have.
If your going to do much of this kind of work, I highly recommend O'Rielly's Sed And Awk book. The time spent going thru how sed works will be paid back many times.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, or give it a + (or -) as a useful answer.
this is a job most suitable for awk or any language that supports breaking/splitting strings.
IMO, using sed for this task, which is regex based , while doable, is difficult to read and debug, hence not the most appropriate tool for the job. No offense to sed fanatics.
awk '{
for(i=1;i<=NF;i++) {
if ($i ~ /\047/ ){
gsub(".","_",$i)
}
}
}1' file
The above says for each field (field seperator by default is white space), check to see if there is a single quote, and if there is , substitute the "." to "_". This method is simple and doesn't need complicated regex.