sed too greedy (+ vs *) - sed

I have lines like this:
scaffold157|size21652:7243-9055/0_1813 10 -2127 86.5772 0 272 854 1813 1 185842 186425 147764049 254
I need to remove part from "/" until word boundary (first tab), so in my example this part:
/0_1813
with this result:
scaffold157|size21652:7243-9055 10 -2127 86.5772 0 272 854 1813 1 185842 186425 147764049 254
However, my sed seems to be too greedy with
sed 's/\/0_.*\b//'
eating all columns. However, with .+, command doesn't work at all and nothing is replaced. What am I doing wrong? Why is .+ not working?

The reason .+ is behaving the way you are seeing is that + is only a metacharacter in EREs and sed uses BREs by default so unless you enable EREs by adding -r or escaping as \+ sed considers + just a literal plus character.
That's an aside though, all you need is:
$ sed 's|/[^[:space:]]*[[:space:]]*||' file
scaffold157|size21652:7243-905510 -2127 86.5772 0 272 854 1813 1 185842 186425 147764049 254
You can probably replace [[:space:]] with \s and [^[:space:]] with \S in some seds, e.g. GNU.

Match digits instead:
sed 's/\/0_[0-9]*//'
Or negated spaces:
sed 's/\/0_[^ \t]*//'
sed 's/\/0_[^[:blank:]]*//'
sed -r 's/\/0_\S*\b//'
Probably with negated spaces, \b is no longer necessary.

I need to remove part from "/" until word boundary (first tab)
here this one-liner gives your expected output:
sed -r 's#/\S*\b##'

Related

Using sed, insert a space at the 3rd last index of each line

I would like to insert a space, before the 3rd last character of each line, to turn this:
CC287999221
CHGFFDTTT34AAA387
CH654AZ0987XX277
Into this:
CC287999 221
CHGFFDTTT34AAA 387
CH654AZ0987XX 277
So far I've tried:
sed -i 's/.*\(...\)/ \1/' file
However this remove the preceding text also.
Thank you
One way:
sed 's/\(...$\)/ \1/' file
Just match the last 3 characters, while substituting put a space and then the matched pattern(\1)
With awk could you please try following.
awk '{print substr($0,1,length($0)-3),substr($0,length($0)-2)}' Input_file
tried on gnu sed:
sed -E 's/\S{3}\s*$/ &/' file
Another awk proposal:
awk '{sub(/.{3}$/," &")}1' file
CC287999 221
CHGFFDTTT34AAA 387
CH654AZ0987XX 277

Join certain lines with sed

I have an input which looks like this:
1
2
3
4
5
6
And I want to transform it with sed to :
12
345
6
I know it can be easily done with other tools but I want to do it specifically with sed as a learning exercise.
I have attempted this:
sed ':x ; /^ *$/{ N; s/\n// ; bx; }'
But it prints :
123456
Can someone help me fix this?
Quoting from the GNU sed manual:
A common technique to process blocks of text such as paragraphs (instead of line-by-line) is using the following construct:
sed '/./{H;$!d} ; x ; s/REGEXP/REPLACEMENT/'
The first expression, /./{H;$!d} operates on all non-empty lines, and adds the current line (in the pattern space) to the hold space. On all lines except the last, the pattern space is deleted and the cycle is restarted.
The other expressions x and s are executed only on empty lines (i.e. paragraph separators). The x command fetches the accumulated lines from the hold space back to the pattern space. The s/// command then operates on all the text in the paragraph (including the embedded newlines).
And indeed,
sed '/./{H;$!d} ; x ; s/\n//g'
does what you want.
FWIW here's how to really do that task in UNIX:
$ awk -v RS= -v OFS= '{$1=$1}1' file
12
345
6
The above will work on any UNIX box.
A GNU awk approach:
$ awk -F"\n" '{gsub("\n","");}1' RS='\n{2,}' file
12
345
6
Note it will add a trailing newline\n after last line.

sed pattern replace, trailing spaces ending with + only with +

sed pattern replace, trailing spaces ending with + only with +
input
Summary of differences with Numeric +34
First-step changes +34
output
Summary of differences with Numeric+34
First-step changes+34
Did not find answer here
dos this work for you?
sed 's/ *+/+/'
add g if you want to multi-replacements in line.
sed -r 's|\s+([+]\S+)$|\1|' file
Output:
Summary of differences with Numeric+34
First-step changes+34
You can for example do:
$ sed -r 's/\s{2,}\+/+/g' file
Summary of differences with Numeric+34
First-step changes+34
This removes multiple spaces (at least 2) whenever they are followed by the + character. Note + has to be escaped to be interpreted as character and not as a regex symbol.

Replace particular string at fixed position using sed

I have a simple text file containing several lines of data where each line has got exactly 26 characters.
E.g.
10001340100491938001945591
10001340100491951002049591
10001340100462055002108507
10001340100492124002135591
10001340100492145002156507
10001340100472204002205591
Now, I want to replace 12th and 13th character if only these two characters are 49 with characters 58.
I tried it like this:
sed 's/^(.{12})49/\58/' data.txt
but am getting this error:
sed: -e expression #1, char 18: invalid reference \5 on `s' command's RHS
Thanks in advance
The captured group is \1, so you want to put the 11 (not 12) characters, then 58:
sed -E 's/^(.{11})49/\158/' data.txt
You also need -E or -r if you don't want to escape square and curly brackets.
With your input, the command changes 49 to 58 in lines 1, 2, 4 and 5.
If you like to try an awk solution:
awk 'substr($0,12,2)==49 {$0=substr($0,1,11)"58"substr($0,14)}1' file
10001340100581938001945591
10001340100581951002049591
10001340100462055002108507
10001340100582124002135591
10001340100582145002156507
10001340100472204002205591
In your expression you are replacing the first 14 characters (if you got it right). But you need to replace 11 plus the two (12th, 13th). More importantly, sed is fussy about escaping brackets, so you need backslashes in front of \( and { etc. Finally - the number of the capture group is 1. You omitted that number.
Putting it all together, you get
sed 's/^\(.\{11\}\)49/\158/'
There are flags you can use in sed to make the expressions more "regular" (changing the meaning of ( vs \(). What flag that is depends on your platform (version of sed), I believe.

What does this sed expression mean?

I'm trying to tool around with some scripts I have inherited at work and wanted to see if someone could decipher what this expression is attempting to accomplish:
|sed -e 's#\(.\{36\}\)\(.*\)#\1|\2#g' | sed -e 's#\(.\{49\}\)\(.*\)#\1|\2#g'
I have tried to reverse engineer this via the reference manuals and google, but have not been successful.
Thanks!
This is two sed statements. The first inserts a pipe character ('|') after the first 36 characters of the line, the second inserts a pipe character after the first 49 characters (including the pipe it inserted in the first step).
As far as I can tell, these could be written more concisely with the same effect:
|sed -e 's#\(.\{36\}\)#\1|#' | sed -e 's#\(.\{49\}\)#\1|#'
It means
insert after the first 36 chars of each line a '|'
in that ouput insert after the first 49 chars a '|'
all these insertions are done if the line contains at least 36 chars, respectively 49 chars.
you can do it shorter so:
| sed ' s:^.\{36\}:&|:; s:^.\{49\}:&|: '